pyarrow.parquet.SortingColumn#

classpyarrow.parquet.SortingColumn(intcolumn_index,booldescending=False,boolnulls_first=False)#

Bases:object

Sorting specification for a single column.

Returned byRowGroupMetaData.sorting_columns() and used inParquetWriter to specify the sort order of the data.

Parameters:
column_indexint

Index of column that data is sorted by.

descendingbool, defaultFalse

Whether column is sorted in descending order.

nulls_firstbool, defaultFalse

Whether null values appear before valid values.

Notes

Column indices are zero-based, refer only to leaf fields, and are indepth-first order. This may make the column indices for nested schemasdifferent from what you expect. In most cases, it will be easier tospecify the sort order using column names instead of column indicesand converting using thefrom_ordering method.

Examples

In other APIs, sort order is specified by names, such as:

>>>sort_order=[('id','ascending'),('timestamp','descending')]

For Parquet, the column index must be used instead:

>>>importpyarrow.parquetaspq>>>[pq.SortingColumn(0),pq.SortingColumn(1,descending=True)][SortingColumn(column_index=0, descending=False, nulls_first=False), SortingColumn(column_index=1, descending=True, nulls_first=False)]

Convert the sort_order into the list of sorting columns withfrom_ordering (note that the schema must be provided as well):

>>>importpyarrowaspa>>>schema=pa.schema([('id',pa.int64()),('timestamp',pa.timestamp('ms'))])>>>sorting_columns=pq.SortingColumn.from_ordering(schema,sort_order)>>>sorting_columns(SortingColumn(column_index=0, descending=False, nulls_first=False), SortingColumn(column_index=1, descending=True, nulls_first=False))

Convert back to the sort order withto_ordering:

>>>pq.SortingColumn.to_ordering(schema,sorting_columns)((('id', 'ascending'), ('timestamp', 'descending')), 'at_end')
__init__(*args,**kwargs)#

Methods

__init__(*args, **kwargs)

from_ordering(cls, Schema schema, sort_keys)

Create a tuple of SortingColumn objects from the same arguments aspyarrow.compute.SortOptions.

to_dict(self)

Get dictionary representation of the SortingColumn.

to_ordering(Schema schema, sorting_columns)

Convert a tuple of SortingColumn objects to the same format aspyarrow.compute.SortOptions.

Attributes

column_index

"Index of column data is sorted by (int).

descending

Whether column is sorted in descending order (bool).

nulls_first

Whether null values appear before valid values (bool).

column_index#

“Index of column data is sorted by (int).

descending#

Whether column is sorted in descending order (bool).

classmethodfrom_ordering(cls,Schemaschema,sort_keys,null_placement='at_end')#

Create a tuple of SortingColumn objects from the same arguments aspyarrow.compute.SortOptions.

Parameters:
schemaSchema

Schema of the input data.

sort_keysSequence of (name,order)tuples

Names of field/column keys (str) to sort the input on,along with the order each field/column is sorted in.Accepted values fororder are “ascending”, “descending”.

null_placement{‘at_start’, ‘at_end’}, default ‘at_end’

Where null values should appear in the sort order.

Returns:
sorting_columnstuple ofSortingColumn
nulls_first#

Whether null values appear before valid values (bool).

to_dict(self)#

Get dictionary representation of the SortingColumn.

Returns:
dict

Dictionary with a key for each attribute of this class.

staticto_ordering(Schemaschema,sorting_columns)#

Convert a tuple of SortingColumn objects to the same format aspyarrow.compute.SortOptions.

Parameters:
schemaSchema

Schema of the input data.

sorting_columnstuple ofSortingColumn

Columns to sort the input on.

Returns:
sort_keystuple of (name,order)tuples
null_placement{‘at_start’, ‘at_end’}