pyarrow.orc.ORCWriter#
- classpyarrow.orc.ORCWriter(where,*,file_version='0.12',batch_size=1024,stripe_size=67108864,compression='uncompressed',compression_block_size=65536,compression_strategy='speed',row_index_stride=10000,padding_tolerance=0.0,dictionary_key_size_threshold=0.0,bloom_filter_columns=None,bloom_filter_fpp=0.05)[source]#
Bases:
objectWriter interface for a single ORC file
- Parameters:
- where
strorpyarrow.io.NativeFile Writable target. For passing Python file objects or byte buffers,see pyarrow.io.PythonFileInterface, pyarrow.io.BufferOutputStreamor pyarrow.io.FixedSizeBufferWriter.
- file_version{“0.11”, “0.12”}, default “0.12”
Determine which ORC file version to use.Hive 0.11 / ORC v0is the older versionwhileHive 0.12 / ORC v1is the newer one.
- batch_size
int, default 1024 Number of rows the ORC writer writes at a time.
- stripe_size
int, default 64 * 1024 * 1024 Size of each ORC stripe in bytes.
- compression
str, default ‘uncompressed’ The compression codec.Valid values: {‘UNCOMPRESSED’, ‘SNAPPY’, ‘ZLIB’, ‘LZ4’, ‘ZSTD’}Note that LZ0 is currently not supported.
- compression_block_size
int, default 64 * 1024 Size of each compression block in bytes.
- compression_strategy
str, default ‘speed’ The compression strategy i.e. speed vs size reduction.Valid values: {‘SPEED’, ‘COMPRESSION’}
- row_index_stride
int, default 10000 The row index stride i.e. the number of rows peran entry in the row index.
- padding_tolerance
double, default 0.0 The padding tolerance.
- dictionary_key_size_threshold
double, default 0.0 The dictionary key size threshold. 0 to disable dictionary encoding.1 to always enable dictionary encoding.
- bloom_filter_columns
None, set-like or list-like, defaultNone Columns that use the bloom filter.
- bloom_filter_fpp
double, default 0.05 Upper limit of the false-positive rate of the bloom filter.
- where
- __init__(where,*,file_version='0.12',batch_size=1024,stripe_size=67108864,compression='uncompressed',compression_block_size=65536,compression_strategy='speed',row_index_stride=10000,padding_tolerance=0.0,dictionary_key_size_threshold=0.0,bloom_filter_columns=None,bloom_filter_fpp=0.05)[source]#
Methods
__init__(where, *[, file_version, ...])close()Close the ORC file
write(table)Write the table into an ORC file.
Attributes
- is_open=False#
- write(table)[source]#
Write the table into an ORC file. The schema of the table mustbe equal to the schema used when opening the ORC file.
- Parameters:
- table
pyarrow.Table The table to be written into the ORC file
- table

