pyarrow.orc.ORCFile#
- classpyarrow.orc.ORCFile(source)[source]#
Bases:
object
Reader interface for a single ORC file
- Parameters:
- source
str
orpyarrow.NativeFile
Readable source. For passing Python file objects or byte buffers,see pyarrow.io.PythonFileInterface or pyarrow.io.BufferReader.
- source
Methods
__init__
(source)read
([columns])Read the whole file.
read_stripe
(n[, columns])Read a single stripe from the file.
Attributes
Compression codec of the file
Number of bytes to buffer for the compression codec in the file
Length of the data stripes in the file in bytes
The number of compressed bytes in the file footer
The number of bytes in the file
The number of bytes in the file postscript
Format version of the ORC file, must be 0.11 or 0.12
The file metadata, as an arrow KeyValueMetadata
The number of rows in the file
Number of stripe statistics
The number of stripes in the file
Number of rows per an entry in the row index or 0 if there is no row index
The file schema, as an arrow schema
Software instance and version that wrote this file
The number of compressed bytes in the file stripe statistics
Name of the writer that wrote this file.
Version of the writer
- propertycompression#
Compression codec of the file
- propertycompression_size#
Number of bytes to buffer for the compression codec in the file
- propertycontent_length#
Length of the data stripes in the file in bytes
- propertyfile_footer_length#
The number of compressed bytes in the file footer
- propertyfile_length#
The number of bytes in the file
- propertyfile_postscript_length#
The number of bytes in the file postscript
- propertyfile_version#
Format version of the ORC file, must be 0.11 or 0.12
- propertymetadata#
The file metadata, as an arrow KeyValueMetadata
- propertynrows#
The number of rows in the file
- propertynstripe_statistics#
Number of stripe statistics
- propertynstripes#
The number of stripes in the file
- read(columns=None)[source]#
Read the whole file.
- Parameters:
- columns
list
If not None, only these columns will be read from the file. Acolumn name may be a prefix of a nested field, e.g. ‘a’ will select‘a.b’, ‘a.c’, and ‘a.d.e’. Output always follows theordering of the file and not thecolumns list.
- columns
- Returns:
pyarrow.Table
Content of the file as a Table.
- read_stripe(n,columns=None)[source]#
Read a single stripe from the file.
- Parameters:
- Returns:
pyarrow.RecordBatch
Content of the stripe as a RecordBatch.
- propertyrow_index_stride#
Number of rows per an entry in the row index or 0if there is no row index
- propertyschema#
The file schema, as an arrow schema
- propertysoftware_version#
Software instance and version that wrote this file
- propertystripe_statistics_length#
The number of compressed bytes in the file stripe statistics
- propertywriter#
Name of the writer that wrote this file.If the writer is unknown then its Writer ID(a number) is returned
- propertywriter_version#
Version of the writer