Dataset#
Factory functions#
| Open a dataset. |
| Create a FileSystemDataset from a_metadata file created viapyarrow.parquet.write_metadata. |
| Specify a partitioning scheme. |
| Reference a column of the dataset. |
| Expression representing a scalar value. |
| Write a dataset to a given format and partitioning. |
Classes#
| FileFormat for CSV files. |
Scan-specific options for CSV fragments. | |
| FileFormat for JSON files. |
| FileFormat for Parquet |
| Parquet format specific options for reading. |
| Scan-specific options for Parquet fragments. |
A Fragment representing a parquet file. | |
| A Partitioning based on a specified Schema. |
| A Partitioning for "/$key=$value/" nested directories as found in Apache Hive. |
| A Partitioning based on a specified Schema. |
| Collection of data fragments and potentially child datasets. |
| A Dataset of file fragments. |
| Influences the discovery of filesystem paths. |
Create a DatasetFactory from a list of paths with schema inspection. | |
| A Dataset wrapping child datasets. |
| Fragment of data from a Dataset. |
Scan options specific to a particular fragment and scan operation. | |
| A combination of a record batch and the fragment it came from. |
| A materialized scan operation with context and options bound. |
A logical expression to be evaluated against some input. | |
| A Dataset wrapping in-memory data. |
| Metadata information about files written as part of a dataset write operation |
Helper functions#
| Extract partition keys (equality constraints between a field and a scalar) from an expression as a dict mapping the field's name to its value. |

