pyarrow.dataset.parquet_dataset#

pyarrow.dataset.parquet_dataset(metadata_path,schema=None,filesystem=None,format=None,partitioning=None,partition_base_dir=None)[source]#

Create a FileSystemDataset from a_metadata file created viapyarrow.parquet.write_metadata.

Parameters:
metadata_pathpath,

Path pointing to a single file parquet metadata file

schemaSchema, optional

Optionally provide the Schema for the Dataset, in which case it willnot be inferred from the source.

filesystemFileSystem or URIstr, defaultNone

If a single path is given as source and filesystem is None, then thefilesystem will be inferred from the path.If an URI string is passed, then a filesystem object is constructedusing the URI’s optional path component as a directory prefix. See theexamples below.Note that the URIs on Windows must follow ‘file:///C:…’ or‘file:/C:…’ patterns.

formatParquetFileFormat

An instance of a ParquetFileFormat if special options needs to bepassed.

partitioningPartitioning,PartitioningFactory,str,list ofstr

The partitioning scheme specified with thepartitioning()function. A flavor string can be used as shortcut, and with a list offield names a DirectoryPartitioning will be inferred.

partition_base_dirstr, optional

For the purposes of applying the partitioning, paths will bestripped of the partition_base_dir. Files not matching thepartition_base_dir prefix will be skipped for partitioning discovery.The ignored files will still be part of the Dataset, but will nothave partition information.

Returns:
FileSystemDataset

The dataset corresponding to the given metadata