pyarrow.dataset.parquet_dataset#
- pyarrow.dataset.parquet_dataset(metadata_path,schema=None,filesystem=None,format=None,partitioning=None,partition_base_dir=None)[source]#
Create a FileSystemDataset from a_metadata file created viapyarrow.parquet.write_metadata.
- Parameters:
- metadata_pathpath,
Path pointing to a single file parquet metadata file
- schema
Schema, optional Optionally provide the Schema for the Dataset, in which case it willnot be inferred from the source.
- filesystem
FileSystemor URIstr, defaultNone If a single path is given as source and filesystem is None, then thefilesystem will be inferred from the path.If an URI string is passed, then a filesystem object is constructedusing the URI’s optional path component as a directory prefix. See theexamples below.Note that the URIs on Windows must follow ‘file:///C:…’ or‘file:/C:…’ patterns.
- format
ParquetFileFormat An instance of a ParquetFileFormat if special options needs to bepassed.
- partitioning
Partitioning,PartitioningFactory,str,listofstr The partitioning scheme specified with the
partitioning()function. A flavor string can be used as shortcut, and with a list offield names a DirectoryPartitioning will be inferred.- partition_base_dir
str, optional For the purposes of applying the partitioning, paths will bestripped of the partition_base_dir. Files not matching thepartition_base_dir prefix will be skipped for partitioning discovery.The ignored files will still be part of the Dataset, but will nothave partition information.
- Returns:
FileSystemDatasetThe dataset corresponding to the given metadata

