pyarrow.dataset.parquet_dataset #

pyarrow.dataset.parquet_dataset(metadata_path,schema=None,filesystem=None,format=None,partitioning=None,partition_base_dir=None)[source]#

Create a FileSystemDataset from a_metadata file created viapyarrow.parquet.write_metadata.

Parameters:

metadata_pathpath,: Path pointing to a single file parquet metadata file
schemaSchema, optional: Optionally provide the Schema for the Dataset, in which case it willnot be inferred from the source.
filesystemFileSystem or URIstr, defaultNone: If a single path is given as source and filesystem is None, then thefilesystem will be inferred from the path.If an URI string is passed, then a filesystem object is constructedusing the URI’s optional path component as a directory prefix. See theexamples below.Note that the URIs on Windows must follow ‘file:///C:…’ or‘file:/C:…’ patterns.
formatParquetFileFormat: An instance of a ParquetFileFormat if special options needs to bepassed.
partitioningPartitioning,PartitioningFactory,str,list ofstr: The partitioning scheme specified with thepartitioning()function. A flavor string can be used as shortcut, and with a list offield names a DirectoryPartitioning will be inferred.
partition_base_dirstr, optional: For the purposes of applying the partitioning, paths will bestripped of the partition_base_dir. Files not matching thepartition_base_dir prefix will be skipped for partitioning discovery.The ignored files will still be part of the Dataset, but will nothave partition information.

Returns:

FileSystemDataset: The dataset corresponding to the given metadata

On this page

Edit on GitHub

Movatterモバイル変換

pyarrow.dataset.parquet_dataset#

pyarrow.dataset.parquet_dataset #