pyarrow.dataset.FileSystemDatasetFactory#

classpyarrow.dataset.FileSystemDatasetFactory(FileSystemfilesystem,paths_or_selector,FileFormatformat,FileSystemFactoryOptionsoptions=None)#

Bases:DatasetFactory

Create a DatasetFactory from a list of paths with schema inspection.

Parameters:
filesystempyarrow.fs.FileSystem

Filesystem to discover.

paths_or_selectorpyarrow.fs.FileSelector orlist of path-likes

Either a Selector object or a list of path-like objects.

formatFileFormat

Currently only ParquetFileFormat and IpcFileFormat are supported.

optionsFileSystemFactoryOptions, optional

Various flags influencing the discovery of filesystem paths.

__init__(*args,**kwargs)#

Methods

__init__(*args, **kwargs)

finish(self, Schema schema=None)

Create a Dataset using the inspected schema or an explicit schema (if given).

inspect(self, *[, promote_options, fragments])

Inspect data fragments and return a common Schema.

inspect_schemas(self)

Attributes

finish(self,Schemaschema=None)#

Create a Dataset using the inspected schema or an explicit schema(if given).

Parameters:
schemaSchema, defaultNone

The schema to conform the source to. If None, the inspectedschema is used.

Returns:
Dataset
inspect(self,*,promote_options='default',fragments=None)#

Inspect data fragments and return a common Schema.

Parameters:
promote_optionsstr, default “default”

Control how to unify types. Accepts strings “default” and “permissive”.Default: types must match exactly, except nulls can be merged with other types.Permissive: types are promoted when possible.

fragmentsint, defaultNone

How many fragments should be inspected to infer the unified schema.UseNone to inspect all fragments.

Returns:
Schema
inspect_schemas(self)#
root_partition#