pandas.read_parquet #

pandas.read_parquet(path,engine='auto',columns=None,storage_options=None,use_nullable_dtypes=<no_default>,dtype_backend=<no_default>,filesystem=None,filters=None,**kwargs)[source]#

Load a parquet object from the file path, returning a DataFrame.

Parameters:

pathstr, path object or file-like object

String, path object (implementingos.PathLike[str]), or file-likeobject implementing a binaryread() function.The string could be a URL. Valid URL schemes include http, ftp, s3,gs, and file. For file URLs, a host is expected. A local file could be:file://localhost/path/to/table.parquet.A file URL can also be a path to a directory that contains multiplepartitioned parquet files. Both pyarrow and fastparquet supportpaths to directories as well as file URLs. A directory path could be:file://localhost/path/to/tables ors3://bucket/partition_dir.

engine{‘auto’, ‘pyarrow’, ‘fastparquet’}, default ‘auto’

Parquet library to use. If ‘auto’, then the optionio.parquet.engine is used. The defaultio.parquet.enginebehavior is to try ‘pyarrow’, falling back to ‘fastparquet’ if‘pyarrow’ is unavailable.

When using the'pyarrow' engine and no storage options are providedand a filesystem is implemented by bothpyarrow.fs andfsspec(e.g. “s3://”), then thepyarrow.fs filesystem is attempted first.Use the filesystem keyword with an instantiated fsspec filesystemif you wish to use its implementation.

columnslist, default=None

If not None, only these columns will be read from the file.

storage_optionsdict, optional

Extra options that make sense for a particular storage connection, e.g.host, port, username, password, etc. For HTTP(S) URLs the key-value pairsare forwarded tourllib.request.Request as header options. For otherURLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs areforwarded tofsspec.open. Please seefsspec andurllib for moredetails, and for more examples on storage options referhere.

Added in version 1.3.0.

use_nullable_dtypesbool, default False

If True, use dtypes that usepd.NA as missing value indicatorfor the resulting DataFrame. (only applicable for thepyarrowengine)As new dtypes are added that supportpd.NA in the future, theoutput with this option will change to use those dtypes.Note: this is an experimental option, and behaviour (e.g. additionalsupport dtypes) may change without notice.

Deprecated since version 2.0.

dtype_backend{‘numpy_nullable’, ‘pyarrow’}, default ‘numpy_nullable’

Back-end data type applied to the resultantDataFrame(still experimental). Behaviour is as follows:

"numpy_nullable": returns nullable-dtype-backedDataFrame(default).
"pyarrow": returns pyarrow-backed nullableArrowDtypeDataFrame.

Added in version 2.0.

filesystemfsspec or pyarrow filesystem, default None

Filesystem object to use when reading the parquet file. Only implementedforengine="pyarrow".

Added in version 2.1.0.

filtersList[Tuple] or List[List[Tuple]], default None

To filter out data.Filter syntax: [[(column, op, val), …],…]where op is [==, =, >, >=, <, <=, !=, in, not in]The innermost tuples are transposed into a set of filters appliedthrough anAND operation.The outer list combines these sets of filters through anORoperation.A single list of tuples can also be used, meaning that noORoperation between set of filters is to be conducted.

Using this argument will NOT result in row-wise filtering of the finalpartitions unlessengine="pyarrow" is also specified. Forother engines, filtering is only performed at the partition level, that is,to prevent the loading of some row-groups and/or files.

Added in version 2.1.0.

**kwargs

Any additional kwargs are passed to the engine.

Returns:

DataFrame

See also

DataFrame.to_parquet: Create a parquet object that serializes a DataFrame.

Examples

>>>original_df=pd.DataFrame(...{"foo":range(5),"bar":range(5,10)}...)>>>original_df   foo  bar0    0    51    1    62    2    73    3    84    4    9>>>df_parquet_bytes=original_df.to_parquet()>>>fromioimportBytesIO>>>restored_df=pd.read_parquet(BytesIO(df_parquet_bytes))>>>restored_df   foo  bar0    0    51    1    62    2    73    3    84    4    9>>>restored_df.equals(original_df)True>>>restored_bar=pd.read_parquet(BytesIO(df_parquet_bytes),columns=["bar"])>>>restored_bar    bar0    51    62    73    84    9>>>restored_bar.equals(original_df[['bar']])True

The function useskwargs that are passed directly to the engine.In the following example, we use thefilters argument of the pyarrowengine to filter the rows of the DataFrame.

Sincepyarrow is the default engine, we can omit theengine argument.Note that thefilters argument is implemented by thepyarrow engine,which can benefit from multithreading and also potentially be moreeconomical in terms of memory.

>>>sel=[("foo",">",2)]>>>restored_part=pd.read_parquet(BytesIO(df_parquet_bytes),filters=sel)>>>restored_part    foo  bar0    3    81    4    9

On this page

Show Source

Movatterモバイル変換

pandas.read_parquet#

pandas.read_parquet #