Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
Ctrl+K

pandas.DataFrame.to_parquet#

DataFrame.to_parquet(path=None,*,engine='auto',compression='snappy',index=None,partition_cols=None,storage_options=None,**kwargs)[source]#

Write a DataFrame to the binary parquet format.

This function writes the dataframe as aparquet file. You can choose different parquetbackends, and have the option of compression. Seethe user guide for more details.

Parameters:
pathstr, path object, file-like object, or None, default None

String, path object (implementingos.PathLike[str]), or file-likeobject implementing a binarywrite() function. If None, the result isreturned as bytes. If a string or path, it will be used as Root Directorypath when writing a partitioned dataset.

engine{‘auto’, ‘pyarrow’, ‘fastparquet’}, default ‘auto’

Parquet library to use. If ‘auto’, then the optionio.parquet.engine is used. The defaultio.parquet.enginebehavior is to try ‘pyarrow’, falling back to ‘fastparquet’ if‘pyarrow’ is unavailable.

compressionstr or None, default ‘snappy’

Name of the compression to use. UseNone for no compression.Supported options: ‘snappy’, ‘gzip’, ‘brotli’, ‘lz4’, ‘zstd’.

indexbool, default None

IfTrue, include the dataframe’s index(es) in the file output.IfFalse, they will not be written to the file.IfNone, similar toTrue the dataframe’s index(es)will be saved. However, instead of being saved as values,the RangeIndex will be stored as a range in the metadata so itdoesn’t require much space and is faster. Other indexes willbe included as columns in the file output.

partition_colslist, optional, default None

Column names by which to partition the dataset.Columns are partitioned in the order they are given.Must be None if path is not a string.

storage_optionsdict, optional

Extra options that make sense for a particular storage connection, e.g.host, port, username, password, etc. For HTTP(S) URLs the key-value pairsare forwarded tourllib.request.Request as header options. For otherURLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs areforwarded tofsspec.open. Please seefsspec andurllib for moredetails, and for more examples on storage options referhere.

**kwargs

Additional arguments passed to the parquet library. Seepandas io for more details.

Returns:
bytes if no path argument is provided else None

See also

read_parquet

Read a parquet file.

DataFrame.to_orc

Write an orc file.

DataFrame.to_csv

Write a csv file.

DataFrame.to_sql

Write to a sql table.

DataFrame.to_hdf

Write to hdf.

Notes

This function requires either thefastparquet orpyarrow library.

Examples

>>>df=pd.DataFrame(data={'col1':[1,2],'col2':[3,4]})>>>df.to_parquet('df.parquet.gzip',...compression='gzip')>>>pd.read_parquet('df.parquet.gzip')   col1  col20     1     31     2     4

If you want to get a buffer to the parquet content you can use a io.BytesIOobject, as long as you don’t use partition_cols, which creates multiple files.

>>>importio>>>f=io.BytesIO()>>>df.to_parquet(f)>>>f.seek(0)0>>>content=f.read()

[8]ページ先頭

©2009-2025 Movatter.jp