Movatterモバイル変換


[0]ホーム

URL:


Python Pandas Tutorial

Python Pandas - Feather File Format



The Feather file format in Pandas provides a fast and efficient way to store and retrieve DataFrame data in a binary format. It is a portable file format optimized for high-performance I/O operations and is portable across different platforms.

What is the Feather File Format?

Feather is a binary columnar file format designed for efficient data storage and fast retrieval of tabular data. It supports all Pandas data types, including extension types like categorical and timezone-aware datetime types. The format is based onApache Arrow's memory specification, enabling high-performance I/O operations.

The Feather file format is language-independent binary file format designed for efficient data exchanging. It is supported by both Python and R languages, ensuring easy data sharing compatibility across data analysis languages. This format is also efficient for fast reading and writing capabilities with less memory usage.

Important Considerations

When working with Feather files in Pandas, you need to keep the following points in mind −

  • Index Storage: Pandas does not store DataFrame indices (Index, or MultiIndex) in Feather files. You can usereset_index() method if you need to store the index.

  • Unique Column Names: Duplicate or non-string column names are not supported.

  • Object Data Types: Columns with object data types are not supported and will raise an error during serialization.

Saving a Pandas DataFrame as a Feather File

To save a Pandas object to a Feather file, you can use theDataFrame.to_feather() method, which saves data of the Pandas object to a file in feather format.

Note: Before saving or retrieving the data from a feather file you need to ensure that the 'pyarrow' library is installed. It is an optional Python dependency library that needs to be installed it by using the following command −

pip install pyarrow.

Example

Following is the example that uses theto_feather() method for saving a Pandas DataFrame object into a feather file.

import pandas as pdimport numpy as np# Create a sample DataFramedf = pd.DataFrame({"a": list("abc"),"b": list(range(1, 4)),"c": np.arange(3, 6).astype("u1"),"d": np.arange(4.0, 7.0),"e": [True, False, True],"f": pd.Categorical(list("abc")),"g": pd.date_range("20240101", periods=3)})print("Original DataFrame:")print(df)# Save the DataFrame as a feather filedf.to_feather("df_feather_file.feather")print("\nDataFrame is successfully saved as a feather file.")

When we run above program, it produces following result −

Original DataFrame:
abcdefg
0a134.0Truea2024-01-01
1b245.0Falseb2024-01-02
2c356.0Truec2024-01-03
DataFrame is successfully saved as a feather file.

Reading a Feather File into Pandas

For loading a feather file data into the Pandas object, you can use theread_feather() method. This method provides several options for customizing data reading.

Example

This example reads the Pandas object from a feather file using the Pandasread_feather() method.

import pandas as pdimport numpy as np# Create a sample DataFramedf = pd.DataFrame({"a": list("abc"),"b": list(range(1, 4)),"c": np.arange(3, 6).astype("u1"),"d": np.arange(4.0, 7.0),"e": [True, False, True],"f": pd.Categorical(list("abc")),"g": pd.date_range("20240101", periods=3)})# Save the DataFrame as a feather filedf.to_feather("df_feather_file.feather")# Load the feather fileresult = pd.read_feather("df_feather_file.feather")# Display the DataFrameprint(result)# Verify data typesprint("\nData Type of the each column:")print(result.dtypes)

While executing the above code we get the following output −

abcdefg
0a134.0Truea2024-01-01
1b245.0Falseb2024-01-02
2c356.0Truec2024-01-03
Data Type of the each column:a objectb int64c uint8d float64e boolf categoryg datetime64[ns]dtype: object

Handling Feather Files in Memory

In-memory files in Python stores the data in RAM rather than reading/writing to a disk. This avoids the high latency of physical I/O operations. Python provides several types of in-memory files, including −

  • Memory-mapped files

  • StringIO

  • BytesIO

  • MemoryFS

Example

This example demonstrates saving and loading a DataFrame as a feather format In-Memory using theread_feather() andto_feather() methods with the help of theBytesIO library, for the in-memory binary data storage.

import pandas as pdimport io# Create a DataFramedf = pd.DataFrame({"Col_1": range(5), "Col_2": range(5, 10)})print("Original DataFrame:")print(df)# Save the DataFrame as In-Memory featherbuf = io.BytesIO()df.to_feather(buf)# Read the DataFrame from the in-memory bufferloaded_df = pd.read_feather(buf)print("\nDataFrame Loaded from In-Memory Feather:")print(loaded_df)

Following is an output of the above code −

Original DataFrame:
Col_1Col_2
005
116
227
338
449
DataFrame Loaded from In-Memory Feather:
Col_1Col_2
005
116
227
338
449
Print Page
Advertisements

[8]ページ先頭

©2009-2025 Movatter.jp