Movatterモバイル変換


[0]ホーム

URL:


Python Pandas Tutorial

Python Pandas - Working with HDF5 Format



When working with large datasets, we may get "out of memory" errors. These types of problems can be avoided by using an optimized storage format like HDF5. The pandas library offers tools like theHDFStore class andread/write APIs to easily store, retrieve, and manipulate data while optimizing memory usage and retrieval speed.

HDF5 stands forHierarchical Data Format version 5, is an open-source file format designed to store large, complex, and heterogeneous data efficiently. It organizes the data in a hierarchical structure similar to a file system, with groups acting like directories and datasets functioning as files. The HDF5 file format can store different types of data (such as arrays, images, tables, and documents) in a hierarchical structure, making it ideal for managing heterogeneous data.

Creating an HDF5 file using HDFStore in Pandas

TheHDFStore class in pandas is used to manage HDF5 files in a dictionary-like manner. TheHDFStore class is a dictionary-like object that reads and writes Pandas data in the HDF5 format usingPyTables library.

Example

Here is an example of demonstrating how to create aHDF5 file in Pandas using thepandas.HDFStore class.

import pandas as pdimport numpy as np# Create the store using the HDFStore classstore = pd.HDFStore("store.h5")# Display the storeprint(store)# It is important to close the store after usestore.close()

Following is the output of the above code −

<class 'pandas.io.pytables.HDFStore'>File path: store.h5

Note: To work with HDF5 format in pandas, you need thepytables library. It is an optional dependency for pandas and must be installed separately using one of the following commands −

# Using pippip install tables# or using conda installerconda install pytables

Write/read Data to the HDF5 using HDFStore in Pandas

TheHDFStore is a dict-like object, so that we can directly write and read the data to the HDF5 store using key-value pairs.

Example

The below example demonstrates how to write and read data to and from the HDF5 file using theHDFStore in Pandas.

import pandas as pdimport numpy as np# Create the storestore = pd.HDFStore("store.h5")# Create the data index = pd.date_range("1/1/2024", periods=8)s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"])df = pd.DataFrame(np.random.randn(8, 3), index=index, columns=["A", "B", "C"])# Write Pandas data to the Store, which is equivalent to store.put('s', s)store["s"] = s  store["df"] = df# Read Data from the store, which is equivalent to store.get('df')from_store = store["df"]print('Retrieved Data From the HDFStore:\n',from_store)# Close the store after usestore.close()

Following is the output of the above code −

Retrieved Data From the HDFStore:
ABC
2024-01-010.2004670.3418990.105715
2024-01-02-0.3792141.5277140.186246
2024-01-03-0.4181221.0088201.331104
2024-01-040.1464180.587433-0.750389
2024-01-05-0.556524-0.551443-0.161225
2024-01-06-0.214145-0.7226930.072083
2024-01-070.631878-0.521474-0.769847
2024-01-08-0.3619990.4352521.177110

Read and write HDF5 Format Using Pandas APIs

Pandas also provides high-level APIs to simplify the interaction with HDFStore (Nothing but HDF5 files). These APIs allow you to read and write data directly to and from HDF5 files without needing to manually create an HDFStore object. Following are the primary APIs for handling HDF5 files in pandas −

Writing Pandas Data to HDF5 Using to_hdf()

Theto_hdf() function allows you to write pandas objects such as DataFrames and Series directly to an HDF5 file using theHDFStore. This function provides various optional parameters like compression, handling missing values, format options, and more, allowing you to store your data efficiently.

Example

This example uses theDataFrame.to_hdf() function to write data to the HDF5 file.

import pandas as pdimport numpy as np# Create a DataFramedf = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]},index=['x', 'y', 'z']) # Write data to an HDF5 file using the to_hdf()df.to_hdf("data_store.h5", key="df", mode="w", format="table")print("Data successfully written to HDF5 file")

Following is the output of the above code −

Data successfully written to HDF5 file

Reading Data from HDF5 Using read_hdf()

Thepandas.read_hdf() method is used to retrieve Pandas object stored in an HDF5 file. It accepts the file name, file path or buffer from which data is read.

Example

This example demonstrates how to read data stored under the key "df" from the HDF5 file "data_store.h5" using thepd.read_hdf() method.

import pandas as pd# Read data from the HDF5 file using the read_hdf()retrieved_df = pd.read_hdf("data_store.h5", key="df")# Display the retrieved dataprint("Retrieved Data:\n", retrieved_df.head())

Following is the output of the above code −

Retrieved Data:
AB
x14
y25
z36

Appending Data to HDF5 Files Using to_hdf()

Appending data to an existing HDF5 file can be possible by using themode="a" option of theto_hdf() function. This is useful when you want to add new data to a file without overwriting the existing content.

Example

This example demonstrates how to append data to an an existing HDF5 file using theto_hdf() function.

import pandas as pdimport numpy as np# Create a DataFrame to appenddf_new = pd.DataFrame({'A': [7, 8], 'B': [1, 1]},index=['i', 'j'])# Append the new data to the existing HDF5 filedf_new.to_hdf("data_store.h5", key="df", mode="a", format="table", append=True)print("Data successfully appended")# Now read data from the HDF5 file using the read_hdf()retrieved_df = pd.read_hdf("data_store.h5", key='df')# Display the retrieved dataprint("Retrieved Data:\n", retrieved_df.head())

Following is the output of the above code −

Data successfully appendedRetrieved Data:
AB
x14
y25
z36
i71
j81
Print Page
Advertisements

[8]ページ先頭

©2009-2025 Movatter.jp