
- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - Working with HDF5 Format
When working with large datasets, we may get "out of memory" errors. These types of problems can be avoided by using an optimized storage format like HDF5. The pandas library offers tools like theHDFStore class andread/write APIs to easily store, retrieve, and manipulate data while optimizing memory usage and retrieval speed.
HDF5 stands forHierarchical Data Format version 5, is an open-source file format designed to store large, complex, and heterogeneous data efficiently. It organizes the data in a hierarchical structure similar to a file system, with groups acting like directories and datasets functioning as files. The HDF5 file format can store different types of data (such as arrays, images, tables, and documents) in a hierarchical structure, making it ideal for managing heterogeneous data.
Creating an HDF5 file using HDFStore in Pandas
TheHDFStore class in pandas is used to manage HDF5 files in a dictionary-like manner. TheHDFStore class is a dictionary-like object that reads and writes Pandas data in the HDF5 format usingPyTables library.
Example
Here is an example of demonstrating how to create aHDF5 file in Pandas using thepandas.HDFStore class.
import pandas as pdimport numpy as np# Create the store using the HDFStore classstore = pd.HDFStore("store.h5")# Display the storeprint(store)# It is important to close the store after usestore.close()Following is the output of the above code −
<class 'pandas.io.pytables.HDFStore'>File path: store.h5
Note: To work with HDF5 format in pandas, you need thepytables library. It is an optional dependency for pandas and must be installed separately using one of the following commands −
# Using pippip install tables# or using conda installerconda install pytables
Write/read Data to the HDF5 using HDFStore in Pandas
TheHDFStore is a dict-like object, so that we can directly write and read the data to the HDF5 store using key-value pairs.
Example
The below example demonstrates how to write and read data to and from the HDF5 file using theHDFStore in Pandas.
import pandas as pdimport numpy as np# Create the storestore = pd.HDFStore("store.h5")# Create the data index = pd.date_range("1/1/2024", periods=8)s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"])df = pd.DataFrame(np.random.randn(8, 3), index=index, columns=["A", "B", "C"])# Write Pandas data to the Store, which is equivalent to store.put('s', s)store["s"] = s store["df"] = df# Read Data from the store, which is equivalent to store.get('df')from_store = store["df"]print('Retrieved Data From the HDFStore:\n',from_store)# Close the store after usestore.close()Following is the output of the above code −
Retrieved Data From the HDFStore:
| A | B | C | |
|---|---|---|---|
| 2024-01-01 | 0.200467 | 0.341899 | 0.105715 |
| 2024-01-02 | -0.379214 | 1.527714 | 0.186246 |
| 2024-01-03 | -0.418122 | 1.008820 | 1.331104 |
| 2024-01-04 | 0.146418 | 0.587433 | -0.750389 |
| 2024-01-05 | -0.556524 | -0.551443 | -0.161225 |
| 2024-01-06 | -0.214145 | -0.722693 | 0.072083 |
| 2024-01-07 | 0.631878 | -0.521474 | -0.769847 |
| 2024-01-08 | -0.361999 | 0.435252 | 1.177110 |
Read and write HDF5 Format Using Pandas APIs
Pandas also provides high-level APIs to simplify the interaction with HDFStore (Nothing but HDF5 files). These APIs allow you to read and write data directly to and from HDF5 files without needing to manually create an HDFStore object. Following are the primary APIs for handling HDF5 files in pandas −
pandas.read_hdf(): Read data from the HDFStore.
pandas.DataFrame.to_hdf() or pandas.Series.to_hdf(): Write Pandas object data to an HDF5 file using the HDFStore.
Writing Pandas Data to HDF5 Using to_hdf()
Theto_hdf() function allows you to write pandas objects such as DataFrames and Series directly to an HDF5 file using theHDFStore. This function provides various optional parameters like compression, handling missing values, format options, and more, allowing you to store your data efficiently.
Example
This example uses theDataFrame.to_hdf() function to write data to the HDF5 file.
import pandas as pdimport numpy as np# Create a DataFramedf = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]},index=['x', 'y', 'z']) # Write data to an HDF5 file using the to_hdf()df.to_hdf("data_store.h5", key="df", mode="w", format="table")print("Data successfully written to HDF5 file")Following is the output of the above code −
Data successfully written to HDF5 file
Reading Data from HDF5 Using read_hdf()
Thepandas.read_hdf() method is used to retrieve Pandas object stored in an HDF5 file. It accepts the file name, file path or buffer from which data is read.
Example
This example demonstrates how to read data stored under the key "df" from the HDF5 file "data_store.h5" using thepd.read_hdf() method.
import pandas as pd# Read data from the HDF5 file using the read_hdf()retrieved_df = pd.read_hdf("data_store.h5", key="df")# Display the retrieved dataprint("Retrieved Data:\n", retrieved_df.head())Following is the output of the above code −
Retrieved Data:
| A | B | |
|---|---|---|
| x | 1 | 4 |
| y | 2 | 5 |
| z | 3 | 6 |
Appending Data to HDF5 Files Using to_hdf()
Appending data to an existing HDF5 file can be possible by using themode="a" option of theto_hdf() function. This is useful when you want to add new data to a file without overwriting the existing content.
Example
This example demonstrates how to append data to an an existing HDF5 file using theto_hdf() function.
import pandas as pdimport numpy as np# Create a DataFrame to appenddf_new = pd.DataFrame({'A': [7, 8], 'B': [1, 1]},index=['i', 'j'])# Append the new data to the existing HDF5 filedf_new.to_hdf("data_store.h5", key="df", mode="a", format="table", append=True)print("Data successfully appended")# Now read data from the HDF5 file using the read_hdf()retrieved_df = pd.read_hdf("data_store.h5", key='df')# Display the retrieved dataprint("Retrieved Data:\n", retrieved_df.head())Following is the output of the above code −
Data successfully appendedRetrieved Data:
| A | B | |
|---|---|---|
| x | 1 | 4 |
| y | 2 | 5 |
| z | 3 | 6 |
| i | 7 | 1 |
| j | 8 | 1 |