
- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - Iteration
Iterating over pandas objects is a fundamental task in data manipulation, and the behavior of iteration depends on the type of object you're dealing with. This tutorial explains how iteration works in pandas, specifically focusing on Series and DataFrame objects.
The iteration behavior in pandas varies between Series and DataFrame objects −
Series: Iterating over a Series object yields the values directly, making it similar to an array-like structure.
DataFrame: Iterating over a DataFrame follows a dictionary-like convention, where the iteration produces the column labels (i.e., the keys).
Iterating Through Rows in a DataFrame
To iterate over the rows of the DataFrame, we can use the following methods −
items(): to iterate over the (key,value) pairs
iterrows(): iterate over the rows as (index,series) pairs
itertuples(): iterate over the rows as namedtuples
Iterate Over Column Pairs
Theitems() method allows you to iterate over each column as a key-value pair, with the label as the key and the column values as a Series object. This method is consistent with the dictionary-like interface of a DataFrame.
Example
The following example iterates a DataFrame rows using theitems() method. In this example each column is iterated separately as a key-value pair in a Series.
import pandas as pdimport numpy as np df = pd.DataFrame(np.random.randn(4,3),columns=['col1','col2','col3'])print("Original DataFrame:\n", df)# Iterate Through DataFrame rowsprint("Iterated Output:")for key,value in df.items(): print(key,value)Itsoutput is as follows −
Original DataFrame: col1 col2 col30 0.422561 0.094621 -0.2143071 0.430612 -0.334812 -0.0108672 0.350962 -0.145470 0.9884633 1.466426 -1.258297 -0.824569Iterated Output:col1 0 0.4225611 0.4306122 0.3509623 1.466426Name: col1, dtype: float64col2 0 0.0946211 -0.3348122 -0.1454703 -1.258297Name: col2, dtype: float64col3 0 -0.2143071 -0.0108672 0.9884633 -0.824569Name: col3, dtype: float64
Observe, each column is iterated separately, where key is the column name, and value is the corresponding Series object.
Iterate Over DataFrame as Series Pairs
Theiterrows() method returns an iterator that yields index and row pairs, where each row is represented as a Series object, containing the data in each row.
Example
The following example iterates the DataFrame rows using theiterrows() method.
import pandas as pdimport numpy as npdf = pd.DataFrame(np.random.randn(4,3),columns = ['col1','col2','col3'])print("Original DataFrame:\n", df)# Iterate Through DataFrame rowsprint("Iterated Output:")for row_index,row in df.iterrows(): print(row_index,row)Itsoutput is as follows −
Original DataFrame: col1 col2 col30 0.468160 -0.634193 -0.6036121 1.231840 0.090565 -0.4499892 -1.645371 0.032578 -0.1659503 1.956370 -0.261995 2.168167Iterated Output:0 col1 0.468160col2 -0.634193col3 -0.603612Name: 0, dtype: float641 col1 1.231840col2 0.090565col3 -0.449989Name: 1, dtype: float642 col1 -1.645371col2 0.032578col3 -0.165950Name: 2, dtype: float643 col1 1.956370col2 -0.261995col3 2.168167Name: 3, dtype: float64
Note: Becauseiterrows() iterate over the rows, it doesn't preserve the data type across the row. 0,1,2 are the row indices and col1,col2,col3 are column indices.
Iterate Over DataFrame as Namedtuples
Theitertuples() method will return an iterator yielding a named tuple for each row in the DataFrame. The first element of the tuple will be the rows corresponding index value, while the remaining values are the row values. This method is generally faster thaniterrows() and preserves the data types of the row elements.
Example
The following example uses theitertuples() method to loop thought a DataFrame rows as Namedtuples
import pandas as pdimport numpy as npdf = pd.DataFrame(np.random.randn(4,3),columns = ['col1','col2','col3'])print("Original DataFrame:\n", df)# Iterate Through DataFrame rowsprint("Iterated Output:")for row in df.itertuples(): print(row)Itsoutput is as follows −
Original DataFrame: col1 col2 col30 0.501238 -0.353269 -0.0581901 -0.426044 -0.012733 -0.5325942 -0.704042 2.201186 -1.9604293 0.514151 -0.844160 0.508056Iterated Output:Pandas(Index=0, col1=0.5012381423628608, col2=-0.3532690739340918, col3=-0.058189913290578134)Pandas(Index=1, col1=-0.42604395958954777, col2=-0.012733326002509393, col3=-0.5325942971498149)Pandas(Index=2, col1=-0.7040424042099052, col2=2.201186165472291, col3=-1.9604285032438307)Pandas(Index=3, col1=0.5141508750506754, col2=-0.8441600001815068, col3=0.5080555294913854)
Iterating Through DataFrame Columns
When you iterate over a DataFrame, it will simply returns the column names.
Example
Let us consider the following example to understand the iterate over a DataFrame columns.
import pandas as pdimport numpy as np N = 5df = pd.DataFrame({ 'A': pd.date_range(start='2016-01-01', periods=N, freq='D'), 'x': np.linspace(0, stop=N-1, num=N), 'y': np.random.rand(N), 'C': np.random.choice(['Low', 'Medium', 'High'], N).tolist(), 'D': np.random.normal(100, 10, size=N).tolist()})print("Original DataFrame:\n", df)# Iterate Through DataFrame Columnsprint("Output:")for col in df: print(col)Itsoutput is as follows −
Original DataFrame: A x y C D0 2016-01-01 0.0 0.990949 Low 114.1438381 2016-01-02 1.0 0.314517 High 95.5596402 2016-01-03 2.0 0.180237 Low 121.1348173 2016-01-04 3.0 0.170095 Low 95.6431324 2016-01-05 4.0 0.920718 Low 96.379692Output:AxyCD
Example
While iterating over a DataFrame, you should not modify any object. Iteration is meant for reading, and the iterator returns a copy of the original object (a view), meaning changes will not reflect on the original object. The following example demonstrates the above statement.
import pandas as pdimport numpy as npdf = pd.DataFrame(np.random.randn(4,3),columns = ['col1','col2','col3'])for index, row in df.iterrows(): row['a'] = 10print(df)
Itsoutput is as follows −
col1 col2 col30 -1.739815 0.735595 -0.2955891 0.635485 0.106803 1.5279222 -0.939064 0.547095 0.0385853 -1.016509 -0.116580 -0.523158
As you can see, no changes are reflected in the DataFrame since the iteration only provides a view of the data.