Python Pandas - Home
Python Pandas - Introduction
Python Pandas - Environment Setup
Python Pandas - Basics
Python Pandas - Introduction to Data Structures
Python Pandas - Index Objects
Python Pandas - Panel
Python Pandas - Basic Functionality
Python Pandas - Indexing & Selecting Data
Python Pandas - Series
Python Pandas - Series
Python Pandas - Slicing a Series Object
Python Pandas - Attributes of a Series Object
Python Pandas - Arithmetic Operations on Series Object
Python Pandas - Converting Series to Other Objects
Python Pandas - DataFrame
Python Pandas - DataFrame
Python Pandas - Accessing DataFrame
Python Pandas - Slicing a DataFrame Object
Python Pandas - Modifying DataFrame
Python Pandas - Removing Rows from a DataFrame
Python Pandas - Arithmetic Operations on DataFrame
Python Pandas - IO Tools
Python Pandas - IO Tools
Python Pandas - Working with CSV Format
Python Pandas - Reading & Writing JSON Files
Python Pandas - Reading Data from an Excel File
Python Pandas - Writing Data to Excel Files
Python Pandas - Working with HTML Data
Python Pandas - Clipboard
Python Pandas - Working with HDF5 Format
Python Pandas - Comparison with SQL
Python Pandas - Data Handling
Python Pandas - Sorting
Python Pandas - Reindexing
Python Pandas - Iteration
Python Pandas - Concatenation
Python Pandas - Statistical Functions
Python Pandas - Descriptive Statistics
Python Pandas - Working with Text Data
Python Pandas - Function Application
Python Pandas - Options & Customization
Python Pandas - Window Functions
Python Pandas - Aggregations
Python Pandas - Merging/Joining
Python Pandas - MultiIndex
Python Pandas - Basics of MultiIndex
Python Pandas - Indexing with MultiIndex
Python Pandas - Advanced Reindexing with MultiIndex
Python Pandas - Renaming MultiIndex Labels
Python Pandas - Sorting a MultiIndex
Python Pandas - Binary Operations
Python Pandas - Binary Comparison Operations
Python Pandas - Boolean Indexing
Python Pandas - Boolean Masking
Python Pandas - Data Reshaping & Pivoting
Python Pandas - Pivoting
Python Pandas - Stacking & Unstacking
Python Pandas - Melting
Python Pandas - Computing Dummy Variables
Python Pandas - Categorical Data
Python Pandas - Categorical Data
Python Pandas - Ordering & Sorting Categorical Data
Python Pandas - Comparing Categorical Data
Python Pandas - Handling Missing Data
Python Pandas - Missing Data
Python Pandas - Filling Missing Data
Python Pandas - Interpolation of Missing Values
Python Pandas - Dropping Missing Data
Python Pandas - Calculations with Missing Data
Python Pandas - Handling Duplicates
Python Pandas - Duplicated Data
Python Pandas - Counting & Retrieving Unique Elements
Python Pandas - Duplicated Labels
Python Pandas - Grouping & Aggregation
Python Pandas - GroupBy
Python Pandas - Time-series Data
Python Pandas - Date Functionality
Python Pandas - Timedelta
Python Pandas - Sparse Data Structures
Python Pandas - Sparse Data
Python Pandas - Visualization
Python Pandas - Visualization
Python Pandas - Additional Concepts
Python Pandas - Caveats & Gotchas

Python Pandas - Calculations with Missing Data

When working with data, you will often come across missing values, which are represented as NaN (Not a Number) in Pandas. Calculations with the missing values requires more attention since NaN values propagate through most arithmetic operations, which may alter the results.

Pandas offers flexible ways to manage missing data during calculations, allowing you to control how these values affect your results. In this tutorial, we will learn how Pandas handles missing data during calculations, including arithmetic operations, descriptive statistics, and cumulative operations.

Arithmetic Operations with Missing Data

When performing arithmetic operations between Pandas objects, missing values (NaN) are propagated by default. For example, when you add two series with NaN values, the result will also have NaN wherever there was a missing value in any of series.

Example

The following example demonstrates performing the arithmetic operations between two series objects with missing values.

import pandas as pdimport numpy as np# Create 2 input series objectsser1 = pd.Series([1, np.nan, np.nan, 2])ser2 = pd.Series([2, np.nan, 1, np.nan])# Display the seriesprint("Input Series 1:\n",ser1)print("\nInput Series 2:\n",ser2)# Adding two series with NaN valuesresult = ser1 + ser2print('\nResult After adding Two series:\n',result)

Following is the output of the above code −

Input Series 1: 0    1.01    NaN2    NaN3    2.0dtype: float64Input Series 2: 0    2.01    NaN2    1.03    NaNdtype: float64Result After adding Two series: 0    3.01    NaN2    NaN3    NaNdtype: float64

Handling Missing Data in Descriptive Statistics

The Pandas library provides several methods for computingdescriptive statistics, such as summing, calculating the product, or finding the cumulative sum or product. These methods are designed to handle missing data efficiently.

Example: Summing with Missing Values

When summing data with missing values, NaN values are excluded. This allows you to calculate meaningful totals even when some data is missing.

The following example performing the summing operation on a DataFrame column using thesum() function. By default, NaN values are skipped in summation operation.

import pandas as pdimport numpy as np# Create a sample DataFramedata = {'A': [np.nan, 2, np.nan, 4], 'B': [5, 6, 7, 8]}df = pd.DataFrame(data)# Display the input DataFrameprint("Input DataFrame:\n", df)# Summing a column with NaN valuesresult = df['A'].sum()print('\nResult After Summing the values of a column:\n',result)

Following is the output of the above code −

Input DataFrame:

	A	B
0	NaN	5
1	2.0	6
2	NaN	7
3	4.0	8

Result After Summing the values of a column:6.0

Example: Product Calculation with Missing Values

Similar to summing, when calculating the product of values with the missing data (NaN) is treated as 1. This ensures that missing values do not alter the final product.

The following example uses the pandasdf.prod() function to calculate the product of a pandas object.

import pandas as pdimport numpy as np# Create a sample DataFramedata = {'A': [np.nan, 2, np.nan, 4], 'B': [5, 6, np.nan, np.nan]}df = pd.DataFrame(data)# Display the input DataFrameprint("Input DataFrame:\n", df)# Product with NaN valuesresult = df.prod()print('\nResult After Product the values of a DataFrame:\n',result)

Following is the output of the above code −

Input DataFrame:

	A	B
0	NaN	5.0
1	2.0	6.0
2	NaN	NaN
3	4.0	NaN

Result After Product the values of a DataFrame:A 8.0B 30.0dtype: float64

Cumulative Operations with Missing Data

Pandas provides cumulative methods likecumsum() andcumprod() to generate running totals or products. By default, these methods ignore missing values but preserve them in the output. If you want to include the missing data in the calculation, you can set theskipna parameter to False.

Example: Cumulative Sum with Missing Values

The following example demonstrates calculating the cumulative sum of a DataFrame with missing values using thedf.cumsum() method.

import pandas as pdimport numpy as np# Create a sample DataFramedata = {'A': [np.nan, 2, np.nan, 4], 'B': [5, 6, np.nan, np.nan]}df = pd.DataFrame(data)# Display the input DataFrameprint("Input DataFrame:\n", df)# Calculate cumulative sum by ignoring NaNprint('Cumulative sum by ignoring NaN:\n',df.cumsum())

Following is the output of the above code −

Input DataFrame:

	A	B
0	NaN	5.0
1	2.0	6.0
2	NaN	NaN
3	4.0	NaN

Cumulative sum by ignoring NaN:

	A	B
0	NaN	5.0
1	2.0	11.0
2	NaN	NaN
3	6.0	NaN

From the above output you can observe that, the missing values are skipped, and the cumulative sum is computed for the available values.

Example: Including NaN in Cumulative Sum

This example shows how the cumulative sum is performed by including the missing using thedf.cumsum() method by setting theskipna=False.

import pandas as pdimport numpy as np# Create a sample DataFramedata = {'A': [np.nan, 2, np.nan, 4], 'B': [5, 6, np.nan, np.nan]}df = pd.DataFrame(data)# Display the input DataFrameprint("Input DataFrame:\n", df)# Calculate the cumulative sum by preserving NaNprint('Cumulative sum by including NaN:\n', df.cumsum(skipna=False))

Following is the output of the above code −

Input DataFrame:

	A	B
0	NaN	5.0
1	2.0	6.0
2	NaN	NaN
3	4.0	NaN

Cumulative sum by including NaN:

	A	B
0	NaN	5.0
1	NaN	11.0
2	NaN	NaN
3	NaN	NaN

Withskipna=False, the cumulative sum stops when it encounters a NaN value, and all subsequent values are also become NaN.

Print Page

Movatterモバイル変換

Python Pandas - Calculations with Missing Data

Arithmetic Operations with Missing Data

Example

Handling Missing Data in Descriptive Statistics

Example: Summing with Missing Values

Example: Product Calculation with Missing Values

Cumulative Operations with Missing Data

Example: Cumulative Sum with Missing Values

Example: Including NaN in Cumulative Sum