Movatterモバイル変換


[0]ホーム

URL:


Python Pandas Tutorial

Python Pandas - Duplicated Labels



In Pandas row and column labels in both Series and DataFrames are not required to be unique. If a dataset contains the repeated index labels then we call it as duplicated labels, it can lead to unexpected results in some operations such as filtering, aggregating, or slicing.

Pandas provides several methods to detect, manage, and handle such duplicated labels. In this tutorial, we will learn various ways to detect, manage, and handle duplicated labels in Pandas.

Checking for Unique Labels

To check if the row or column labels of a DataFrame are unique, you can use the pandasIndex.is_unique attribute. If it returnsFalse, then it means there are duplicate labels in your Index.

Example

The following example uses the pandasIndex.is_unique attribute for checking the unique labels of a DataFrame.

import pandas as pd# Creating a DataFrame with duplicate row labelsdf = pd.DataFrame({"A": [0, 1, 2], 'B': [4, 1, 1]}, index=["a", "a", "b"])# Display the Original DataFrameprint("Original DataFrame:")print(df)# Check if the row index is uniqueprint("Is row index is unique:",df.index.is_unique)  # Check if the column index is uniqueprint('Is column index is unique:',df.columns.is_unique)

Following is the output of the above code −

Original DataFrame:
AB
a04
a11
b21
Is row index is unique: FalseIs column index is unique: True

Detecting Duplicates Labels

TheIndex.duplicated() method is used to detect duplicates labels of Pandas object, it returns a boolean array indicating whether each label in the Index is duplicated.

Example

The following example uses theIndex.duplicated() method to detect the duplicates row labels of Pandas DataFrame.

import pandas as pd# Creating a DataFrame with duplicate row labelsdf = pd.DataFrame({"A": [0, 1, 2], 'B': [4, 1, 1]}, index=["a", "a", "b"])# Display the Original DataFrameprint("Original DataFrame:")print(df)# Identify duplicated row labelsprint('Duplicated Row Labels:', df.index.duplicated())

Following is the output of the above code −

Original DataFrame:
AB
a04
a11
b21
Duplicated Row Labels: [False True False]

Rejecting Duplicate Labels

Pandas provides an ability to reject the duplicate labels. By default, pandas allows duplicate labels, but you can disallow them by setting.set_flags(allows_duplicate_labels=False). This can be applied to both Series and DataFrames. If pandas detects duplicate labels, it will raise aDuplicateLabelError.

Example

The following example demonstrates creating the Pandas Series object with disallowing the duplicate labels.

import pandas as pd# Create a Series with duplicate labels and disallow duplicatestry:    pd.Series([0, 1, 2], index=["a", "b", "b"]).set_flags(allows_duplicate_labels=False)except pd.errors.DuplicateLabelError as e:    print(e)

Following is the output of the above code −

Index has duplicates.      positionslabel          b        [1, 2]
Print Page
Advertisements

[8]ページ先頭

©2009-2025 Movatter.jp