- API reference
- DataFrame
- pandas.DataF...
pandas.DataFrame.duplicated#
- DataFrame.duplicated(subset=None,keep='first')[source]#
Return boolean Series denoting duplicate rows.
Considering certain columns is optional.
- Parameters:
- subsetcolumn label or sequence of labels, optional
Only consider certain columns for identifying duplicates, bydefault use all of the columns.
- keep{‘first’, ‘last’, False}, default ‘first’
Determines which duplicates (if any) to mark.
first
: Mark duplicates asTrue
except for the first occurrence.last
: Mark duplicates asTrue
except for the last occurrence.False : Mark all duplicates as
True
.
- Returns:
- Series
Boolean series for each duplicated rows.
See also
Index.duplicated
Equivalent method on index.
Series.duplicated
Equivalent method on Series.
Series.drop_duplicates
Remove duplicate values from Series.
DataFrame.drop_duplicates
Remove duplicate values from DataFrame.
Examples
Consider dataset containing ramen rating.
>>>df=pd.DataFrame({...'brand':['Yum Yum','Yum Yum','Indomie','Indomie','Indomie'],...'style':['cup','cup','cup','pack','pack'],...'rating':[4,4,3.5,15,5]...})>>>df brand style rating0 Yum Yum cup 4.01 Yum Yum cup 4.02 Indomie cup 3.53 Indomie pack 15.04 Indomie pack 5.0
By default, for each set of duplicated values, the first occurrenceis set on False and all others on True.
>>>df.duplicated()0 False1 True2 False3 False4 Falsedtype: bool
By using ‘last’, the last occurrence of each set of duplicated valuesis set on False and all others on True.
>>>df.duplicated(keep='last')0 True1 False2 False3 False4 Falsedtype: bool
By setting
keep
on False, all duplicates are True.>>>df.duplicated(keep=False)0 True1 True2 False3 False4 Falsedtype: bool
To find duplicates on specific column(s), use
subset
.>>>df.duplicated(subset=['brand'])0 False1 True2 False3 True4 Truedtype: bool