Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
Ctrl+K

pandas.DataFrame.duplicated#

DataFrame.duplicated(subset=None,keep='first')[source]#

Return boolean Series denoting duplicate rows.

Considering certain columns is optional.

Parameters:
subsetcolumn label or sequence of labels, optional

Only consider certain columns for identifying duplicates, bydefault use all of the columns.

keep{‘first’, ‘last’, False}, default ‘first’

Determines which duplicates (if any) to mark.

  • first : Mark duplicates asTrue except for the first occurrence.

  • last : Mark duplicates asTrue except for the last occurrence.

  • False : Mark all duplicates asTrue.

Returns:
Series

Boolean series for each duplicated rows.

See also

Index.duplicated

Equivalent method on index.

Series.duplicated

Equivalent method on Series.

Series.drop_duplicates

Remove duplicate values from Series.

DataFrame.drop_duplicates

Remove duplicate values from DataFrame.

Examples

Consider dataset containing ramen rating.

>>>df=pd.DataFrame({...'brand':['Yum Yum','Yum Yum','Indomie','Indomie','Indomie'],...'style':['cup','cup','cup','pack','pack'],...'rating':[4,4,3.5,15,5]...})>>>df    brand style  rating0  Yum Yum   cup     4.01  Yum Yum   cup     4.02  Indomie   cup     3.53  Indomie  pack    15.04  Indomie  pack     5.0

By default, for each set of duplicated values, the first occurrenceis set on False and all others on True.

>>>df.duplicated()0    False1     True2    False3    False4    Falsedtype: bool

By using ‘last’, the last occurrence of each set of duplicated valuesis set on False and all others on True.

>>>df.duplicated(keep='last')0     True1    False2    False3    False4    Falsedtype: bool

By settingkeep on False, all duplicates are True.

>>>df.duplicated(keep=False)0     True1     True2    False3    False4    Falsedtype: bool

To find duplicates on specific column(s), usesubset.

>>>df.duplicated(subset=['brand'])0    False1     True2    False3     True4     Truedtype: bool

[8]ページ先頭

©2009-2025 Movatter.jp