pandas.Index.factorize #

Index.factorize(sort=False,use_na_sentinel=True)[source]#

Encode the object as an enumerated type or categorical variable.

This method is useful for obtaining a numeric representation of anarray when all that matters is identifying distinct values.factorizeis available as both a top-level functionpandas.factorize(),and as a methodSeries.factorize() andIndex.factorize().

Parameters:

sortbool, default False: Sortuniques and shufflecodes to maintain therelationship.
use_na_sentinelbool, default True: If True, the sentinel -1 will be used for NaN values. If False,NaN values will be encoded as non-negative integers and will not drop theNaN from the uniques of the values.
Added in version 1.5.0.

Returns:

codesndarray: An integer ndarray that’s an indexer intouniques.uniques.take(codes) will have the same values asvalues.
uniquesndarray, Index, or Categorical: The unique valid values. Whenvalues is Categorical,uniquesis a Categorical. Whenvalues is some other pandas object, anIndex is returned. Otherwise, a 1-D ndarray is returned.
Note
Even if there’s a missing value invalues,uniques willnot contain an entry for it.

See also

cut: Discretize continuous-valued array.
unique: Find the unique value in an array.

Notes

Referencethe user guide for more examples.

Examples

These examples all show factorize as a top-level method likepd.factorize(values). The results are identical for methods likeSeries.factorize().

>>>codes,uniques=pd.factorize(np.array(['b','b','a','c','b'],dtype="O"))>>>codesarray([0, 0, 1, 2, 0])>>>uniquesarray(['b', 'a', 'c'], dtype=object)

Withsort=True, theuniques will be sorted, andcodes will beshuffled so that the relationship is the maintained.

>>>codes,uniques=pd.factorize(np.array(['b','b','a','c','b'],dtype="O"),...sort=True)>>>codesarray([1, 1, 0, 2, 1])>>>uniquesarray(['a', 'b', 'c'], dtype=object)

Whenuse_na_sentinel=True (the default), missing values are indicated inthecodes with the sentinel value-1 and missing values are notincluded inuniques.

>>>codes,uniques=pd.factorize(np.array(['b',None,'a','c','b'],dtype="O"))>>>codesarray([ 0, -1,  1,  2,  0])>>>uniquesarray(['b', 'a', 'c'], dtype=object)

Thus far, we’ve only factorized lists (which are internally coerced toNumPy arrays). When factorizing pandas objects, the type ofuniqueswill differ. For Categoricals, aCategorical is returned.

>>>cat=pd.Categorical(['a','a','c'],categories=['a','b','c'])>>>codes,uniques=pd.factorize(cat)>>>codesarray([0, 0, 1])>>>uniques['a', 'c']Categories (3, object): ['a', 'b', 'c']

Notice that'b' is inuniques.categories, despite not beingpresent incat.values.

For all other pandas objects, an Index of the appropriate type isreturned.

>>>cat=pd.Series(['a','a','c'])>>>codes,uniques=pd.factorize(cat)>>>codesarray([0, 0, 1])>>>uniquesIndex(['a', 'c'], dtype='object')

If NaN is in the values, and we want to include NaN in the uniques of thevalues, it can be achieved by settinguse_na_sentinel=False.

>>>values=np.array([1,2,1,np.nan])>>>codes,uniques=pd.factorize(values)# default: use_na_sentinel=True>>>codesarray([ 0,  1,  0, -1])>>>uniquesarray([1., 2.])

>>>codes,uniques=pd.factorize(values,use_na_sentinel=False)>>>codesarray([0, 1, 0, 2])>>>uniquesarray([ 1.,  2., nan])

On this page

Show Source

Movatterモバイル変換

pandas.Index.factorize#

pandas.Index.factorize #