Nullable integer data type #

Note

IntegerArray is currently experimental. Its API or implementation maychange without warning. Usespandas.NA as the missing value.

InWorking with missing data, we saw that pandas primarily usesNaN to representmissing data. BecauseNaN is a float, this forces an array of integers withany missing values to become floating point. In some cases, this may not mattermuch. But if your integer column is, say, an identifier, casting to float canbe problematic. Some integers cannot even be represented as floating pointnumbers.

Construction#

pandas can represent integer data with possibly missing values usingarrays.IntegerArray. This is anextension typeimplemented within pandas.

In [1]:arr=pd.array([1,2,None],dtype=pd.Int64Dtype())In [2]:arrOut[2]:<IntegerArray>[1, 2, <NA>]Length: 3, dtype: Int64

Or the string alias"Int64" (note the capital"I") to differentiate fromNumPy’s'int64' dtype:

In [3]:pd.array([1,2,np.nan],dtype="Int64")Out[3]:<IntegerArray>[1, 2, <NA>]Length: 3, dtype: Int64

All NA-like values are replaced withpandas.NA.

In [4]:pd.array([1,2,np.nan,None,pd.NA],dtype="Int64")Out[4]:<IntegerArray>[1, 2, <NA>, <NA>, <NA>]Length: 5, dtype: Int64

This array can be stored in aDataFrame orSeries like anyNumPy array.

In [5]:pd.Series(arr)Out[5]:0       11       22    <NA>dtype: Int64

You can also pass the list-like object to theSeries constructorwith the dtype.

Warning

Currentlypandas.array() andpandas.Series() use differentrules for dtype inference.pandas.array() will infer anullable-integer dtype

In [6]:pd.array([1,None])Out[6]:<IntegerArray>[1, <NA>]Length: 2, dtype: Int64In [7]:pd.array([1,2])Out[7]:<IntegerArray>[1, 2]Length: 2, dtype: Int64

For backwards-compatibility,Series infers these as eitherinteger or float dtype.

In [8]:pd.Series([1,None])Out[8]:0    1.01    NaNdtype: float64In [9]:pd.Series([1,2])Out[9]:0    11    2dtype: int64

We recommend explicitly providing the dtype to avoid confusion.

In [10]:pd.array([1,None],dtype="Int64")Out[10]:<IntegerArray>[1, <NA>]Length: 2, dtype: Int64In [11]:pd.Series([1,None],dtype="Int64")Out[11]:0       11    <NA>dtype: Int64

In the future, we may provide an option forSeries to infer anullable-integer dtype.

Operations#

Operations involving an integer array will behave similar to NumPy arrays.Missing values will be propagated, and the data will be coerced to anotherdtype if needed.

In [12]:s=pd.Series([1,2,None],dtype="Int64")# arithmeticIn [13]:s+1Out[13]:0       21       32    <NA>dtype: Int64# comparisonIn [14]:s==1Out[14]:0     True1    False2     <NA>dtype: boolean# slicing operationIn [15]:s.iloc[1:3]Out[15]:1       22    <NA>dtype: Int64# operate with other dtypesIn [16]:s+s.iloc[1:3].astype("Int8")Out[16]:0    <NA>1       42    <NA>dtype: Int64# coerce when neededIn [17]:s+0.01Out[17]:0    1.011    2.012    <NA>dtype: Float64

These dtypes can operate as part of aDataFrame.

In [18]:df=pd.DataFrame({"A":s,"B":[1,1,3],"C":list("aab")})In [19]:dfOut[19]:      A  B  C0     1  1  a1     2  1  a2  <NA>  3  bIn [20]:df.dtypesOut[20]:A     Int64B     int64C    objectdtype: object

These dtypes can be merged, reshaped & casted.

In [21]:pd.concat([df[["A"]],df[["B","C"]]],axis=1).dtypesOut[21]:A     Int64B     int64C    objectdtype: objectIn [22]:df["A"].astype(float)Out[22]:0    1.01    2.02    NaNName: A, dtype: float64

Reduction and groupby operations such assum() work as well.

In [23]:df.sum(numeric_only=True)Out[23]:A    3B    5dtype: Int64In [24]:df.sum()Out[24]:A      3B      5C    aabdtype: objectIn [25]:df.groupby("B").A.sum()Out[25]:B1    33    0Name: A, dtype: Int64

Scalar NA Value#

arrays.IntegerArray usespandas.NA as its scalarmissing value. Slicing a single element that’s missing will returnpandas.NA

In [26]:a=pd.array([1,None],dtype="Int64")In [27]:a[1]Out[27]:<NA>

On this page

Show Source

Movatterモバイル変換

Nullable integer data type#

Construction#

Operations#

Scalar NA Value#

Nullable integer data type #