- API reference
- DataFrame
- pandas.DataF...
pandas.DataFrame.convert_dtypes#
- DataFrame.convert_dtypes(infer_objects=True,convert_string=True,convert_integer=True,convert_boolean=True,convert_floating=True,dtype_backend='numpy_nullable')[source]#
Convert columns to the best possible dtypes using dtypes supporting
pd.NA
.- Parameters:
- infer_objectsbool, default True
Whether object dtypes should be converted to the best possible types.
- convert_stringbool, default True
Whether object dtypes should be converted to
StringDtype()
.- convert_integerbool, default True
Whether, if possible, conversion can be done to integer extension types.
- convert_booleanbool, defaults True
Whether object dtypes should be converted to
BooleanDtypes()
.- convert_floatingbool, defaults True
Whether, if possible, conversion can be done to floating extension types.Ifconvert_integer is also True, preference will be give to integerdtypes if the floats can be faithfully casted to integers.
- dtype_backend{‘numpy_nullable’, ‘pyarrow’}, default ‘numpy_nullable’
Back-end data type applied to the resultant
DataFrame
(still experimental). Behaviour is as follows:"numpy_nullable"
: returns nullable-dtype-backedDataFrame
(default)."pyarrow"
: returns pyarrow-backed nullableArrowDtype
DataFrame.
Added in version 2.0.
- Returns:
- Series or DataFrame
Copy of input object with new dtype.
See also
infer_objects
Infer dtypes of objects.
to_datetime
Convert argument to datetime.
to_timedelta
Convert argument to timedelta.
to_numeric
Convert argument to a numeric type.
Notes
By default,
convert_dtypes
will attempt to convert a Series (or eachSeries in a DataFrame) to dtypes that supportpd.NA
. By using the optionsconvert_string
,convert_integer
,convert_boolean
andconvert_floating
, it is possible to turn off individual conversionstoStringDtype
, the integer extension types,BooleanDtype
or floating extension types, respectively.For object-dtyped columns, if
infer_objects
isTrue
, use the inferencerules as during normal Series/DataFrame construction. Then, if possible,convert toStringDtype
,BooleanDtype
or an appropriate integeror floating extension type, otherwise leave asobject
.If the dtype is integer, convert to an appropriate integer extension type.
If the dtype is numeric, and consists of all integers, convert to anappropriate integer extension type. Otherwise, convert to anappropriate floating extension type.
In the future, as new dtypes are added that support
pd.NA
, the resultsof this method will change to support those new dtypes.Examples
>>>df=pd.DataFrame(...{..."a":pd.Series([1,2,3],dtype=np.dtype("int32")),..."b":pd.Series(["x","y","z"],dtype=np.dtype("O")),..."c":pd.Series([True,False,np.nan],dtype=np.dtype("O")),..."d":pd.Series(["h","i",np.nan],dtype=np.dtype("O")),..."e":pd.Series([10,np.nan,20],dtype=np.dtype("float")),..."f":pd.Series([np.nan,100.5,200],dtype=np.dtype("float")),...}...)
Start with a DataFrame with default dtypes.
>>>df a b c d e f0 1 x True h 10.0 NaN1 2 y False i NaN 100.52 3 z NaN NaN 20.0 200.0
>>>df.dtypesa int32b objectc objectd objecte float64f float64dtype: object
Convert the DataFrame to use best possible dtypes.
>>>dfn=df.convert_dtypes()>>>dfn a b c d e f0 1 x True h 10 <NA>1 2 y False i <NA> 100.52 3 z <NA> <NA> 20 200.0
>>>dfn.dtypesa Int32b string[python]c booleand string[python]e Int64f Float64dtype: object
Start with a Series of strings and missing data represented by
np.nan
.>>>s=pd.Series(["a","b",np.nan])>>>s0 a1 b2 NaNdtype: object
Obtain a Series with dtype
StringDtype
.>>>s.convert_dtypes()0 a1 b2 <NA>dtype: string