Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
Ctrl+K

pandas arrays, scalars, and data types#

Objects#

For most data types, pandas uses NumPy arrays as the concreteobjects contained with aIndex,Series, orDataFrame.

For some data types, pandas extends NumPy’s type system. String aliases for these typescan be found atdtypes.

Kind of Data

pandas Data Type

Scalar

Array

TZ-aware datetime

DatetimeTZDtype

Timestamp

Datetimes

Timedeltas

(none)

Timedelta

Timedeltas

Period (time spans)

PeriodDtype

Period

Periods

Intervals

IntervalDtype

Interval

Intervals

Nullable Integer

Int64Dtype, …

(none)

Nullable integer

Nullable Float

Float64Dtype, …

(none)

Nullable float

Categorical

CategoricalDtype

(none)

Categoricals

Sparse

SparseDtype

(none)

Sparse

Strings

StringDtype

str

Strings

Nullable Boolean

BooleanDtype

bool

Nullable Boolean

PyArrow

ArrowDtype

Python Scalars orNA

PyArrow

pandas and third-party libraries can extend NumPy’s type system (seeExtension types).The top-levelarray() method can be used to create a new array, which may bestored in aSeries,Index, or as a column in aDataFrame.

array(data[, dtype, copy])

Create an array.

PyArrow#

Warning

This feature is experimental, and the API can change in a future release without warning.

Thearrays.ArrowExtensionArray is backed by apyarrow.ChunkedArray with apyarrow.DataType instead of a NumPy array and data type. The.dtype of aarrays.ArrowExtensionArrayis anArrowDtype.

Pyarrow provides similar array anddata typesupport as NumPy including first-class nullability support for all data types, immutability and more.

The table below shows the equivalent pyarrow-backed (pa), pandas extension, and numpy (np) types that are recognized by pandas.Pyarrow-backed types below need to be passed intoArrowDtype to be recognized by pandas e.g.pd.ArrowDtype(pa.bool_())

PyArrow type

pandas extension type

NumPy type

pyarrow.bool_()

BooleanDtype

np.bool_

pyarrow.int8()

Int8Dtype

np.int8

pyarrow.int16()

Int16Dtype

np.int16

pyarrow.int32()

Int32Dtype

np.int32

pyarrow.int64()

Int64Dtype

np.int64

pyarrow.uint8()

UInt8Dtype

np.uint8

pyarrow.uint16()

UInt16Dtype

np.uint16

pyarrow.uint32()

UInt32Dtype

np.uint32

pyarrow.uint64()

UInt64Dtype

np.uint64

pyarrow.float32()

Float32Dtype

np.float32

pyarrow.float64()

Float64Dtype

np.float64

pyarrow.time32()

(none)

(none)

pyarrow.time64()

(none)

(none)

pyarrow.timestamp()

DatetimeTZDtype

np.datetime64

pyarrow.date32()

(none)

(none)

pyarrow.date64()

(none)

(none)

pyarrow.duration()

(none)

np.timedelta64

pyarrow.binary()

(none)

(none)

pyarrow.string()

StringDtype

np.str_

pyarrow.decimal128()

(none)

(none)

pyarrow.list_()

(none)

(none)

pyarrow.map_()

(none)

(none)

pyarrow.dictionary()

CategoricalDtype

(none)

Note

Pyarrow-backed string support is provided by bothpd.StringDtype("pyarrow") andpd.ArrowDtype(pa.string()).pd.StringDtype("pyarrow") is described below in thestring sectionand will be returned if the string alias"string[pyarrow]" is specified.pd.ArrowDtype(pa.string())generally has better interoperability withArrowDtype of different types.

While individual values in anarrays.ArrowExtensionArray are stored as a PyArrow objects, scalars arereturnedas Python scalars corresponding to the data type, e.g. a PyArrow int64 will be returned as Python int, orNA for missingvalues.

arrays.ArrowExtensionArray(values)

Pandas ExtensionArray backed by a PyArrow ChunkedArray.

ArrowDtype(pyarrow_dtype)

An ExtensionDtype for PyArrow data types.

For more information, please see thePyArrow user guide

Datetimes#

NumPy cannot natively represent timezone-aware datetimes. pandas supports thiswith thearrays.DatetimeArray extension array, which can hold timezone-naiveor timezone-aware values.

Timestamp, a subclass ofdatetime.datetime, is pandas’scalar type for timezone-naive or timezone-aware datetime data.NaTis the missing value for datetime data.

Timestamp([ts_input, year, month, day, ...])

Pandas replacement for python datetime.datetime object.

Properties#

Timestamp.asm8

Return numpy datetime64 format in nanoseconds.

Timestamp.day

Timestamp.dayofweek

Return day of the week.

Timestamp.day_of_week

Return day of the week.

Timestamp.dayofyear

Return the day of the year.

Timestamp.day_of_year

Return the day of the year.

Timestamp.days_in_month

Return the number of days in the month.

Timestamp.daysinmonth

Return the number of days in the month.

Timestamp.fold

Timestamp.hour

Timestamp.is_leap_year

Return True if year is a leap year.

Timestamp.is_month_end

Check if the date is the last day of the month.

Timestamp.is_month_start

Check if the date is the first day of the month.

Timestamp.is_quarter_end

Check if date is last day of the quarter.

Timestamp.is_quarter_start

Check if the date is the first day of the quarter.

Timestamp.is_year_end

Return True if date is last day of the year.

Timestamp.is_year_start

Return True if date is first day of the year.

Timestamp.max

Timestamp.microsecond

Timestamp.min

Timestamp.minute

Timestamp.month

Timestamp.nanosecond

Timestamp.quarter

Return the quarter of the year.

Timestamp.resolution

Timestamp.second

Timestamp.tz

Alias for tzinfo.

Timestamp.tzinfo

Timestamp.unit

The abbreviation associated with self._creso.

Timestamp.value

Timestamp.week

Return the week number of the year.

Timestamp.weekofyear

Return the week number of the year.

Timestamp.year

Methods#

Timestamp.as_unit(unit[, round_ok])

Convert the underlying int64 representaton to the given unit.

Timestamp.astimezone(tz)

Convert timezone-aware Timestamp to another time zone.

Timestamp.ceil(freq[, ambiguous, nonexistent])

Return a new Timestamp ceiled to this resolution.

Timestamp.combine(date, time)

Combine date, time into datetime with same date and time fields.

Timestamp.ctime()

Return ctime() style string.

Timestamp.date()

Return date object with same year, month and day.

Timestamp.day_name([locale])

Return the day name of the Timestamp with specified locale.

Timestamp.dst()

Return the daylight saving time (DST) adjustment.

Timestamp.floor(freq[, ambiguous, nonexistent])

Return a new Timestamp floored to this resolution.

Timestamp.fromordinal(ordinal[, tz])

Construct a timestamp from a a proleptic Gregorian ordinal.

Timestamp.fromtimestamp(ts)

Transform timestamp[, tz] to tz's local time from POSIX timestamp.

Timestamp.isocalendar()

Return a named tuple containing ISO year, week number, and weekday.

Timestamp.isoformat([sep, timespec])

Return the time formatted according to ISO 8601.

Timestamp.isoweekday()

Return the day of the week represented by the date.

Timestamp.month_name([locale])

Return the month name of the Timestamp with specified locale.

Timestamp.normalize()

Normalize Timestamp to midnight, preserving tz information.

Timestamp.now([tz])

Return new Timestamp object representing current time local to tz.

Timestamp.replace([year, month, day, hour, ...])

Implements datetime.replace, handles nanoseconds.

Timestamp.round(freq[, ambiguous, nonexistent])

Round the Timestamp to the specified resolution.

Timestamp.strftime(format)

Return a formatted string of the Timestamp.

Timestamp.strptime(string, format)

Function is not implemented.

Timestamp.time()

Return time object with same time but with tzinfo=None.

Timestamp.timestamp()

Return POSIX timestamp as float.

Timestamp.timetuple()

Return time tuple, compatible with time.localtime().

Timestamp.timetz()

Return time object with same time and tzinfo.

Timestamp.to_datetime64()

Return a numpy.datetime64 object with same precision.

Timestamp.to_numpy([dtype, copy])

Convert the Timestamp to a NumPy datetime64.

Timestamp.to_julian_date()

Convert TimeStamp to a Julian Date.

Timestamp.to_period([freq])

Return an period of which this timestamp is an observation.

Timestamp.to_pydatetime([warn])

Convert a Timestamp object to a native Python datetime object.

Timestamp.today([tz])

Return the current time in the local timezone.

Timestamp.toordinal()

Return proleptic Gregorian ordinal.

Timestamp.tz_convert(tz)

Convert timezone-aware Timestamp to another time zone.

Timestamp.tz_localize(tz[, ambiguous, ...])

Localize the Timestamp to a timezone.

Timestamp.tzname()

Return time zone name.

Timestamp.utcfromtimestamp(ts)

Construct a timezone-aware UTC datetime from a POSIX timestamp.

Timestamp.utcnow()

Return a new Timestamp representing UTC day and time.

Timestamp.utcoffset()

Return utc offset.

Timestamp.utctimetuple()

Return UTC time tuple, compatible with time.localtime().

Timestamp.weekday()

Return the day of the week represented by the date.

A collection of timestamps may be stored in aarrays.DatetimeArray.For timezone-aware data, the.dtype of aarrays.DatetimeArray is aDatetimeTZDtype. For timezone-naive data,np.dtype("datetime64[ns]")is used.

If the data are timezone-aware, then every value in the array must have the same timezone.

arrays.DatetimeArray(values[, dtype, freq, copy])

Pandas ExtensionArray for tz-naive or tz-aware datetime data.

DatetimeTZDtype([unit, tz])

An ExtensionDtype for timezone-aware datetime data.

Timedeltas#

NumPy can natively represent timedeltas. pandas providesTimedeltafor symmetry withTimestamp.NaTis the missing value for timedelta data.

Timedelta([value, unit])

Represents a duration, the difference between two dates or times.

Properties#

Timedelta.asm8

Return a numpy timedelta64 array scalar view.

Timedelta.components

Return a components namedtuple-like.

Timedelta.days

Returns the days of the timedelta.

Timedelta.max

Timedelta.microseconds

Timedelta.min

Timedelta.nanoseconds

Return the number of nanoseconds (n), where 0 <= n < 1 microsecond.

Timedelta.resolution

Timedelta.seconds

Return the total hours, minutes, and seconds of the timedelta as seconds.

Timedelta.unit

Timedelta.value

Timedelta.view(dtype)

Array view compatibility.

Methods#

Timedelta.as_unit(unit[, round_ok])

Convert the underlying int64 representation to the given unit.

Timedelta.ceil(freq)

Return a new Timedelta ceiled to this resolution.

Timedelta.floor(freq)

Return a new Timedelta floored to this resolution.

Timedelta.isoformat()

Format the Timedelta as ISO 8601 Duration.

Timedelta.round(freq)

Round the Timedelta to the specified resolution.

Timedelta.to_pytimedelta()

Convert a pandas Timedelta object into a pythondatetime.timedelta object.

Timedelta.to_timedelta64()

Return a numpy.timedelta64 object with 'ns' precision.

Timedelta.to_numpy([dtype, copy])

Convert the Timedelta to a NumPy timedelta64.

Timedelta.total_seconds()

Total seconds in the duration.

A collection ofTimedelta may be stored in aTimedeltaArray.

arrays.TimedeltaArray(values[, dtype, freq, ...])

Pandas ExtensionArray for timedelta data.

Periods#

pandas represents spans of times asPeriod objects.

Period#

Period([value, freq, ordinal, year, month, ...])

Represents a period of time.

Properties#

Period.day

Get day of the month that a Period falls on.

Period.dayofweek

Day of the week the period lies in, with Monday=0 and Sunday=6.

Period.day_of_week

Day of the week the period lies in, with Monday=0 and Sunday=6.

Period.dayofyear

Return the day of the year.

Period.day_of_year

Return the day of the year.

Period.days_in_month

Get the total number of days in the month that this period falls on.

Period.daysinmonth

Get the total number of days of the month that this period falls on.

Period.end_time

Get the Timestamp for the end of the period.

Period.freq

Period.freqstr

Return a string representation of the frequency.

Period.hour

Get the hour of the day component of the Period.

Period.is_leap_year

Return True if the period's year is in a leap year.

Period.minute

Get minute of the hour component of the Period.

Period.month

Return the month this Period falls on.

Period.ordinal

Period.quarter

Return the quarter this Period falls on.

Period.qyear

Fiscal year the Period lies in according to its starting-quarter.

Period.second

Get the second component of the Period.

Period.start_time

Get the Timestamp for the start of the period.

Period.week

Get the week of the year on the given Period.

Period.weekday

Day of the week the period lies in, with Monday=0 and Sunday=6.

Period.weekofyear

Get the week of the year on the given Period.

Period.year

Return the year this Period falls on.

Methods#

Period.asfreq(freq[, how])

Convert Period to desired frequency, at the start or end of the interval.

Period.now(freq)

Return the period of now's date.

Period.strftime(fmt)

Returns a formatted string representation of thePeriod.

Period.to_timestamp([freq, how])

Return the Timestamp representation of the Period.

A collection ofPeriod may be stored in aarrays.PeriodArray.Every period in aarrays.PeriodArray must have the samefreq.

arrays.PeriodArray(values[, dtype, freq, copy])

Pandas ExtensionArray for storing Period data.

PeriodDtype(freq)

An ExtensionDtype for Period data.

Intervals#

Arbitrary intervals can be represented asInterval objects.

Interval

Immutable object implementing an Interval, a bounded slice-like interval.

Properties#

Interval.closed

String describing the inclusive side the intervals.

Interval.closed_left

Check if the interval is closed on the left side.

Interval.closed_right

Check if the interval is closed on the right side.

Interval.is_empty

Indicates if an interval is empty, meaning it contains no points.

Interval.left

Left bound for the interval.

Interval.length

Return the length of the Interval.

Interval.mid

Return the midpoint of the Interval.

Interval.open_left

Check if the interval is open on the left side.

Interval.open_right

Check if the interval is open on the right side.

Interval.overlaps(other)

Check whether two Interval objects overlap.

Interval.right

Right bound for the interval.

A collection of intervals may be stored in anarrays.IntervalArray.

arrays.IntervalArray(data[, closed, dtype, ...])

Pandas array for interval data that are closed on the same side.

IntervalDtype([subtype, closed])

An ExtensionDtype for Interval data.

Nullable integer#

numpy.ndarray cannot natively represent integer-data with missing values.pandas provides this througharrays.IntegerArray.

arrays.IntegerArray(values, mask[, copy])

Array of integer (optional missing) values.

Int8Dtype()

An ExtensionDtype for int8 integer data.

Int16Dtype()

An ExtensionDtype for int16 integer data.

Int32Dtype()

An ExtensionDtype for int32 integer data.

Int64Dtype()

An ExtensionDtype for int64 integer data.

UInt8Dtype()

An ExtensionDtype for uint8 integer data.

UInt16Dtype()

An ExtensionDtype for uint16 integer data.

UInt32Dtype()

An ExtensionDtype for uint32 integer data.

UInt64Dtype()

An ExtensionDtype for uint64 integer data.

Nullable float#

arrays.FloatingArray(values, mask[, copy])

Array of floating (optional missing) values.

Float32Dtype()

An ExtensionDtype for float32 data.

Float64Dtype()

An ExtensionDtype for float64 data.

Categoricals#

pandas defines a custom data type for representing data that can take only alimited, fixed set of values. The dtype of aCategorical can be described byaCategoricalDtype.

CategoricalDtype([categories, ordered])

Type for categorical data with the categories and orderedness.

CategoricalDtype.categories

AnIndex containing the unique categories allowed.

CategoricalDtype.ordered

Whether the categories have an ordered relationship.

Categorical data can be stored in apandas.Categorical

Categorical(values[, categories, ordered, ...])

Represent a categorical variable in classic R / S-plus fashion.

The alternativeCategorical.from_codes() constructor can be used when youhave the categories and integer codes already:

Categorical.from_codes(codes[, categories, ...])

Make a Categorical type from codes and categories or dtype.

The dtype information is available on theCategorical

Categorical.dtype

TheCategoricalDtype for this instance.

Categorical.categories

The categories of this categorical.

Categorical.ordered

Whether the categories have an ordered relationship.

Categorical.codes

The category codes of this categorical index.

np.asarray(categorical) works by implementing the array interface. Be aware, that this convertstheCategorical back to a NumPy array, so categories and order information is not preserved!

Categorical.__array__([dtype, copy])

The numpy array interface.

ACategorical can be stored in aSeries orDataFrame.To create a Series of dtypecategory, usecat=s.astype(dtype) orSeries(...,dtype=dtype) wheredtype is either

If theSeries is of dtypeCategoricalDtype,Series.cat can be used to change the categoricaldata. SeeCategorical accessor for more.

Sparse#

Data where a single value is repeated many times (e.g.0 orNaN) maybe stored efficiently as aarrays.SparseArray.

arrays.SparseArray(data[, sparse_index, ...])

An ExtensionArray for storing sparse data.

SparseDtype([dtype, fill_value])

Dtype for data stored inSparseArray.

TheSeries.sparse accessor may be used to access sparse-specific attributesand methods if theSeries contains sparse values. SeeSparse accessor andthe user guide for more.

Strings#

When working with text data, where each valid element is a string or missing,we recommend usingStringDtype (with the alias"string").

arrays.StringArray(values[, copy])

Extension array for string data.

arrays.ArrowStringArray(values)

Extension array for string data in apyarrow.ChunkedArray.

StringDtype([storage])

Extension dtype for string data.

TheSeries.str accessor is available forSeries backed by aarrays.StringArray.SeeString handling for more.

Nullable Boolean#

The boolean dtype (with the alias"boolean") provides support for storingboolean data (True,False) with missing values, which is not possiblewith a boolnumpy.ndarray.

arrays.BooleanArray(values, mask[, copy])

Array of boolean (True/False) data with missing values.

BooleanDtype()

Extension dtype for boolean data.

Utilities#

Constructors#

api.types.union_categoricals(to_union[, ...])

Combine list-like of Categorical-like, unioning categories.

api.types.infer_dtype(value[, skipna])

Return a string label of the type of a scalar or list-like of values.

api.types.pandas_dtype(dtype)

Convert input into a pandas only dtype object or a numpy dtype object.

Data type introspection#

api.types.is_any_real_numeric_dtype(arr_or_dtype)

Check whether the provided array or dtype is of a real number dtype.

api.types.is_bool_dtype(arr_or_dtype)

Check whether the provided array or dtype is of a boolean dtype.

api.types.is_categorical_dtype(arr_or_dtype)

(DEPRECATED) Check whether an array-like or dtype is of the Categorical dtype.

api.types.is_complex_dtype(arr_or_dtype)

Check whether the provided array or dtype is of a complex dtype.

api.types.is_datetime64_any_dtype(arr_or_dtype)

Check whether the provided array or dtype is of the datetime64 dtype.

api.types.is_datetime64_dtype(arr_or_dtype)

Check whether an array-like or dtype is of the datetime64 dtype.

api.types.is_datetime64_ns_dtype(arr_or_dtype)

Check whether the provided array or dtype is of the datetime64[ns] dtype.

api.types.is_datetime64tz_dtype(arr_or_dtype)

(DEPRECATED) Check whether an array-like or dtype is of a DatetimeTZDtype dtype.

api.types.is_extension_array_dtype(arr_or_dtype)

Check if an object is a pandas extension array type.

api.types.is_float_dtype(arr_or_dtype)

Check whether the provided array or dtype is of a float dtype.

api.types.is_int64_dtype(arr_or_dtype)

(DEPRECATED) Check whether the provided array or dtype is of the int64 dtype.

api.types.is_integer_dtype(arr_or_dtype)

Check whether the provided array or dtype is of an integer dtype.

api.types.is_interval_dtype(arr_or_dtype)

(DEPRECATED) Check whether an array-like or dtype is of the Interval dtype.

api.types.is_numeric_dtype(arr_or_dtype)

Check whether the provided array or dtype is of a numeric dtype.

api.types.is_object_dtype(arr_or_dtype)

Check whether an array-like or dtype is of the object dtype.

api.types.is_period_dtype(arr_or_dtype)

(DEPRECATED) Check whether an array-like or dtype is of the Period dtype.

api.types.is_signed_integer_dtype(arr_or_dtype)

Check whether the provided array or dtype is of a signed integer dtype.

api.types.is_string_dtype(arr_or_dtype)

Check whether the provided array or dtype is of the string dtype.

api.types.is_timedelta64_dtype(arr_or_dtype)

Check whether an array-like or dtype is of the timedelta64 dtype.

api.types.is_timedelta64_ns_dtype(arr_or_dtype)

Check whether the provided array or dtype is of the timedelta64[ns] dtype.

api.types.is_unsigned_integer_dtype(arr_or_dtype)

Check whether the provided array or dtype is of an unsigned integer dtype.

api.types.is_sparse(arr)

(DEPRECATED) Check whether an array-like is a 1-D pandas sparse array.

Iterable introspection#

api.types.is_dict_like(obj)

Check if the object is dict-like.

api.types.is_file_like(obj)

Check if the object is a file-like object.

api.types.is_list_like(obj[, allow_sets])

Check if the object is list-like.

api.types.is_named_tuple(obj)

Check if the object is a named tuple.

api.types.is_iterator(obj)

Check if the object is an iterator.

Scalar introspection#

api.types.is_bool(obj)

Return True if given object is boolean.

api.types.is_complex(obj)

Return True if given object is complex.

api.types.is_float(obj)

Return True if given object is float.

api.types.is_hashable(obj)

Return True if hash(obj) will succeed, False otherwise.

api.types.is_integer(obj)

Return True if given object is integer.

api.types.is_interval(obj)

api.types.is_number(obj)

Check if the object is a number.

api.types.is_re(obj)

Check if the object is a regex pattern instance.

api.types.is_re_compilable(obj)

Check if the object can be compiled into a regex pattern instance.

api.types.is_scalar(val)

Return True if given object is scalar.


[8]ページ先頭

©2009-2025 Movatter.jp