pandas.DataFrame.describe #

DataFrame.describe(percentiles=None,include=None,exclude=None)[source]#

Generate descriptive statistics.

Descriptive statistics include those that summarize the centraltendency, dispersion and shape of adataset’s distribution, excludingNaN values.

Analyzes both numeric and object series, as wellasDataFrame column sets of mixed data types. The outputwill vary depending on what is provided. Refer to the notesbelow for more detail.

Parameters:

percentileslist-like of numbers, optional

The percentiles to include in the output. All shouldfall between 0 and 1. The default is[.25,.5,.75], which returns the 25th, 50th, and75th percentiles.

include‘all’, list-like of dtypes or None (default), optional

A white list of data types to include in the result. IgnoredforSeries. Here are the options:

‘all’ : All columns of the input will be included in the output.
A list-like of dtypes : Limits the results to theprovided data types.To limit the result to numeric types submitnumpy.number. To limit it instead to object columns submitthenumpy.object data type. Stringscan also be used in the style ofselect_dtypes (e.g.df.describe(include=['O'])). Toselect pandas categorical columns, use'category'
None (default) : The result will include all numeric columns.

excludelist-like of dtypes or None (default), optional,

A black list of data types to omit from the result. IgnoredforSeries. Here are the options:

A list-like of dtypes : Excludes the provided data typesfrom the result. To exclude numeric types submitnumpy.number. To exclude object columns submit the datatypenumpy.object. Strings can also be used in the style ofselect_dtypes (e.g.df.describe(exclude=['O'])). Toexclude pandas categorical columns, use'category'
None (default) : The result will exclude nothing.

Returns:

Series or DataFrame: Summary statistics of the Series or Dataframe provided.

See also

DataFrame.count: Count number of non-NA/null observations.
DataFrame.max: Maximum of the values in the object.
DataFrame.min: Minimum of the values in the object.
DataFrame.mean: Mean of the values.
DataFrame.std: Standard deviation of the observations.
DataFrame.select_dtypes: Subset of a DataFrame including/excluding columns based on their dtype.

Notes

For numeric data, the result’s index will includecount,mean,std,min,max as well as lower,50 andupper percentiles. By default the lower percentile is25 and theupper percentile is75. The50 percentile is thesame as the median.

For object data (e.g. strings or timestamps), the result’s indexwill includecount,unique,top, andfreq. Thetopis the most common value. Thefreq is the most common value’sfrequency. Timestamps also include thefirst andlast items.

If multiple object values have the highest count, then thecount andtop results will be arbitrarily chosen fromamong those with the highest count.

For mixed data types provided via aDataFrame, the default is toreturn only an analysis of numeric columns. If the dataframe consistsonly of object and categorical data without any numeric columns, thedefault is to return an analysis of both the object and categoricalcolumns. Ifinclude='all' is provided as an option, the resultwill include a union of attributes of each type.

Theinclude andexclude parameters can be used to limitwhich columns in aDataFrame are analyzed for the output.The parameters are ignored when analyzing aSeries.

Examples

Describing a numericSeries.

>>>s=pd.Series([1,2,3])>>>s.describe()count    3.0mean     2.0std      1.0min      1.025%      1.550%      2.075%      2.5max      3.0dtype: float64

Describing a categoricalSeries.

>>>s=pd.Series(['a','a','b','c'])>>>s.describe()count     4unique    3top       afreq      2dtype: object

Describing a timestampSeries.

>>>s=pd.Series([...np.datetime64("2000-01-01"),...np.datetime64("2010-01-01"),...np.datetime64("2010-01-01")...])>>>s.describe()count                      3mean     2006-09-01 08:00:00min      2000-01-01 00:00:0025%      2004-12-31 12:00:0050%      2010-01-01 00:00:0075%      2010-01-01 00:00:00max      2010-01-01 00:00:00dtype: object

Describing aDataFrame. By default only numeric fieldsare returned.

>>>df=pd.DataFrame({'categorical':pd.Categorical(['d','e','f']),...'numeric':[1,2,3],...'object':['a','b','c']...})>>>df.describe()       numericcount      3.0mean       2.0std        1.0min        1.025%        1.550%        2.075%        2.5max        3.0

Describing all columns of aDataFrame regardless of data type.

>>>df.describe(include='all')       categorical  numeric objectcount            3      3.0      3unique           3      NaN      3top              f      NaN      afreq             1      NaN      1mean           NaN      2.0    NaNstd            NaN      1.0    NaNmin            NaN      1.0    NaN25%            NaN      1.5    NaN50%            NaN      2.0    NaN75%            NaN      2.5    NaNmax            NaN      3.0    NaN

Describing a column from aDataFrame by accessing it asan attribute.

>>>df.numeric.describe()count    3.0mean     2.0std      1.0min      1.025%      1.550%      2.075%      2.5max      3.0Name: numeric, dtype: float64

Including only numeric columns in aDataFrame description.

>>>df.describe(include=[np.number])       numericcount      3.0mean       2.0std        1.0min        1.025%        1.550%        2.075%        2.5max        3.0

Including only string columns in aDataFrame description.

>>>df.describe(include=[object])       objectcount       3unique      3top         afreq        1

Including only categorical columns from aDataFrame description.

>>>df.describe(include=['category'])       categoricalcount            3unique           3top              dfreq             1

Excluding numeric columns from aDataFrame description.

>>>df.describe(exclude=[np.number])       categorical objectcount            3      3unique           3      3top              f      afreq             1      1

Excluding object columns from aDataFrame description.

>>>df.describe(exclude=[object])       categorical  numericcount            3      3.0unique           3      NaNtop              f      NaNfreq             1      NaNmean           NaN      2.0std            NaN      1.0min            NaN      1.025%            NaN      1.550%            NaN      2.075%            NaN      2.5max            NaN      3.0

On this page

Show Source

Movatterモバイル変換

pandas.DataFrame.describe#

pandas.DataFrame.describe #