pandas.crosstab#
- pandas.crosstab(index,columns,values=None,rownames=None,colnames=None,aggfunc=None,margins=False,margins_name='All',dropna=True,normalize=False)[source]#
Compute a simple cross tabulation of two (or more) factors.
By default, computes a frequency table of the factors unless anarray of values and an aggregation function are passed.
- Parameters:
- indexarray-like, Series, or list of arrays/Series
Values to group by in the rows.
- columnsarray-like, Series, or list of arrays/Series
Values to group by in the columns.
- valuesarray-like, optional
Array of values to aggregate according to the factors.Requiresaggfunc be specified.
- rownamessequence, default None
If passed, must match number of row arrays passed.
- colnamessequence, default None
If passed, must match number of column arrays passed.
- aggfuncfunction, optional
If specified, requiresvalues be specified as well.
- marginsbool, default False
Add row/column margins (subtotals).
- margins_namestr, default ‘All’
Name of the row/column that will contain the totalswhen margins is True.
- dropnabool, default True
Do not include columns whose entries are all NaN.
- normalizebool, {‘all’, ‘index’, ‘columns’}, or {0,1}, default False
Normalize by dividing all values by the sum of values.
If passed ‘all’ orTrue, will normalize over all values.
If passed ‘index’ will normalize over each row.
If passed ‘columns’ will normalize over each column.
If margins isTrue, will also normalize margin values.
- Returns:
- DataFrame
Cross tabulation of the data.
See also
DataFrame.pivotReshape data based on column values.
pivot_tableCreate a pivot table as a DataFrame.
Notes
Any Series passed will have their name attributes used unless row or columnnames for the cross-tabulation are specified.
Any input passed containing Categorical data will haveall of itscategories included in the cross-tabulation, even if the actual data doesnot contain any instances of a particular category.
In the event that there aren’t overlapping indexes an empty DataFrame willbe returned.
Referencethe user guide for more examples.
Examples
>>>a=np.array(...[..."foo",..."foo",..."foo",..."foo",..."bar",..."bar",..."bar",..."bar",..."foo",..."foo",..."foo",...],...dtype=object,...)>>>b=np.array(...[..."one",..."one",..."one",..."two",..."one",..."one",..."one",..."two",..."two",..."two",..."one",...],...dtype=object,...)>>>c=np.array(...[..."dull",..."dull",..."shiny",..."dull",..."dull",..."shiny",..."shiny",..."dull",..."shiny",..."shiny",..."shiny",...],...dtype=object,...)>>>pd.crosstab(a,[b,c],rownames=["a"],colnames=["b","c"])b one twoc dull shiny dull shinyabar 1 2 1 0foo 2 2 1 2
Here ‘c’ and ‘f’ are not represented in the data and will not beshown in the output because dropna is True by default. Setdropna=False to preserve categories with no data.
>>>foo=pd.Categorical(["a","b"],categories=["a","b","c"])>>>bar=pd.Categorical(["d","e"],categories=["d","e","f"])>>>pd.crosstab(foo,bar)col_0 d erow_0a 1 0b 0 1>>>pd.crosstab(foo,bar,dropna=False)col_0 d e frow_0a 1 0 0b 0 1 0c 0 0 0