pandas.Grouper #

classpandas.Grouper(*args,**kwargs)[source]#

A Grouper allows the user to specify a groupby instruction for an object.

This specification will select a column via the key parameter, or if thelevel and/or axis parameters are given, a level of the index of the targetobject.

Ifaxis and/orlevel are passed as keywords to bothGrouper andgroupby, the values passed toGrouper take precedence.

Parameters:

keystr, defaults to None

Groupby key, which selects the grouping column of the target.

levelname/number, defaults to None

The level for the target index.

freqstr / frequency object, defaults to None

This will groupby the specified frequency if the target selection(via key or level) is a datetime-like object. For full specificationof available frequencies, please seehere.

axisstr, int, defaults to 0

Number/name of the axis.

sortbool, default to False

Whether to sort the resulting labels.

closed{‘left’ or ‘right’}

Closed end of interval. Only whenfreq parameter is passed.

label{‘left’ or ‘right’}

Interval boundary to use for labeling.Only whenfreq parameter is passed.

convention{‘start’, ‘end’, ‘e’, ‘s’}

If grouper is PeriodIndex andfreq parameter is passed.

originTimestamp or str, default ‘start_day’

The timestamp on which to adjust the grouping. The timezone of origin mustmatch the timezone of the index.If string, must be one of the following:

‘epoch’:origin is 1970-01-01
‘start’:origin is the first value of the timeseries
‘start_day’:origin is the first day at midnight of the timeseries
‘end’:origin is the last value of the timeseries
‘end_day’:origin is the ceiling midnight of the last day

Added in version 1.3.0.

offsetTimedelta or str, default is None

An offset timedelta added to the origin.

dropnabool, default True

If True, and if group keys contain NA values, NA values together withrow/column will be dropped. If False, NA values will also be treated asthe key in groups.

Returns:

Grouper or pandas.api.typing.TimeGrouper: A TimeGrouper is returned iffreq is notNone. Otherwise, a Grouperis returned.

Examples

df.groupby(pd.Grouper(key="Animal")) is equivalent todf.groupby('Animal')

>>>df=pd.DataFrame(...{..."Animal":["Falcon","Parrot","Falcon","Falcon","Parrot"],..."Speed":[100,5,200,300,15],...}...)>>>df   Animal  Speed0  Falcon    1001  Parrot      52  Falcon    2003  Falcon    3004  Parrot     15>>>df.groupby(pd.Grouper(key="Animal")).mean()        SpeedAnimalFalcon  200.0Parrot   10.0

Specify a resample operation on the column ‘Publish date’

>>>df=pd.DataFrame(...{..."Publish date":[...pd.Timestamp("2000-01-02"),...pd.Timestamp("2000-01-02"),...pd.Timestamp("2000-01-09"),...pd.Timestamp("2000-01-16")...],..."ID":[0,1,2,3],..."Price":[10,20,30,40]...}...)>>>df  Publish date  ID  Price0   2000-01-02   0     101   2000-01-02   1     202   2000-01-09   2     303   2000-01-16   3     40>>>df.groupby(pd.Grouper(key="Publish date",freq="1W")).mean()               ID  PricePublish date2000-01-02    0.5   15.02000-01-09    2.0   30.02000-01-16    3.0   40.0

If you want to adjust the start of the bins based on a fixed timestamp:

>>>start,end='2000-10-01 23:30:00','2000-10-02 00:30:00'>>>rng=pd.date_range(start,end,freq='7min')>>>ts=pd.Series(np.arange(len(rng))*3,index=rng)>>>ts2000-10-01 23:30:00     02000-10-01 23:37:00     32000-10-01 23:44:00     62000-10-01 23:51:00     92000-10-01 23:58:00    122000-10-02 00:05:00    152000-10-02 00:12:00    182000-10-02 00:19:00    212000-10-02 00:26:00    24Freq: 7min, dtype: int64

>>>ts.groupby(pd.Grouper(freq='17min')).sum()2000-10-01 23:14:00     02000-10-01 23:31:00     92000-10-01 23:48:00    212000-10-02 00:05:00    542000-10-02 00:22:00    24Freq: 17min, dtype: int64

>>>ts.groupby(pd.Grouper(freq='17min',origin='epoch')).sum()2000-10-01 23:18:00     02000-10-01 23:35:00    182000-10-01 23:52:00    272000-10-02 00:09:00    392000-10-02 00:26:00    24Freq: 17min, dtype: int64

>>>ts.groupby(pd.Grouper(freq='17min',origin='2000-01-01')).sum()2000-10-01 23:24:00     32000-10-01 23:41:00    152000-10-01 23:58:00    452000-10-02 00:15:00    45Freq: 17min, dtype: int64

If you want to adjust the start of the bins with anoffset Timedelta, the twofollowing lines are equivalent:

>>>ts.groupby(pd.Grouper(freq='17min',origin='start')).sum()2000-10-01 23:30:00     92000-10-01 23:47:00    212000-10-02 00:04:00    542000-10-02 00:21:00    24Freq: 17min, dtype: int64

>>>ts.groupby(pd.Grouper(freq='17min',offset='23h30min')).sum()2000-10-01 23:30:00     92000-10-01 23:47:00    212000-10-02 00:04:00    542000-10-02 00:21:00    24Freq: 17min, dtype: int64

To replace the use of the deprecatedbase argument, you can now useoffset,in this example it is equivalent to havebase=2:

>>>ts.groupby(pd.Grouper(freq='17min',offset='2min')).sum()2000-10-01 23:16:00     02000-10-01 23:33:00     92000-10-01 23:50:00    362000-10-02 00:07:00    392000-10-02 00:24:00    24Freq: 17min, dtype: int64

On this page

Show Source

Movatterモバイル変換

pandas.Grouper#

pandas.Grouper #