- API reference
- GroupBy
- pandas.Grouper
pandas.Grouper#
- classpandas.Grouper(*args,**kwargs)[source]#
A Grouper allows the user to specify a groupby instruction for an object.
This specification will select a column via the key parameter, or if thelevel and/or axis parameters are given, a level of the index of the targetobject.
Ifaxis and/orlevel are passed as keywords to bothGrouper andgroupby, the values passed toGrouper take precedence.
- Parameters:
- keystr, defaults to None
Groupby key, which selects the grouping column of the target.
- levelname/number, defaults to None
The level for the target index.
- freqstr / frequency object, defaults to None
This will groupby the specified frequency if the target selection(via key or level) is a datetime-like object. For full specificationof available frequencies, please seehere.
- axisstr, int, defaults to 0
Number/name of the axis.
- sortbool, default to False
Whether to sort the resulting labels.
- closed{‘left’ or ‘right’}
Closed end of interval. Only whenfreq parameter is passed.
- label{‘left’ or ‘right’}
Interval boundary to use for labeling.Only whenfreq parameter is passed.
- convention{‘start’, ‘end’, ‘e’, ‘s’}
If grouper is PeriodIndex andfreq parameter is passed.
- originTimestamp or str, default ‘start_day’
The timestamp on which to adjust the grouping. The timezone of origin mustmatch the timezone of the index.If string, must be one of the following:
‘epoch’:origin is 1970-01-01
‘start’:origin is the first value of the timeseries
‘start_day’:origin is the first day at midnight of the timeseries
‘end’:origin is the last value of the timeseries
‘end_day’:origin is the ceiling midnight of the last day
Added in version 1.3.0.
- offsetTimedelta or str, default is None
An offset timedelta added to the origin.
- dropnabool, default True
If True, and if group keys contain NA values, NA values together withrow/column will be dropped. If False, NA values will also be treated asthe key in groups.
- Returns:
- Grouper or pandas.api.typing.TimeGrouper
A TimeGrouper is returned if
freqis notNone. Otherwise, a Grouperis returned.
Examples
df.groupby(pd.Grouper(key="Animal"))is equivalent todf.groupby('Animal')>>>df=pd.DataFrame(...{..."Animal":["Falcon","Parrot","Falcon","Falcon","Parrot"],..."Speed":[100,5,200,300,15],...}...)>>>df Animal Speed0 Falcon 1001 Parrot 52 Falcon 2003 Falcon 3004 Parrot 15>>>df.groupby(pd.Grouper(key="Animal")).mean() SpeedAnimalFalcon 200.0Parrot 10.0
Specify a resample operation on the column ‘Publish date’
>>>df=pd.DataFrame(...{..."Publish date":[...pd.Timestamp("2000-01-02"),...pd.Timestamp("2000-01-02"),...pd.Timestamp("2000-01-09"),...pd.Timestamp("2000-01-16")...],..."ID":[0,1,2,3],..."Price":[10,20,30,40]...}...)>>>df Publish date ID Price0 2000-01-02 0 101 2000-01-02 1 202 2000-01-09 2 303 2000-01-16 3 40>>>df.groupby(pd.Grouper(key="Publish date",freq="1W")).mean() ID PricePublish date2000-01-02 0.5 15.02000-01-09 2.0 30.02000-01-16 3.0 40.0
If you want to adjust the start of the bins based on a fixed timestamp:
>>>start,end='2000-10-01 23:30:00','2000-10-02 00:30:00'>>>rng=pd.date_range(start,end,freq='7min')>>>ts=pd.Series(np.arange(len(rng))*3,index=rng)>>>ts2000-10-01 23:30:00 02000-10-01 23:37:00 32000-10-01 23:44:00 62000-10-01 23:51:00 92000-10-01 23:58:00 122000-10-02 00:05:00 152000-10-02 00:12:00 182000-10-02 00:19:00 212000-10-02 00:26:00 24Freq: 7min, dtype: int64
>>>ts.groupby(pd.Grouper(freq='17min')).sum()2000-10-01 23:14:00 02000-10-01 23:31:00 92000-10-01 23:48:00 212000-10-02 00:05:00 542000-10-02 00:22:00 24Freq: 17min, dtype: int64
>>>ts.groupby(pd.Grouper(freq='17min',origin='epoch')).sum()2000-10-01 23:18:00 02000-10-01 23:35:00 182000-10-01 23:52:00 272000-10-02 00:09:00 392000-10-02 00:26:00 24Freq: 17min, dtype: int64
>>>ts.groupby(pd.Grouper(freq='17min',origin='2000-01-01')).sum()2000-10-01 23:24:00 32000-10-01 23:41:00 152000-10-01 23:58:00 452000-10-02 00:15:00 45Freq: 17min, dtype: int64
If you want to adjust the start of the bins with anoffset Timedelta, the twofollowing lines are equivalent:
>>>ts.groupby(pd.Grouper(freq='17min',origin='start')).sum()2000-10-01 23:30:00 92000-10-01 23:47:00 212000-10-02 00:04:00 542000-10-02 00:21:00 24Freq: 17min, dtype: int64
>>>ts.groupby(pd.Grouper(freq='17min',offset='23h30min')).sum()2000-10-01 23:30:00 92000-10-01 23:47:00 212000-10-02 00:04:00 542000-10-02 00:21:00 24Freq: 17min, dtype: int64
To replace the use of the deprecatedbase argument, you can now useoffset,in this example it is equivalent to havebase=2:
>>>ts.groupby(pd.Grouper(freq='17min',offset='2min')).sum()2000-10-01 23:16:00 02000-10-01 23:33:00 92000-10-01 23:50:00 362000-10-02 00:07:00 392000-10-02 00:24:00 24Freq: 17min, dtype: int64