Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
Ctrl+K

pandas.Series.resample#

Series.resample(rule,axis=<no_default>,closed=None,label=None,convention=<no_default>,kind=<no_default>,on=None,level=None,origin='start_day',offset=None,group_keys=False)[source]#

Resample time-series data.

Convenience method for frequency conversion and resampling of time series.The object must have a datetime-like index (DatetimeIndex,PeriodIndex,orTimedeltaIndex), or the caller must pass the label of a datetime-likeseries/index to theon/level keyword parameter.

Parameters:
ruleDateOffset, Timedelta or str

The offset string or object representing target conversion.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Which axis to use for up- or down-sampling. ForSeries this parameteris unused and defaults to 0. Must beDatetimeIndex,TimedeltaIndex orPeriodIndex.

Deprecated since version 2.0.0:Use frame.T.resample(…) instead.

closed{‘right’, ‘left’}, default None

Which side of bin interval is closed. The default is ‘left’for all frequency offsets except for ‘ME’, ‘YE’, ‘QE’, ‘BME’,‘BA’, ‘BQE’, and ‘W’ which all have a default of ‘right’.

label{‘right’, ‘left’}, default None

Which bin edge label to label bucket with. The default is ‘left’for all frequency offsets except for ‘ME’, ‘YE’, ‘QE’, ‘BME’,‘BA’, ‘BQE’, and ‘W’ which all have a default of ‘right’.

convention{‘start’, ‘end’, ‘s’, ‘e’}, default ‘start’

ForPeriodIndex only, controls whether to use the start orend ofrule.

Deprecated since version 2.2.0:Convert PeriodIndex to DatetimeIndex before resampling instead.

kind{‘timestamp’, ‘period’}, optional, default None

Pass ‘timestamp’ to convert the resulting index to aDateTimeIndex or ‘period’ to convert it to aPeriodIndex.By default the input representation is retained.

Deprecated since version 2.2.0:Convert index to desired type explicitly instead.

onstr, optional

For a DataFrame, column to use instead of index for resampling.Column must be datetime-like.

levelstr or int, optional

For a MultiIndex, level (name or number) to use forresampling.level must be datetime-like.

originTimestamp or str, default ‘start_day’

The timestamp on which to adjust the grouping. The timezone of originmust match the timezone of the index.If string, must be one of the following:

  • ‘epoch’:origin is 1970-01-01

  • ‘start’:origin is the first value of the timeseries

  • ‘start_day’:origin is the first day at midnight of the timeseries

  • ‘end’:origin is the last value of the timeseries

  • ‘end_day’:origin is the ceiling midnight of the last day

Added in version 1.3.0.

Note

Only takes effect for Tick-frequencies (i.e. fixed frequencies likedays, hours, and minutes, rather than months or quarters).

offsetTimedelta or str, default is None

An offset timedelta added to the origin.

group_keysbool, default False

Whether to include the group keys in the result index when using.apply() on the resampled object.

Added in version 1.5.0:Not specifyinggroup_keys will retain values-dependent behaviorfrom pandas 1.4 and earlier (seepandas 1.5.0 Release notes for examples).

Changed in version 2.0.0:group_keys now defaults toFalse.

Returns:
pandas.api.typing.Resampler

Resampler object.

See also

Series.resample

Resample a Series.

DataFrame.resample

Resample a DataFrame.

groupby

Group Series/DataFrame by mapping, function, label, or list of labels.

asfreq

Reindex a Series/DataFrame with the given frequency without grouping.

Notes

See theuser guidefor more.

To learn more about the offset strings, please seethis link.

Examples

Start by creating a series with 9 one minute timestamps.

>>>index=pd.date_range('1/1/2000',periods=9,freq='min')>>>series=pd.Series(range(9),index=index)>>>series2000-01-01 00:00:00    02000-01-01 00:01:00    12000-01-01 00:02:00    22000-01-01 00:03:00    32000-01-01 00:04:00    42000-01-01 00:05:00    52000-01-01 00:06:00    62000-01-01 00:07:00    72000-01-01 00:08:00    8Freq: min, dtype: int64

Downsample the series into 3 minute bins and sum the valuesof the timestamps falling into a bin.

>>>series.resample('3min').sum()2000-01-01 00:00:00     32000-01-01 00:03:00    122000-01-01 00:06:00    21Freq: 3min, dtype: int64

Downsample the series into 3 minute bins as above, but label eachbin using the right edge instead of the left. Please note that thevalue in the bucket used as the label is not included in the bucket,which it labels. For example, in the original series thebucket2000-01-0100:03:00 contains the value 3, but the summedvalue in the resampled bucket with the label2000-01-0100:03:00does not include 3 (if it did, the summed value would be 6, not 3).

>>>series.resample('3min',label='right').sum()2000-01-01 00:03:00     32000-01-01 00:06:00    122000-01-01 00:09:00    21Freq: 3min, dtype: int64

To include this value close the right side of the bin interval,as shown below.

>>>series.resample('3min',label='right',closed='right').sum()2000-01-01 00:00:00     02000-01-01 00:03:00     62000-01-01 00:06:00    152000-01-01 00:09:00    15Freq: 3min, dtype: int64

Upsample the series into 30 second bins.

>>>series.resample('30s').asfreq()[0:5]# Select first 5 rows2000-01-01 00:00:00   0.02000-01-01 00:00:30   NaN2000-01-01 00:01:00   1.02000-01-01 00:01:30   NaN2000-01-01 00:02:00   2.0Freq: 30s, dtype: float64

Upsample the series into 30 second bins and fill theNaNvalues using theffill method.

>>>series.resample('30s').ffill()[0:5]2000-01-01 00:00:00    02000-01-01 00:00:30    02000-01-01 00:01:00    12000-01-01 00:01:30    12000-01-01 00:02:00    2Freq: 30s, dtype: int64

Upsample the series into 30 second bins and fill theNaN values using thebfill method.

>>>series.resample('30s').bfill()[0:5]2000-01-01 00:00:00    02000-01-01 00:00:30    12000-01-01 00:01:00    12000-01-01 00:01:30    22000-01-01 00:02:00    2Freq: 30s, dtype: int64

Pass a custom function viaapply

>>>defcustom_resampler(arraylike):...returnnp.sum(arraylike)+5...>>>series.resample('3min').apply(custom_resampler)2000-01-01 00:00:00     82000-01-01 00:03:00    172000-01-01 00:06:00    26Freq: 3min, dtype: int64

For DataFrame objects, the keywordon can be used to specify thecolumn instead of the index for resampling.

>>>d={'price':[10,11,9,13,14,18,17,19],...'volume':[50,60,40,100,50,100,40,50]}>>>df=pd.DataFrame(d)>>>df['week_starting']=pd.date_range('01/01/2018',...periods=8,...freq='W')>>>df   price  volume week_starting0     10      50    2018-01-071     11      60    2018-01-142      9      40    2018-01-213     13     100    2018-01-284     14      50    2018-02-045     18     100    2018-02-116     17      40    2018-02-187     19      50    2018-02-25>>>df.resample('ME',on='week_starting').mean()               price  volumeweek_starting2018-01-31     10.75    62.52018-02-28     17.00    60.0

For a DataFrame with MultiIndex, the keywordlevel can be used tospecify on which level the resampling needs to take place.

>>>days=pd.date_range('1/1/2000',periods=4,freq='D')>>>d2={'price':[10,11,9,13,14,18,17,19],...'volume':[50,60,40,100,50,100,40,50]}>>>df2=pd.DataFrame(...d2,...index=pd.MultiIndex.from_product(...[days,['morning','afternoon']]...)...)>>>df2                      price  volume2000-01-01 morning       10      50           afternoon     11      602000-01-02 morning        9      40           afternoon     13     1002000-01-03 morning       14      50           afternoon     18     1002000-01-04 morning       17      40           afternoon     19      50>>>df2.resample('D',level=0).sum()            price  volume2000-01-01     21     1102000-01-02     22     1402000-01-03     32     1502000-01-04     36      90

If you want to adjust the start of the bins based on a fixed timestamp:

>>>start,end='2000-10-01 23:30:00','2000-10-02 00:30:00'>>>rng=pd.date_range(start,end,freq='7min')>>>ts=pd.Series(np.arange(len(rng))*3,index=rng)>>>ts2000-10-01 23:30:00     02000-10-01 23:37:00     32000-10-01 23:44:00     62000-10-01 23:51:00     92000-10-01 23:58:00    122000-10-02 00:05:00    152000-10-02 00:12:00    182000-10-02 00:19:00    212000-10-02 00:26:00    24Freq: 7min, dtype: int64
>>>ts.resample('17min').sum()2000-10-01 23:14:00     02000-10-01 23:31:00     92000-10-01 23:48:00    212000-10-02 00:05:00    542000-10-02 00:22:00    24Freq: 17min, dtype: int64
>>>ts.resample('17min',origin='epoch').sum()2000-10-01 23:18:00     02000-10-01 23:35:00    182000-10-01 23:52:00    272000-10-02 00:09:00    392000-10-02 00:26:00    24Freq: 17min, dtype: int64
>>>ts.resample('17min',origin='2000-01-01').sum()2000-10-01 23:24:00     32000-10-01 23:41:00    152000-10-01 23:58:00    452000-10-02 00:15:00    45Freq: 17min, dtype: int64

If you want to adjust the start of the bins with anoffset Timedelta, the twofollowing lines are equivalent:

>>>ts.resample('17min',origin='start').sum()2000-10-01 23:30:00     92000-10-01 23:47:00    212000-10-02 00:04:00    542000-10-02 00:21:00    24Freq: 17min, dtype: int64
>>>ts.resample('17min',offset='23h30min').sum()2000-10-01 23:30:00     92000-10-01 23:47:00    212000-10-02 00:04:00    542000-10-02 00:21:00    24Freq: 17min, dtype: int64

If you want to take the largest Timestamp as the end of the bins:

>>>ts.resample('17min',origin='end').sum()2000-10-01 23:35:00     02000-10-01 23:52:00    182000-10-02 00:09:00    272000-10-02 00:26:00    63Freq: 17min, dtype: int64

In contrast with thestart_day, you can useend_day to take the ceilingmidnight of the largest Timestamp as the end of the bins and drop the binsnot containing data:

>>>ts.resample('17min',origin='end_day').sum()2000-10-01 23:38:00     32000-10-01 23:55:00    152000-10-02 00:12:00    452000-10-02 00:29:00    45Freq: 17min, dtype: int64

[8]ページ先頭

©2009-2025 Movatter.jp