Create multiple columns in pandas aggregation function

Question 1

I'd like to create multiple columns while resampling a pandas DataFrame like the built-in ohlc method.

def mhl(data):    return pandas.Series([np.mean(data),np.max(data),np.min(data)],index = ['mean','high','low'])ts.resample('30Min',how=mhl)

Dies with

Exception: Must produce aggregated value

Any suggestions? Thanks!

Question 2

You can pass a dictionary of functions to theresample method:

In [35]: tsOut[35]:2013-01-01 00:00:00     02013-01-01 00:15:00     12013-01-01 00:30:00     22013-01-01 00:45:00     32013-01-01 01:00:00     42013-01-01 01:15:00     5...2013-01-01 23:00:00    922013-01-01 23:15:00    932013-01-01 23:30:00    942013-01-01 23:45:00    952013-01-02 00:00:00    96Freq: 15T, Length: 97

Create a dictionary of functions:

mhl = {'m':np.mean, 'h':np.max, 'l':np.min}

Pass the dictionary to thehow parameter ofresample:

In [36]: ts.resample("30Min", how=mhl)Out[36]:                      h     m   l2013-01-01 00:00:00   1   0.5   02013-01-01 00:30:00   3   2.5   22013-01-01 01:00:00   5   4.5   42013-01-01 01:30:00   7   6.5   62013-01-01 02:00:00   9   8.5   82013-01-01 02:30:00  11  10.5  102013-01-01 03:00:00  13  12.5  122013-01-01 03:30:00  15  14.5  14

Question 3

It's about 10x faster to use"mean" than to usenp.mean. Same goes for'min' and 'max'

Question 4

Is there a way to specify a default for most columns (e.g.,sum instead ofmean) and then override the method for a single column?

Question 5

Neat trick: you can even pass a dictionary (for the columns) of dictionary of functions, like so:mhl = {'data_column_1': {'resultA': np.mean, 'resultB': max}, 'data_column_2': {'resultC': min, 'resultD': sum}}

Zelazny7 40.7k18 gold badges72 silver badges86 bronze badges · Accepted Answer · 2013-02-16 02:15:53Z

You can pass a dictionary of functions to theresample method:

In [35]: tsOut[35]:2013-01-01 00:00:00     02013-01-01 00:15:00     12013-01-01 00:30:00     22013-01-01 00:45:00     32013-01-01 01:00:00     42013-01-01 01:15:00     5...2013-01-01 23:00:00    922013-01-01 23:15:00    932013-01-01 23:30:00    942013-01-01 23:45:00    952013-01-02 00:00:00    96Freq: 15T, Length: 97

Create a dictionary of functions:

mhl = {'m':np.mean, 'h':np.max, 'l':np.min}

Pass the dictionary to thehow parameter ofresample:

In [36]: ts.resample("30Min", how=mhl)Out[36]:                      h     m   l2013-01-01 00:00:00   1   0.5   02013-01-01 00:30:00   3   2.5   22013-01-01 01:00:00   5   4.5   42013-01-01 01:30:00   7   6.5   62013-01-01 02:00:00   9   8.5   82013-01-01 02:30:00  11  10.5  102013-01-01 03:00:00  13  12.5  122013-01-01 03:30:00  15  14.5  14

It's about 10x faster to use"mean" than to usenp.mean. Same goes for'min' and 'max'
Is there a way to specify a default for most columns (e.g.,sum instead ofmean) and then override the method for a single column?
Neat trick: you can even pass a dictionary (for the columns) of dictionary of functions, like so:mhl = {'data_column_1': {'resultA': np.mean, 'resultB': max}, 'data_column_2': {'resultC': min, 'resultD': sum}}

Movatterモバイル変換

Collectives™ on Stack Overflow

Create multiple columns in pandas aggregation function

1 Answer1

3 Comments

Your Answer

Sign up orlog in

Post as a guest

Related

Hot Network Questions

Subscribe to RSS