I'd like to create multiple columns while resampling a pandas DataFrame like the built-in ohlc method.
def mhl(data): return pandas.Series([np.mean(data),np.max(data),np.min(data)],index = ['mean','high','low'])ts.resample('30Min',how=mhl)Dies with
Exception: Must produce aggregated valueAny suggestions? Thanks!
1 Answer1
You can pass a dictionary of functions to theresample method:
In [35]: tsOut[35]:2013-01-01 00:00:00 02013-01-01 00:15:00 12013-01-01 00:30:00 22013-01-01 00:45:00 32013-01-01 01:00:00 42013-01-01 01:15:00 5...2013-01-01 23:00:00 922013-01-01 23:15:00 932013-01-01 23:30:00 942013-01-01 23:45:00 952013-01-02 00:00:00 96Freq: 15T, Length: 97Create a dictionary of functions:
mhl = {'m':np.mean, 'h':np.max, 'l':np.min}Pass the dictionary to thehow parameter ofresample:
In [36]: ts.resample("30Min", how=mhl)Out[36]: h m l2013-01-01 00:00:00 1 0.5 02013-01-01 00:30:00 3 2.5 22013-01-01 01:00:00 5 4.5 42013-01-01 01:30:00 7 6.5 62013-01-01 02:00:00 9 8.5 82013-01-01 02:30:00 11 10.5 102013-01-01 03:00:00 13 12.5 122013-01-01 03:30:00 15 14.5 14 Sign up to request clarification or add additional context in comments.
3 Comments
Tom Leys
It's about 10x faster to use
"mean" than to usenp.mean. Same goes for'min' and 'max'Eric Walker
Is there a way to specify a default for most columns (e.g.,
sum instead ofmean) and then override the method for a single column?Def_Os
Neat trick: you can even pass a dictionary (for the columns) of dictionary of functions, like so:
mhl = {'data_column_1': {'resultA': np.mean, 'resultB': max}, 'data_column_2': {'resultC': min, 'resultD': sum}}Explore related questions
See similar questions with these tags.