Given the following dataframe
In [31]: rand = np.random.RandomState(1) df = pd.DataFrame({'A': ['foo', 'bar', 'baz'] * 2, 'B': rand.randn(6), 'C': rand.rand(6) > .5})In [32]: dfOut[32]: A B C 0 foo 1.624345 False 1 bar -0.611756 True 2 baz -0.528172 False 3 foo -1.072969 True 4 bar 0.865408 False 5 baz -2.301539 TrueI would like to sort it in groups (A) by the aggregated sum ofB, and then by the value inC (not aggregated). So basically get the order of theA groups with
In [28]: df.groupby('A').sum().sort('B')Out[28]: B C A baz -2.829710 1 bar 0.253651 1 foo 0.551377 1And then by True/False, so that it ultimately looks like this:
In [30]: df.ix[[5, 2, 1, 4, 3, 0]]Out[30]: A B C 5 baz -2.301539 True 2 baz -0.528172 False 1 bar -0.611756 True 4 bar 0.865408 False 3 foo -1.072969 True 0 foo 1.624345 FalseHow can this be done?
4 Answers4
Groupby A:
In [0]: grp = df.groupby('A')Within each group, sum over B and broadcast the values using transform. Then sort by B:
In [1]: grp[['B']].transform(sum).sort('B')Out[1]: B2 -2.8297105 -2.8297101 0.2536514 0.2536510 0.5513773 0.551377Index the original df by passing the index from above. This will re-order the A values by the aggregate sum of the B values:
In [2]: sort1 = df.ix[grp[['B']].transform(sum).sort('B').index]In [3]: sort1Out[3]: A B C2 baz -0.528172 False5 baz -2.301539 True1 bar -0.611756 True4 bar 0.865408 False0 foo 1.624345 False3 foo -1.072969 TrueFinally, sort the 'C' values within groups of 'A' using thesort=False option to preserve the A sort order from step 1:
In [4]: f = lambda x: x.sort('C', ascending=False)In [5]: sort2 = sort1.groupby('A', sort=False).apply(f)In [6]: sort2Out[6]: A B CAbaz 5 baz -2.301539 True 2 baz -0.528172 Falsebar 1 bar -0.611756 True 4 bar 0.865408 Falsefoo 3 foo -1.072969 True 0 foo 1.624345 FalseClean up the df index by usingreset_index withdrop=True:
In [7]: sort2.reset_index(0, drop=True)Out[7]: A B C5 baz -2.301539 True2 baz -0.528172 False1 bar -0.611756 True4 bar 0.865408 False3 foo -1.072969 True0 foo 1.624345 False4 Comments
groupby'ssort=False flag would return an arbitrary, not necessarily sorted order (I guess I was associating them with python dictionaries for some reason). But this answer implies that the flag is guaranteed to preserve the original order of the dataframe rows?Out[7],inplace=True should be added to the arguments inInput[7] .Here's a more concise approach...
df['a_bsum'] = df.groupby('A')['B'].transform(sum)df.sort(['a_bsum','C'], ascending=[True, False]).drop('a_bsum', axis=1)The first line adds a column to the data frame with the groupwise sum. The second line performs the sort and then removes the extra column.
Result:
A B C5 baz -2.301539 True2 baz -0.528172 False1 bar -0.611756 True4 bar 0.865408 False3 foo -1.072969 True0 foo 1.624345 FalseNOTE:sort is deprecated, usesort_values instead
2 Comments
sort_values the last operation is not dropping the column. That is happening because the default isinplace=False. So, specifyinginplace=True will also do the work. An alternative would be using the followingdf.drop('a_bsum', axis=1, inplace=True) after.df will do the work as welldf = df.sort_values(['a_bsum','C'], ascending=[True, False]).drop('a_bsum', axis=1).One way to do this is to insert a dummy column with the sums in order to sort:
In [10]: sum_B_over_A = df.groupby('A').sum().BIn [11]: sum_B_over_AOut[11]: Abar 0.253652baz -2.829711foo 0.551376Name: Bin [12]: df['sum_B_over_A'] = df.A.apply(sum_B_over_A.get_value)In [13]: dfOut[13]: A B C sum_B_over_A0 foo 1.624345 False 0.5513761 bar -0.611756 True 0.2536522 baz -0.528172 False -2.8297113 foo -1.072969 True 0.5513764 bar 0.865408 False 0.2536525 baz -2.301539 True -2.829711In [14]: df.sort(['sum_B_over_A', 'A', 'B'])Out[14]: A B C sum_B_over_A5 baz -2.301539 True -2.8297112 baz -0.528172 False -2.8297111 bar -0.611756 True 0.2536524 bar 0.865408 False 0.2536523 foo -1.072969 True 0.5513760 foo 1.624345 False 0.551376and maybe you would drop the dummy row:
In [15]: df.sort(['sum_B_over_A', 'A', 'B']).drop('sum_B_over_A', axis=1)Out[15]: A B C5 baz -2.301539 True2 baz -0.528172 False1 bar -0.611756 True4 bar 0.865408 False3 foo -1.072969 True0 foo 1.624345 False5 Comments
df.A.map(dict(zip(sum_B_over_A.index, sum_B_over_A))) :) (should beget_value, no?). Also didn't know about column-wise drops, thanks a lot. (though I kinda prefer the version w/out the dummy column for some reason)df.sort(['sum_B_over_A', 'A', 'B', 'C'])... I should really add link to thesort docs.The question is difficult to understand. However, group by A and sum by B then sort values descending. The column A sort order depends on B. You can then use filtering to create a new dataframe filter by A values order the dataframe.
rand = np.random.RandomState(1)df = pd.DataFrame({'A': ['foo', 'bar', 'baz'] * 2, 'B': rand.randn(6), 'C': rand.rand(6) > .5})grouped=df.groupby('A')['B'].sum().sort_values(ascending=False)print(grouped)print(grouped.index.get_level_values(0))Output:
Afoo 0.551377bar 0.253651baz -2.829710Comments
Explore related questions
See similar questions with these tags.


