- Notifications
You must be signed in to change notification settings - Fork35
Adding a parameter for aggregating rows when calling Frame.set_index on a columns with duplicated labels#319
-
Several times I've found myself writing utility functions to reindex Frames/Series with an optional callable to handle duplicates. For example, this function reindexes a series based on the values of a second series. It is passed a duplicate aggregation function and calls it with any duplicates it finds. defreindex(old_idx_to_values:sf.Series,old_idx_to_new_idx:sf.Series,duplicate_aggregation:tp.Callable[[sf.Series],tp.Any]=sum, )->sf.Series:'''Given two series, reindex the first series with the second's values. `duplicate_aggregation` is called with duplicated values in the first series, and should return a single element. '''temp=sf.Frame.from_concat((old_idx_to_new_idx,old_idx_to_values),axis=1)idx_name=old_idx_to_new_idx.nameval_name=old_idx_to_values.name# If there are duplicates, handle them.ifold_idx_to_new_idx.duplicated().any():new_idx= []new_values= []forgroupintemp.iter_group(idx_name,axis=0):new_idx.append(group[idx_name].values[0])new_values.append(duplicate_aggregation(group[val_name].values))result=sf.Series(new_values,name=val_name,index=sf.Index(new_idx,name=idx_name))else:result=temp.set_index(idx_name)[val_name]returnresult In other circumstances This pattern is useful enough that I think it would be beneficial to include in Static Frame. For example, |
BetaWas this translation helpful?Give feedback.
All reactions
👍 1
Replies: 1 comment
-
Thank you for this suggestion. I began to explore an implementation in >>>labels= (... (1,1,'a'),... (1,2,'b'),... (1,3,'c'),... (2,1,'d'),... (2,2,'e'),... (2,3,'d'),... (3,1,'b'),... (3,2,'h'),... (3,3,'b'),... )>>>f=sf.Frame.from_records(labels,columns=('x','y','z'))>>>f<Frame><Index>xyz<<U1><Index>011a112b213c321d422e523d631b732h833b<int64><int64><int64><<U1>>>>f.pivot(index_fields='z')<Frame><Index:values>xy<<U1><Index:z>a11b76c13d44e22h32<<U1><int64><int64> This does not, however, yet permit applying a different function per column. Multiple functions are permitted, but the default usage of those functions is to apply all of them to each columns: If per column function application is needed, it seems like adding a parameter to |
BetaWas this translation helpful?Give feedback.
All reactions
This discussion was converted from issue #279 on February 18, 2021 19:10.