Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Adding a parameter for aggregating rows when calling Frame.set_index on a columns with duplicated labels#319

ForeverWintr started this conversation inIdeas
Discussion options

Several times I've found myself writing utility functions to reindex Frames/Series with an optional callable to handle duplicates. For example, this function reindexes a series based on the values of a second series. It is passed a duplicate aggregation function and calls it with any duplicates it finds.

defreindex(old_idx_to_values:sf.Series,old_idx_to_new_idx:sf.Series,duplicate_aggregation:tp.Callable[[sf.Series],tp.Any]=sum,    )->sf.Series:'''Given two series, reindex the first series with the second's values. `duplicate_aggregation`    is called with duplicated values in the first series, and should return a single element.    '''temp=sf.Frame.from_concat((old_idx_to_new_idx,old_idx_to_values),axis=1)idx_name=old_idx_to_new_idx.nameval_name=old_idx_to_values.name# If there are duplicates, handle them.ifold_idx_to_new_idx.duplicated().any():new_idx= []new_values= []forgroupintemp.iter_group(idx_name,axis=0):new_idx.append(group[idx_name].values[0])new_values.append(duplicate_aggregation(group[val_name].values))result=sf.Series(new_values,name=val_name,index=sf.Index(new_idx,name=idx_name))else:result=temp.set_index(idx_name)[val_name]returnresult

In other circumstancesduplicate_aggregation might be a function that checks that all values in the group are identical and arbitrarily returns one of them.

This pattern is useful enough that I think it would be beneficial to include in Static Frame. For example,Frame.set_index could takeduplicate_aggregation. What do you think?

You must be logged in to vote

Replies: 1 comment

Comment options

Thank you for this suggestion.

I began to explore an implementation inFrame.set_index but then realized that I had implemented something very similar inFrame.pivot. I believeFrame.pivot covers the basic case of what this would do inFrame.set_index. For example:

>>>labels= (...                 (1,1,'a'),...                 (1,2,'b'),...                 (1,3,'c'),...                 (2,1,'d'),...                 (2,2,'e'),...                 (2,3,'d'),...                 (3,1,'b'),...                 (3,2,'h'),...                 (3,3,'b'),...                 )>>>f=sf.Frame.from_records(labels,columns=('x','y','z'))>>>f<Frame><Index>xyz<<U1><Index>011a112b213c321d422e523d631b732h833b<int64><int64><int64><<U1>>>>f.pivot(index_fields='z')<Frame><Index:values>xy<<U1><Index:z>a11b76c13d44e22h32<<U1><int64><int64>

This does not, however, yet permit applying a different function per column. Multiple functions are permitted, but the default usage of those functions is to apply all of them to each columns:

>>> f.pivot(index_fields='z', func=dict(x=np.sum, y=np.mean))<Frame><IndexHierarchy: ('values', 'func')> x       x                  y       y         <<U1>                                     x       y                  x       y         <<U1><Index: z>a                                    1       1.0                1       1.0b                                    7       2.3333333333333335 6       2.0c                                    1       1.0                3       3.0d                                    4       2.0                4       2.0e                                    2       2.0                2       2.0h                                    3       3.0                2       2.0<<U1>                                <int64> <float64>          <int64> <float64>

If per column function application is needed, it seems like adding a parameter toFrame.pivot to alter how multiple functions are used might be the best approach.

You must be logged in to vote
0 replies
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Category
Ideas
Labels
None yet
2 participants
@ForeverWintr@flexatone
Converted from issue

This discussion was converted from issue #279 on February 18, 2021 19:10.


[8]ページ先頭

©2009-2025 Movatter.jp