Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

IndexHierarchy.astype does strange things with boolean masks.#496

ForeverWintr started this conversation inGeneral
Discussion options

Description

Trying toastype levels of an IndexHierarchy doesn't seem to work correctly.

Example
Given this IH:

idx=sf.IndexHierarchy.from_labels([("1",2), (3,"4"), ('5','6')],name=(5,6))# <IndexHierarchy: (5, 6)># 1                        2# 3                        4# 5                        6# <object>                 <object>

This looks like it should astype only the first level, but it astypes both.

idx.astype[[True,False]](str)# <IndexHierarchy># 1                2# 3                4# 5                6# <<U1>            <<U1>

This looks like it should astype both levels, but it astypes only the second

idx.astype[[True,True]](str)# <IndexHierarchy># 1                2# 3                4# 5                6# <object>         <<U1>

Platform

Run the following function (static-frame >= 0.8.1) and provide the results to define your platform and environment:

>>>importstatic_frameassf>>>sf.Platform.display()<Series:platform><Index>platformLinux-5.13.0-48-generic-x86_64-withsys.version3.8.12 (default,Jun82022,11:51:38) [GCC7.5.0]static-frame0.9.3numpy1.17.4pandas0.24.2xlsxwriter1.1.5openpyxl3.0.9xarray0.18.0tables3.6.1pyarrow0.17.0msgpack            (1,0,0)msgpack_numpyNone<<U13><object>
You must be logged in to vote

Replies: 8 comments 1 reply

Comment options

Many thanks for posting this question.

This behavior is correct and is consistent with other selection interfaces, but certainly seems surprising. Boolean selection in StaticFrame is only used if given a Boolean array. Here, we are using lists of bools, which is interpreted as any other selection list: as the "labels" to select. AsIndexHierarchy selection is based on integer "iloc" positions, this can produce a confusing results, as everyIndexHierarchy has positions 0 and 1.

We can see below that if we use a Boolean array instead of a list, we get the expected results.

>>>idx=sf.IndexHierarchy.from_labels([("1",2), (3,"4"), ('5','6')],name=(5,6))>>>idx.astype[np.array((True,False))](str)<IndexHierarchy>123456<<U1><object>>>>idx.astype[np.array((True,True))](str)<IndexHierarchy>123456<<U1><<U1>>>>idx.astype[np.array((False,True))](str)<IndexHierarchy>123456<object><<U1>
You must be logged in to vote
0 replies
Comment options

@flexatone
It seems to work fine if we use 0,1 in a list to specify the iloc position.

 idx.astype[[0,1]](str)<IndexHierarchy>1                23                45                6<<U1>            <<U1>
You must be logged in to vote
0 replies
Comment options

@Acexxxxxxxxx : yes, that is expected. In that case we are selecting "columns" 0 and 1 (which is interpreted the same as False and True when given in a list).

You must be logged in to vote
0 replies
Comment options

ForeverWintr
Jun 16, 2022
Collaborator Author

Interesting! I wonder if it is worth having a special case for bools here, as it seems undesirable to treat them as integers in this case.

I think then that to change the dtypes of some levels in an index, e.g., convert objects to string, the best approach is to do this:

# 1. select types to convertneeds_conversion=idx.dtypes==np.dtype("O")# 2. pass .values to astype:converted=idx.astype[needs_conversion.values](str)
You must be logged in to vote
0 replies
Comment options

@ForeverWintr , thanks for your comments.

Can you elaborate on what you suggest regarding bools? I think I misspoke when I said that (on a depth-2 index)np.array([False, True]) will be treated the same as[0, 1];np.array([False, True]) would be the same as simply[1].

Regarding the "best approach", the following works exactly as you suggest:

>>>idx=sf.IndexHierarchy.from_product(('a','b'), (1,2), (True,False))>>>idx<IndexHierarchy>a1Truea1Falsea2Truea2Falseb1Trueb1Falseb2Trueb2False<<U1><int64><bool>>>>idx.astype[(idx.dtypes==bool).values](str)<IndexHierarchy>a1Truea1Falsea2Truea2Falseb1Trueb1Falseb2Trueb2False<<U1><int64><<U5>
You must be logged in to vote
0 replies
Comment options

ForeverWintr
Jun 20, 2022
Collaborator Author

Your example shows the same thing as mine, right?

If I understand what's happening,astype[np.array([False, True])] is treated the same as simply,astype[[1]], whileastype[[False, True]] is treated the same asastype[[0, 1]]. In factastype[[False, True, True, True, True]] is also treated the same asastype[[0, 1]]!

Re-reading this, I'm not sure if I understood your original comment correctly.

Boolean selection in StaticFrame is only used if given a Boolean array.

I don't think this is true, as there are cases where a boolean list works as (I) expected. For example, inloc selection:

f=sf.Frame.from_records([[1,2,3], [4,5,6], [7,8,9]])f.loc[[True,False,True]]<Frame><Index>012<int64><Index>01232789<int64><int64><int64><int64>

I see this is not the case in all selections though.

I guess then my suggestion was that static frame treat boolean lists the same as boolean arrays, but at the time I thoughastype was the only interface that didn't do that. I see now that that's not the case.

However, selection with a boolean list works in numpy; should it work in more cases in static frame too?

You must be logged in to vote
0 replies
Comment options

Thanks for the example of a list ofbool working as an Booleannp.array... that is not expected and I will investigate (#497)

With NumPy, selection with a list ofbool is never ambiguous as data is only labeled by integers. With StaticFrame, data can be labelled by any hashable, including Booleans, leading to some potentially very ambiguous selections if we do not exclusively treat Booleannp.array as Boolean selectors. The following contrived example makes this explicit, but there are other, more subtle challenges.

>>>f=sf.Frame.from_fields((list('abc'),list('def')),columns=(True,False))>>>f<Frame><Index>TrueFalse<bool><Index>0ad1be2cf<int64><<U1><<U1>>>>f[[False,True]]# a selection list<Frame><Index>FalseTrue<bool><Index>0da1eb2fc<int64><<U1><<U1>>>>f[np.array([False,True])]# a Boolean selection<Frame><Index>False<bool><Index>0d1e2f<int64><<U1>

Putting the responsibility of the caller to make explicit what type of selection they are doing by forcing the usage of Boolean arrays is well within the spirit of StaticFrame's stricter interfaces, I believe.

You must be logged in to vote
0 replies
Comment options

ForeverWintr
Jun 21, 2022
Collaborator Author

Good point about the possibility of Boolean labels!

Is this the only case where selecting with anarray changes the selection behavior? It seems to me that arrays are treated the same as lists for other selections, e.g.,f[['a', 'b']] is equivalent tof[np.array(['a', 'b'])].

You must be logged in to vote
1 reply
@flexatone
Comment options

Yes, I think that is correct: in other situations you can use a list or an array to do selections.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Category
General
Labels
None yet
3 participants
@ForeverWintr@flexatone@Acexxxxxxxxx
Converted from issue

This discussion was converted from issue #487 on June 21, 2022 15:55.


[8]ページ先頭

©2009-2025 Movatter.jp