- Notifications
You must be signed in to change notification settings - Fork35
IndexHierarchy.astype does strange things with boolean masks.#496
-
Description Trying to Example idx=sf.IndexHierarchy.from_labels([("1",2), (3,"4"), ('5','6')],name=(5,6))# <IndexHierarchy: (5, 6)># 1 2# 3 4# 5 6# <object> <object> This looks like it should astype only the first level, but it astypes both. idx.astype[[True,False]](str)# <IndexHierarchy># 1 2# 3 4# 5 6# <<U1> <<U1> This looks like it should astype both levels, but it astypes only the second idx.astype[[True,True]](str)# <IndexHierarchy># 1 2# 3 4# 5 6# <object> <<U1> Platform Run the following function (static-frame >= 0.8.1) and provide the results to define your platform and environment: >>>importstatic_frameassf>>>sf.Platform.display()<Series:platform><Index>platformLinux-5.13.0-48-generic-x86_64-withsys.version3.8.12 (default,Jun82022,11:51:38) [GCC7.5.0]static-frame0.9.3numpy1.17.4pandas0.24.2xlsxwriter1.1.5openpyxl3.0.9xarray0.18.0tables3.6.1pyarrow0.17.0msgpack (1,0,0)msgpack_numpyNone<<U13><object> |
BetaWas this translation helpful?Give feedback.
All reactions
Replies: 8 comments 1 reply
-
Many thanks for posting this question. This behavior is correct and is consistent with other selection interfaces, but certainly seems surprising. Boolean selection in StaticFrame is only used if given a Boolean array. Here, we are using lists of bools, which is interpreted as any other selection list: as the "labels" to select. As We can see below that if we use a Boolean array instead of a list, we get the expected results. >>>idx=sf.IndexHierarchy.from_labels([("1",2), (3,"4"), ('5','6')],name=(5,6))>>>idx.astype[np.array((True,False))](str)<IndexHierarchy>123456<<U1><object>>>>idx.astype[np.array((True,True))](str)<IndexHierarchy>123456<<U1><<U1>>>>idx.astype[np.array((False,True))](str)<IndexHierarchy>123456<object><<U1> |
BetaWas this translation helpful?Give feedback.
All reactions
👍 1
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
@flexatone |
BetaWas this translation helpful?Give feedback.
All reactions
-
@Acexxxxxxxxx : yes, that is expected. In that case we are selecting "columns" 0 and 1 (which is interpreted the same as False and True when given in a list). |
BetaWas this translation helpful?Give feedback.
All reactions
-
Interesting! I wonder if it is worth having a special case for bools here, as it seems undesirable to treat them as integers in this case. I think then that to change the dtypes of some levels in an index, e.g., convert objects to string, the best approach is to do this: # 1. select types to convertneeds_conversion=idx.dtypes==np.dtype("O")# 2. pass .values to astype:converted=idx.astype[needs_conversion.values](str) |
BetaWas this translation helpful?Give feedback.
All reactions
-
@ForeverWintr , thanks for your comments. Can you elaborate on what you suggest regarding bools? I think I misspoke when I said that (on a depth-2 index) Regarding the "best approach", the following works exactly as you suggest: >>>idx=sf.IndexHierarchy.from_product(('a','b'), (1,2), (True,False))>>>idx<IndexHierarchy>a1Truea1Falsea2Truea2Falseb1Trueb1Falseb2Trueb2False<<U1><int64><bool>>>>idx.astype[(idx.dtypes==bool).values](str)<IndexHierarchy>a1Truea1Falsea2Truea2Falseb1Trueb1Falseb2Trueb2False<<U1><int64><<U5> |
BetaWas this translation helpful?Give feedback.
All reactions
-
Your example shows the same thing as mine, right? If I understand what's happening, Re-reading this, I'm not sure if I understood your original comment correctly.
I don't think this is true, as there are cases where a boolean list works as (I) expected. For example, in f=sf.Frame.from_records([[1,2,3], [4,5,6], [7,8,9]])f.loc[[True,False,True]]<Frame><Index>012<int64><Index>01232789<int64><int64><int64><int64> I see this is not the case in all selections though. I guess then my suggestion was that static frame treat boolean lists the same as boolean arrays, but at the time I though However, selection with a boolean list works in numpy; should it work in more cases in static frame too? |
BetaWas this translation helpful?Give feedback.
All reactions
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
Thanks for the example of a list of With NumPy, selection with a list of >>>f=sf.Frame.from_fields((list('abc'),list('def')),columns=(True,False))>>>f<Frame><Index>TrueFalse<bool><Index>0ad1be2cf<int64><<U1><<U1>>>>f[[False,True]]# a selection list<Frame><Index>FalseTrue<bool><Index>0da1eb2fc<int64><<U1><<U1>>>>f[np.array([False,True])]# a Boolean selection<Frame><Index>False<bool><Index>0d1e2f<int64><<U1> Putting the responsibility of the caller to make explicit what type of selection they are doing by forcing the usage of Boolean arrays is well within the spirit of StaticFrame's stricter interfaces, I believe. |
BetaWas this translation helpful?Give feedback.
All reactions
-
Good point about the possibility of Boolean labels! Is this the only case where selecting with an |
BetaWas this translation helpful?Give feedback.
All reactions
-
Yes, I think that is correct: in other situations you can use a list or an array to do selections. |
BetaWas this translation helpful?Give feedback.
All reactions
This discussion was converted from issue #487 on June 21, 2022 15:55.