Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

ENH: Adding __array_ufunc__ capability to MaskedArrays (again)#22914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
greglucas wants to merge20 commits intonumpy:main
base:main
Choose a base branch
Loading
fromgreglucas:masked-ufunc-2

Conversation

greglucas
Copy link
Contributor

This is a rebase and update of several previous PRs:#21977 and#16022, which were both reverted due to issues downstream. I've kept a bunch of smaller commits showing the updates to comments to try and make it easier for review.

I've tested with Astropy and there are still two failing tests over there, which I don't quite understand yet, so I'm hoping a review might be able to point out some hints for what I'm missing. It looks like we do enterMaskedColumn.__array_wrap__, but with acontext=None, so that wrap isn't undoing anything because it doesn't know which function it came from... I'm not sure if this means the update needs to be handled in numpy or astropy.

astropy/units/tests/test_quantity_ufuncs.py::TestQuantityMathFuncs::test_power_array_array3astropy/table/tests/test_column.py::TestColumn::test_numpy_boolean_ufuncs[MaskedColumn]

cc@rcomer,@seberg,@mhvk since you've all had some great comments and help on the previous PRs!

Summary

This enables any ufunc numpy operations that are called on a MaskedArray to use the masked version of that function automatically without needing to resort to np.ma.func() calls directly.

Example

importnumpyasnpa=np.ma.array(np.arange(10), [0,1]*5)x=a<10# x.data == [True,  True,  True,  True,  True, False, False, False, False, False]y=np.ma.less(a,10)# y.data == [True,  True,  True,  True,  True,  True, False,  True, False, True]

Even though data values ina are masked, they are still acted upon and evaluated. Changing that underlying data. However, calling the masked less version doesn't do anything with the data in the array. Note that this is a big change but to a somewhat ambiguous case of what happens to masked values under function evaluation.

The real reason I began looking into this was to not evaluate the function in the masked values because I'd already premasked the condition that would throw warnings at me. See#4959 for evaluating log10 at locations less than 0 that were under the mask, so still threw warnings.

importnumpyasnpA=arange(-2,5)/2Am=numpy.ma.masked_less_equal(A,0)np.log10(Am)# Previously threw a warning, now works as expectednp.ma.log10(Am)# No warnings before or after

Implementation

I haven't added any documentation yet because I wanted to check and see if this was even the right way of approaching this idea. I struggled to figure out how to pass theout= object pointers around in a consistent manner, so I am definitely interested in suggestions on improving that. I had to update quite a few seemingly random areas of the Masked codebase to make that work everywhere. There are a few tests that I had to update because values under the mask were being compared, and those have now changed with this implementation applying the masked version of ufuncs everywhere.

This is a pretty major (underlying) API change, so I expect there to be quite a bit of discussion on the best approaches here.

Linked Issues

Fixes#4959
Closes#15200

rcomer and edmundhenley-mo reacted with heart emojiMuellerSeb reacted with eyes emoji
greglucasand others added19 commitsJanuary 2, 2023 17:15
This enables any ufunc numpy operations that are called on aMaskedArray to use the masked version of that function automaticallywithout needing to resort to np.ma.func() calls.
Fixes the problem reported atnumpy#21977 (comment)The reduce method here effectively calls itself with an unmaskedMaskedArray (mask=nomask) and then expects either a MaskedArray ora scalar.  This change ensures that an ordinary ndarray isconverted to a MaskedArray, following the pattern already used inmean and var in this module.
Now we are calling np.power() in std() which goes through the ufuncmachinery, so we don't want to pass any additional unsafe castingkwargs that aren't allowed within the masked implementation.
Move the np-ufunc check to the top of the routine so we immediatelygo to super() when necessary. Before we were returning NotImplementedif an arg wasn't able to be handled.Update the arg instance check to defer for everything but anotherclass that has implemented __array_ufunc__
This allows for subclasses to be handled correctly
This is handled in the C code now within the ufunc machinery.
@mhvk
Copy link
Contributor

mhvk commentedJan 3, 2023

@greglucas - theastropy.units error I think we would need to work around - as is, we don't really treatMaskedArray correctly and it shows here; we will just have to deal with it...
But theastropy.table error is more perhaps more serious:MaskedColumn is a subclass ofMaskedArray and it assumes it does not have to worry about__array_ufunc__. While we could likely define it and make things work again (indeed, that might be good), we may just be the canary in the coalmine: any otherMaskedArray subclasses may well break too.

@mhvk
Copy link
Contributor

mhvk commentedJan 3, 2023

Actually, I quickly tried to implement__array_ufunc__ myself onMaskedColumn, and it exposed a bug in the implementation here even fornp.ma.MaskedArray:

In [2]: np.signbit(np.ma.MaskedArray([1., 2.], mask=[True, False]))---------------------------------------------------------------------------TypeError                                 Traceback (most recent call last)Cell In[2], line 1----> 1 np.signbit(np.ma.MaskedArray([1., 2.], mask=[True, False]))TypeError: operand type(s) all returned NotImplemented from __array_ufunc__(<ufunc 'signbit'>, '__call__', masked_array(data=[--, 2.0],             mask=[ True, False],       fill_value=1e+20)): 'MaskedArray'

The reason is that if the function is not defined innp.ma, you just hand off tosuper(), i.e.,np.ndarray.__array_ufunc__. But that one returnsNotImplemented because it does not know how to deal withMaskedArray. I think you'd have to extract the data and copy the mask over. Though perhaps easier is to definenp.ma.signbit!

@mhvk
Copy link
Contributor

mhvk commentedJan 3, 2023

p.s. Might want to check test completeness by checking all functions innp.testing.overrides.get_overridable_numpy_ufuncs() are covered...

if ma_ufunc is np_ufunc:
# We didn't have a Masked version of the ufunc, so we need to
# call the ndarray version to prevent infinite recursion.
return super().__array_ufunc__(np_ufunc, method, *inputs, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You will need to unwrap the data arrays here and then call things again. On the plus side, it may actually make sense to always just do that.
Also, you should not usesuper() here, but rather call the original ufunc I think, because after unwrapping it could be that you need to dispatch to a different argument.

There is the small change that the ufuncs behave a bit different from thenp.ma.ufunc version (and maybe even better?): seegh-22347.
So there might even be a point of just ignoring whatnp.ma.func does.

After the fallout from the "simple"gh-22046, I am a bit hesitant about changing these subtleties, we probably need to test very carefully and I am wondering if we should push it of for a "2.0" (which I hope to bring up now for in a year). In that case, a possible approach could be to hide these definitions behind an environment variable (feature flag) maybe.

@mhvk
Copy link
Contributor

mhvk commentedJan 4, 2023

Maybe worth pulling@seberg's comment in the main thread: a question is whether the regular numpy functions should just call thenp.ma ones, including the, to me, rather strange use of domains (seegh-22347), or rather do a simpler wrapping where masks of input arguments simply or-d together, without worry about the values of the inputs. Arguing against this of course is that one then is basically moving to a new class (which I did for astropy; seehttps://docs.astropy.org/en/latest/utils/masked/index.html and in particular the discussion of differences withnp.ma.MaskedArray).

But probably it is easier to just add the few missing ufunc tonp.ma and work with what you have!

@greglucas
Copy link
ContributorAuthor

p.s. Might want to check test completeness by checking all functions in np.testing.overrides.get_overridable_numpy_ufuncs() are covered...

But probably it is easier to just add the few missing ufunc to np.ma and work with what you have!

Thanks for pointing me to that function! Somewhat surprisingly, there are actually 30 ufuncs that have no implementation innp.ma currently! Most of them are easy, but I don't think there is any wrapper class that handles multi-output functions yet (ex:https://numpy.org/doc/stable/reference/generated/numpy.frexp.html)


I went digging through some of the old mailing list archives that I wasn't aware of and came across this really nice discussion:
https://mail.python.org/archives/list/numpy-discussion@python.org/thread/OSGGDTJX6XZK3LXLNQYGZP766FD3YFQQ/#QNVWET3O7KXFGAB7NVZNWUGHE766TGNL
and if I'm parsing it all correctly it seems like people are basically on the same page and working out the final details is all that is left. I had also forgotten that@ahaldane had pointed me at their implementation of a new class in one of the previous PRs:
https://github.com/ahaldane/ndarray_ducktypes/blob/master/ndarray_ducktypes/MaskedArray.py

My goal for this was to have a bit of a stepping stone before going all-in on a new class. However, I am now possibly seeing that this might just be playing a game of whack-a-mole that I'm sure will change some behavior and upset someone, so maybe we should just do a hard-breaking-change all at once instead of introduce these small papercuts. (i.e. work on the better implementation for a 2.0 release rather than adding these now)

I think one of the things missing from many of the discussions is that many downstream libraries make use ofnp.sum(),np.stack(), etc... without thinking about what instance is being passed into them (various arrays from numpy, pandas, dask, masked), so I do think it is important to try and start dispatching thenp.func tonp.ma.func specific methods for the Masked class since that seems to be the odd one out at the moment...

My desires from a new class:

  1. No auto-masking of "bad" computation locations, let the user do that apriori if they want

I haven't seen anyone argue for this yet, and this looks like it goes way back in git blame.

  1. No warnings when doing a computation of a bad value under a mask

It would be annoying to get divide by zero warnings on your computations when you've already pre-masked the data (see my example at the top of the PR for the motivation here!)

  1. np.func gets forwarded tonp.ma.func whenever possible (implement__array_ufunc__ and__array_function__)

It is quite surprising right now to donp.stack(ma, ma) and get a Masked instance back, but with an incorrect mask! (it would be better to just get a plain ndarray IMO)

  1. Proper subclass wrapping/heirarchy preservation

Should fall out of (3) if implemented properly with NotImplemented being returned.


I'll try and take a better look at@ahaldane's implementation in nd_duck_array and@mhvk's implementation in Astropy. I think right now we are missing having a clear path between where we are now and where we want to go, so something a little more concrete with actions of what is needed might be nice. I'll try and give it some thought.

seberg reacted with rocket emoji

@mhvk
Copy link
Contributor

mhvk commentedJan 5, 2023

Thanks, that's a great summary! There is only one item I'd add: think carefully how to deal with underlyingdata other thanndarray. For instance, part of the reason I made a separate implementation in astropy was that the currentMaskedArray simply cannot be used to createMaskedQuantity -- withMaskedArray holding aQuantity attributes like the unit become hidden (and I think@ahaldane's version did not make that easy either). I found it worked much better by havingQuantity on top, effectively holding a masked array as data (in our case, with corresponding class hierarchyMaskedQuantity(Quantity, MaskedNDArray)). But for something likedask, it might be better to have both the data and the mask be dask arrays, while for Xarray, that would not make sense.

Overall, it certainly is not crazy to move in small steps! I think a bigger question here is whether when wrapping regular numpyufunc, one wants to just get thenp.ma.<ufunc> equivalent, or something that moves closer to a final goal (i.e., as you note, no masking of bad elements but also no warnings for masked ones [note that the latter may be quite tricky...]).

greglucas reacted with thumbs up emoji

@MuellerSeb
Copy link

Any news here?

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@sebergsebergseberg left review comments

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

BUG: MaskedArray does not seem to respect ufunc dispatch hierarchy Do not warn when invalid values are masked in a MaskedArray
6 participants
@greglucas@mhvk@MuellerSeb@seberg@charris@rcomer

[8]ページ先頭

©2009-2025 Movatter.jp