NotificationsYou must be signed in to change notification settings
Fork11.9k
Star31k

API,MAINT: Reorganize array-wrap calling and introduce`return_scalar`#25409

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

seberg merged 18 commits intonumpy:mainfromseberg:clean-wrapping

Jan 20, 2024

Merged

API,MAINT: Reorganize array-wrap calling and introduce`return_scalar`#25409

seberg merged 18 commits intonumpy:mainfromseberg:clean-wrapping

Jan 20, 2024

Conversation

Copy link

Member

seberg commentedDec 17, 2023

This reorganize how array-wrap is called. It might very mildly change the semantics for reductions I think (and for negative priorities).

Overall, it now passes a newreturn_scalar=False/True when calling__array_wrap__ and deprecates any array-wrap which does not acceptarr, context, return_scalar.

I have not integrated it yet, but half the reason for the reorganization is to integrate it/reuse it in thearray_coverter helper PR (gh-24214), which stumbled over trying to make the scalar handling sane.

Forcing downstream to addreturn_scalar=False to the signature is a bit annoying, but e.g. our memory maps currently try toguess at it, which seems bad. I am hoping, this can be part to making the scalar vs. array return more sane.

But, maybe mainly, I hope it consolidates things (together withgh-24124 mainly, as if ufuncs were the only complex place we used this, it wouldn't matter much).

seberg added5 commits

December 17, 2023 18:22

API: Reorganize__array_wrap__ and addreturn_scalar=False

0ba20d7

This also deprecates any `__array_wrap__` which does not accept`context` and `return_scalar`.

BUG: Fix niche bug in rounding.

e6fdbdc

MAINT: Adjust __array_wrap__ in code and tests (also deprecation test)

b9fb29e

MAINT: Use/move the simplest C-wrapping also

3c51b0f

DOC: Update doc and add release note

ad0e168

github-actionsbot added the 30 - API label

Dec 17, 2023

seberg commented

Dec 17, 2023

View reviewed changes

numpy/_core/src/multiarray/calculation.c

		Py_DECREF(f);
		Py_DECREF(out);
		if (ret_int) {
		if (ret_int&&ret!=NULL) {

Copy link

MemberAuthor

sebergDec 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Sorry, unrelated and a bit annoying to trigger. But the deprecations triggered the bug...

seberg added3 commits

December 17, 2023 18:39

STY: Make linter happy in old tests

09ba729

MAINT: Silence GCC warning (value cannot really be used uninitialized)

2ef4302

MAINT: Small style fixes

411e2db

Copy link

Contributor

mhvk commentedDec 18, 2023

Hmm, seeing this I'm really not sure it is worth breaking every package that implemented__array_wrap__ for something so relatively small... In the end, most classes could check whetherndim==0 and then do something based on that.

Another idea, that does not break the signature (but might well break implementations), is to pass on to__array_wrap__ what numpy would return, i.e., an array or a scalar.

But there also remains something to be said for just more generally using array scalars, and let that be the API break. It might just make code simpler rather than more complex...

p.s. Should stress that I haven't thought enough about this! I should really look more at yourtry-0d-preservation branch...

Copy link

MemberAuthor

seberg commentedDec 18, 2023•
edited
Loading

It would be nice not to require this, but...

In the end, most classes could check whether ndim==0 and then do something based on that.

No because 0-D doesn't imply that NumPy should be returning a scalar, after all that is what annoys us all the time! I would like to be able to decide thatnp.add(1, 1) should maybe a scalar, butnp.add(np.array(1), 1) should be an array. Orarr1d.sum() vs.arr1d.sum(-1) maybe/probably.

It doesn't matter much which cases are which, so long there is no absolute uniform 0-d is always array/scalar...

is to pass on to__array_wrap__ what numpy would return, i.e., an array or a scalar.

Might work out mostlyuntildtype=object hits and all is lost, since you need at leastobj.view(my_subclass) to work, and it cannot. And it isn't even really reasonable to check forisinstance(obj, ndarray)...

Which was, why I thought I would just callresult[()] instead, but that also doesn't work because we don't know that the result defines such indexing semantics.

So, I would love a different idea, but the only thing I have left is piggy-backing oncontext or try/except indefinitely. But I really dislike try/except indefinitely, so that would leave adding one more try/except branch as the current one breaks subclasses that rely oncontext (although I hope beyond masked arrays few exist, and everyone who might want to use it moves to__array_ufunc__).

BUG: Fix reference leak in ufunc array-wrapping

4846b1d

This probably doesn't fix the 32bit issue unfortunately, only the windows ones...

seberg force-pushed theclean-wrapping branch fromabe3530 to4846b1dCompare

December 18, 2023 19:54

BUG: Fix leak for result arrays in all ufunc calls

a2cee02

Copy link

MemberAuthor

seberg commentedDec 19, 2023

FWIW, I have updated thetry-0d-preservation branch with these (and the array-converter) changes. There are some tests still failing, but the last commit shows that it is not impossible anymore to return scalars for scalar inputs and arrays otherwise.
(With the admitted caveat in the strange case where e.g.dtype="i,i" means that a tuple is considered a scalar rather, when it normally is not.)

Copy link

Contributor

mhvk commentedDec 19, 2023

Which was, why I thought I would just call result[()] instead, but that also doesn't work because we don't know that the result defines such indexing semantics.

Maybe we should still go with this? I think it may cause less breakage than a new argument...

Though perhaps I'm overly worried; not sure how many projects in fact have an__array_wrap__ to start with...

Copy link

MemberAuthor

seberg commentedDec 20, 2023

Let's say this, we currently have two possible paths:

Functions which always return an array
Functions which return a scalar if the result is 0-D.

And__array_wrap__ cannot possibly match that, because even if it relies oncontext being set for the second, it will miss reductions.

So to some degree, I really want to add the ability to pick the two behaviors consciously for NumPy. Now, you could just not tell__array_wrap__ about it as it rarely needs to know in practice (most implementations are already happy to just ignore it).

I think it may cause less breakage than a new argument...

Well, in the sense of requiring less code change? Yes, for sure. In the sense of braking thingsdisregarding the deprecation warning, not sure? Since I think some things like pandas Series will run intoresult[()] not working.

Though perhaps I'm overly worried; not sure how many projects in fact have an__array_wrap__ to start with...

Right, and if I add another step which tries to passarr, context whenarr, context, return_scalar fails, then this is aDeprecationWarning and will not fail straight away (albeit add a bunch of overhead).

I don't have a strong opinion, it seems tedious, but OTOH, it seems bound to be annoying for someone to not be able to match NumPy arrays (e.g.mmap is such a case).

Copy link

MemberAuthor

seberg commentedJan 7, 2024

@mhvk sorry, need to come back to this, I think it is pretty important (if just to push the followup ofarray_coverter(), whichis important to fixup NEP 50 niche case bugs in the Pytho API).

We had discussed this in a meeting a while ago, and I don't think anyone had serious concerns with adding it. But I do value your opinion more than most when it comes to__array_function__. What I will say is that this is just a deprecation, so yes, everyone must update, but end-users wouldn't see it immediately (although it's slow). It could be aPendingDeprecationWarning, but that just hides it even more with no gain I think. I can will add a try for__array_function__(arr, context), so that it is truly just a deprecation (even if it could go through 2 trys before succeeding).

To me, the new kwarg still just seems easier than any other polymorphism to try to guess this without it, and it is only some libraries that need to adapt.

mhvk reviewed

Jan 7, 2024

View reviewed changes

Copy link

Contributor

mhvk left a comment•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@seberg - sorry to have forgotten about this! Looking with a fresh eye, I think your solution is for the best -- everything I can think of has the risk of subtle breakage.

So, now just a proper review of the code -- which looks great! One genuine comment about whether forcing wrapping for reductions is correct (looks like the old code did not). Though if forced wrapping for reductions is OK, then all the calls tonpy_apply_wrap in fact used forced, so the argument could be removed. Indeed, that suggests that if we do not want to force wrapping in reductions, we could just move theif statement to the reduction instead (or are you using it in the follow-up PR?).

numpy/_core/memmap.py Outdated

		# Return scalar instead of 0d memmap, e.g. for np.sum with
		# axis=None
		ifarr.shape== ():
		ifarr.shape== ()andreturn_scalar:

Copy link

Contributor

mhvkJan 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This can just beif return_scalar:, right? or does the calling code not ensure this is set only for 0-d arrays?

Copy link

MemberAuthor

sebergJan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You are right, thanks. When I started I hadn't ensured that yet, but it does now (and I think it is much cleaner/better).

numpy/_core/src/multiarray/arraywrap.c Outdated

		* @param inputs Original input objects
		* @param out_wrap Set to the python callable or None (on success).
		* @param out_wrap_type If not NULL, set to the type belonging to the wrapper
		* or set to NULL.

Copy link

Contributor

mhvkJan 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

From the code, this seems no longer true:*out_wrap_type is always set. This means the rest of the comment can go, which is probably for the better!

numpy/_core/src/multiarray/arraywrap.c Outdated

		* Wrapping is always done for ufuncs because the values stored will have been
		* mutated (guarantees calling array-wrap if any mutation may have occurred;
		* this is not true for Python calls, though).
		* UFuncs always call array-wrap with base-class arrays in any case.

Copy link

Contributor

mhvkJan 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think removing the comment up to this point is OK.

Or was this comment meant fornpy_apply_wrap?

Copy link

MemberAuthor

sebergJan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Removed this is a leftover and better and sufficiently covered byforce_wrap now...

numpy/_core/src/multiarray/arraywrap.c Outdated

		arr= (PyArrayObject*)obj;
		}
		else {
		/* TODO: This branch should ideally be NEVER taken! */

Copy link

Contributor

mhvkJan 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Tests fail if you remove this? Might be worth an issue enumerating those (if not too many...).

Copy link

MemberAuthor

sebergJan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I didn't really try, but the point is that as long as we convert 0-D arrays to scalars randomly, it would seem rather typical to callwrap(a + b) and the result of that can be a scalar.

I will expand the comment ofwhy rather than investigating for now (just seems more useful to me).

numpy/_core/src/multiarray/arraywrap.cShow resolvedHide resolved

numpy/_core/src/umath/ufunc_object.c

		returnNULL;
		for (intout_i=0;out_i<ufunc->nout;out_i++) {
		context.out_i=out_i;
		PyObject*original_out=NULL;

Copy link

Contributor

mhvkJan 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'd make this a one-liner, but up to you!

PyObject *original_out = (full_args.out) ?  PyTuple_GET_ITEM(full_args.out, out_i) : NULL;

numpy/_core/src/umath/ufunc_object.cShow resolvedHide resolved

numpy/_core/tests/test_deprecations.py Outdated

		def__array_wrap__(self,arr,context=None):
		return'pass'

		classTest2:

Copy link

Contributor

mhvkJan 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can inherit fromTest1 to allow removing the__array__ method.

numpy/_core/src/multiarray/arraywrap.cShow resolvedHide resolved

seberg added5 commits

January 17, 2024 14:20

Ensure we try passing context and address related smaller review comm…

3caf819

…ents

Merge branch 'main' into clean-wrapping

5d2c04c

Ensure we don't trycontext=None and expand code comment

181a376

Rely on return_scalar always being right (and style nit)

8621eee

Remove outdated comments as per review

3122538

seberg commented

Jan 17, 2024

View reviewed changes

Copy link

MemberAuthor

seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for the review, sorry, I have not gotten back earlier. I think I addressed the issues (the largest one maybe being to try just passing context).

Forcing wrapping in the reduction path seemed right to me, but that said I don't care about just keeping things as they were. It doesn'treally belong here, except that I wanted to slowly try to standardize behavior more.

numpy/_core/src/multiarray/arraywrap.c Outdated

		arr= (PyArrayObject*)obj;
		}
		else {
		/* TODO: This branch should ideally be NEVER taken! */

Copy link

MemberAuthor

sebergJan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I didn't really try, but the point is that as long as we convert 0-D arrays to scalars randomly, it would seem rather typical to callwrap(a + b) and the result of that can be a scalar.

I will expand the comment ofwhy rather than investigating for now (just seems more useful to me).

numpy/_core/memmap.py Outdated

		# Return scalar instead of 0d memmap, e.g. for np.sum with
		# axis=None
		ifarr.shape== ():
		ifarr.shape== ()andreturn_scalar:

Copy link

MemberAuthor

sebergJan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You are right, thanks. When I started I hadn't ensured that yet, but it does now (and I think it is much cleaner/better).

numpy/_core/src/multiarray/arraywrap.c Outdated

		* Wrapping is always done for ufuncs because the values stored will have been
		* mutated (guarantees calling array-wrap if any mutation may have occurred;
		* this is not true for Python calls, though).
		* UFuncs always call array-wrap with base-class arrays in any case.

Copy link

MemberAuthor

sebergJan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Removed this is a leftover and better and sufficiently covered byforce_wrap now...

numpy/_core/src/umath/ufunc_object.c Outdated


		PyObject*wrapped_result=npy_apply_wrap(
		(PyObject*)ret,out_obj,wrap,wrap_type,NULL,
		PyArray_NDIM(ret)==0,NPY_TRUE);

Copy link

MemberAuthor

sebergJan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I would say it is correct, but you are right, it is a subtle change? I find it odd to not do it for reductions, but do it for ufuncs, that said, ufuncs are special also in passingcontext, so I would be happy to undo it.

(I think besides this, looking up might be unnecessary sometimes, but nothing is changed.)

Let's just undo force-wrap for now for reductions (its a change...)

497c535

Copy link

MemberAuthor

seberg commentedJan 17, 2024

Nevermind, I siwtched to not do force-wrap which I think leaves the reduction result unchanged. While I would like to align them, I need this in more for the other PR then delay thinking about it.

ENH: Chain the original error when the deprecationwarning is raised

f50ff16

Doing this due tonumpygh-25635 since it is super confusing with thebad retrying...

Copy link

MemberAuthor

seberg commentedJan 19, 2024

gh-25635 was a good reason to chain exceptions. Because this wasalways terrible (really it got no worse, except now you have a chance of getting to the real error). It shows nicely how that try/except can go wrong if it isn't very specific enough.

In [4]: np.sqrt(np.ma.masked)---------------------------------------------------------------------------TypeError                                 Traceback (most recent call last)File ~/forks/numpy/build-install/usr/lib/python3.10/site-packages/numpy/ma/core.py:6686, in MaskedConstant.__array_wrap__(self, obj, context, return_scalar)   6685 def __array_wrap__(self, obj, context=None, return_scalar=False):-> 6686     return self.view(MaskedArray).__array_wrap__(obj, context)File ~/forks/numpy/build-install/usr/lib/python3.10/site-packages/numpy/ma/core.py:3131, in MaskedArray.__array_wrap__(self, obj, context, return_scalar)   3129     fill_value = self.fill_value-> 3131 np.copyto(result, fill_value, where=d)   3133 # Update the maskTypeError: Cannot cast scalar from dtype('float64') to dtype('bool') according to the rule 'safe'The above exception was the direct cause of the following exception:DeprecationWarning                        Traceback (most recent call last)Cell In[4], line 1----> 1 np.sqrt(np.ma.masked)DeprecationWarning: __array_wrap__ must accept context and return_scalar arguments (positionally) in the future. (Deprecated NumPy 2.0)

mhvk reviewed

Jan 19, 2024

View reviewed changes

Copy link

Contributor

mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Mostly small comments about the logic innpy_find_array_wrap (and one ref-counting issue).

numpy/_core/src/multiarray/arraywrap.c Outdated

		for (inti=0;i<nin;i++) {
		PyObject*obj=inputs[i];
		if (PyArray_CheckExact(obj)) {
		if (wrap==NULL\|\|priority<0) {

Copy link

Contributor

mhvkJan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Priority cannot be< 0 here, right? Should the initialization above be at-inf? A benefit of that is that I think the test forwrap == NULL would not be needed any more.

Copy link

Contributor

mhvkJan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'd also writepriority < NPY_PRIORITY and use that below forpriority = NPY_PRIORITY - somewhat easier to read than having to know the priority of arrays is 0.

Copy link

MemberAuthor

sebergJan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Priority can be< 0, this makes sense for example formemmap and is used there.

I had initially thought about it, and decided it seemed dubious. Maybe all it changes is that-inf really means that__array_wrap__ shouldnever be called, which would make sense (and probably nobody uses that anyway).

There is the odd code, that the old code justskipped ndarrays, which is worse, because negative wraps take precendence over ndarray! (yes this is a change here)

Copy link

MemberAuthor

sebergJan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

(Will replace theNPY_PRIORITY to not hardcode the 0)

Copy link

Contributor

mhvkJan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

But if you initializepriority = 0, doesn't that mean thatnp.memmap.__array_wrap__ never gets selected? Is that OK?

Copy link

MemberAuthor

sebergJan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The code here always uses the first one (if there is more than one). That is what theNULL check does.

Copy link

MemberAuthor

sebergJan 19, 2024•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ah, that is nicer though of course! I should writei == 0 that is much clearer...

EDIT: The only reason I even initialized priority at all is because gcc incorrectly thinks it can be used uninitialized.

Copy link

MemberAuthor

sebergJan 19, 2024•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Wait no, arg... that doesn't work, how about just adding a comment for what theNULL check does early on?

EDIT: Done this, I do think we can tweak this, but as of now... this is the smallest variation from what we had before (which was very strange and didn't agree with the python side)

numpy/_core/src/multiarray/arraywrap.c

		}
		}
		elseif (PyArray_IsAnyScalar(obj)) {
		if (wrap==NULL\|\|priority<NPY_SCALAR_PRIORITY) {

Copy link

Contributor

mhvkJan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Same here.

numpy/_core/src/multiarray/arraywrap.cShow resolvedHide resolved

BUG,MAINT: Address Martens review comments

606f5c3

seberg force-pushed theclean-wrapping branch fromdbb85c0 to606f5c3Compare

January 19, 2024 21:36

charris changed the title~~API,MAINT: Reorganize array-wrap calling and introducereturn_scalar~~API,MAINT: Reorganize array-wrap calling and introducereturn_scalar

Jan 20, 2024

mhvk reviewed

Jan 20, 2024

View reviewed changes

numpy/_core/src/multiarray/arraywrap.c

		for (inti=0;i<nin;i++) {
		PyObject*obj=inputs[i];
		if (PyArray_CheckExact(obj)) {
		if (wrap==NULL\|\|priority<NPY_PRIORITY) {

Copy link

Contributor

mhvkJan 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm still confused: if you initializepriority = -inf (or some other very large negative number), will this line not just always be true ifwrap is undefined? So, why is thewrap == NULL needed?

Copy link

MemberAuthor

sebergJan 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

not if the object definespriority = -inf. Maybe that would be preferable, but 🤷.

Copy link

MemberAuthor

sebergJan 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ah, I only need it on the last branch, yes. Would that seem clearer (then the last branch is still special, this way priority is only initialized to silence warnings)?

Copy link

Contributor

mhvkJan 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think coding against something that definespriority = -inf yet defined__array_wrap__ seems unnecessary! But if you really prefer what you have, that is fine too -- my own sense is that the logic becomes clearer without the extra conditions that are needed to grab the first argument unconditionally.

Copy link

MemberAuthor

sebergJan 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't disagree, I think I just wanted to matchone of the implementations inside NumPy.
Maybe it is laziness speaking: This way no user can run into this (not even hypothesis) and its there already there...?