NotificationsYou must be signed in to change notification settings
Fork11.9k
Star31k

ENH: Create helper for conversion to arrays#24214

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

ngoldbaum merged 4 commits intonumpy:mainfromseberg:creation-helper

Jan 22, 2024

Merged

ENH: Create helper for conversion to arrays#24214

ngoldbaum merged 4 commits intonumpy:mainfromseberg:creation-helper

Jan 22, 2024

Conversation

Copy link

Member

seberg commentedJul 19, 2023

Had to give this a start to be able to experiment. It does solve at least some problems (e.g.np.select() use-case and it looks like it would fix and/or simplify linalg code).

The main thing is that we often convert multiple arrays, and then want something like:

a, b, c = np.asarray(a), np.asarray(b), np.asarray(c) and thendtype = np.result_type(a, b, c) often with an additional0.0. That is a pretty common pattern, NEP 50 makes it tricky if you wish to support weak scalars, because you need to callresult_type with the scalar information intact (before theasarray or manually undo). (For example, this happens innp.select())
- Maybe libraries can disregard that, but for NumPy it's hard, since you want many functions to be ufunc-like, so if ufuncs do it, others probably should be able to/try to.
Maybe mainly internally we sometimes need the__array_wrap__ for subclass support, although it seems consistently used for single input functions (outside ufuncs).
- Additional motif: A clean wrapping pattern seems required for making scalar logic sane.
It may be useful to know if an input was a scalar or not (not a 0-D array). You need to ask NumPy about it, but it discovered at conversion time. So it is an information that is weird to tag onto an array (probably?) but figuring it out later would require duplicating work.

First, I wanted to just make a single function. But with multiple things above that may be interesting to return, and e.g.result_type() might error and I am not sure we always need it.

So, is it insane to create a rich object (rather than a complicated return value and many kwargs):

# better names of course, currently in `np.core._multiarray_umath.ArrayCreationHelper`arrays = creation_helper(*inputs)# e.g. a linspace could do something like this:dtype = arrays.common_dtype(ensure_inexact=True)arrays.arrays  # arrays as tuple (probably allow arrays[0])# Apply the typical wrapping if needed# (should cache wrap, or fetch it early on rather, maybe just# make it a method which applies the wrapping `arrays.wrap(result)`).if arrays.wrap is not None:    return arrays.wrap(result)return result

(I can seea, b, info = helper(a, b) as a nice pattern; OTOH it makes including casting kwarg heavy since you somewhat need to include it in the first step.)

Further things might include:

Casting to a given or the common dtype. (maybearrays.to_common_dtype(ensure_inexact=True))
I think@mdhaber was hoping to include broadcasting. Not sure, since it is one call away, but also seems fine to add.

github-actionsbot added the 01 - Enhancement label

Jul 19, 2023

seberg changed the title~~ENH: Create helper for conversion to arrays~~DISCUSS,ENH: Create helper for conversion to arrays

Jul 19, 2023

Copy link

Contributor

I like this idea. Following the broadcasting suggestion, it may be nice to try to be reasonably compatible with whatufuncs do generally. I guess that might include an optionalout argument (and correspondingout attribute) or perhaps instead a methodoutput_arrays(out=None) that would create the array with the right dtype and subclass, or check if it is OK if an array is given. Taking this perhaps too far, one could similarly have anaxis argument/method as well (which would just usenormalize_axis on the broadcasted shape under the hood).

p.s. Might it be possible to factor out theufunc machinery (rather that write this from scratch)? Or indeed use something like this inside theufuncs?

Copy link

Contributor

mdhaber commentedJul 19, 2023•
edited
Loading

I think@mdhaber was hoping to include broadcasting. Not sure, since it is one call away, but also seems fine to add.

That would be nice. It seems like a common enough follow-up step that it would be a convenient addition to this convenience function.

So, is it insane to create a rich object...

We do this frequently in SciPy now, so not insane. Seems unusual for NumPy, but I wouldn't complain.

Idea: maybe instead of an extra parameterensure_inexact=True, couldextra_dtype accept classes likenp.inexact, if that makes sense?
I'm not used to reading C. Would it be easy to add a docstring or the equivalent sequence of existing NumPy commands so I can check my understanding of what is going on?
We had also discussed the possibility of addingaxis to be ignored whenbroadcasting for use in, say, multi-sample reduction statistics. For example, inttest_ind, it is sensible forx.shape == (5,) andy.shape == (3, 4) withaxis=-1; when the arrays are broadcasted, the shapes become(3, 5) and(3, 4). I implemented this already in SciPy, but should I open an issue about it being moved upstream?

Thanks for this!

Copy link

MemberAuthor

seberg commentedJul 20, 2023

Yeah, agree that the ufunc machinery should share as much as possible, although was a bit hoping to mainly consolidate Python paths first. The reason was that looking at the array-wrap implementation in ufuncs they are messy and buggy (there is an issue), so fixing that seems like a bit of a messy change easier on its own.

I think a method for output array prep could make sense, adding it to the initial call seems like it would overload meaning (we don't know the common dtype is right for one thing).

Idea: maybe instead of an extra parameter ensure_inexact=True, could extra_dtype accept classes like np.inexact, if that makes sense?

Right, that is how it is implemented. Justdtype=np.inexact doesn't quite work yet and I thought theresult_type(..., 0.) pattern is so common, it might actually make sense to have it explicitly.

[discussed] adding axis to be ignored when broadcasting for use in, say, multi-sample reduction statistics [...]

Ah, right! Yeah, maybe make its own issue, it seems to me we should with integrating that intonp.broadcast_arrays first. Admittedly, I think that may require reimplementing the whole function (maybe for better).

Copy link

Member

rgommers commentedJul 20, 2023

@seberg from your description it is not entirely clear to me if you mean for this to be a public or private function? With the namespace we discussed somewhere else,np.lib.array_utils I believe, we should get a place to put utilities like this - but I can't tell from the description yet if this would be useful more widely.

Since it was mentioned above, we recently came across the need fornormalize_axis_index in SciPy - that's also a useful utility, which is not public right now.

Copy link

MemberAuthor

seberg commentedJul 20, 2023

The first issue I want to solve would mean private is fine, but yes suspect we should just make it public and if we are unsure about some methods/attributes or so give it an_ (which would mean putting it intonp.lib.array_utils namespace, or different name if anyone suggests a better one soon).
(And@mhvk's interest makes me lean more towards public at last if I can get some of what@mdhaber wants too.)

Copy link

Contributor

mhvk commentedJul 20, 2023

Agreed that the output makes more sense as a separate method, rather than on input, especially since in most cases something has to be generated.

Also, 👍 to makingnormalize_axis_index a publicly accessible tool (we already import it...); probably all ofshape_base can go into the same collection too.

p.s. A better name thancreation_helper would still be good. Will try to think a bit more.

p.p.s. You'll not be surprised to hear that I think that by default subclasses should be passed through... But it may be good to think a bit more generally what to do with array mimics - the benefit of gettingresult_type depends only on having adtype, not on being anndarray (subclass) (and similarly for a possible broadcast shape). Though easiest perhaps for that is just to let this be overridden by__array_function__.

seberg mentioned this pull request

Jul 22, 2023

BUG:np.geomspace() raises AttributeError ifstop >= 2 ** 64#24194

Open

Copy link

Contributor

mdhaber commentedAug 12, 2023•
edited
Loading

I just ran into this little gem , which might have been avoided with use of a helper like this.

np.float16([2])<np.float32(2.0009766)# array([False])np.float16([2])<np.float32([2.0009766])# array([ True])

Copy link

MemberAuthor

seberg commentedAug 12, 2023

You might think that helps, but it probably wouldn't :). This is NEP 50 stuff.

Copy link

Contributor

mdhaber commentedAug 12, 2023•
edited
Loading

If before comparison the float16 is promoted to the result type or if the float32 is broadcasted to the result shape, the comparison is done correctly. So I think the helper would help with those things.

Copy link

MemberAuthor

seberg commentedAug 12, 2023

@mdhaber no,np.result_type() would do the samedown promotion in the above case until you do NEP 50 .

Copy link

Contributor

mdhaber commentedAug 12, 2023

I always pass the dtypes intoresult_type instead of the arrays.

import numpy as npx = np.float16([2])y = np.float32(2.0009766)print(x < y)x1, y1 = np.broadcast_arrays(x , y)print(x1 < y1)dtype = np.result_type(x.dtype, y.dtype)print(x.astype(dtype) < y.astype(y))

[False][ True][ True]

Copy link

MemberAuthor

seberg commentedAug 12, 2023

Right, if you pass the dtypes you get the right thing, but that is the same as adopting NEP 50.

Copy link

MemberAuthor

seberg commentedNov 3, 2023

I was trying to get back to this a bit, but having difficulties to quite formulate what we need/want here. I would prefer to not go the feature-creep route for starters.

Things that I would like:

Allow converting multiple arrays at once more conveniently (easy).
The result should havecommon_dtype() as it is pretty useful especially with NEP 50.
The result should have awrap() to apply__array_wrap__() when it makes sense.¹
We could have more options:
- shape (braodcast shape)
- Abroadcast() method to modify the "arrays"` (probably needs a local implementation for starters, which then might be adoptable elsewhere). Broadcastingcould be a kwarg also.

There are some options, which would probably fit as kwargs² :

subok=True (to set it toFalse)
broadcastingcould be a kwarg.

Another API, I worry about more: it would be nice to provide a clean way to preserve Python better. This is to allow later normal promotion to use NEP 50 rules.
Due to theisclose and similar use-case, it also seems to me that in many cases it would make sense for broadcasting tonot modify scalars (or 0-D objects).
For that, it would make sense to either have a kwargpreserve_scalars=True or an attributearrays_or_scalars?

Maybe one solution is to make it:

conv=array_converter(*arrays)dt=conv.common_dtype()# always applicable, may do initial conversion to arraysconv.shape# could do this to get the broadcast shape, but could also defer.conv.was_pyscalar# maybe.  Or an enum with `0-D, scalar, python scalar`.arrays=conv.arrays(subok=True,allow_scalars=True,broadcast=True,dtype=dt)# Wrap result (if wanted) (see footnote 2) [^2]returnconv.wrap(result)

Any thoughts? Implementing any of this isn't super hard, but getting more clarity on where to start without a feature creap, is for me.

I think, I like that last thought as API (initialization takes no kwargs, but fetching the arrays is a function and not an attribute/property). But it isn't quite clear to me. Especially how to do scalar preservation.

Footnotes

We currently mainly/only seem to use wrap ifsubok=False, so the details here are a bit tricky (should you ever call it when you already got a subclass?).
However, one reason I would like a clear entry-point is that it would helpa lot thinking aboutENH: No longer auto-convert array scalars to numpy scalars in ufuncs (and elsewhere?) #24897, since it could take a kwarg likepreserve_scalars=True.↩
kwargs are slow for classes due totp_new pre Python 3.13, but lets not worry about it↩

seberg force-pushed thecreation-helper branch from2e69d9c to2eeb2c5Compare

November 23, 2023 10:59

seberg commented

Nov 23, 2023

View reviewed changes

numpy/_core/function_base.py

		stop=asanyarray(stop)*1.0
		conv=_array_converter(start,stop)
		start,stop=conv.as_arrays(subok=True)
		dt=conv.result_type(ensure_inexact=True)

Copy link

MemberAuthor

sebergNov 23, 2023•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Iterated a bit, this is still a bit rough (I would like to consolidate__array_wrap__ a bit at least).

This does defineconv.wrap() as the one true way to call__array_wrap__, but for NEP 50, the interesting part is really this part.
(I haved also moved the cast to float into the subtract viadtype=, which isn't strictly necessary.)

The point is, that this meansnp.linspace(np.float32(3), 10) returns float32. (yes, still needs tests, but thought I would note this, as this pattern would hopefully be useful a few places, that are currently subtly broken in the same way.)

Copy link

MemberAuthor

sebergNov 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@mhvk beyond cleanup, part of my thought for consolidating is that it should help withgh-24897. Because wrapping is a reasonable place to decide scalar vs. array logic.

mhvk reviewed

Nov 24, 2023

View reviewed changes

Copy link

Contributor

mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@seberg - now had a better look. I think this is really nice and it should be both relatively straightforward to extend and to use some of it inside the ufuncs.

One suggestion: it might be nice ifconv can be indexed like a tuple. That should then return the "anyarray" version (in which case the defaultsubok=False foras_arrays() makes sense; without this, I feelsubok=True would be a more logical default).

If not doing that, I would suggest to replaceary, = conv.as_arrays() withary = conv.as_arrays()[0] for clarity (the trailing comma before the equal sign is easy to miss...).

numpy/_core/function_base.pyShow resolvedHide resolved

numpy/_core/src/multiarray/creation_helper.c Outdated

		Py_ssize_tnarrs_ssize_t= (args==NULL) ?0 :PyTuple_GET_SIZE(args);
		intnarrs= (int)narrs_ssize_t;
		/* Limit to 64 for now to use in a few places. */
		if (narrs_ssize_t>64) {

Copy link

Contributor

mhvkNov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

More logical to useNPY_MAXARGS?

numpy/_core/src/multiarray/creation_helper.c Outdated

		item->DType=NPY_DTYPE(PyArray_DESCR(item->array));
		Py_INCREF(item->DType);

		if (!npy_mark_tmp_array_if_pyscalar(

Copy link

Contributor

mhvkNov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think I'd swap the branches to avoid the!

numpy/_core/src/multiarray/creation_helper.c Outdated

		Py_INCREF(item->descr);
		}
		else {
		/* just clear the flag again */

Copy link

Contributor

mhvkNov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Well, swap only if this is really not needed! This would seem to preclude using it later, is that intended? Or is->was_scalar exactly the same (i.e., python scalars only)?

Copy link

MemberAuthor

sebergNov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I also leavedescr empty. Se yes, it is just the same. Just wanted to go through that, because that is wehre other paths go through.

numpy/_core/src/multiarray/creation_helper.c

		res_item=item->object;
		}
		else {
		res_item= (PyObject*)item->array;

Copy link

Contributor

mhvkNov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Probably premature optimization, but it would seem logical not to even create the array from a python scalar unless it is actually needed.

Copy link

MemberAuthor

sebergNov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

yes, but that requires splitting up the creation, so I indeed thought it premature.

This is one of the things that we could do in the ufunc if and only if we delete the "promotion state" (to allow setting it to warn for NumPy 2).

numpy/_core/src/multiarray/creation_helper.c Outdated

		creation_helper_dealloc(PyArrayCreationHelperObject*self)
		{
		creation_item*item=self->items;
		for (inti=0;i<self->narrs;i++) {

Copy link

Contributor

mhvkNov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

you need anitem++ here too! (or doself->items[i])

Copy link

MemberAuthor

sebergNov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ouch, luckily, probably the reason the sanitizer job failed...

numpy/_core/src/multiarray/creation_helper.c Outdated

		creation_helper_as_arrays(PyArrayCreationHelperObject*self,
		PyObjectconstargs,Py_ssize_tlen_args,PyObject*kwnames)
		{
		npy_boolsubok=NPY_FALSE;

Copy link

Contributor

mhvkNov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think this will replace moreasanyarray thanasarray, so maybe make the defaultNPY_TRUE? In general, subclasses should be OK...

Copy link

MemberAuthor

sebergNov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

OK, let me change it. In practice it isn't a big change. For a bit I was thinking "whyasanyarray at all?!
I am not actually sure it is more useful for most places, but since theconv.wrap() is a no-op (normally) for those now, I also don't really care to prefer it.

The point forasanyarray() is allowing functions to work due to composition of other functions/methods working out, when simple wrapping might not propagate info well.
In practice, I do wonder how often that is actually well advised (compared to using__array_function__ and dealing with it manually). The point being: if the class is not trivial enough for just wrapping, maybe it should use__array_function__. OTOH, I can see a small amount of pragmatism in it, so fine with preferring it.

numpy/lib/_arraysetops_impl.pyShow resolvedHide resolved

numpy/lib/_arraysetops_impl.py Outdated


		"""
		conv=_array_converter(ary)
		ary,=conv.as_arrays(subok=True)

Copy link

Contributor

mhvkNov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybeary = conv.as_arrays(subok=True)[0].ravel().

It does feel one should just be able to indexconv (effectively making it a bit like a subclass of tuple).
Then it would beary = conv[0].ravel().

numpy/lib/_shape_base_impl.py

		# matrices have to be transposed first, because they collapse dimensions!
		out_arr=transpose(buff,buff_permute)
		returnres.__array_wrap__(out_arr)
		res=transpose(buff,buff_permute)

Copy link

Contributor

mhvkNov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

For subclasses other thanmatrix, this would be an API change if.transpose() is overridden. That seems rather unlikely (and wrong-headed), so I'm fine with making the change regardless...

seberg commented

Nov 28, 2023

View reviewed changes

Copy link

MemberAuthor

seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I have addressed all the comments and changed the default forsubok (I was torn on that initially, but I made my peace with it).

I allow indexing now (which also allowsa, b = conv unpacking, although calling the method is actually a tiny bit faster (the reason is that unpacking checksconv[length] to trigger the index error).

I have a start on reorganizing the ufuncs to share some infrastructure for array-wrap at least. But decided that shouldn't stop us here, since it mostly means moving thefind_wrap function out to share it and the ufunc changes are not quite utterly trivial.

numpy/_core/function_base.pyShow resolvedHide resolved

numpy/_core/src/multiarray/creation_helper.c

		res_item=item->object;
		}
		else {
		res_item= (PyObject*)item->array;

Copy link

MemberAuthor

sebergNov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

yes, but that requires splitting up the creation, so I indeed thought it premature.

This is one of the things that we could do in the ufunc if and only if we delete the "promotion state" (to allow setting it to warn for NumPy 2).

numpy/_core/src/multiarray/creation_helper.c Outdated

		Py_INCREF(item->descr);
		}
		else {
		/* just clear the flag again */

Copy link

MemberAuthor

sebergNov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I also leavedescr empty. Se yes, it is just the same. Just wanted to go through that, because that is wehre other paths go through.

numpy/_core/src/multiarray/creation_helper.c Outdated

		}
		else {
		/*
		* `__array_wrap__` expects an array argument, so we ensure that.

Copy link

MemberAuthor

sebergNov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Well, I think that scalars remain common in some code paths (for the time being at least). But rephrased the comment.

numpy/_core/src/multiarray/creation_helper.c Outdated

		creation_helper_dealloc(PyArrayCreationHelperObject*self)
		{
		creation_item*item=self->items;
		for (inti=0;i<self->narrs;i++) {

Copy link

MemberAuthor

sebergNov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ouch, luckily, probably the reason the sanitizer job failed...

numpy/lib/_arraysetops_impl.pyShow resolvedHide resolved

numpy/_core/src/multiarray/creation_helper.c Outdated

		creation_helper_as_arrays(PyArrayCreationHelperObject*self,
		PyObjectconstargs,Py_ssize_tlen_args,PyObject*kwnames)
		{
		npy_boolsubok=NPY_FALSE;

Copy link

MemberAuthor

sebergNov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

seberg marked this pull request as ready for review

November 28, 2023 14:12

seberg force-pushed thecreation-helper branch from8361de0 to9003c94Compare

November 28, 2023 14:23

mhvk reviewed

Nov 28, 2023

View reviewed changes

Copy link

Contributor

mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks mostly great now! Just a few remaining comments.

One comment: with the indexing giving any array, I think it is actually OK to havesubok=False as default foras_arrays(). Though also completely happy to keep it as you have it now - in principle, subclasses should behave like regular ones, so it should be fine to keep them (and it is nice if the default actually corresponds to the least amount of work, which is the case forsubok=True since you are not viewing as a regular array).

numpy/_core/function_base.pyShow resolvedHide resolved

numpy/_core/src/multiarray/array_converter.cShow resolvedHide resolved

numpy/_core/src/multiarray/array_converter.c Outdated

		}
		}

		Py_INCREF(self);

Copy link

Contributor

mhvkNov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Are you sure you need theINCREF here?PyObject_NewVar returns a new reference already (which youDECREF in the fail path).

Copy link

MemberAuthor

sebergNov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Right, thanks! Must have been some remnant of some debugging confusion...

numpy/_core/src/multiarray/array_converter.c Outdated

		returnNULL;
		}

		/* When we only have scalars, the default is to retur an array: */

Copy link

Contributor

mhvkNov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

retur ->return

numpy/lib/_function_base_impl.pyShow resolvedHide resolved

Copy link

Contributor

mhvk commentedNov 28, 2023

p.s. On theasanyarray, beyond the rationale that subclasses should behave the same way and thus not be converted (of course, they don't always do...), it is certainly nice if generally they get passed on, since then__array_ufunc__ coverage often is good enough for functions to work. ForQuantity at least, there are still a lot of functions where we do not have to override; seehttps://github.com/astropy/astropy/blob/main/astropy/units/quantity_helper/function_helpers.py (theSUBCLASS_SAFE_FUNCTIONS set; as you'll notice, there are also a lot that do need overrides, which means a very long list of rather boilerplate code that is little fun to write, so all the better if some functions don't need it!)

seberg force-pushed thecreation-helper branch fromf40fa4a to2ad57cfCompare

November 29, 2023 12:00

seberg commented

Nov 29, 2023

View reviewed changes

Copy link

MemberAuthor

seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yeah, aboutSUBCLASS_SAFE_FUNCTIONS: it would be great if we could formalize this in some form and document the pattern. But the best would be to generalize it to non-subclasses somehow, which might be just callingfunc._implementation() rather thanndarray.__array_function__.

with the indexing giving any array, I think it is actually OK to have subok=False as default for

I agree more with the latter part, unless we hadasanyarray also: align default and short-hand. Plus, if you want a real nudge towardsubok having the opposite default is no good.

The main reason fornot doing it was maybe always matrices (and probably historic). But we should hopefully worry less and less about those.

Also added a start for documentation.

numpy/_core/src/multiarray/array_converter.cShow resolvedHide resolved

numpy/_core/src/multiarray/array_converter.c Outdated

		}
		}

		Py_INCREF(self);

Copy link

MemberAuthor

sebergNov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Right, thanks! Must have been some remnant of some debugging confusion...

numpy/_core/function_base.pyShow resolvedHide resolved

numpy/lib/_function_base_impl.pyShow resolvedHide resolved

seberg changed the title~~DISCUSS,ENH: Create helper for conversion to arrays~~ENH: Create helper for conversion to arrays

Nov 29, 2023

seberg added this to the2.0.0 release milestone

Nov 29, 2023

ngoldbaum reviewed

Nov 29, 2023

View reviewed changes

Copy link

Member

ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I looked this over and spotted a few minor things but no showstoppers.

numpy/_core/src/multiarray/ctors.c Outdated

		* This means it was:
		* - Recognized as a scalar by a/the dtype. This can be DType specific,
		* for example a tuple may be a scalar, but only for structured dtypes.
		* - Anything not recognized a scalar (mapped to a DType) but also not

Copy link

Member

ngoldbaumNov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change

	*-Anythingnotrecognizedascalar (mappedtoaDType)butalsonot
	*-AnythingnotrecognizedasaninstanceofaDType's scalar type but also not

I got tripped up on the original wording - take it or leave it, but if you leave it, it should berecognized as a scalar

numpy/_core/src/multiarray/array_converter.c OutdatedShow resolvedHide resolved

numpy/_core/_add_newdocs.py Outdated

		Helper to convert one or more objects to arrays. Integrates machinery
		to deal with the ``result_type`` and ``__array_wrap__``.

		The reason for this is that e.g. ``result_type`` needs to covert to arrays

Copy link

Member

ngoldbaumNov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change

	Thereasonforthisisthate.g.``result_type``needstocoverttoarrays
	Thereasonforthisisthate.g.``result_type``needstoconverttoarrays

mhvk reviewed

Nov 30, 2023

View reviewed changes

Copy link

Contributor

mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think what's left is only typos and some suggestions for the new docstrings!

numpy/_core/_add_newdocs.py Outdated

		wrap(arr, /, to_scalar=None)

		Call ``__array_wrap__`` on the input ``arr``. When ``arr`` is already
		the correct subclass, this may be a no-op.

Copy link

Contributor

mhvkNov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think one should be explicit about what is done: "this may be a no-op" -> "no wrapping is done". Or maybe rewrite the note:

Call ``__array_wrap__`` on ``arr`` if ``arr`` is not the same subclassas the input the ``__array_wrap__`` method was retrieved from.

numpy/_core/_add_newdocs.py Outdated

		add_newdoc('numpy._core._multiarray_umath','_array_converter', ('scalar_input',
		"""
		A tuple which is ``True`` if the original inputs were scalars and
		``False`` otherwise. Scalars are all objects which are coerced to a

Copy link

Contributor

mhvkNov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

How about

A tuple which indicates for each input whether it was a scalar thatwas coerced to a 0-D array (and was not already an array or somethingconverted via a protocol like ``__array__()``).

numpy/_core/_add_newdocs.py Outdated

		``False`` otherwise. Scalars are all objects which are coerced to a
		0-D array but are not already arrays or converted via one of the protocols.

		.. note::

Copy link

Contributor

mhvkNov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

My sense would be to delete this note: it probably adds more confusion than it solves...

numpy/_core/_add_newdocs.py Outdated

		----------
		arr : ndarray
		The object to be wrapped. Ideally, this is always an ndarray
		or subclass, although NumPy scalars are accepted as of now by

Copy link

Contributor

mhvkNov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

How about

The object to be wrapped. Normally an ndarray or subclass,although for backward compatibility NumPy scalars are also accepted(these will be converted to a NumPy array before being passed on tothe ``__array_wrap__`` method).

numpy/_core/_add_newdocs.py Outdated

		When ``True`` will convert a 0-d array to a scalar via ``result[()]``
		(with a fast-path for non-subclasses). If ``False`` the result should
		be an array-like (as ``__array_wrap__`` is free to return a non-array).
		By default (``None``), a scalar is returned if all inputs are scalar.

Copy link

Contributor

mhvkNov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

are scalar -> were scalar

numpy/_core/src/multiarray/array_converter.cShow resolvedHide resolved

numpy/_core/src/multiarray/array_converter.c OutdatedShow resolvedHide resolved

numpy/_core/src/multiarray/array_converter.c Outdated

		PyObject*obj=self->items[i].object;
		if (PyArray_CheckExact(obj)) {
		if (wrap==NULL\|\|priority<0) {
		Py_INCREF(Py_None);

Copy link

Contributor

mhvkNov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Strictly, you don't need to set the wrap here, given theif (wrap == NULL) clause at the end. (same for scalar)

numpy/lib/_arraysetops_impl.pyShow resolvedHide resolved

numpy/lib/_function_base_impl.pyShow resolvedHide resolved

Copy link

Contributor

mhvk commentedNov 30, 2023

p.s. Probably time for some squashing of commits too...

Copy link

MemberAuthor

seberg commentedNov 30, 2023

I don't want to jinx this because of it, but the[()] logic for subclasses doesn't work for currentmemmap (which incorrectly always does it).
Now that is maybe even never used right now, but it makes me wonder if we need another approach, which might mean passingscalar=True in__array_wrap__!?

Copy link

Contributor

mhvk commentedNov 30, 2023

I'm a bit confused: why would it not work formemmap? It doesn't hurt to do[()] twice, does it? (I do agree we can and perhaps should adjustmemmap.__array_wrap__ to not create scalars itself, but that seems independent of this PR).

That said, you are right that in principle it would be better to give__array_wrap__ a chance to react to whether or not a scalar is expected, instead of using a heuristic as you have now. And I guess 2.0 is the right time for such an API change. Though on the other hand,__array_wrap__ still is somewhat out-of-date API and generally classes have to be able to deal with[()] anyway. So, perhaps we do not need to worry?

Copy link

Member

ngoldbaum commentedNov 30, 2023

see#25278 which was reported with impressive timing for this PR

seberg mentioned this pull request

Dec 8, 2023

API: Return type of isin() for scalar inputs#25290

Open

seberg mentioned this pull request

Dec 17, 2023

API,MAINT: Reorganize array-wrap calling and introducereturn_scalar#25409

Merged

seberg force-pushed thecreation-helper branch from2ad57cf to397445fCompare

December 19, 2023 10:34

Copy link

Contributor

mhvk commentedDec 19, 2023

I'm still not quite sold onreturn_scalar, but I'm also realizing I've lost what little overview I had of the problems that are caused by the various steps. This makes me wonder if it would be better to have a PR first that creates a helper that simply keeps the status quo, i.e., makes no attempt to be clever on wrapping, just reproducing the__array_wrap__ calls that are present now.

seberg added a commit that referenced this pull request

Jan 20, 2024

API,MAINT: Reorganize array-wrap calling and introduce ``return_scala…

6bd3abf

…r`` (#25409)This reorganize how array-wrap is called. It might very mildly change the semantics for reductions I think (and for negative priorities).Overall, it now passes a new return_scalar=False/True when calling __array_wrap__ and deprecates any array-wrap which does not accept arr, context, return_scalar.I have not integrated it yet, but half the reason for the reorganization is to integrate it/reuse it in the array_coverter helper PR (gh-24214), which stumbled over trying to make the scalar handling sane.Forcing downstream to add return_scalar=False to the signature is a bit annoying, but e.g. our memory maps currently try to guess at it, which seems bad. I am hoping, this can be part to making the scalar vs. array return more sane.But, maybe mainly, I hope it consolidates things (together withgh-24124 mainly, as if ufuncs were the only complex place we used this, it wouldn't matter much).---* API: Reorganize `__array_wrap__` and add `return_scalar=False`This also deprecates any `__array_wrap__` which does not accept`context` and `return_scalar`.* BUG: Fix niche bug in rounding.* MAINT: Adjust __array_wrap__ in code and tests (also deprecation test)* MAINT: Use/move the simplest C-wrapping also* DOC: Update doc and add release note* STY: Make linter happy in old tests* MAINT: Silence GCC warning (value cannot really be used uninitialized)* MAINT: Small style fixes* BUG: Fix reference leak in ufunc array-wrappingThis probably doesn't fix the 32bit issue unfortunately, only the windows ones...* BUG: Fix leak for result arrays in all ufunc calls* Ensure we try passing context and address related smaller review comments* Ensure we don't try `context=None` and expand code comment* Rely on return_scalar always being right (and style nit)* Remove outdated comments as per review* Let's just undo force-wrap for now for reductions (its a change...)* ENH: Chain the original error when the deprecationwarning is raisedDoing this due togh-25635 since it is super confusing with thebad retrying...* BUG,MAINT: Address Martens review comments

seberg force-pushed thecreation-helper branch from397445f toccab1afCompare

January 20, 2024 21:21

seberg added3 commits

January 20, 2024 22:29

MAINT: Allow PyArray_FromAny_int to signal if the input was a scalar

27fa00b

ENH: Add private array_conversion helper

b194bdd

This helper allows dealing with result_type and array wrappingwithout explicitly holding on to the original objects andespecially potentially converting them many times.

MAINT: Use new converter to fix promotions and avoid manual wrapping

04444cd

This doesn't quite fix all places, but it does most (most prominentlylinalg also does some quite awkward wrapping).

seberg force-pushed thecreation-helper branch fromccab1af to04444cdCompare

January 20, 2024 21:34

Copy link

MemberAuthor

seberg commentedJan 20, 2024

@mhvk and@ngoldbaum, I merged main (i.e. the array-wrap helper) and applied all the remaining doc suggestions (thanks, they were great!). Then squashed it down to three commits.

The reason will be displayed to describe this comment to others.Learn more.

Could this beif (item->scalar_input && npy_mark...)? If so, maybe combine with l.93-95,

if (item->scalar_input) {    if (npy_mark...) {        ...    }    else {        ...    }}else {  /* non-scalar input */    self->flags &= ~(NPY_CH_ALL_PYSCALARS | NPY_CH_ALL_SCALARS);}

Copy link

MemberAuthor

sebergJan 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, that is correct, and I adopted it, but not as suggested here. The pretty-code annoyance is thatitem->descr needs to be dealt with, so whatever you do, you have some duplication.

But I agree, it isn't super nice to split this from L93, and maybe easier to grok if it is clear thatnpy_mark_tmp_array_if_pyscalar does indeed requirescalar_input.

numpy/_core/_add_newdocs.py Outdated

		float, or complex.
		""")

		add_newdoc('numpy._core._multiarray_umath','_array_converter', ('scalar_input',

Copy link

Contributor

mhvkJan 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I thinkscalar_input is not used anywhere in the code. Do you have cases in mind where it would be useful? It sort-of feels like it should be, but if there is no use, should it be exposed? (Fine to leave in, just asking.)

Also, right now it is a property. Might it make sense to make it a method (with no arguments), so that one could extend it if needed? (E.g., to ask not just for scalar, but python scalar inputs.)

But perhaps all this just argues for keeping_array_converter private for now, so we can see what is best!

Copy link

Member

ngoldbaumJan 22, 2024

Choose a reason for hiding this comment

Copy link

MemberAuthor

seberg commentedJan 21, 2024

Addressed the smaller comments, thanks.

The API issues are not very clear to me. I really like this type of feature, but it is quite fuzzy to me how it should look like and which details are useful (enough).

That is why I opted for private for now: I am not comfortable with saying that this API seems good (enough) unless others give very clear +1's.

Generally, I would like this or something similar to be public, because I think it can be generally useful and also specifically because I think it is more useful with NEP 50 if downstream wants to be pedantic.

From a maintenance burden I am not super worried even if we deprecate it again soonish, we would name squatnp.lib.array_utils.array_converter, though.

If wekeep things private, I somewhat like just keeping those odd features. I can see thescalar_input to be useful at least in niches like a python searchsorted like:

conv = array_converter(haystack, needle)haystack, needle = conv# The needle decides whether a scalar result makes sense:return conv.wrap(result, to_scalar=conv.scalar_input[1])

I am not sure that we will use it (in fact I doubt it in the near future), but I also like the idea of "advertising" that this is possible.
(Part of why it is there, was that it is very easy and I also thought it may be useful for the python scalar preservation, which the conversion already does now and it seems to work well.)

If we lean towardspublic API, I fully agree to ditch them all until it is very clear we or downstream use them. Those odd features shouldn't stop us from making this public!

mhvk approved these changes

Jan 21, 2024

View reviewed changes

Copy link

Contributor

mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

OK, code looks good now! On the API, I'm fine with keeping it private for now and seeing what ends up being useful and what not. I'll approve now, but won't merge to give@ngoldbaum a chance to look.

p.s. Note that the linter failed.

Copy link

Member

ngoldbaum commentedJan 21, 2024

I'll look at this tomorrow.

Copy link

MemberAuthor

Member

ngoldbaum commentedJan 22, 2024

I figure it's better to keep the linter happy so people don't get surprise linter corrections on future PRs. Thanks!

ngoldbaum merged commit55afe35 intonumpy:main

Jan 22, 2024

Labels

01 - Enhancement

Movatterモバイル変換

Uh oh!

ENH: Create helper for conversion to arrays#24214

ENH: Create helper for conversion to arrays#24214

Uh oh!

Conversation

seberg commentedJul 19, 2023

Uh oh!

mhvk commentedJul 19, 2023

Uh oh!

mdhaber commentedJul 19, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

seberg commentedJul 20, 2023

Uh oh!

rgommers commentedJul 20, 2023

Uh oh!

seberg commentedJul 20, 2023

Uh oh!

mhvk commentedJul 20, 2023

Uh oh!

mdhaber commentedAug 12, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

seberg commentedAug 12, 2023

Uh oh!

mdhaber commentedAug 12, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

seberg commentedAug 12, 2023

Uh oh!

mdhaber commentedAug 12, 2023

Uh oh!

seberg commentedAug 12, 2023

Uh oh!

seberg commentedNov 3, 2023

Footnotes

Uh oh!

sebergNov 23, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhvk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

mdhaber commentedJul 19, 2023•
edited
Loading

mdhaber commentedAug 12, 2023•
edited
Loading

mdhaber commentedAug 12, 2023•
edited
Loading

sebergNov 23, 2023•
edited
Loading