Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

ENH: Create helper for conversion to arrays#24214

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
ngoldbaum merged 4 commits intonumpy:mainfromseberg:creation-helper
Jan 22, 2024

Conversation

@seberg
Copy link
Member

Had to give this a start to be able to experiment. It does solve at least some problems (e.g.np.select() use-case and it looks like it would fix and/or simplify linalg code).

The main thing is that we often convert multiple arrays, and then want something like:

  • a, b, c = np.asarray(a), np.asarray(b), np.asarray(c) and thendtype = np.result_type(a, b, c) often with an additional0.0. That is a pretty common pattern, NEP 50 makes it tricky if you wish to support weak scalars, because you need to callresult_type with the scalar information intact (before theasarray or manually undo). (For example, this happens innp.select())
    • Maybe libraries can disregard that, but for NumPy it's hard, since you want many functions to be ufunc-like, so if ufuncs do it, others probably should be able to/try to.
  • Maybe mainly internally we sometimes need the__array_wrap__ for subclass support, although it seems consistently used for single input functions (outside ufuncs).
    • Additional motif: A clean wrapping pattern seems required for making scalar logic sane.
  • It may be useful to know if an input was a scalar or not (not a 0-D array). You need to ask NumPy about it, but it discovered at conversion time. So it is an information that is weird to tag onto an array (probably?) but figuring it out later would require duplicating work.

First, I wanted to just make a single function. But with multiple things above that may be interesting to return, and e.g.result_type() might error and I am not sure we always need it.

So, is it insane to create a rich object (rather than a complicated return value and many kwargs):

# better names of course, currently in `np.core._multiarray_umath.ArrayCreationHelper`arrays = creation_helper(*inputs)# e.g. a linspace could do something like this:dtype = arrays.common_dtype(ensure_inexact=True)arrays.arrays  # arrays as tuple (probably allow arrays[0])# Apply the typical wrapping if needed# (should cache wrap, or fetch it early on rather, maybe just# make it a method which applies the wrapping `arrays.wrap(result)`).if arrays.wrap is not None:    return arrays.wrap(result)return result

(I can seea, b, info = helper(a, b) as a nice pattern; OTOH it makes including casting kwarg heavy since you somewhat need to include it in the first step.)

Further things might include:

  • Casting to a given or the common dtype. (maybearrays.to_common_dtype(ensure_inexact=True))
  • I think@mdhaber was hoping to include broadcasting. Not sure, since it is one call away, but also seems fine to add.

@sebergseberg changed the titleENH: Create helper for conversion to arraysDISCUSS,ENH: Create helper for conversion to arraysJul 19, 2023
@mhvk
Copy link
Contributor

I like this idea. Following the broadcasting suggestion, it may be nice to try to be reasonably compatible with whatufuncs do generally. I guess that might include an optionalout argument (and correspondingout attribute) or perhaps instead a methodoutput_arrays(out=None) that would create the array with the right dtype and subclass, or check if it is OK if an array is given. Taking this perhaps too far, one could similarly have anaxis argument/method as well (which would just usenormalize_axis on the broadcasted shape under the hood).

p.s. Might it be possible to factor out theufunc machinery (rather that write this from scratch)? Or indeed use something like this inside theufuncs?

@mdhaber
Copy link
Contributor

mdhaber commentedJul 19, 2023
edited
Loading

I think@mdhaber was hoping to include broadcasting. Not sure, since it is one call away, but also seems fine to add.

That would be nice. It seems like a common enough follow-up step that it would be a convenient addition to this convenience function.

So, is it insane to create a rich object...

We do this frequently in SciPy now, so not insane. Seems unusual for NumPy, but I wouldn't complain.

  1. Idea: maybe instead of an extra parameterensure_inexact=True, couldextra_dtype accept classes likenp.inexact, if that makes sense?

  2. I'm not used to reading C. Would it be easy to add a docstring or the equivalent sequence of existing NumPy commands so I can check my understanding of what is going on?

  3. We had also discussed the possibility of addingaxis to be ignored whenbroadcasting for use in, say, multi-sample reduction statistics. For example, inttest_ind, it is sensible forx.shape == (5,) andy.shape == (3, 4) withaxis=-1; when the arrays are broadcasted, the shapes become(3, 5) and(3, 4). I implemented this already in SciPy, but should I open an issue about it being moved upstream?

Thanks for this!

@seberg
Copy link
MemberAuthor

Yeah, agree that the ufunc machinery should share as much as possible, although was a bit hoping to mainly consolidate Python paths first. The reason was that looking at the array-wrap implementation in ufuncs they are messy and buggy (there is an issue), so fixing that seems like a bit of a messy change easier on its own.

I think a method for output array prep could make sense, adding it to the initial call seems like it would overload meaning (we don't know the common dtype is right for one thing).

Idea: maybe instead of an extra parameter ensure_inexact=True, could extra_dtype accept classes like np.inexact, if that makes sense?

Right, that is how it is implemented. Justdtype=np.inexact doesn't quite work yet and I thought theresult_type(..., 0.) pattern is so common, it might actually make sense to have it explicitly.

  1. [discussed] adding axis to be ignored when broadcasting for use in, say, multi-sample reduction statistics [...]

Ah, right! Yeah, maybe make its own issue, it seems to me we should with integrating that intonp.broadcast_arrays first. Admittedly, I think that may require reimplementing the whole function (maybe for better).

@rgommers
Copy link
Member

@seberg from your description it is not entirely clear to me if you mean for this to be a public or private function? With the namespace we discussed somewhere else,np.lib.array_utils I believe, we should get a place to put utilities like this - but I can't tell from the description yet if this would be useful more widely.

Since it was mentioned above, we recently came across the need fornormalize_axis_index in SciPy - that's also a useful utility, which is not public right now.

@seberg
Copy link
MemberAuthor

The first issue I want to solve would mean private is fine, but yes suspect we should just make it public and if we are unsure about some methods/attributes or so give it an_ (which would mean putting it intonp.lib.array_utils namespace, or different name if anyone suggests a better one soon).
(And@mhvk's interest makes me lean more towards public at last if I can get some of what@mdhaber wants too.)

@mhvk
Copy link
Contributor

Agreed that the output makes more sense as a separate method, rather than on input, especially since in most cases something has to be generated.

Also, 👍 to makingnormalize_axis_index a publicly accessible tool (we already import it...); probably all ofshape_base can go into the same collection too.

p.s. A better name thancreation_helper would still be good. Will try to think a bit more.

p.p.s. You'll not be surprised to hear that I think that by default subclasses should be passed through... But it may be good to think a bit more generally what to do with array mimics - the benefit of gettingresult_type depends only on having adtype, not on being anndarray (subclass) (and similarly for a possible broadcast shape). Though easiest perhaps for that is just to let this be overridden by__array_function__.

seberg reacted with thumbs up emoji

@mdhaber
Copy link
Contributor

mdhaber commentedAug 12, 2023
edited
Loading

I just ran into this little gem , which might have been avoided with use of a helper like this.

np.float16([2])<np.float32(2.0009766)# array([False])np.float16([2])<np.float32([2.0009766])# array([ True])

@seberg
Copy link
MemberAuthor

You might think that helps, but it probably wouldn't :). This is NEP 50 stuff.

@mdhaber
Copy link
Contributor

mdhaber commentedAug 12, 2023
edited
Loading

If before comparison the float16 is promoted to the result type or if the float32 is broadcasted to the result shape, the comparison is done correctly. So I think the helper would help with those things.

@seberg
Copy link
MemberAuthor

@mdhaber no,np.result_type() would do the samedown promotion in the above case until you do NEP 50 .

@mdhaber
Copy link
Contributor

I always pass the dtypes intoresult_type instead of the arrays.

import numpy as npx = np.float16([2])y = np.float32(2.0009766)print(x < y)x1, y1 = np.broadcast_arrays(x , y)print(x1 < y1)dtype = np.result_type(x.dtype, y.dtype)print(x.astype(dtype) < y.astype(y))
[False][ True][ True]

@seberg
Copy link
MemberAuthor

Right, if you pass the dtypes you get the right thing, but that is the same as adopting NEP 50.

@seberg
Copy link
MemberAuthor

I was trying to get back to this a bit, but having difficulties to quite formulate what we need/want here. I would prefer to not go the feature-creep route for starters.

Things that I would like:

  • Allow converting multiple arrays at once more conveniently (easy).
  • The result should havecommon_dtype() as it is pretty useful especially with NEP 50.
  • The result should have awrap() to apply__array_wrap__() when it makes sense.1
  • We could have more options:
    • shape (braodcast shape)
    • Abroadcast() method to modify the "arrays"` (probably needs a local implementation for starters, which then might be adoptable elsewhere). Broadcastingcould be a kwarg also.

There are some options, which would probably fit as kwargs2 :

  • subok=True (to set it toFalse)
  • broadcastingcould be a kwarg.

Another API, I worry about more: it would be nice to provide a clean way to preserve Python better. This is to allow later normal promotion to use NEP 50 rules.
Due to theisclose and similar use-case, it also seems to me that in many cases it would make sense for broadcasting tonot modify scalars (or 0-D objects).
For that, it would make sense to either have a kwargpreserve_scalars=True or an attributearrays_or_scalars?

Maybe one solution is to make it:

conv=array_converter(*arrays)dt=conv.common_dtype()# always applicable, may do initial conversion to arraysconv.shape# could do this to get the broadcast shape, but could also defer.conv.was_pyscalar# maybe.  Or an enum with `0-D, scalar, python scalar`.arrays=conv.arrays(subok=True,allow_scalars=True,broadcast=True,dtype=dt)# Wrap result (if wanted) (see footnote 2) [^2]returnconv.wrap(result)

Any thoughts? Implementing any of this isn't super hard, but getting more clarity on where to start without a feature creap, is for me.


I think, I like that last thought as API (initialization takes no kwargs, but fetching the arrays is a function and not an attribute/property). But it isn't quite clear to me. Especially how to do scalar preservation.

Footnotes

  1. We currently mainly/only seem to use wrap ifsubok=False, so the details here are a bit tricky (should you ever call it when you already got a subclass?).
    However, one reason I would like a clear entry-point is that it would helpa lot thinking aboutENH: No longer auto-convert array scalars to numpy scalars in ufuncs (and elsewhere?) #24897, since it could take a kwarg likepreserve_scalars=True.

  2. kwargs are slow for classes due totp_new pre Python 3.13, but lets not worry about it

stop=asanyarray(stop)*1.0
conv=_array_converter(start,stop)
start,stop=conv.as_arrays(subok=True)
dt=conv.result_type(ensure_inexact=True)
Copy link
MemberAuthor

@sebergsebergNov 23, 2023
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Iterated a bit, this is still a bit rough (I would like to consolidate__array_wrap__ a bit at least).

This does defineconv.wrap() as the one true way to call__array_wrap__, but for NEP 50, the interesting part is really this part.
(I haved also moved the cast to float into the subtract viadtype=, which isn't strictly necessary.)

The point is, that this meansnp.linspace(np.float32(3), 10) returns float32. (yes, still needs tests, but thought I would note this, as this pattern would hopefully be useful a few places, that are currently subtly broken in the same way.)

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@mhvk beyond cleanup, part of my thought for consolidating is that it should help withgh-24897. Because wrapping is a reasonable place to decide scalar vs. array logic.

Copy link
Contributor

@mhvkmhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@seberg - now had a better look. I think this is really nice and it should be both relatively straightforward to extend and to use some of it inside the ufuncs.

One suggestion: it might be nice ifconv can be indexed like a tuple. That should then return the "anyarray" version (in which case the defaultsubok=False foras_arrays() makes sense; without this, I feelsubok=True would be a more logical default).

If not doing that, I would suggest to replaceary, = conv.as_arrays() withary = conv.as_arrays()[0] for clarity (the trailing comma before the equal sign is easy to miss...).

Py_ssize_tnarrs_ssize_t= (args==NULL) ?0 :PyTuple_GET_SIZE(args);
intnarrs= (int)narrs_ssize_t;
/* Limit to 64 for now to use in a few places. */
if (narrs_ssize_t>64) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

More logical to useNPY_MAXARGS?

item->DType=NPY_DTYPE(PyArray_DESCR(item->array));
Py_INCREF(item->DType);

if (!npy_mark_tmp_array_if_pyscalar(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think I'd swap the branches to avoid the!

Py_INCREF(item->descr);
}
else {
/* just clear the flag again */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Well, swap only if this is really not needed! This would seem to preclude using it later, is that intended? Or is->was_scalar exactly the same (i.e., python scalars only)?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I also leavedescr empty. Se yes, it is just the same. Just wanted to go through that, because that is wehre other paths go through.

res_item=item->object;
}
else {
res_item= (PyObject*)item->array;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Probably premature optimization, but it would seem logical not to even create the array from a python scalar unless it is actually needed.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

yes, but that requires splitting up the creation, so I indeed thought it premature.

This is one of the things that we could do in the ufunc if and only if we delete the "promotion state" (to allow setting it to warn for NumPy 2).

creation_helper_dealloc(PyArrayCreationHelperObject*self)
{
creation_item*item=self->items;
for (inti=0;i<self->narrs;i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

you need anitem++ here too! (or doself->items[i])

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ouch, luckily, probably the reason the sanitizer job failed...

creation_helper_as_arrays(PyArrayCreationHelperObject*self,
PyObject*const*args,Py_ssize_tlen_args,PyObject*kwnames)
{
npy_boolsubok=NPY_FALSE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think this will replace moreasanyarray thanasarray, so maybe make the defaultNPY_TRUE? In general, subclasses should be OK...

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

OK, let me change it. In practice it isn't a big change. For a bit I was thinking "whyasanyarray at all?!
I am not actually sure it is more useful for most places, but since theconv.wrap() is a no-op (normally) for those now, I also don't really care to prefer it.

The point forasanyarray() is allowing functions to work due to composition of other functions/methods working out, when simple wrapping might not propagate info well.
In practice, I do wonder how often that is actually well advised (compared to using__array_function__ and dealing with it manually). The point being: if the class is not trivial enough for just wrapping, maybe it should use__array_function__. OTOH, I can see a small amount of pragmatism in it, so fine with preferring it.

"""
conv=_array_converter(ary)
ary,=conv.as_arrays(subok=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybeary = conv.as_arrays(subok=True)[0].ravel().

It does feel one should just be able to indexconv (effectively making it a bit like a subclass of tuple).
Then it would beary = conv[0].ravel().

# matrices have to be transposed first, because they collapse dimensions!
out_arr=transpose(buff,buff_permute)
returnres.__array_wrap__(out_arr)
res=transpose(buff,buff_permute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

For subclasses other thanmatrix, this would be an API change if.transpose() is overridden. That seems rather unlikely (and wrong-headed), so I'm fine with making the change regardless...

seberg reacted with thumbs up emoji
Copy link
MemberAuthor

@sebergseberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I have addressed all the comments and changed the default forsubok (I was torn on that initially, but I made my peace with it).

I allow indexing now (which also allowsa, b = conv unpacking, although calling the method is actually a tiny bit faster (the reason is that unpacking checksconv[length] to trigger the index error).


I have a start on reorganizing the ufuncs to share some infrastructure for array-wrap at least. But decided that shouldn't stop us here, since it mostly means moving thefind_wrap function out to share it and the ufunc changes are not quite utterly trivial.

res_item=item->object;
}
else {
res_item= (PyObject*)item->array;
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

yes, but that requires splitting up the creation, so I indeed thought it premature.

This is one of the things that we could do in the ufunc if and only if we delete the "promotion state" (to allow setting it to warn for NumPy 2).

Py_INCREF(item->descr);
}
else {
/* just clear the flag again */
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I also leavedescr empty. Se yes, it is just the same. Just wanted to go through that, because that is wehre other paths go through.

}
else {
/*
* `__array_wrap__` expects an array argument, so we ensure that.
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Well, I think that scalars remain common in some code paths (for the time being at least). But rephrased the comment.

creation_helper_dealloc(PyArrayCreationHelperObject*self)
{
creation_item*item=self->items;
for (inti=0;i<self->narrs;i++) {
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ouch, luckily, probably the reason the sanitizer job failed...

creation_helper_as_arrays(PyArrayCreationHelperObject*self,
PyObject*const*args,Py_ssize_tlen_args,PyObject*kwnames)
{
npy_boolsubok=NPY_FALSE;
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

OK, let me change it. In practice it isn't a big change. For a bit I was thinking "whyasanyarray at all?!
I am not actually sure it is more useful for most places, but since theconv.wrap() is a no-op (normally) for those now, I also don't really care to prefer it.

The point forasanyarray() is allowing functions to work due to composition of other functions/methods working out, when simple wrapping might not propagate info well.
In practice, I do wonder how often that is actually well advised (compared to using__array_function__ and dealing with it manually). The point being: if the class is not trivial enough for just wrapping, maybe it should use__array_function__. OTOH, I can see a small amount of pragmatism in it, so fine with preferring it.

@sebergseberg marked this pull request as ready for reviewNovember 28, 2023 14:12
Copy link
Contributor

@mhvkmhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks mostly great now! Just a few remaining comments.

One comment: with the indexing giving any array, I think it is actually OK to havesubok=False as default foras_arrays(). Though also completely happy to keep it as you have it now - in principle, subclasses should behave like regular ones, so it should be fine to keep them (and it is nice if the default actually corresponds to the least amount of work, which is the case forsubok=True since you are not viewing as a regular array).

}
}

Py_INCREF(self);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Are you sure you need theINCREF here?PyObject_NewVar returns a new reference already (which youDECREF in the fail path).

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Right, thanks! Must have been some remnant of some debugging confusion...

returnNULL;
}

/* When we only have scalars, the default is to retur an array: */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

retur ->return

seberg reacted with thumbs up emoji
@mhvk
Copy link
Contributor

p.s. On theasanyarray, beyond the rationale that subclasses should behave the same way and thus not be converted (of course, they don't always do...), it is certainly nice if generally they get passed on, since then__array_ufunc__ coverage often is good enough for functions to work. ForQuantity at least, there are still a lot of functions where we do not have to override; seehttps://github.com/astropy/astropy/blob/main/astropy/units/quantity_helper/function_helpers.py (theSUBCLASS_SAFE_FUNCTIONS set; as you'll notice, there are also a lot that do need overrides, which means a very long list of rather boilerplate code that is little fun to write, so all the better if some functions don't need it!)

Copy link
MemberAuthor

@sebergseberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yeah, aboutSUBCLASS_SAFE_FUNCTIONS: it would be great if we could formalize this in some form and document the pattern. But the best would be to generalize it to non-subclasses somehow, which might be just callingfunc._implementation() rather thanndarray.__array_function__.

with the indexing giving any array, I think it is actually OK to have subok=False as default for

I agree more with the latter part, unless we hadasanyarray also: align default and short-hand. Plus, if you want a real nudge towardsubok having the opposite default is no good.

The main reason fornot doing it was maybe always matrices (and probably historic). But we should hopefully worry less and less about those.

Also added a start for documentation.

}
}

Py_INCREF(self);
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Right, thanks! Must have been some remnant of some debugging confusion...

@sebergseberg changed the titleDISCUSS,ENH: Create helper for conversion to arraysENH: Create helper for conversion to arraysNov 29, 2023
@sebergseberg added this to the2.0.0 release milestoneNov 29, 2023
Copy link
Member

@ngoldbaumngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I looked this over and spotted a few minor things but no showstoppers.

* This means it was:
* - Recognized as a scalar by a/the dtype. This can be DType specific,
* for example a tuple may be a scalar, but only for structured dtypes.
* - Anything not recognized a scalar (mapped to a DType) but also not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
*-Anythingnotrecognizedascalar (mappedtoaDType)butalsonot
*-AnythingnotrecognizedasaninstanceofaDType's scalar type but also not

I got tripped up on the original wording - take it or leave it, but if you leave it, it should berecognized as a scalar

seberg reacted with thumbs up emoji
Helper to convert one or more objects to arrays. Integrates machinery
to deal with the ``result_type`` and ``__array_wrap__``.
The reason for this is that e.g. ``result_type`` needs to covert to arrays
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
Thereasonforthisisthate.g.``result_type``needstocoverttoarrays
Thereasonforthisisthate.g.``result_type``needstoconverttoarrays

Copy link
Contributor

@mhvkmhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think what's left is only typos and some suggestions for the new docstrings!

wrap(arr, /, to_scalar=None)
Call ``__array_wrap__`` on the input ``arr``. When ``arr`` is already
the correct subclass, this may be a no-op.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think one should be explicit about what is done: "this may be a no-op" -> "no wrapping is done". Or maybe rewrite the note:

Call ``__array_wrap__`` on ``arr`` if ``arr`` is not the same subclassas the input the ``__array_wrap__`` method was retrieved from.

seberg reacted with thumbs up emojiseberg reacted with rocket emoji
add_newdoc('numpy._core._multiarray_umath','_array_converter', ('scalar_input',
"""
A tuple which is ``True`` if the original inputs were scalars and
``False`` otherwise. Scalars are all objects which are coerced to a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

How about

A tuple which indicates for each input whether it was a scalar thatwas coerced to a 0-D array (and was not already an array or somethingconverted via a protocol like ``__array__()``).

seberg reacted with thumbs up emoji
``False`` otherwise. Scalars are all objects which are coerced to a
0-D array but are not already arrays or converted via one of the protocols.
.. note::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

My sense would be to delete this note: it probably adds more confusion than it solves...

seberg reacted with thumbs up emoji
----------
arr : ndarray
The object to be wrapped. Ideally, this is always an ndarray
or subclass, although NumPy scalars are accepted as of now by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

How about

The object to be wrapped. Normally an ndarray or subclass,although for backward compatibility NumPy scalars are also accepted(these will be converted to a NumPy array before being passed on tothe ``__array_wrap__`` method).

seberg reacted with thumbs up emoji
When ``True`` will convert a 0-d array to a scalar via ``result[()]``
(with a fast-path for non-subclasses). If ``False`` the result should
be an array-like (as ``__array_wrap__`` is free to return a non-array).
By default (``None``), a scalar is returned if all inputs are scalar.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

are scalar -> were scalar

PyObject*obj=self->items[i].object;
if (PyArray_CheckExact(obj)) {
if (wrap==NULL||priority<0) {
Py_INCREF(Py_None);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Strictly, you don't need to set the wrap here, given theif (wrap == NULL) clause at the end. (same for scalar)

@mhvk
Copy link
Contributor

p.s. Probably time for some squashing of commits too...

@seberg
Copy link
MemberAuthor

I don't want to jinx this because of it, but the[()] logic for subclasses doesn't work for currentmemmap (which incorrectly always does it).
Now that is maybe even never used right now, but it makes me wonder if we need another approach, which might mean passingscalar=True in__array_wrap__!?

@mhvk
Copy link
Contributor

I'm a bit confused: why would it not work formemmap? It doesn't hurt to do[()] twice, does it? (I do agree we can and perhaps should adjustmemmap.__array_wrap__ to not create scalars itself, but that seems independent of this PR).

That said, you are right that in principle it would be better to give__array_wrap__ a chance to react to whether or not a scalar is expected, instead of using a heuristic as you have now. And I guess 2.0 is the right time for such an API change. Though on the other hand,__array_wrap__ still is somewhat out-of-date API and generally classes have to be able to deal with[()] anyway. So, perhaps we do not need to worry?

@ngoldbaum
Copy link
Member

see#25278 which was reported with impressive timing for this PR

mhvk reacted with confused emojiWardBrian reacted with eyes emoji

@mhvk
Copy link
Contributor

I'm still not quite sold onreturn_scalar, but I'm also realizing I've lost what little overview I had of the problems that are caused by the various steps. This makes me wonder if it would be better to have a PR first that creates a helper that simply keeps the status quo, i.e., makes no attempt to be clever on wrapping, just reproducing the__array_wrap__ calls that are present now.

seberg added a commit that referenced this pull requestJan 20, 2024
…r`` (#25409)This reorganize how array-wrap is called. It might very mildly change the semantics for reductions I think (and for negative priorities).Overall, it now passes a new return_scalar=False/True when calling __array_wrap__ and deprecates any array-wrap which does not accept arr, context, return_scalar.I have not integrated it yet, but half the reason for the reorganization is to integrate it/reuse it in the array_coverter helper PR (gh-24214), which stumbled over trying to make the scalar handling sane.Forcing downstream to add return_scalar=False to the signature is a bit annoying, but e.g. our memory maps currently try to guess at it, which seems bad. I am hoping, this can be part to making the scalar vs. array return more sane.But, maybe mainly, I hope it consolidates things (together withgh-24124 mainly, as if ufuncs were the only complex place we used this, it wouldn't matter much).---* API: Reorganize `__array_wrap__` and add `return_scalar=False`This also deprecates any `__array_wrap__` which does not accept`context` and `return_scalar`.* BUG: Fix niche bug in rounding.* MAINT: Adjust __array_wrap__ in code and tests (also deprecation test)* MAINT: Use/move the simplest C-wrapping also* DOC: Update doc and add release note* STY: Make linter happy in old tests* MAINT: Silence GCC warning (value cannot really be used uninitialized)* MAINT: Small style fixes* BUG: Fix reference leak in ufunc array-wrappingThis probably doesn't fix the 32bit issue unfortunately, only the windows ones...* BUG: Fix leak for result arrays in all ufunc calls* Ensure we try passing context and address related smaller review comments* Ensure we don't try `context=None` and expand code comment* Rely on return_scalar always being right (and style nit)* Remove outdated comments as per review* Let's just undo force-wrap for now for reductions (its a change...)* ENH: Chain the original error when the deprecationwarning is raisedDoing this due togh-25635 since it is super confusing with thebad retrying...* BUG,MAINT: Address Martens review comments
This helper allows dealing with result_type and array wrappingwithout explicitly holding on to the original objects andespecially potentially converting them many times.
This doesn't quite fix all places, but it does most (most prominentlylinalg also does some quite awkward wrapping).
@seberg
Copy link
MemberAuthor

@mhvk and@ngoldbaum, I merged main (i.e. the array-wrap helper) and applied all the remaining doc suggestions (thanks, they were great!). Then squashed it down to three commits.

Ithink this should be pretty much ready now, since IIRC it was relatively settled. But happy to do another round if things come up of course.

Copy link
Contributor

@mhvkmhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This looks good! Looking at it again, I mainly wonder about methods and arguments that are defined but not actually used. Arguably, we should not have those without a clear use case... But as long as this is private, perhaps it is OK?

its memory, in which case you can traverse ``a.base`` for a memory handler.
""")

add_newdoc('numpy._core._multiarray_umath','_array_converter',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I wondered a bit about whether this needs to be private, as it seems generally useful. But I guess this allows possible changes before making it public?

result=wrap(result)
returnresult
conv=_array_converter(obj)
# As this already tried the method, subok is maybe quite reasonable here:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybe add# but this follows what was done before. TODO: revisit this. Or something else that makes perhaps clearer that it would be good to make this change (even if best not to do so in this PR!).

seberg reacted with thumbs up emoji
@@ -0,0 +1,471 @@
/*
* This file defines an _array_converter object used internally to NumPy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

"used internallyin Numpyto [deal ...]"

item->DType=NPY_DTYPE(PyArray_DESCR(item->array));
Py_INCREF(item->DType);

if (npy_mark_tmp_array_if_pyscalar(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Could this beif (item->scalar_input && npy_mark...)? If so, maybe combine with l.93-95,

if (item->scalar_input) {    if (npy_mark...) {        ...    }    else {        ...    }}else {  /* non-scalar input */    self->flags &= ~(NPY_CH_ALL_PYSCALARS | NPY_CH_ALL_SCALARS);}

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, that is correct, and I adopted it, but not as suggested here. The pretty-code annoyance is thatitem->descr needs to be dealt with, so whatever you do, you have some duplication.

But I agree, it isn't super nice to split this from L93, and maybe easier to grok if it is clear thatnpy_mark_tmp_array_if_pyscalar does indeed requirescalar_input.

float, or complex.
""")

add_newdoc('numpy._core._multiarray_umath','_array_converter', ('scalar_input',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I thinkscalar_input is not used anywhere in the code. Do you have cases in mind where it would be useful? It sort-of feels like it should be, but if there is no use, should it be exposed? (Fine to leave in, just asking.)

Also, right now it is a property. Might it make sense to make it a method (with no arguments), so that one could extend it if needed? (E.g., to ask not just for scalar, but python scalar inputs.)

But perhaps all this just argues for keeping_array_converter private for now, so we can see what is best!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The linter says this line is one character too long

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I didn't think we cared, but fair, pushed the fix (inclued it in the previous commit).

returnNULL;
}

/* When we only have scalars, the default is to return an array: */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I was again a bit confused about this, maybe it helps to say "We follow the default setting for as_arrays of CONVERT_IF_NO_ARRAY"

seberg reacted with thumbs up emoji
----------
subok : True or False, optional
Whether array subclasses are preserved.
pyscalars : {"convert", "preserve", "convert_if_no_array"}, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This argument is not used anywhere, so a bit of the same question as forscalar_input above. In particular, I wonder if it wouldn't be handier to pass this oninput (perhaps still allowing to override it here). But without a use case, probably not worth overthinking it now!

@seberg
Copy link
MemberAuthor

Addressed the smaller comments, thanks.

The API issues are not very clear to me. I really like this type of feature, but it is quite fuzzy to me how it should look like and which details are useful (enough).

That is why I opted for private for now: I am not comfortable with saying that this API seems good (enough) unless others give very clear +1's.

Generally, I would like this or something similar to be public, because I think it can be generally useful and also specifically because I think it is more useful with NEP 50 if downstream wants to be pedantic.

From a maintenance burden I am not super worried even if we deprecate it again soonish, we would name squatnp.lib.array_utils.array_converter, though.


If wekeep things private, I somewhat like just keeping those odd features. I can see thescalar_input to be useful at least in niches like a python searchsorted like:

conv = array_converter(haystack, needle)haystack, needle = conv# The needle decides whether a scalar result makes sense:return conv.wrap(result, to_scalar=conv.scalar_input[1])

I am not sure that we will use it (in fact I doubt it in the near future), but I also like the idea of "advertising" that this is possible.
(Part of why it is there, was that it is very easy and I also thought it may be useful for the python scalar preservation, which the conversion already does now and it seems to work well.)

If we lean towardspublic API, I fully agree to ditch them all until it is very clear we or downstream use them. Those odd features shouldn't stop us from making this public!

Copy link
Contributor

@mhvkmhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

OK, code looks good now! On the API, I'm fine with keeping it private for now and seeing what ends up being useful and what not. I'll approve now, but won't merge to give@ngoldbaum a chance to look.

p.s. Note that the linter failed.

seberg reacted with thumbs up emoji
@ngoldbaum
Copy link
Member

I'll look at this tomorrow.

@seberg
Copy link
MemberAuthor

FWIW, I rebased mytry-0d-preservation branch, and ended up using those features (not I am sure alternatives wouldn't be better). It actually seems surprisingly close to doing something reasonable.

Copy link
Member

@ngoldbaumngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I looked over the C code again and didn't spot any issues. Please push a fixup for the linter then this should be good to go.

float, or complex.
""")

add_newdoc('numpy._core._multiarray_umath','_array_converter', ('scalar_input',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The linter says this line is one character too long

@ngoldbaum
Copy link
Member

I figure it's better to keep the linter happy so people don't get surprise linter corrections on future PRs. Thanks!

@ngoldbaumngoldbaum merged commit55afe35 intonumpy:mainJan 22, 2024
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@ngoldbaumngoldbaumngoldbaum left review comments

@mhvkmhvkmhvk approved these changes

Assignees

No one assigned

Projects

None yet

Milestone

2.0.0 release

Development

Successfully merging this pull request may close these issues.

5 participants

@seberg@mhvk@mdhaber@rgommers@ngoldbaum

[8]ページ先頭

©2009-2025 Movatter.jp