Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

DEP: Deprecate setting the strides and dtype of a numpy array#28901

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
eendebakpt wants to merge14 commits intonumpy:main
base:main
Choose a base branch
Loading
fromeendebakpt:deprecate_array_stride_set

Conversation

eendebakpt
Copy link
Contributor

@eendebakpteendebakpt marked this pull request as draftMay 4, 2025 21:37
@eendebakpteendebakpt changed the titleDraft: DEP: Deprecate setting the strides and dtype of a numpy arrayDEP: Deprecate setting the strides and dtype of a numpy arrayMay 4, 2025
Copy link
Member

@sebergseberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Error paths are both incorrect and untested, so that is definitely required.

Other than that, should have a release note as well. But I guess you had it as draft for a reason :).

@@ -124,6 +124,11 @@ array_strides_set(PyArrayObject *self, PyObject *obj, void *NPY_UNUSED(ignored))
npy_intp upper_offset = 0;
Py_buffer view;

/* DEPRECATED 2025-05-04, NumPy 2.3 */
PyErr_WarnEx(PyExc_DeprecationWarning,
"Setting the strides on a Numpy array has been deprecated in Numpy 2.3.\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We could point tostrided_window_view andstride_tricks.as_strided. Although, not sure it is needed for this one.

msg = "Setting the strides on a Numpy array has been deprecated"
arr = np.arange(48)
with pytest.warns(DeprecationWarning, match=msg):
arr.strides = arr.strides
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Please move the tests totest_deprecations.py. They should test the error path as well (whichtest_deprecations machinery does for you, although you can do it explicitly also.).

@jorenham

This comment was marked as resolved.

@eendebakpt
Copy link
ContributorAuthor

eendebakpt commentedMay 5, 2025
edited
Loading

The.view() method sets the dtype and this is a bit hard to handle. InPyArray_View (seeconvert.c) first a new array is constructed withPyArray_NewFromDescr_int and then the dtype is updated withPyObject_SetAttrString. I replaced the call toPyObject_SetAttrString with a direct call to the code updating the dtype, bypassing the deprecation warning. This works fine for exact arrays, but gives issues for subclasses, in particular the masked array.

We could usePyObject_SetAttrString insidePyArray_View and catch the generated deprecation warning. This would hide changing the dtype inPyArray_View (but it is ok, there is only a single reference to the array), but users changing the dtype from python will get the deprecation warning.

Any other ideas on how to approach this?

Update: refactored the PR to use unique references for warning. This will exclude the calls from insidePyArray_View.

@eendebakpteendebakptforce-pushed thedeprecate_array_stride_set branch fromdf42838 to2fa5086CompareMay 5, 2025 21:22
@seberg
Copy link
Member

Any other ideas on how to approach this?
Update: refactored the PR to use unique references for warning. This will exclude the calls from inside PyArray_View.

Unique reference cheeck seems OK to me, also just to be nice for a bit. Overall, I think there is no reason not to pass thedtype already while creating the initial view (that currently has the same dtype), but not quite sure how that pans out.
My first guess would be that the only reason for this organization is thatarr.dtype = includes the necessary checks for whether a view is OK.

As for some newer code changes here, let's avoid any warning filtering (yes I guess on new Python it's threadsafe at least, but...). Either, this still works fine now that you did the refcount check, or we should just use aarr.view() which could be guarded bydtype != dtype.

(I am not certain the refcount checks will work on PyPy as is, so there might be a problem with using it for avoiding thview() change.)

@eendebakpteendebakptforce-pushed thedeprecate_array_stride_set branch fromaf05298 toc27f61aCompareMay 6, 2025 09:46
@eendebakpt
Copy link
ContributorAuthor

As for some newer code changes here, let's avoid any warning filtering (yes I guess on new Python it's threadsafe at least, but...). Either, this still works fine now that you did the refcount check, or we should just use aarr.view() which could be guarded bydtype != dtype.

The warning filtering is indeed not nice. But using a view won't work directly. For example for the recarray the dtype is updated in the array finalizer.

def__array_finalize__(self,obj):
ifself.dtype.typeisnotrecordandself.dtype.namesisnotNone:
# if self.dtype is not np.record, invoke __setattr__ which will
# convert it to a record if it is a void dtype.
withwarnings.catch_warnings():
# gh-28901
warnings.filterwarnings("ignore",category=DeprecationWarning)
self.dtype=self.dtype

I can create a view with the new dtype (maybe this will create an infinite recursion, I will have to try), but I cannot replace self with the new view.

@seberg
Copy link
Member

but I cannot replace self with the new view.

Right, this was maybe the reason for why it wasn't deprecated before. I think we need a solution for it, though. And as much as I dislike this record array stuff and we should maybe discourage its use.

My first thought is one or both of:

  • Allow this specific change for record-arrays, where the new dtype is known to be fully equivalent and behave exactly the same.
  • Just add a by-pass, with something_dtype = setting, or a (semi?)private function that forces it. (I slightly fear that_dtype might be used, so maybe something more awkward.)

Unless the "unique reference" path can fix this (but I guess it didn't).

The point is, if our code needs a warning manager, then we are just kicking the real solution down the road, since eventually that should be an error and the warning context manager will stop working anyway.

@eendebakpteendebakptforce-pushed thedeprecate_array_stride_set branch fromb933d58 to11f6569CompareMay 6, 2025 16:41
@eendebakpt
Copy link
ContributorAuthor

but I cannot replace self with the new view.

Right, this was maybe the reason for why it wasn't deprecated before. I think we need a solution for it, though. And as much as I dislike this record array stuff and we should maybe discourage its use.

My first thought is one or both of:

  • Allow this specific change for record-arrays, where the new dtype is known to be fully equivalent and behave exactly the same.

Do you means checking inside the dtype setter whether the object is a subclass of masked array or recarray? That is possible (a bit inconvenient as the masked array and rec array are defined in python).

  • Just add a by-pass, with something_dtype = setting, or a (semi?)private function that forces it. (I slightly fear that_dtype might be used, so maybe something more awkward.)

This seems a reasonable approach. Any user making use of_dtype would be aware this is an internal method and best avoided.

It would be much nicer if we could rewrite the code for masked array and recarray to not update the dtype at all. If we are not able to do this, then perhaps there are users with their own subclasses that will face the same issues. For masked array the most changes are introduced in#10314 and#5943.@ahaldane@mhvk do you know whether avoiding setting the dtype in the finalizer would be possible?

Unless the "unique reference" path can fix this (but I guess it didn't).

The unique reference path does not help here, at the point where the recarray or masked array sets the dtype, we already have 5 references to the object.

The point is, if our code needs a warning manager, then we are just kicking the real solution down the road, since eventually that should be an error and the warning context manager will stop working anyway.

Kicking down the solution might be ok at first. This PR is to signal users we are deprecating setting thedtype andstrides. We can adapt our own code in a followup PR.

@seberg
Copy link
Member

Do you means checking inside the dtype setter whether the object is a subclass of masked array or recarray?

I was thinking forrecord dtypes, you could check that the only difference in the dtype is indeed that it's a "record" one. (I don't quite recall what that meant, it's an annoying concept that exists solely to return different scalars, IIRC.)

It would be nice to just not need any of this and a great improvement to try and do something about it. But it might be a rather deep rabbit hole...

Kicking down the solution might be ok at first. This PR is to signal users we are deprecating setting the dtype and strides. We can adapt our own code in a followup PR.

Yeah it may be fine. It would be nice not to if it isn't hard, for three reasons (none of which are quite blocking maybe):

  1. Doing it know gives us full confidence we can go through with it.
  2. Whatever work-around we add may be useful/needed by downstream. Which makes life easier for them (worst case they need one version dependent set of code, not update that again).
  3. Warning context managers are not very fast and not thread safe on older Python versions.

It would be much nicer if we could rewrite the code for masked array and recarray to not update the dtype at all.

Yes, but especially for record arrays it may be a rather deep rabbit hole as mentioned above.

@jorenhamjorenham self-requested a reviewMay 8, 2025 00:36
Copy link
Member

@jorenhamjorenham left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Squigglies appear where they should (at least with Pylance on VSCode):
image

So typing-wise there's nothing I could possibly complain about now 👌🏻

Copy link
Contributor

@mhvkmhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Overall, this is nice! Some mostly small comments in-line. My main comment echoes the discussion -- ideally, we avoid this whole_set_dtype. Will try to have a bit of a look at least forMaskedArray...

@@ -124,6 +124,14 @@ array_strides_set(PyArrayObject *self, PyObject *obj, void *NPY_UNUSED(ignored))
npy_intp upper_offset = 0;
Py_buffer view;

/* DEPRECATED 2025-05-04, NumPy 2.3 */
int ret = PyErr_WarnEx(PyExc_DeprecationWarning,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why not use theDEPRECATE() macro? As e.g. done at

if (DEPRECATE("Data type alias 'a' was deprecated in NumPy 2.0. "
"Use the 'S' alias instead.")<0) {
returnNULL;
}

array_descr_set(PyArrayObject *self, PyObject *arg, void *NPY_UNUSED(ignored))
{
// to be replaced with PyUnstable_Object_IsUniquelyReferenced https://github.com/python/cpython/pull/133144
int unique_reference = (Py_REFCNT(self) == 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I guess we probably do not care too much about a missed deprecation warning in some cases on python 3.14, but we do have

check_unique_temporary(PyObject*lhs)
, and using it could make sure we don't miss updating this...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

arr isn't a temporary though, so it probably doesn't apply but the "unique" part of that function should apply.


def set_dtype(x):
x.dtype = int
self.assert_deprecated(lambda: set_dtype(x))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybe simpler and clearer asself.assert_deprecated(setattr, x, "dtype", int)?

eendebakpt reacted with thumbs up emoji
s = x.strides
x.strides = s

self.assert_deprecated(lambda: set_strides(x))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

As above

@@ -1215,7 +1216,9 @@ def test_zero_stride(self):
arr = np.broadcast_to(arr, 10)
assert arr.strides == (0,)
with pytest.raises(ValueError):
arr.dtype = "i1"
with warnings.catch_warnings(): # gh-28901
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Might as well test that the deprecation warning happens here?

with pytest.warns(DeprecationWarning, match="Setting the dtype"):

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

There are some cases where we sometimes get a warning, and sometimes not (depending on optimizations perhaps? or pypy? refcounting is funny stuff). For that reason I pickedwarnings.catch_warnings() which works for either a warning or no warning.

I will check once more whether we can use apytest.warns here.

@@ -366,7 +368,11 @@ def make_array(size, offset, strides):
offset=offset * x.itemsize)
except Exception as e:
raise RuntimeError(e)
r.strides = strides = strides * x.itemsize

with warnings.catch_warnings(): # gh-28901
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

As above, I think one might as well usepytest.warns (and below too)


# test 0d
arr_0d = np.array(0)
arr_0d.strides = ()
arr_0d = np.lib.stride_tricks.as_strided(arr_0d,strides = ())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Remove spaces around= for arguments to functions (PEP8) -- also in the other cases below.

@@ -101,7 +102,10 @@ def as_strided(x, shape=None, strides=None, subok=False, writeable=True):
array = np.asarray(DummyArray(interface, base=x))
# The route via `__interface__` does not preserve structured
# dtypes. Since dtype should remain unchanged, we set it explicitly.
array.dtype = x.dtype
with warnings.catch_warnings():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'd prefer not to have_set_dtype at all, but if we keep it, then this seems a good place to use it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Agreed, the idea of having this work around, is exactly the ability to avoid any warning context managers both in NumPy (and potentially outside of NumPy).

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Agreed. I wanted to be on the safe side since there is a difference between.dtype = ... and._set_dtype(...) for subclasses that override setting attributes. Here array is created bynp.asarray so it is fine and I will update the code.

@@ -3492,7 +3492,8 @@ def dtype(self):

@dtype.setter
def dtype(self, dtype):
super(MaskedArray, type(self)).dtype.__set__(self, dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm confused - isn't it OK to have deprecation warnings also forMaskedArray?

seberg reacted with thumbs up emoji
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

There is a valid code path from numpy where the dtype of a masked array is updated (I think it was thePyArray_View) and the refcount is already larger than 1. In that case we do not want to generate a warning, hence the use of_set_dtype.
The downside of this approach is that if a user changes the dtype, we will not get a deprecation warning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is why I was suggesting to have the warning only forndarray proper -- then we can just not worry aboutMaskedArray andrecarray yet.

@@ -2126,7 +2126,11 @@ def test_assign_dtype(self):
a = np.zeros(4, dtype='f4,i4')

m = np.ma.array(a)
m.dtype = np.dtype('f4')
with warnings.catch_warnings():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Here for sure usepytest.warns - we do want to check warnings are emitted also forMaskedArray (and also below of course)

@mhvk
Copy link
Contributor

mhvk commentedMay 8, 2025

Darn, what a rabbit hole indeed! One possible solution is to start giving the warning only onndarray itself instead of also on subclasses - then we do not need any changes to masked and record arrays. I.e.,

--- a/numpy/_core/src/multiarray/getset.c+++ b/numpy/_core/src/multiarray/getset.c@@ -527,9 +527,7 @@ static int array_descr_set(PyArrayObject *self, PyObject *arg, void *NPY_UNUSED(ignored)) {     // to be replaced with PyUnstable_Object_IsUniquelyReferenced https://github.com/python/cpython/pull/133144-    int unique_reference = (Py_REFCNT(self) == 1);--    if (!unique_reference) {+    if (PyArray_CheckExact(self) && (Py_REFCNT(self) != 1)) {         // this will not emit deprecation warnings for all cases, but for most it will         /* DEPRECATED 2025-05-04, NumPy 2.3 */         int ret = PyErr_WarnEx(PyExc_DeprecationWarning,

Not ideal, but at least it makes a start, and we don't need the new_set_dtype method...

@seberg
Copy link
Member

Not ideal, but at least it makes a start, and we don't need the new _set_dtype method...

I guess I am not too worried about a_set_dtype if that is the a way make broader progress without larger refactors (including calling it semi-public, but discouraging its use).
However, 2.3 branching is around the corner (<~2 weeks) so especially if you want to push for that and the alternatives are just unwieldy, then omitting subclasses seems like a good way to make a very good start here.

@mhvk
Copy link
Contributor

mhvk commentedMay 8, 2025

Good point about 2.3 being around the corner. I think following my suggestion is easiest in the sense that it introduces no new API, and we can then think about how to doPyArray_View better. E.g., one could envisage only calling the dtype setter if it is different from the standard one.

eendebakpt reacted with thumbs up emoji

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@sebergsebergseberg left review comments

@mhvkmhvkmhvk left review comments

@jorenhamjorenhamjorenham approved these changes

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

4 participants
@eendebakpt@jorenham@seberg@mhvk

[8]ページ先頭

©2009-2025 Movatter.jp