NotificationsYou must be signed in to change notification settings
Fork11.1k
Star29.9k

Show shape any time it cannot be inferred in repr#27482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

mattip merged 2 commits intonumpy:mainfrommhvk:arrayprint-shape-for-summarized

Nov 7, 2024

Merged

Show shape any time it cannot be inferred in repr#27482

mattip merged 2 commits intonumpy:mainfrommhvk:arrayprint-shape-for-summarized

Nov 7, 2024

Conversation

Copy link

Contributor

mhvk commentedSep 30, 2024•
edited
Loading

As noted in#27461, when an array is summarized, therepr no longer makes it clear what the shape is. This is a bit odd, since for the case of a zero-sized array, the shape is shown when it is not clear what it would be (i.e., when the size is zero but the shape multi-dimensional). Similarly, if the dtype is not obvious, it is shown. So, this PR also shows the shape when summaries are made:

In [2]: np.arange(1001)Out[2]: array([   0,    1,    2, ...,  998,  999, 1000], shape=(1001,))In [3]: np.arange(1001, dtype="i2")Out[3]: array([   0,    1,    2, ...,  998,  999, 1000],      shape=(1001,), dtype=int16)

One can get the old behaviour by usinglegacy='2.1' - I'm not sure this is really required; happy to remove it if desired.

Fixes#27461

p.s. The first commit reorganizes the routine to make it easier to add the shape, while the second has the actual enhancement and tests.

mhvk added 01 - Enhancement component: numpy._core labels

Sep 30, 2024

mhvk added this to the2.2.0 release milestone

Sep 30, 2024

mhvk commented

Sep 30, 2024

View reviewed changes

numpy/_core/tests/test_arrayprint.py

		...,
		[[ 0.]]]])""")
		)
		try:

Copy link

ContributorAuthor

mhvkSep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This change does two things: correctly put the printoptions setting inside atry/finally, and ensure that there are never any spaces at the end of a line (my editor removed those, thus causing the tests to fail).

mhvk mentioned this pull request

Sep 30, 2024

ENH: Addshape control parameter toset_printoptions#27461

Closed

Copy link

Member

rkern commentedSep 30, 2024

We're not going to change the default representation without a wider discussion onthe mailing list and a NEP. This will cause a ton of churn for docstring examples in all downstream projects, so we won't pull the trigger on this without a significant consensus from a lot of stakeholders. I have my doubts that we'll get that kind of strong agreement at this time.NEP 51 was the last change of this kind, and it was significantly smaller in scope.

In any case, since there is noshape= argument tonp.array(), this particular choice for__repr__ is probably not going to happen.

mhvk force-pushed thearrayprint-shape-for-summarized branch from2530d2d to20b2babCompare

September 30, 2024 16:44

Copy link

ContributorAuthor

mhvk commentedSep 30, 2024•
edited
Loading

In any case, since there is noshape= argument tonp.array(), this particular choice for__repr__ is probably not going to happen.

Hmm, we do give the shape already for shapes that include zeros, and therepr cannot be used anyway to round-trip the array if it is being summarized... Also, I don't actually think this is going to be very disruptive, since the shape is only shown in the rare case that summarization is done (needs a size > 1000). Indeed, in numpy, there is all of one docstring test that now fails. (EDIT: I also checked astropy: also one test failure for a case where it explicitly checks what happens for an array that exceeds the threshold).

Anyway, point taken that this should go by the mailing list (though if the consensus is that this requires a NEP, I will just close this...).

Copy link

Contributor

stefanv commentedSep 30, 2024

since there is noshape= argument tonp.array(), this particular choice for__repr__

This part does strike me as contrary to what a repr does, so would prefer the version withof shape=(...) hinted at in the mailing list post (given that our repr's cannot be eval'd anyway).

Copy link

Member

seberg commentedOct 1, 2024

I don't mind this change. It seems right that this is far less impactful than the scalar repr change if it only kicks in when eval round-tripping can't possibly work (if we need a NEP, then just a tiny one for the process of agreeing to it, I think).
Usingshape= as if it was a call is a bit awkward, but not sure that trailingof shape=(...) will read better, so I honestly don't have an opinion on which part.

Copy link

ContributorAuthor

mhvk commentedOct 2, 2024

Given the feedback on the mailing list, I tried implementing adding # shape=... for the cases where the shape is not obvious, but at least locallyspin check-docs now hangs where that happens (innp.set_printoptions docstring): it looks like one may be able to have a comment that does not get checked by the doctest, but not have output that has a comment... I pushed it up so people can see it, but one may have to kill the relevant test run... Any pointers to where these doctests are run?

Current sample output

In [4]: np.arange(1001)Out[4]: array([   0,    1,    2, ...,  998,  999, 1000])  # shape=(1001,)In [5]: np.arange(1001, dtype="i2")Out[5]: array([   0,    1,    2, ...,  998,  999, 1000],      dtype=int16)  # shape=(1001,)In [6]: np.empty((10,2,0), dtype="i2")Out[6]: array([], dtype=int16)  # shape=(10, 2, 0)

mhvk force-pushed thearrayprint-shape-for-summarized branch from98e5bbf to4eb54b2Compare

October 2, 2024 19:47

Copy link

ContributorAuthor

mhvk commentedOct 2, 2024

OK, I figured out the doctests - it is just very slow if something fails. Must admit that the comment after the array works less well when the array is part of alist, as is the case in the examples for the varioussplit functions. E.g.,

x = np.arange(16.0).reshape(2, 2, 4)np.dsplit(x, np.array([3, 6]))[array([[[ 0.,  1.,  2.],         [ 4.,  5.,  6.]],         [[ 8.,  9., 10.],         [12., 13., 14.]]]), array([[[ 3.],         [ 7.]],         [[11.],         [15.]]]), array([], dtype=float64)  # shape=(2, 2, 0)]

But arguably rare enough that it doesn't matter much. I do like not having theshape argument inside thearray(...) part.

Copy link

Contributor

stefanv commentedOct 3, 2024•
edited
Loading

I guess the whole point here is not to be disruptive, otherwise<ndarray data=[...] shape=(...) dtype=float> may have been the better choice.

That# shape=(...) in a list is pretty confusing, since the closing bracket no longer parses :/

There is no constructor for which it makes sense to derive shape from an input object AND specify the shape (ndarray comes closest, but takes a buffer instead of obj), so I suppose it's futile.

Slightly radical, perhaps, but we could add ashape argument toarray(), ignoring it if it matches the object shape, or raise if it doesn't. Kind of useless, but at least then we have some valid excuse to haveshape inside the keyword list. (We'll tell marketing to sell it as a Shape Verifier™.)

Copy link

Member

seberg commentedOct 3, 2024

Yeah, while anndim orndmax would be useful information, a shape tuple is probably really just useful for verification (even if you allow(-1, 3)); unless you allow reshaping, but that seems too strange to me.
FWIW, my "I don't care", is actually being in favor of using, shape=; I suppose I don't care about the "broken" syntax all that much.

Copy link

ContributorAuthor

mhvk commentedOct 3, 2024

Slightly radical, perhaps, but we could add ashape argument toarray(), ignoring it if it matches the object shape, or raise if it doesn't. Kind of useless,

Actually, not totally useless fordtype=object - it could tell what the desired depth is to which lists of lists should be parsed, i.e., which level theobject is meant to refer to.

Back to this PR, I find the comment inside the list rather bothersome too, and agree with@seberg that the "broken" syntax is not that bad - after all, it is only shown for cases where the data string itself could not be parsed anyway. Shall I go ahead and remove the last commit that tried it?

p.s. I did have one further alternative, which would be to replace the... in summarization with something indicating the number of elements omitted. I felt it was getting too complex, but could revisit...

Copy link

Member

seberg commentedOct 3, 2024

p.s. I did have one further alternative, which would be to replace the ... in summarization with something indicating the

I remember seeing this discussed as an issue or mailing list post at some point (i.e. someone felt strong about it). Personally, I tend to think it adds much visual clutter (maybe in an html repr where you can reduce the clutter).

Copy link

Member

mattip commentedOct 30, 2024

Summarizing the more difficult problem: changing the repr breaks doc tests. It seems there are about a dozen in scipy, but only a handful in NumPy. If we go ahead with this, we could recommend projects uselegacy='2.1' if they update to 2.x and their doc tests are broken.

Copy link

ContributorAuthor

mhvk commentedOct 30, 2024

What was the final verdict on where to have the shape information? As a "fake" argument (perhaps to be added later as a real one), or as the comment? If the former, I'll need to remove the last commit.

Copy link

Contributor

stefanv commentedOct 31, 2024

That# shape=(...) in a list is pretty confusing, since the closing bracket no longer parses :/

I've warmed to the non-existent keyword argument, because of the above.

Copy link

Member

mattip commentedOct 31, 2024

Agreed. That was also the sentiment at the triage meeting: a fake argument not a comment.

MAINT: refactor array_repr to treat shape and dtype similary

4faf84e

To help set oneself up for using shape for different reasons.

mhvk force-pushed thearrayprint-shape-for-summarized branch from4eb54b2 toc5f2516Compare

October 31, 2024 11:31

Copy link

ContributorAuthor

mhvk commentedOct 31, 2024

OK, removed the last commit (and rebased), so it is back to a fakeshape argument.

mattip approved these changes

Nov 4, 2024

View reviewed changes

Copy link

Member

mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM. A short release note that mentions thelegacy=2.1 fix for projects with many doctest failures should be added.

ENH: include shape in repr also for summarized arrays

3c6d763

So that it is always there when the shape cannot be inferred from list

mhvk force-pushed thearrayprint-shape-for-summarized branch fromc5f2516 to3c6d763Compare

November 5, 2024 18:30

Copy link

ContributorAuthor

mhvk commentedNov 5, 2024

OK, thanks, I added a changelog snippet.

mattip merged commitfbffb8c intonumpy:main

Nov 7, 2024

67 checks passed

Copy link

Member

mattip commentedNov 7, 2024

Thanks@mhvk

mhvk deleted the arrayprint-shape-for-summarized branch

November 7, 2024 12:07

neutrinoceros mentioned this pull request

Nov 10, 2024

TST: fix running array repr tests with an ellipsis on numpy 2.2 (dev)astropy/astropy#17364

Merged

1 task

ev-br mentioned this pull request

Dec 9, 2024

Adapt to numpy==2.2 repr changesscipy/scipy_doctest#175

Closed

Copy link

Contributor

ev-br commentedDec 10, 2024

PSA note for projects using scipy-doctest: upgrade to scipy-doctest==1.5.1, update your doctests to use the new repr, and your doctesting will work on both numpy 2.2 and older versions.

TomNicholas mentioned this pull request

Dec 10, 2024

numpy formatting change breaks testspydata/xarray#9873

Closed

Copy link

TomNicholas commentedDec 10, 2024•
edited
Loading

This change broke xarray's printed representation (xrefpydata/xarray#9873) - thank you for providing an escape hatch, but is there a way for me to enable that as a global option?