Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Show shape any time it cannot be inferred in repr#27482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
mattip merged 2 commits intonumpy:mainfrommhvk:arrayprint-shape-for-summarized
Nov 7, 2024

Conversation

mhvk
Copy link
Contributor

@mhvkmhvk commentedSep 30, 2024
edited
Loading

As noted in#27461, when an array is summarized, therepr no longer makes it clear what the shape is. This is a bit odd, since for the case of a zero-sized array, the shape is shown when it is not clear what it would be (i.e., when the size is zero but the shape multi-dimensional). Similarly, if the dtype is not obvious, it is shown. So, this PR also shows the shape when summaries are made:

In [2]: np.arange(1001)Out[2]: array([   0,    1,    2, ...,  998,  999, 1000], shape=(1001,))In [3]: np.arange(1001, dtype="i2")Out[3]: array([   0,    1,    2, ...,  998,  999, 1000],      shape=(1001,), dtype=int16)

One can get the old behaviour by usinglegacy='2.1' - I'm not sure this is really required; happy to remove it if desired.

Fixes#27461

p.s. The first commit reorganizes the routine to make it easier to add the shape, while the second has the actual enhancement and tests.

eric-wieser, mathause, and charris reacted with thumbs up emoji
...,
[[ 0.]]]])""")
)
try:
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This change does two things: correctly put the printoptions setting inside atry/finally, and ensure that there are never any spaces at the end of a line (my editor removed those, thus causing the tests to fail).

@rkern
Copy link
Member

We're not going to change the default representation without a wider discussion onthe mailing list and a NEP. This will cause a ton of churn for docstring examples in all downstream projects, so we won't pull the trigger on this without a significant consensus from a lot of stakeholders. I have my doubts that we'll get that kind of strong agreement at this time.NEP 51 was the last change of this kind, and it was significantly smaller in scope.

In any case, since there is noshape= argument tonp.array(), this particular choice for__repr__ is probably not going to happen.

@mhvkmhvkforce-pushed thearrayprint-shape-for-summarized branch from2530d2d to20b2babCompareSeptember 30, 2024 16:44
@mhvk
Copy link
ContributorAuthor

mhvk commentedSep 30, 2024
edited
Loading

In any case, since there is noshape= argument tonp.array(), this particular choice for__repr__ is probably not going to happen.

Hmm, we do give the shape already for shapes that include zeros, and therepr cannot be used anyway to round-trip the array if it is being summarized... Also, I don't actually think this is going to be very disruptive, since the shape is only shown in the rare case that summarization is done (needs a size > 1000). Indeed, in numpy, there is all of one docstring test that now fails. (EDIT: I also checked astropy: also one test failure for a case where it explicitly checks what happens for an array that exceeds the threshold).

Anyway, point taken that this should go by the mailing list (though if the consensus is that this requires a NEP, I will just close this...).

@stefanv
Copy link
Contributor

since there is noshape= argument tonp.array(), this particular choice for__repr__

This part does strike me as contrary to what a repr does, so would prefer the version withof shape=(...) hinted at in the mailing list post (given that our repr's cannot be eval'd anyway).

mhvk reacted with thumbs up emoji

@seberg
Copy link
Member

I don't mind this change. It seems right that this is far less impactful than the scalar repr change if it only kicks in when eval round-tripping can't possibly work (if we need a NEP, then just a tiny one for the process of agreeing to it, I think).
Usingshape= as if it was a call is a bit awkward, but not sure that trailingof shape=(...) will read better, so I honestly don't have an opinion on which part.

mhvk reacted with thumbs up emoji

@mhvk
Copy link
ContributorAuthor

mhvk commentedOct 2, 2024

Given the feedback on the mailing list, I tried implementing adding # shape=... for the cases where the shape is not obvious, but at least locallyspin check-docs now hangs where that happens (innp.set_printoptions docstring): it looks like one may be able to have a comment that does not get checked by the doctest, but not have output that has a comment... I pushed it up so people can see it, but one may have to kill the relevant test run... Any pointers to where these doctests are run?

Current sample output

In [4]: np.arange(1001)Out[4]: array([   0,    1,    2, ...,  998,  999, 1000])  # shape=(1001,)In [5]: np.arange(1001, dtype="i2")Out[5]: array([   0,    1,    2, ...,  998,  999, 1000],      dtype=int16)  # shape=(1001,)In [6]: np.empty((10,2,0), dtype="i2")Out[6]: array([], dtype=int16)  # shape=(10, 2, 0)

@mhvkmhvkforce-pushed thearrayprint-shape-for-summarized branch from98e5bbf to4eb54b2CompareOctober 2, 2024 19:47
@mhvk
Copy link
ContributorAuthor

mhvk commentedOct 2, 2024

OK, I figured out the doctests - it is just very slow if something fails. Must admit that the comment after the array works less well when the array is part of alist, as is the case in the examples for the varioussplit functions. E.g.,

x = np.arange(16.0).reshape(2, 2, 4)np.dsplit(x, np.array([3, 6]))[array([[[ 0.,  1.,  2.],         [ 4.,  5.,  6.]],         [[ 8.,  9., 10.],         [12., 13., 14.]]]), array([[[ 3.],         [ 7.]],         [[11.],         [15.]]]), array([], dtype=float64)  # shape=(2, 2, 0)]

But arguably rare enough that it doesn't matter much. I do like not having theshape argument inside thearray(...) part.

@stefanv
Copy link
Contributor

stefanv commentedOct 3, 2024
edited
Loading

I guess the whole point here is not to be disruptive, otherwise<ndarray data=[...] shape=(...) dtype=float> may have been the better choice.

That# shape=(...) in a list is pretty confusing, since the closing bracket no longer parses :/

There is no constructor for which it makes sense to derive shape from an input object AND specify the shape (ndarray comes closest, but takes a buffer instead of obj), so I suppose it's futile.

Slightly radical, perhaps, but we could add ashape argument toarray(), ignoring it if it matches the object shape, or raise if it doesn't. Kind of useless, but at least then we have some valid excuse to haveshape inside the keyword list. (We'll tell marketing to sell it as a Shape Verifier™.)

@seberg
Copy link
Member

Yeah, while anndim orndmax would be useful information, a shape tuple is probably really just useful for verification (even if you allow(-1, 3)); unless you allow reshaping, but that seems too strange to me.
FWIW, my "I don't care", is actually being in favor of using, shape=; I suppose I don't care about the "broken" syntax all that much.

@mhvk
Copy link
ContributorAuthor

mhvk commentedOct 3, 2024

Slightly radical, perhaps, but we could add ashape argument toarray(), ignoring it if it matches the object shape, or raise if it doesn't. Kind of useless,

Actually, not totally useless fordtype=object - it could tell what the desired depth is to which lists of lists should be parsed, i.e., which level theobject is meant to refer to.

Back to this PR, I find the comment inside the list rather bothersome too, and agree with@seberg that the "broken" syntax is not that bad - after all, it is only shown for cases where the data string itself could not be parsed anyway. Shall I go ahead and remove the last commit that tried it?

p.s. I did have one further alternative, which would be to replace the... in summarization with something indicating the number of elements omitted. I felt it was getting too complex, but could revisit...

@seberg
Copy link
Member

p.s. I did have one further alternative, which would be to replace the ... in summarization with something indicating the

I remember seeing this discussed as an issue or mailing list post at some point (i.e. someone felt strong about it). Personally, I tend to think it adds much visual clutter (maybe in an html repr where you can reduce the clutter).

mhvk reacted with thumbs up emoji

@mattip
Copy link
Member

Summarizing the more difficult problem: changing the repr breaks doc tests. It seems there are about a dozen in scipy, but only a handful in NumPy. If we go ahead with this, we could recommend projects uselegacy='2.1' if they update to 2.x and their doc tests are broken.

seberg reacted with thumbs up emoji

@mhvk
Copy link
ContributorAuthor

What was the final verdict on where to have the shape information? As a "fake" argument (perhaps to be added later as a real one), or as the comment? If the former, I'll need to remove the last commit.

@stefanv
Copy link
Contributor

That# shape=(...) in a list is pretty confusing, since the closing bracket no longer parses :/

I've warmed to the non-existent keyword argument, because of the above.

@mattip
Copy link
Member

Agreed. That was also the sentiment at the triage meeting: a fake argument not a comment.

To help set oneself up for using shape for different reasons.
@mhvkmhvkforce-pushed thearrayprint-shape-for-summarized branch from4eb54b2 toc5f2516CompareOctober 31, 2024 11:31
@mhvk
Copy link
ContributorAuthor

OK, removed the last commit (and rebased), so it is back to a fakeshape argument.

Copy link
Member

@mattipmattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM. A short release note that mentions thelegacy=2.1 fix for projects with many doctest failures should be added.

So that it is always there when the shape cannot be inferred from list
@mhvkmhvkforce-pushed thearrayprint-shape-for-summarized branch fromc5f2516 to3c6d763CompareNovember 5, 2024 18:30
@mhvk
Copy link
ContributorAuthor

mhvk commentedNov 5, 2024

OK, thanks, I added a changelog snippet.

@mattipmattip merged commitfbffb8c intonumpy:mainNov 7, 2024
67 checks passed
@mattip
Copy link
Member

Thanks@mhvk

seberg reacted with hooray emoji

@ev-br
Copy link
Contributor

PSA note for projects using scipy-doctest: upgrade to scipy-doctest==1.5.1, update your doctests to use the new repr, and your doctesting will work on both numpy 2.2 and older versions.

@TomNicholas
Copy link

TomNicholas commentedDec 10, 2024
edited
Loading

This change broke xarray's printed representation (xrefpydata/xarray#9873) - thank you for providing an escape hatch, but is there a way for me to enable that as a global option?

@seberg
Copy link
Member

You can also uselegacy="..." printoptions to avoid the new repr (that defers the problem a bit maybe).

Although, in this case since it seems like a single test, maybe it is just as easy to adapt the test to allow both?

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@mattipmattipmattip approved these changes

Assignees
No one assigned
Projects
None yet
Milestone
2.2.0 release
Development

Successfully merging this pull request may close these issues.

ENH: Addshape control parameter toset_printoptions
7 participants
@mhvk@rkern@stefanv@seberg@mattip@ev-br@TomNicholas

[8]ページ先頭

©2009-2025 Movatter.jp