Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Improve pandas/xarray/... conversion#22560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
jklymak merged 2 commits intomatplotlib:mainfromoscargus:pandasconversion
Apr 21, 2022

Conversation

oscargus
Copy link
Member

@oscargusoscargus commentedFeb 24, 2022
edited
Loading

PR Summary

Seehttps://stackoverflow.com/questions/13187778/convert-pandas-dataframe-to-numpy-array/54508052#54508052 for motivation.

Related to#16402

PR Checklist

Tests and Styling

  • Has pytest style unit tests (andpytest passes).
  • IsFlake 8 compliant (installflake8-docstrings and runflake8 --docstring-convention=all).

Documentation

  • [N/A] New features are documented, with examples if plot related.
  • [N/A] New features have an entry indoc/users/next_whats_new/ (follow instructions in README.rst there).
  • [N/A] API changes documented indoc/api/next_api_changes/ (follow instructions in README.rst there).
  • [N/A] Documentation is sphinx and numpydoc compliant (the docs shouldbuild without error).

if hasattr(X, 'values'): # support pandas.Series
X = X.values
if hasattr(X, 'to_numpy'): # support pandas.Series
X = X.to_numpy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Are there any libraries we need to worry about that implementX.values, but notX.to_numpy()? Maybe we also want to leave anelif with theX.values block as a fallback just in case...?

@QuLogic
Copy link
Member

Perhaps should be coordinated with#22141?

@oscargus
Copy link
MemberAuthor

oscargus commentedFeb 25, 2022
edited
Loading

Perhaps should be coordinated with#22141?

That's where I got the idea. As seen, I have not touchedcbook.

Related to the question from@greglucas, I do not know. But I saw this code incbook (which I didn't touch):

# unpack if we have a values or to_numpy method.
try:
X=X.to_numpy()
exceptAttributeError:
try:
ifisinstance(X.values,np.ndarray):
X=X.values
exceptAttributeError:
pass

This is a bit more complex as it uses a try-except approach (not sure how much that affects things though), has a fallback on values and check that values actually returns an nparray. One can of course use a similar approach here (and in#22141). Possibly slightly improved as I do not know if there are cases if values actually is non-trivial, so no need to run it twice.

Edit: I have now touched this and use a similar approach, but withhasattr instead oftry-except seehttps://stackoverflow.com/questions/903130/hasattr-vs-try-except-block-to-deal-with-non-existent-attributes as the assumption is that most often we do not pass pandas or xarray (or whatever) data.

@greglucas
Copy link
Contributor

I guess you could write acbook._unpack_pandas(x) helper to use that in all of these areas for consistency? I would say it is safer to keep the.values fallback, and if you do want to get rid of it eventually, you could put in a deprecation warning if that branch is hit.

@oscargus
Copy link
MemberAuthor

Yes, I was also thinking about a helper function (but didn't know where to place it, so great info that cbook is the place). I do not have any strong opinions as such, more that I read the link and it seemed like the right thing to do.

@oscargusoscargus marked this pull request as draftFebruary 25, 2022 16:46
@oscargusoscargus marked this pull request as ready for reviewFebruary 27, 2022 12:11
@oscargusoscargus mentioned this pull requestFeb 27, 2022
6 tasks
@oscargus
Copy link
MemberAuthor

Seems like the easiest way is to wait for#22141, add those conversions here as well, and then discuss the correct name for the function.

@oscargus
Copy link
MemberAuthor

I'm thinking that maybe_unpack_to_numpy is a general enough name?

jklymak reacted with thumbs up emoji

@oscargus
Copy link
MemberAuthor

oscargus commentedFeb 28, 2022
edited
Loading

This is now updated:

  • renamed function to_unpack_to_numpy
  • added test for xarray(now an extra test dependency) (Removed extra dependency)
  • (removed duplicate test code for pandas) (No duplicate code to start with...)

@oscargusoscargus changed the titleImprove Pandas conversionImprove pandas/xarray/... conversionFeb 28, 2022
@tacaswelltacaswell added this to thev3.5.2 milestoneMar 3, 2022
@tacaswell
Copy link
Member

Please update the lines touched by#22141

oscargus reacted with thumbs up emoji

@oscargusoscargusforce-pushed thepandasconversion branch 3 times, most recently from19e86ca tof17f3afCompareMarch 3, 2022 22:13
@oscargus
Copy link
MemberAuthor

oscargus commentedMar 3, 2022
edited
Loading

There is also this function where it doesn't work to simply replaceto_numpy with_unpack_to_numpy. No idea why. (One reason may be that there is no attribute error forto_numpy anymore, so some types of input data is handled well by_check_1d, has an index method, but noto_numpy method... Seems like plain Python lists can be a culprit here then...)

defindex_of(y):
"""
A helper function to create reasonable x values for the given *y*.
This is used for plotting (x, y) if x values are not explicitly given.
First try ``y.index`` (assuming *y* is a `pandas.Series`), if that
fails, use ``range(len(y))``.
This will be extended in the future to deal with more types of
labeled data.
Parameters
----------
y : float or array-like
Returns
-------
x, y : ndarray
The x and y values to plot.
"""
try:
returny.index.to_numpy(),y.to_numpy()
exceptAttributeError:
pass
try:
y=_check_1d(y)
except (np.VisibleDeprecationWarning,ValueError):
# NumPy 1.19 will warn on ragged input, and we can't actually use it.
pass
else:
returnnp.arange(y.shape[0],dtype=float),y
raiseValueError('Input could not be cast to an at-least-1D NumPy array')

np.testing.assert_array_equal(Idx, IdxRef)


def test_index_of_xarray(xr):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Does xarray get us more coverage here? They have ato_numpy() method the same as pandas I believe.
https://xarray.pydata.org/en/stable/generated/xarray.DataArray.to_numpy.html

So, it seems like a pretty heavy dependency to add for just this one test...

jklymak reacted with thumbs up emoji
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Not so much for coverage as for actually testing using data of specified formats. With the discussion about which formats we support, it makes sense to test them as well. Right now some of these are tested in the plots, but it can possibly make sense to simply test them here as these are the core function used to get data that can be plotted.

If we claim (which we actually don't, maybe we should?) that we can plot xarray, we should probably test it as well. And other types that we may want to claim to support. Or maybe fork off a specific dependency test that is not executed on all platforms/version, including pandas (which is 11.7 MB, xarray is 870 kB).

(There is another xarray-test above, so two.)

I can of course remove them, but I think we should discuss if we want to support more formats than pandas and numpy (and Python list/tuple), and, if so, have explicit tests for them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I agree, we should probably discuss what we want to support/test. To me, this doesn't seem to add a whole lot of value for adding a new dependency.

There was also a discussion around removing Scipy as a dependency in the docs:#22120

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I removed the dependencies but kept the tests. Hence, they will run if xarray is available.

I also opened#22645 for discussions (probably should be discussed at a dev-call as well).

@tacaswell
Copy link
Member

I made an executive decision to install xarray on CI.

We already have all of its dependencies installed and it is a pure-python package.

oscargus and timhoffm reacted with thumbs up emoji



def _unpack_to_numpy(x):
"""Internal helper to extract data from e.g. pandas and xarray objects."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Please document what we intend to support, i.e. everything with .to_numpy() or .values, and what types we expect to catch with it, e.g. values-> older pandas dataframes(?).

@jklymakjklymak merged commit709fba8 intomatplotlib:mainApr 21, 2022
@tacaswell
Copy link
Member

We discussed this on the call, I will write up an issues shortly about the documentation of the intention etc.

timhoffm reacted with thumbs up emoji

@QuLogic
Copy link
Member

meeseeksdev backport to v3.5.x

meeseeksmachine pushed a commit to meeseeksmachine/matplotlib that referenced this pull requestApr 21, 2022
QuLogic added a commit that referenced this pull requestApr 29, 2022
…560-on-v3.5.xBackport PR#22560 on branch v3.5.x (Improve pandas/xarray/... conversion)
mocquin added a commit to mocquin/physipy that referenced this pull requestMay 7, 2022
seematplotlib/matplotlib#22973,matplotlib/matplotlib#22879, andmatplotlib/matplotlib#22560It is not clear to me as to which is the standard interface for unit handling (for eg, hist still doesn't handle unit by default)
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@timhoffmtimhoffmtimhoffm left review comments

@greglucasgreglucasgreglucas left review comments

@tacaswelltacaswelltacaswell approved these changes

Assignees
No one assigned
Projects
None yet
Milestone
v3.5.2
Development

Successfully merging this pull request may close these issues.

6 participants
@oscargus@QuLogic@greglucas@tacaswell@timhoffm@jklymak

[8]ページ先頭

©2009-2025 Movatter.jp