Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

FIX: clean up unit conversion unpacking of data, particularly for dates and pandas series#11664

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed

Conversation

jklymak
Copy link
Member

@jklymakjklymak commentedJul 15, 2018
edited
Loading

PR Summary

Closes#10022 and#11391

This does one small thing (run units conversion on the position argument ofbxp/boxplot) and two big things:

  1. At unit conversion, runnp.asarray on incoming objects to unpack numpy arrays from them.Previously in order to get native pandas support we were unpacking thevalues field, but running
    np.asarray does the same thing and allows the packed values to not necessarily be stored invalues. i.e.
dates=np.arange('2005-02','2005-05',dtype='datetime64[M]')values=np.sin(np.array(range(len(dates))))df=pd.DataFrame({'dates':dates,'z':values})print(df['dates'])print(np.asarray(df['dates']))print(df['dates'].values)

returns:

0   2005-02-011   2005-03-012   2005-04-01Name: dates, dtype: datetime64[ns]['2005-02-01T00:00:00.000000000' '2005-03-01T00:00:00.000000000' '2005-04-01T00:00:00.000000000']['2005-02-01T00:00:00.000000000' '2005-03-01T00:00:00.000000000' '2005-04-01T00:00:00.000000000']

This re-does#10638, which unpacked pandas py checking for and usingvalues.

  1. For the pandas tests, this deregisters the pandas converters, because its not clear we should be testing those. With the above, our nativedate handling should be able to strip pandas series of their date information and convert. This passes all the pandas tests, but I admit this may be a controversial step.

Immediate fix code:

importnumpyasnpimportpandasaspdimportmatplotlib.pyplotaspltpd.plotting.deregister_matplotlib_converters()data=np.random.rand(5,2)years=pd.date_range('1/1/2000',periods=2,freq=pd.DateOffset(years=1)).year# Does not workplt.figure()plt.boxplot(data,positions=years)plt.show()

failed on master. This fixes...

PR Checklist

  • Has Pytest style unit tests
  • Code is PEP 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@jklymak
Copy link
MemberAuthor

jklymak commentedJul 15, 2018
edited
Loading

BTW I've used this fix in#10638. Not sure this normalization shouldn't be incbook somewhere:cbook._check_1d? EDIT: obsolete comment

@jklymakjklymak changed the titleFIX: let boxplot take pandas positionFIX: let boxplot take pandas date as positionJul 15, 2018
@jklymakjklymakforce-pushed thefix-boxplot-pandas-position branch from12b3b7f to445047cCompareJuly 15, 2018 18:21
@@ -3644,6 +3644,8 @@ def dopatch(xs, ys, **kwargs):
datashape_message = ("List of boxplot statistics and `{0}` "
"values must have same the length")
# check position
if hasattr(positions, 'values'):
positions = positions.values

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Are there any better ways to check if something is a pandasSeries?

In this case the following (admittedly very constructed) example would fail:

import numpy as npimport matplotlib.pyplot as pltclass Years(list):    values = Truedata = np.random.rand(5,2)years = Years([1999,2003])plt.figure()plt.boxplot(data, positions=years)plt.show()

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Well, its a damned if you do, damned if you don't situation.

Other data types use thevalues convention as well (xarray for instance). The goal here was to unpack the pandas series w/o having to know what a pandas series is (there is no pandas import in the main library).

If someone tries to pass a class that hasvalues meaning something else, then I guess they will have to unpack before they plot.

FWIW, my preference would be that we only support numpy arrays and basic python lists, and let users and/or downstream libraries do their own unpacking.

NelleV reacted with thumbs up emoji
Copy link
Member

@ImportanceOfBeingErnestImportanceOfBeingErnestJul 15, 2018
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't like the idea at all that only because some object has avalues attribute, matplotlib now thinks that it should use it (and not the actual data in the object, e.g. provided by some iterator).

Or, if you think this is what should always happen, it should actually always happen. As it stands, matplotlib would use the iterator for usual numbers, but thevalues attribute for dates (#10638) or in boxplots (this PR).

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

OK, agreed. Stand by for a more holistic solution - though potentially more controversial...

@jklymakjklymakforce-pushed thefix-boxplot-pandas-position branch from445047c tofb7eb0bCompareJuly 15, 2018 20:55
@jklymakjklymakforce-pushed thefix-boxplot-pandas-position branch 6 times, most recently fromb84c4b5 to95e4e52CompareJuly 16, 2018 03:26
@jklymakjklymak changed the titleFIX: let boxplot take pandas date as positionFIX: clean up unit conversion unpacking of data, particularly for dates and pandas seriesJul 16, 2018
@jklymakjklymakforce-pushed thefix-boxplot-pandas-position branch 4 times, most recently fromd87ac91 todb394cbCompareJuly 16, 2018 12:46
@jklymak
Copy link
MemberAuthor

ping@jorisvandenbossche as this is a redo of a PR you helped with previously...

@tacaswelltacaswell added this to thev3.1 milestoneJul 17, 2018
@jklymak
Copy link
MemberAuthor

Maybe closes#5896

@jklymakjklymakforce-pushed thefix-boxplot-pandas-position branch fromdb394cb tob16c311CompareNovember 30, 2018 03:37
@jklymak
Copy link
MemberAuthor

Rebased, this is still ready to go and waiting for review.

@timhoffm
Copy link
Member

Looks okish, but I'm not deep enough into the numpy/pandas data formats to judge the implications.

Copy link
Member

@efiringefiring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybe just a very minor change and some explanation or additional comments.

@@ -6048,7 +6048,7 @@ def pcolormesh(self, *args, alpha=None, norm=None, cmap=None, vmin=None,

# convert to one dimensional arrays
C = C.ravel()
coords = np.column_stack((X, Y)).astype(float, copy=False)
coords = np.column_stack((X, Y)).astype(float)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't see any point in removing the "copy=False" kwarg. It saves time, and it is logically what is wanted here. A copy was already made via thecolumn_stack call, so nothing done to coords will affect X or Y.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Not sure. Changed back

if munits.ConversionInterface.is_numlike(x):
return x
if isinstance(x, list) or isinstance(x, numbers.Number):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What is the rationale for returning a list as-is instead of running it throughnp.asarray? Just that the latter is unnecessary? Or are there cases where the conversion would be an error?

return x
if isinstance(x, list) or isinstance(x, numbers.Number):
return x
if issubclass(type(x), np.ma.MaskedArray):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why is this usingissubclass(type(x)... instead ofisinstance?

return x
if issubclass(type(x), np.ma.MaskedArray):
return np.ma.asarray(x)
return np.asarray(x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is the point of this whole addition above to short-circuit some of what would otherwise happen inget_converter? I'm not opposed, just confused.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

it seems worse, because callingnp.asarray can drop information in some cases, making it unavailable in the call toconvert below. Unless I'm missing something...

if ((isinstance(d, np.ndarray) and
np.issubdtype(d.dtype, np.datetime64)) or
isinstance(d, np.datetime64)):
return _dt64_to_ordinalf(d)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What case is this whole block covering? Again, I'm confused.

data = np.random.rand(5, 2)
years = pd.date_range('1/1/2000',
periods=2, freq=pd.DateOffset(years=1)).year
# Does not work
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Another place I am confused. Is the comment obsolete?

@@ -1477,9 +1479,14 @@ def have_units(self):
return self.converter is not None or self.units is not None

def convert_units(self, x):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This seems to be converting anything numlike, which includes things likepint.Quantity, into an array/masked array before callingconvert. That seems like it would breakpint, though I will note that our unit tests seem to be passing.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Great, I’ll check vs pint when I look at the other comments above. Thanks for the reminder...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ok, I guess I still don't understand how all of this works. 😉 Sois_numlike only returnstrue for things that are basic numbers (or contain them) and don't need converting. I think I'm ok with this code then. Tests passing is always a good thing too. 😄

@jklymak
Copy link
MemberAuthor

OK, somewhat confused about everything I was thinking here, and some of the type checking has moved on since this was written. So I'm going to close. Note#13304 deregisters the pandas converters which I think was an important contribution here. Sorry for the reviewer effort on this...

@jklymakjklymak deleted the fix-boxplot-pandas-position branchJanuary 28, 2019 00:57
@jklymakjklymak mentioned this pull requestFeb 27, 2019
6 tasks
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@dopplershiftdopplershiftdopplershift left review comments

@ImportanceOfBeingErnestImportanceOfBeingErnestImportanceOfBeingErnest left review comments

@efiringefiringefiring requested changes

Assignees
No one assigned
Projects
None yet
Milestone
v3.1.0
Development

Successfully merging this pull request may close these issues.

boxplot: positions used to take Int64Index
6 participants
@jklymak@timhoffm@efiring@dopplershift@ImportanceOfBeingErnest@tacaswell

[8]ページ先頭

©2009-2025 Movatter.jp