Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork8.1k
FIX: clean up unit conversion unpacking of data, particularly for dates and pandas series#11664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
jklymak commentedJul 15, 2018 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
|
12b3b7f to445047cComparelib/matplotlib/axes/_axes.py Outdated
| "values must have same the length") | ||
| # check position | ||
| ifhasattr(positions,'values'): | ||
| positions=positions.values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Are there any better ways to check if something is a pandasSeries?
In this case the following (admittedly very constructed) example would fail:
import numpy as npimport matplotlib.pyplot as pltclass Years(list): values = Truedata = np.random.rand(5,2)years = Years([1999,2003])plt.figure()plt.boxplot(data, positions=years)plt.show()There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Well, its a damned if you do, damned if you don't situation.
Other data types use thevalues convention as well (xarray for instance). The goal here was to unpack the pandas series w/o having to know what a pandas series is (there is no pandas import in the main library).
If someone tries to pass a class that hasvalues meaning something else, then I guess they will have to unpack before they plot.
FWIW, my preference would be that we only support numpy arrays and basic python lists, and let users and/or downstream libraries do their own unpacking.
ImportanceOfBeingErnestJul 15, 2018 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I don't like the idea at all that only because some object has avalues attribute, matplotlib now thinks that it should use it (and not the actual data in the object, e.g. provided by some iterator).
Or, if you think this is what should always happen, it should actually always happen. As it stands, matplotlib would use the iterator for usual numbers, but thevalues attribute for dates (#10638) or in boxplots (this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
OK, agreed. Stand by for a more holistic solution - though potentially more controversial...
445047c tofb7eb0bCompareb84c4b5 to95e4e52Compared87ac91 todb394cbComparejklymak commentedJul 16, 2018
ping@jorisvandenbossche as this is a redo of a PR you helped with previously... |
jklymak commentedJul 24, 2018
Maybe closes#5896 |
db394cb tob16c311Comparejklymak commentedNov 30, 2018
Rebased, this is still ready to go and waiting for review. |
timhoffm commentedDec 4, 2018
Looks okish, but I'm not deep enough into the numpy/pandas data formats to judge the implications. |
efiring left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Maybe just a very minor change and some explanation or additional comments.
| # convert to one dimensional arrays | ||
| C=C.ravel() | ||
| coords=np.column_stack((X,Y)).astype(float,copy=False) | ||
| coords=np.column_stack((X,Y)).astype(float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I don't see any point in removing the "copy=False" kwarg. It saves time, and it is logically what is wanted here. A copy was already made via thecolumn_stack call, so nothing done to coords will affect X or Y.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Not sure. Changed back
| # some iterables need to be massaged a bit... | ||
| ifmunits.ConversionInterface.is_numlike(x): | ||
| returnx | ||
| ifisinstance(x,list)orisinstance(x,numbers.Number): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
What is the rationale for returning a list as-is instead of running it throughnp.asarray? Just that the latter is unnecessary? Or are there cases where the conversion would be an error?
| returnx | ||
| ifisinstance(x,list)orisinstance(x,numbers.Number): | ||
| returnx | ||
| ifissubclass(type(x),np.ma.MaskedArray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Why is this usingissubclass(type(x)... instead ofisinstance?
| returnx | ||
| ifissubclass(type(x),np.ma.MaskedArray): | ||
| returnnp.ma.asarray(x) | ||
| returnnp.asarray(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Is the point of this whole addition above to short-circuit some of what would otherwise happen inget_converter? I'm not opposed, just confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
it seems worse, because callingnp.asarray can drop information in some cases, making it unavailable in the call toconvert below. Unless I'm missing something...
| if ((isinstance(d,np.ndarray)and | ||
| np.issubdtype(d.dtype,np.datetime64))or | ||
| isinstance(d,np.datetime64)): | ||
| return_dt64_to_ordinalf(d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
What case is this whole block covering? Again, I'm confused.
| data=np.random.rand(5,2) | ||
| years=pd.date_range('1/1/2000', | ||
| periods=2,freq=pd.DateOffset(years=1)).year | ||
| # Does not work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Another place I am confused. Is the comment obsolete?
| defhave_units(self): | ||
| returnself.converterisnotNoneorself.unitsisnotNone | ||
| defconvert_units(self,x): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This seems to be converting anything numlike, which includes things likepint.Quantity, into an array/masked array before callingconvert. That seems like it would breakpint, though I will note that our unit tests seem to be passing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Great, I’ll check vs pint when I look at the other comments above. Thanks for the reminder...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Ok, I guess I still don't understand how all of this works. 😉 Sois_numlike only returnstrue for things that are basic numbers (or contain them) and don't need converting. I think I'm ok with this code then. Tests passing is always a good thing too. 😄
jklymak commentedJan 28, 2019
OK, somewhat confused about everything I was thinking here, and some of the type checking has moved on since this was written. So I'm going to close. Note#13304 deregisters the pandas converters which I think was an important contribution here. Sorry for the reviewer effort on this... |
Uh oh!
There was an error while loading.Please reload this page.
PR Summary
Closes#10022 and#11391
This does one small thing (run units conversion on the position argument of
bxp/boxplot) and two big things:np.asarrayon incoming objects to unpack numpy arrays from them.Previously in order to get native pandas support we were unpacking thevaluesfield, but runningnp.asarraydoes the same thing and allows the packed values to not necessarily be stored invalues. i.e.returns:
This re-does#10638, which unpacked pandas py checking for and using
values.datehandling should be able to strip pandas series of their date information and convert. This passes all the pandas tests, but I admit this may be a controversial step.Immediate fix code:
failed on master. This fixes...
PR Checklist