Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork7.9k
FIX: clean up unit conversion unpacking of data, particularly for dates and pandas series#11664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
jklymak commentedJul 15, 2018 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
|
12b3b7f
to445047c
Comparelib/matplotlib/axes/_axes.py Outdated
@@ -3644,6 +3644,8 @@ def dopatch(xs, ys, **kwargs): | |||
datashape_message = ("List of boxplot statistics and `{0}` " | |||
"values must have same the length") | |||
# check position | |||
if hasattr(positions, 'values'): | |||
positions = positions.values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Are there any better ways to check if something is a pandasSeries
?
In this case the following (admittedly very constructed) example would fail:
import numpy as npimport matplotlib.pyplot as pltclass Years(list): values = Truedata = np.random.rand(5,2)years = Years([1999,2003])plt.figure()plt.boxplot(data, positions=years)plt.show()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Well, its a damned if you do, damned if you don't situation.
Other data types use thevalues
convention as well (xarray for instance). The goal here was to unpack the pandas series w/o having to know what a pandas series is (there is no pandas import in the main library).
If someone tries to pass a class that hasvalues
meaning something else, then I guess they will have to unpack before they plot.
FWIW, my preference would be that we only support numpy arrays and basic python lists, and let users and/or downstream libraries do their own unpacking.
ImportanceOfBeingErnestJul 15, 2018 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I don't like the idea at all that only because some object has avalues
attribute, matplotlib now thinks that it should use it (and not the actual data in the object, e.g. provided by some iterator).
Or, if you think this is what should always happen, it should actually always happen. As it stands, matplotlib would use the iterator for usual numbers, but thevalues
attribute for dates (#10638) or in boxplots (this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
OK, agreed. Stand by for a more holistic solution - though potentially more controversial...
445047c
tofb7eb0b
Compareb84c4b5
to95e4e52
Compared87ac91
todb394cb
Compareping@jorisvandenbossche as this is a redo of a PR you helped with previously... |
Maybe closes#5896 |
db394cb
tob16c311
CompareRebased, this is still ready to go and waiting for review. |
Looks okish, but I'm not deep enough into the numpy/pandas data formats to judge the implications. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Maybe just a very minor change and some explanation or additional comments.
@@ -6048,7 +6048,7 @@ def pcolormesh(self, *args, alpha=None, norm=None, cmap=None, vmin=None, | |||
# convert to one dimensional arrays | |||
C = C.ravel() | |||
coords = np.column_stack((X, Y)).astype(float, copy=False) | |||
coords = np.column_stack((X, Y)).astype(float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I don't see any point in removing the "copy=False" kwarg. It saves time, and it is logically what is wanted here. A copy was already made via thecolumn_stack
call, so nothing done to coords will affect X or Y.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Not sure. Changed back
if munits.ConversionInterface.is_numlike(x): | ||
return x | ||
if isinstance(x, list) or isinstance(x, numbers.Number): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
What is the rationale for returning a list as-is instead of running it throughnp.asarray
? Just that the latter is unnecessary? Or are there cases where the conversion would be an error?
return x | ||
if isinstance(x, list) or isinstance(x, numbers.Number): | ||
return x | ||
if issubclass(type(x), np.ma.MaskedArray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Why is this usingissubclass(type(x)...
instead ofisinstance
?
return x | ||
if issubclass(type(x), np.ma.MaskedArray): | ||
return np.ma.asarray(x) | ||
return np.asarray(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Is the point of this whole addition above to short-circuit some of what would otherwise happen inget_converter
? I'm not opposed, just confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
it seems worse, because callingnp.asarray
can drop information in some cases, making it unavailable in the call toconvert
below. Unless I'm missing something...
if ((isinstance(d, np.ndarray) and | ||
np.issubdtype(d.dtype, np.datetime64)) or | ||
isinstance(d, np.datetime64)): | ||
return _dt64_to_ordinalf(d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
What case is this whole block covering? Again, I'm confused.
data = np.random.rand(5, 2) | ||
years = pd.date_range('1/1/2000', | ||
periods=2, freq=pd.DateOffset(years=1)).year | ||
# Does not work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Another place I am confused. Is the comment obsolete?
@@ -1477,9 +1479,14 @@ def have_units(self): | |||
return self.converter is not None or self.units is not None | |||
def convert_units(self, x): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This seems to be converting anything numlike, which includes things likepint.Quantity
, into an array/masked array before callingconvert
. That seems like it would breakpint
, though I will note that our unit tests seem to be passing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Great, I’ll check vs pint when I look at the other comments above. Thanks for the reminder...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Ok, I guess I still don't understand how all of this works. 😉 Sois_numlike
only returnstrue
for things that are basic numbers (or contain them) and don't need converting. I think I'm ok with this code then. Tests passing is always a good thing too. 😄
OK, somewhat confused about everything I was thinking here, and some of the type checking has moved on since this was written. So I'm going to close. Note#13304 deregisters the pandas converters which I think was an important contribution here. Sorry for the reviewer effort on this... |
Uh oh!
There was an error while loading.Please reload this page.
PR Summary
Closes#10022 and#11391
This does one small thing (run units conversion on the position argument of
bxp/boxplot
) and two big things:np.asarray
on incoming objects to unpack numpy arrays from them.Previously in order to get native pandas support we were unpacking thevalues
field, but runningnp.asarray
does the same thing and allows the packed values to not necessarily be stored invalues
. i.e.returns:
This re-does#10638, which unpacked pandas py checking for and using
values
.date
handling should be able to strip pandas series of their date information and convert. This passes all the pandas tests, but I admit this may be a controversial step.Immediate fix code:
failed on master. This fixes...
PR Checklist