Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork7.9k
Box plot and violin plot now ignore masked points#13651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
febf9bc
to4c48547
Comparedasrirez commentedMar 11, 2019 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Hi@phobson, could you please review this PR once you get the chance? (P.S. I forgot to mention that this PR is on behalf of a group of students from University of Toronto Scarborough's CSCD01 course, instructed by@atafliovich) |
lib/matplotlib/cbook/__init__.py Outdated
if isinstance(X, np.ndarray): | ||
X = X.T | ||
elif len(X) and isinstance(next(iter(X), None), np.ma.MaskedArray): | ||
X = np.asarray([delete_masked_points(x).pop() for x in X]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Can this not simply beX = x[~x.mask]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I've been trying to get that to work, but when I feed the reproduction from#13533 without masking anything like so:
...#a = np.ma.masked_less( a, -9000 )#b = np.ma.masked_less( b, -9000 )#c = np.ma.masked_less( c, -9000 )...
The filtered X causes_reshape_2D
to throw an exception, namely this one
matplotlib/lib/matplotlib/cbook/__init__.py
Line 1436 in4c48547
raiseValueError("{} must have 2 or fewer dimensions".format(name)) |
Something aboutX.ndim
is not consistent with the change in this particular scenario. Usingdelete_masked_points
gave it a value of 2, butX = x[~x.mask]
gave it a value of 3.
(Pdb) X[masked_array(data=[ 229.5, 374. , 536.5, -9999. , -9999. , -9999. , -9999. , -9999. , 182. , 42.5, 49. ], mask=False, fill_value=1e+20), masked_array(data=[ 363. , 118.5, 159. , -9999. , 311. , 516. , 380. , 338.5, 223. , 211.5, 128.5], mask=False, fill_value=1e+20), masked_array(data=[205.5, 277.5, 141.5, 278. , 302. , 251. , 299. , 250. , 315.5, 92.1, 211.9], mask=False, fill_value=1e+20)](Pdb) np.atleast_1d(np.asarray([delete_masked_points(x).pop() for x in X]))array([[ 229.5, 374. , 536.5, -9999. , -9999. , -9999. , -9999. , -9999. , 182. , 42.5, 49. ], [ 363. , 118.5, 159. , -9999. , 311. , 516. , 380. , 338.5, 223. , 211.5, 128.5], [ 205.5, 277.5, 141.5, 278. , 302. , 251. , 299. , 250. , 315.5, 92.1, 211.9]])(Pdb) np.atleast_1d(np.asarray([delete_masked_points(x).pop() for x in X])).ndim2(Pdb) np.atleast_1d([x[~x.mask] for x in X])array([[[ 229.5, 374. , 536.5, -9999. , -9999. , -9999. , -9999. , -9999. , 182. , 42.5, 49. ]], [[ 363. , 118.5, 159. , -9999. , 311. , 516. , 380. , 338.5, 223. , 211.5, 128.5]], [[ 205.5, 277.5, 141.5, 278. , 302. , 251. , 299. , 250. , 315.5, 92.1, 211.9]]])(Pdb) np.atleast_1d([x[~x.mask] for x in X]).ndim3
Couldn't really figure out why this happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Sorry, I mistyped that:X = X[~X.mask]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Oh, in that caseX
wouldn't have amask
attribute, the collections within may have the typenp.ma.MaskedArray
but not itself.
FiguredX = np.asarray([delete_masked_points(x).pop() for x in X])
can be swapped withX = [(x[~x.mask] if np.ma.getmask(x).any() else x) for x in X]
if preferred, but masked point can't be removed without iterating overX
as far as I know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
OK, I guess I don't quite understand what case you are looking at. I guess I don't understand what X is for this case. Can you add a test that is more informative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
@jklymak are 2-D masked arrays supported? The helper functiondelete_masked_points
doesn't seem to support them.
Arguments can be in any of 5 categories: 1) 1-D masked arrays 2) 1-D ndarrays 3) ndarrays with more than one dimension 4) other non-string iterables 5) anything else
When I try and usedelete_masked_points
on a 2-D array, it reaches this point and returns aValueError
if isinstance(x, np.ma.MaskedArray): if x.ndim > 1: raise ValueError("Masked arrays must be 1-D")
This is the test case I was trying it out on
def test_boxplot_masked_array_2d(): fig, ax = plt.subplots() data = [[-1, -1], [1, 3], [3, 6]] masked_data = np.ma.masked_less(data, 0) ax.boxplot(masked_data) fig.canvas.draw() assert ax.get_ylim() == (1, 6)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yes, masked arrays can be of any dimension.
I'm pretty sure boxplot and friends take 2-D numpy arrays, so they should take 2-D masked arrays too.delete_masked_points
of course can't delete from more than 1-D because you may need to remove more or fewer values from each row/column and that won't be two-d anymore. Thats why I was suggesting you also think about what happens if a np.array has a NaN, because its pretty easy to change a masked array into an array w/ NaNs at the mask points.
nathan78906Mar 26, 2019 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Ok, I'm able to get test cases passing for a 2D array(and all other cases I believe) using this,
if isinstance(X, np.ma.MaskedArray): X = X[~X.mask] if np.ma.getmask(X).any() else X elif isinstance(next(iter(X), None), np.ma.MaskedArray): X = [(x[~x.mask] if np.ma.getmask(x).any() else x) for x in X]
However, when I try with array that has NaN points, it seems like Y limits get changed.
By using this code I don't see anything when I callplt.show()
import numpy as npimport matplotlib.pyplot as plta = np.ma.array([1, 2, 3, np.NaN, 4, 5])fig = plt.figure( 3 )ax = fig.add_subplot( 111 )bp = ax.boxplot( [a], vert = True)plt.show()
Removing the np.NaN gives me the desired result
(EDIT: alsoa = [1, 2, 3, np.NaN, 4, 5]
anda = np.array([1, 2, 3, np.NaN, 4, 5])
shows nothing also)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
OK, so I think you are almost there. Looks like you will have to do the "pop" trick.But, before you do, I think you need to doX=X.T
if its a 2-D masked array because the semantics of 2-D arrays are different than a list of lists.
I would still like to see the test have a 2-D masked array as the input.... i.e. I think:data = np.array([[1, 2, 3, -1,], [3, 2, -1, 1]])
a = np.ma.masked_less(data, 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Added a test case for the 2-D masked array.
@jklymak We've rebased onto master, is there anything else we need to do? |
I think you accidently didn't do the rebase properly - all those commits shouldnot be there. Hopefully you backed up your branch to try again! For this PR, I still think you need to test that
and
Give the same thing. I'm not sure that they do in your code because 2-D matrices have the opposite indexing convention than nested lists (don't ask me why, but its too late to change it). |
5b24dc8
toc50578d
CompareModify _reshape_2D to check for single MaskedArray object or multiple MaskedArray objects in an array
c50578d
to08853d7
Compare@jklymak When I try
with this example
I get this right after
Which results in 4 different columns, whereas without using a masked array the If I remove the
Also this example below will produce the same plot as the example right above, which is what my current code produces(without
Adding the Any thoughts on this? From examples online (http://blog.bharatbhole.com/creating-boxplots-with-matplotlib/) I think that the number of arrays in the nested list equals the number of columns? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Addressing this problem--handling masked arrays--is a good idea, and I see the appeal of doing it in this private utility function, but I think the approach needs to be changed a bit. Modifying thiscbook._reshape_2D
function affects other functions likehist
; and it makes the function do 2 things (edit and reshape), not just reshape. Is this always wanted? I suspect it would break some hist cases in which weights are used. Maybe the_reshape_2D
should be left alone, and followed by a separate function to handle the masked-point deletion.
Is nan-handling also needed? If so, the easy way to combine it with masked-point deletion, once the arrays of any type are 1-D (or even lists), is to invokex = cbook.safe_masked_invalid(x).compressed()
. At that point someone may object, "what aboutinf
?". Personally, I think that filtering them out as invalid values in this context makes perfect sense, but if necessary we could easily make asafe_masked_nan
function that would leaveinf
alone.
# Delete masked points for MaskedArray | ||
if isinstance(X, np.ma.MaskedArray): | ||
if X.ndim == 2: | ||
X = [(x[~x.mask] if np.ma.getmask(x).any() else x) for x in X] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
These expressions can be greatly simplified by using thecompressed()
method of masked arrays. But see my general comment.
@efiring Thanks, I tried to use the following
but I'm still running into the problem where
and
return different results because of the opposite indexing convention with 2D arrays (transpose basically). If I try adding
which returns 4 columns, not 3 (as in the example without a masked array) |
PR Summary
PR to address#13533.
Using the masking feature of numpy was previously unsupported for box and violin plots.
New behaviour for these plots ensures that masked elements of a numpy masked array are cleared out before showing the plot.
Tests added, verified failing before fix and passing after.
PR Checklist