Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Box plot and violin plot now ignore masked points#13651

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
dasrirez wants to merge7 commits intomatplotlib:main
base:main
Choose a base branch
Loading
fromdasrirez:filter-masked-points

Conversation

dasrirez
Copy link

PR Summary

PR to address#13533.
Using the masking feature of numpy was previously unsupported for box and violin plots.
New behaviour for these plots ensures that masked elements of a numpy masked array are cleared out before showing the plot.
Tests added, verified failing before fix and passing after.

PR Checklist

  • Has Pytest style unit tests
  • Code isFlake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@dasrirezdasrirez changed the titleBox plot and violin plot ignore masked pointsBox plot and violin plot now ignore masked pointsMar 11, 2019
@jklymakjklymak added this to thev3.1.1 milestoneMar 11, 2019
@dasrirez
Copy link
Author

dasrirez commentedMar 11, 2019
edited
Loading

Hi@phobson, could you please review this PR once you get the chance?
Alternatively if you're busy would@tacaswell be able to review whenever convenient?
Thanks!

(P.S. I forgot to mention that this PR is on behalf of a group of students from University of Toronto Scarborough's CSCD01 course, instructed by@atafliovich)

if isinstance(X, np.ndarray):
X = X.T
elif len(X) and isinstance(next(iter(X), None), np.ma.MaskedArray):
X = np.asarray([delete_masked_points(x).pop() for x in X])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can this not simply beX = x[~x.mask]?

Copy link
Author

@dasrirezdasrirezMar 23, 2019
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I've been trying to get that to work, but when I feed the reproduction from#13533 without masking anything like so:

...#a = np.ma.masked_less( a, -9000 )#b = np.ma.masked_less( b, -9000 )#c = np.ma.masked_less( c, -9000 )...

The filtered X causes_reshape_2D to throw an exception, namely this one

raiseValueError("{} must have 2 or fewer dimensions".format(name))

Something aboutX.ndim is not consistent with the change in this particular scenario. Usingdelete_masked_points gave it a value of 2, butX = x[~x.mask] gave it a value of 3.

(Pdb) X[masked_array(data=[  229.5,   374. ,   536.5, -9999. , -9999. , -9999. ,                   -9999. , -9999. ,   182. ,    42.5,    49. ],             mask=False,       fill_value=1e+20), masked_array(data=[  363. ,   118.5,   159. , -9999. ,   311. ,   516. ,                     380. ,   338.5,   223. ,   211.5,   128.5],             mask=False,       fill_value=1e+20), masked_array(data=[205.5, 277.5, 141.5, 278. , 302. , 251. , 299. , 250. ,                   315.5,  92.1, 211.9],             mask=False,       fill_value=1e+20)](Pdb) np.atleast_1d(np.asarray([delete_masked_points(x).pop() for x in X]))array([[  229.5,   374. ,   536.5, -9999. , -9999. , -9999. , -9999. ,        -9999. ,   182. ,    42.5,    49. ],       [  363. ,   118.5,   159. , -9999. ,   311. ,   516. ,   380. ,          338.5,   223. ,   211.5,   128.5],       [  205.5,   277.5,   141.5,   278. ,   302. ,   251. ,   299. ,          250. ,   315.5,    92.1,   211.9]])(Pdb) np.atleast_1d(np.asarray([delete_masked_points(x).pop() for x in X])).ndim2(Pdb) np.atleast_1d([x[~x.mask] for x in X])array([[[  229.5,   374. ,   536.5, -9999. , -9999. , -9999. , -9999. ,         -9999. ,   182. ,    42.5,    49. ]],       [[  363. ,   118.5,   159. , -9999. ,   311. ,   516. ,   380. ,           338.5,   223. ,   211.5,   128.5]],       [[  205.5,   277.5,   141.5,   278. ,   302. ,   251. ,   299. ,           250. ,   315.5,    92.1,   211.9]]])(Pdb) np.atleast_1d([x[~x.mask] for x in X]).ndim3

Couldn't really figure out why this happens.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Sorry, I mistyped that:X = X[~X.mask]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Oh, in that caseX wouldn't have amask attribute, the collections within may have the typenp.ma.MaskedArray but not itself.

FiguredX = np.asarray([delete_masked_points(x).pop() for x in X]) can be swapped withX = [(x[~x.mask] if np.ma.getmask(x).any() else x) for x in X] if preferred, but masked point can't be removed without iterating overX as far as I know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

OK, I guess I don't quite understand what case you are looking at. I guess I don't understand what X is for this case. Can you add a test that is more informative?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@jklymak are 2-D masked arrays supported? The helper functiondelete_masked_points doesn't seem to support them.

    Arguments can be in any of 5 categories:    1) 1-D masked arrays    2) 1-D ndarrays    3) ndarrays with more than one dimension    4) other non-string iterables    5) anything else

When I try and usedelete_masked_points on a 2-D array, it reaches this point and returns aValueError

            if isinstance(x, np.ma.MaskedArray):                if x.ndim > 1:                    raise ValueError("Masked arrays must be 1-D")

This is the test case I was trying it out on

def test_boxplot_masked_array_2d():    fig, ax = plt.subplots()    data = [[-1, -1], [1, 3], [3, 6]]    masked_data = np.ma.masked_less(data, 0)    ax.boxplot(masked_data)    fig.canvas.draw()    assert ax.get_ylim() == (1, 6)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, masked arrays can be of any dimension.

I'm pretty sure boxplot and friends take 2-D numpy arrays, so they should take 2-D masked arrays too.delete_masked_points of course can't delete from more than 1-D because you may need to remove more or fewer values from each row/column and that won't be two-d anymore. Thats why I was suggesting you also think about what happens if a np.array has a NaN, because its pretty easy to change a masked array into an array w/ NaNs at the mask points.

Copy link
Contributor

@nathan78906nathan78906Mar 26, 2019
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ok, I'm able to get test cases passing for a 2D array(and all other cases I believe) using this,

    if isinstance(X, np.ma.MaskedArray):        X = X[~X.mask] if np.ma.getmask(X).any() else X    elif isinstance(next(iter(X), None), np.ma.MaskedArray):        X = [(x[~x.mask] if np.ma.getmask(x).any() else x) for x in X]

However, when I try with array that has NaN points, it seems like Y limits get changed.
By using this code I don't see anything when I callplt.show()

import numpy as npimport matplotlib.pyplot as plta = np.ma.array([1, 2, 3, np.NaN, 4, 5])fig = plt.figure( 3 )ax = fig.add_subplot( 111 )bp = ax.boxplot( [a], vert = True)plt.show()

Removing the np.NaN gives me the desired result

(EDIT: alsoa = [1, 2, 3, np.NaN, 4, 5] anda = np.array([1, 2, 3, np.NaN, 4, 5])shows nothing also)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

OK, so I think you are almost there. Looks like you will have to do the "pop" trick.But, before you do, I think you need to doX=X.T if its a 2-D masked array because the semantics of 2-D arrays are different than a list of lists.

I would still like to see the test have a 2-D masked array as the input.... i.e. I think:data = np.array([[1, 2, 3, -1,], [3, 2, -1, 1]])a = np.ma.masked_less(data, 0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Added a test case for the 2-D masked array.

@nathan78906
Copy link
Contributor

@jklymak We've rebased onto master, is there anything else we need to do?

@jklymak
Copy link
Member

I think you accidently didn't do the rebase properly - all those commits shouldnot be there. Hopefully you backed up your branch to try again!

For this PR, I still think you need to test that

data = np.array([[1, 2, 3, ], [3, 2, 1]])

and

data = np.array([[1, 2, 3, -1,], [3, 2, -1, 1]])a = np.ma.masked_less(data, 0)

Give the same thing. I'm not sure that they do in your code because 2-D matrices have the opposite indexing convention than nested lists (don't ask me why, but its too late to change it).

@nathan78906
Copy link
Contributor

@jklymak When I try

    if isinstance(X, np.ma.MaskedArray):        if X.ndim == 2:            X = X.T            X = [(x[~x.mask] if np.ma.getmask(x).any() else x) for x in X]        else:            X = X[~X.mask] if np.ma.getmask(X).any() else X

with this example

data_to_plot = np.array([[1, 2, 3, -1,], [3, 2, -1, 1]])data_to_plot = np.ma.masked_less(data_to_plot, 0)

I get this right afterX=X.T

masked_array(  data=[[1, 3],        [2, 2],        [3, --],        [--, 1]],  mask=[[False, False],        [False, False],        [False,  True],        [ True, False]],  fill_value=999999)

Which results in 4 different columns, whereas without using a masked array the[3, 1] would be joined and apart of the same column (so 3 columns).

If I remove theX=X.T line and use a nested list as below, I think it makes more sense that the following example would return 2 columns, not 3 (or 4). The code that I currently have withoutX=X.T returns 2 columns in this example.

data1 = np.array([1, 2, 3, -1])data2 = np.array([3, 2, -1, 1])data_to_plot = [data1, data2]data_to_plot = np.ma.masked_less(data_to_plot, 0)

Also this example below will produce the same plot as the example right above, which is what my current code produces(withoutX=X.T).

data_to_plot = np.array([[1, 2, 3, -1,], [3, 2, -1, 1]])data_to_plot = np.ma.masked_less(data_to_plot, 0)

Adding theX=X.T causes there to be 4 columns in both examples.

Any thoughts on this? From examples online (http://blog.bharatbhole.com/creating-boxplots-with-matplotlib/) I think that the number of arrays in the nested list equals the number of columns?

Copy link
Member

@efiringefiring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Addressing this problem--handling masked arrays--is a good idea, and I see the appeal of doing it in this private utility function, but I think the approach needs to be changed a bit. Modifying thiscbook._reshape_2D function affects other functions likehist; and it makes the function do 2 things (edit and reshape), not just reshape. Is this always wanted? I suspect it would break some hist cases in which weights are used. Maybe the_reshape_2D should be left alone, and followed by a separate function to handle the masked-point deletion.
Is nan-handling also needed? If so, the easy way to combine it with masked-point deletion, once the arrays of any type are 1-D (or even lists), is to invokex = cbook.safe_masked_invalid(x).compressed(). At that point someone may object, "what aboutinf?". Personally, I think that filtering them out as invalid values in this context makes perfect sense, but if necessary we could easily make asafe_masked_nan function that would leaveinf alone.

# Delete masked points for MaskedArray
if isinstance(X, np.ma.MaskedArray):
if X.ndim == 2:
X = [(x[~x.mask] if np.ma.getmask(x).any() else x) for x in X]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

These expressions can be greatly simplified by using thecompressed() method of masked arrays. But see my general comment.

@nathan78906
Copy link
Contributor

@efiring Thanks, I tried to use the following

    if isinstance(X, np.ma.MaskedArray):        if X.ndim == 2:            X = [safe_masked_invalid(x).compressed() for x in X]        else:            X = safe_masked_invalid(X).compressed()    elif isinstance(next(iter(X), None), np.ma.MaskedArray):        X = [safe_masked_invalid(x).compressed() for x in X]

but I'm still running into the problem where

data_to_plot = np.array([[1, 2, 3, -1,], [3, 2, -1, 1]])data_to_plot = np.ma.masked_less(data_to_plot, 0)

and

data_to_plot = np.array([[1, 2, 3], [3, 2, 1]])

return different results because of the opposite indexing convention with 2D arrays (transpose basically). If I try addingX=X.T afterif X.ndim == 2 I run into the problem where I get the following

masked_array(  data=[[1, 3],        [2, 2],        [3, --],        [--, 1]],  mask=[[False, False],        [False, False],        [False,  True],        [ True, False]],  fill_value=999999)

which returns 4 columns, not 3 (as in the example without a masked array)

@efiringefiring modified the milestones:v3.1.1,v3.2.0May 25, 2019
@tacaswelltacaswell modified the milestones:v3.2.0,v3.3.0Sep 5, 2019
@QuLogicQuLogic modified the milestones:v3.3.0,v3.4.0May 2, 2020
@QuLogicQuLogic modified the milestones:v3.4.0,v3.5.0Jan 22, 2021
@jklymakjklymak marked this pull request as draftMarch 24, 2021 23:31
@QuLogicQuLogic modified the milestones:v3.5.0,v3.6.0Aug 18, 2021
@timhoffmtimhoffm modified the milestones:v3.6.0,unassignedApr 30, 2022
@story645story645 modified the milestones:unassigned,needs sortingOct 6, 2022
@tacaswelltacaswell mentioned this pull requestJun 2, 2023
@dstansbydstansby mentioned this pull requestJan 6, 2024
5 tasks
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@jklymakjklymakjklymak left review comments

@nathan78906nathan78906nathan78906 left review comments

@efiringefiringefiring requested changes

@phobsonphobsonAwaiting requested review from phobson

Requested changes must be addressed to merge this pull request.

Assignees
No one assigned
Projects
None yet
Milestone
future releases
Development

Successfully merging this pull request may close these issues.

8 participants
@dasrirez@nathan78906@jklymak@efiring@tacaswell@QuLogic@story645@timhoffm

[8]ページ先頭

©2009-2025 Movatter.jp