Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Preparations for multivariate plotting#29877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
trygvrad wants to merge8 commits intomatplotlib:main
base:main
Choose a base branch
Loading
fromtrygvrad:multivariate-plot-prapare-2

Conversation

trygvrad
Copy link
Contributor

PR summary

This PR continues the work of#28658 and#28454 and#29876, aiming toclose#14168. (Feature request: Bivariate colormapping)

This is parttwo of the former PR,#29221, and builds upon#29876. Please see#29221 for the previous discussion

#29876 includes:

  • AMultiNorm class. This is a subclass ofcolors.Normalize and holdsn_variate norms.
  • Testing of theMultiNorm class

This PRincludes in this PR:

  • changes to colorizer.py needed to expose the MultiNorm class

Featuresnot included in this PR:

  • Exposes the functionality provided byMultiNorm together withBivarColormap andMultivarColormap to the plotting functionsaxes.imshow(...),axes.pcolor, and `axes.pcolormesh(...)
  • Testing of the new plotting methods
  • Examples in the docs

This commit introduces the MultiNorm calss to prepare for the introduction of multivariate plotting methods
@trygvradtrygvrad changed the titleMultivariate plot prapare 2Preparations for multivariate plottingApr 6, 2025
return x
else:
# in case of a dtype with multiple fields:
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Would be good to get at least partial coverage for this branch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I haven't really been involved in this work nor understand how it works, but there is quite a bit of introduced code to deal with multiple datatypes? If this will be covered by tests/functionality in later PRs, that is fine, if not, please add tests for (most of) it.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I was asked to split#29221 into multiple PRs, and this PR is one of them.
There is tests for this functionality in#29221 using the top-level plotting functions (axes.imshow() etc.)
In my mind it is better to test using the top-level API, but if you wish I could add dedicated testing to this PR.

if self.norm.n_output != cmap_obj.n_variates:
raise ValueError(f"The colormap {cmap} does not support "
f"{self.norm.n_output} variates as required by "
f"the {type(self.norm)} on this Colorizer.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Error messages typically have no end dot (same comment applies throughout).

trygvrad reacted with thumbs up emoji
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks, I'll need to change this in the other PR as well.

mask = np.empty(x.shape, dtype=np.dtype('bool, '*len(x.dtype.descr)))
for dd, dm in zip(x.dtype.descr, mask.dtype.descr):
mask[dm[0]] = ~(np.isfinite(x[dd[0]]))
xm = np.ma.array(x, mask=mask, copy=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Do numpy masked arrays actually support struct arrays as mask, with possibly different masking of the fields?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I have found that this is the only way numpy supports masking dtypes with multiple fields, but I will see if[("mask", bool, len(x.dtype.descr))] as you suggest bellow is a reasonable approach to using a single mask.

else:
# in case of a dtype with multiple fields:
try:
mask = np.empty(x.shape, dtype=np.dtype('bool, '*len(x.dtype.descr)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Could the dtype be e.g.[("mask", bool, len(x.dtype.descr))] (with a slightly different API)?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is an interesting idea. I'll make a prototype and see if this would add unnecessary complexity somewhere else.

@trygvrad
Copy link
ContributorAuthor

@anntzer I think this is important, so I wanted to reply to this in the main thread.

Could the dtype be e.g. [("mask", bool, len(x.dtype.descr))] (with a slightly different API)?

The context here is that mulrivariate data is stored internally as an array with a data type with multiple fields.
This has been chosen, because it ensures thatdata.shape returns the same shape for both scalar and multivariate data.
If a numpy array with multiple fields is masked, itmust have a separate mask for each channel. I read@anntzer s suggestion as letting the mask be another field, i.e. ['bool', 'float64', 'float64'] interpreted as [mask, variate0, variate1] when a dataset with two variates is masked.

It should be noted that when a regular np.array is masked, and the mask isfalse for all values, only a single instance of is stored (instead of a full array of bools). This is not the case for structured arrays. For structured arrays, full mask [with a separate bool for each field] is encoded in all cases.

I didn't actually get as far as to prototype this, but I did have a look around.

I have found that it will largely involve changes tocolors.multi_norm._iterable_variates_in_data() andcbook.safe_masked_invalid()

I have tried to list the advantages/disadvantages of the two approaches below:

A: Use a masked array with a struct array.

  • Implication: each variate has a separate mask

Advantages:

  1. It is easy to iterate over the channels (this is in any case handled bycolors.multi_norm._iterable_variates_in_data())
  2. Easy to parse masked input
  3. np.ma.is_masked() will work for both multivariate and scalar data
    3.1 I don't think this is actually used internally in the context of the data for a relevant plotting method, so this appears to be a minor issue.
  4. Each variatemay have a different mask, and we may implement different belending mode in color for each.
    4.1 i.e. instead of having the masked values be transparent, it is possible to map them to unique colors, [typically colors that otherwise do not occur naturally in the colormap, typically cyan, magenta, bright green ] so that the user knows which channel has masked [invalid] data.
    4.2 We will probably not support this initially, but choosing this route allows us the flexibility in the future

Disadvantages:

  1. Need to store a separate mask for each channel.

B: store the mask as an additional dtype in the struct array i.e.[("mask", bool, len(x.dtype.descr))]

  • Implication: a shared mask for all channels

Advantages:

  1. Only one mask
    1.1 Less memory use
    1.2 No ambiguity as to what data is masked

Disadvantages:

  1. In order to iterate over the channels, a masked array must be created for each channel. (i.e. slicing the array will not produce masked arrays – this can be handled incolors.multi_norm._iterable_variates_in_data().)
    1.1. The data may be iterated over multiple times in order to produce a single plot [autoset limits(?) etc.]. One way to interpret option A is that it caches each variate in its masked form, whereas with option B the masked version of each variate is created only upon access.
  2. A no-masked and masked version of the same array has different number of fields, which can lead to confusion.
  3. Masked input must be parsed
    3.1 With this implementation, practically all data will need to be formatted upon input, whereas with implementation A, data that is already structs [or is complex!] is interoperable with the internal workings of matplotlib.
  4. I suspect it will be more difficult to onboard new developers with this approach.

Having looked at this, my personal opinion is that option A is more suitable for matplotlib because I think it will be easier to maintain.

@anntzer let me know if I have interpreted your suggestion correctly, and if you agree with my assessment of approach A or B, or if you think I should make a full prototype to explore this further.

@trygvradtrygvrad mentioned this pull requestApr 17, 2025
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@anntzeranntzeranntzer left review comments

@oscargusoscargusoscargus left review comments

At least 1 approving review is required to merge this pull request.

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

Feature request: Bivariate colormapping
3 participants
@trygvrad@anntzer@oscargus

[8]ページ先頭

©2009-2025 Movatter.jp