Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork7.9k
MultiNorm class#29876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
MultiNorm class#29876
Conversation
This commit introduces the MultiNorm calss to prepare for the introduction of multivariate plotting methods
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
lib/matplotlib/colors.py Outdated
@@ -2320,6 +2320,16 @@ def __init__(self, vmin=None, vmax=None, clip=False): | |||
self._scale = None | |||
self.callbacks = cbook.CallbackRegistry(signals=["changed"]) | |||
@property | |||
def n_input(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Perhaps input_dims, output_dims for consistency with Transforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Dimensions is probably somewhat confusing term in this context, because it typically means the number of axes in an array, i.e.np.zeros((4,5,6))
has three dimensions, but the correspondingn_input
would be four [if shown as an image of size (5,6)].
What do you think ofinput_variates
andoutput_variates
, orinput_vars
andoutput_vars
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Sure, although transforms use the same terminology? No strong opinion there.
However this does raise another issue, which is that you seem to put the variates as thefirst dimension. Intuitively I'd rather put them as thelast dimension (just like if you pass a 2D array to plot, thelast dimension are the different variates). I'm sure this design choice must have been discussed somewhere, can you link that discussion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We had a brief discussion on this here#14168 (comment), but we followed that up on a weekly meeting 14 September 2023https://hackmd.io/@matplotlib/Skaz1lrh6#Mutlivariate-colormapping .
The argument is not formulated well in the log, but basically(V, N, M)
because there areV
cmaps and norms, and in this way the input for the data mirrors that of the vmin, vmax and norm keywords, i.e.:ax.imshow((data0, data1), vmin=(vmin0, vmin1), cmap='BiOrangeBlue')
or
ax.imshow((preassure, temperature), vmin=(vmin_preassure, vmin_temperature), cmap='BiOrangeBlue')
Quite often the data is qualitatively different, like the example at the bottom of the page here:https://trygvrad.github.io/mpl_docs/Multivariate%20colormaps.html whereGDP_per_capita
andAnnual_growth
are plotted together. I find it is not very intuitive to wrap these in a single array.
(also: this allows the different variates to have different data types)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Sure, looks like I already lost that discussion once so let's not relitigate it :)
I'll just ping@timhoffm who may have some idea as to the best name for n_input/input_dims (or some other name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thank you@timhoffm :D
Why do we have to distinguish input and output? They are both the same here. Do you anticipate that there can be norms with different inputs and outputs? What would be an example for that? - I'm asking because that decides whether we need the input/output parts in the name.
The thinking was that there is no requirement that they be equal, so as to be future-proof they are here implemented as separate. I do believe this was mentioned at a meeting at one point, but I have spent the last day trying to imagine a use case where it would be beneficial to have them separate. I can imagine some use cases where one might want to map a single scalar to two colorbars, such as height above and below sea level on separate colorbars, but even as I stretch my mind to find a use case where one scalar maps to multiple colorbars, I cannot find a use case where it is more intuitive to integrate that into the norm than to handle the additional complexity a separate pre-processing step and use a norm where the number of inputs and outputs are the same.
With that in mind I think we should change the names from n_input/n_output to a single variable name.
In my mind the following are good options:
n_variates
n_channels
n_dims
Or we could go even shorted and just usevariates
,channels
ordims
.
I have no strong opinion now, but this has been my logic on this topic previously:
Whenn_variates
was implemented inMultivariateColormap
my thinking was that we have the ability to define our own vocabulary in this case, and whatever word we choose, it is better if we are consistent, thus we ended up withMultivariateColormap
andn_variates
, but we could alternatively haveMultichannelColormap
andn_channels
. If we choose to usen_variates
also forMultiNorm
it would give an indication to that this relates toMultivariateColormap
. Similarly, the word 'dims' is used in the context of transform, while the word 'channel' is weakly connected to the concept of an alpha-channel. My hunch has been that using a less-used word [variates] gives us more power to imbue that word with meaning, and that makes it easier to be specific in our vocabulary going forward.
How often will these properties be used by end users? - I'm asking, because if they are rarely used, we may affort longer names for more clarity.
I don't intend for users to use them. The number is implicit from the creation of aMultiNorm
, i.e.MultiNorm(['log', None])
or evenplt.imshow([a, b], cmap='2AddA', norm=['log', None])
.
Their importance is only for error-handling, to make sure that the data, norm and colormap are compatible, and give the user a reasonable error.
@anntzer I do not hold strong opinion, so if you let me know what you prefer, I will update this PR accordingly.
As this is not something we expect users to interact with much, the main concern should be the maintainers :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
- I agree that a single variable is sufficient. It's not the responsibility of the Norm to create or condense information. So it can only be
N -> N
. It could possibly even be1 -> 1
if we allow a list of Norm instances for theN -> N
case. Advantage would be that we don't need to create any new norms and the logic stays simpler. The only thing we'd loose is the ability to couple the variables, but I don't see that that's needed. - We could always later introduce MultiNorm. To be checked: Maybe it's favorable internally to collect these into a single instance, but that would not need to be public. What do you think?
- I asked ChatGPT "would you use variate as a standalone term?":
Excellent question — and honestly,"variate" isn’t super commonly used by itself in casual conversation or even in a lot of applied work, but itdoes exist as a proper term in stats.
Technically:
- Avariate is a random variable, or a measurable quantity that can take on different values.
Example: If you're studying people's heights, each person’s height would be a value of the "height variate."In practice:
- You’re way more likely to hear "variable" or "random variable" than "variate."
- "Variate" mostly shows up in more formal, old-school, or academic statistical texts.
Example: "The sample contains 100 observations of the variate X."It really shines in compound terms like:
- Univariate → one variable
- Bivariate → two variables
- Multivariate → multiple variables
But you wouldn’t usually say, “Let’s analyze this variate.” You’d just say, “variable.”
- I don't intend for users to use
n_variates
. Then (if we still want to expose MultiNorm), let's just call thisn_variables
and keep it private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Lets go forn_variables
then, it's a good name :)
So it can only be N -> N. It could possibly even be 1 -> 1 if we allow a list of Norm instances for the N -> N case. Advantage would be that we don't need to create any new norms and the logic stays simpler. The only thing we'd loose is the ability to couple the variables, but I don't see that that's needed. - We could always later introduce MultiNorm. To be checked: Maybe it's favorable internally to collect these into a single instance, but that would not need to be public. What do you think?
We have discussed this before in#28428 where I initially implemented multivariate color mapping with list of norms instead of MultiNorm#28428 (comment) . This was then later changed to MultiNorm.
However, this was before the introduction of theColorizer
as a container for norm→colormap.
The choice of having a list of norms or a MultiNorm object is now somewhat a question of where we put the additional complexity for supporting norms for multivariate/bivariate colormaps. Do we put the complexity in A: Colorizer or B: MultiNorm. My hypothesis is that having MultiNorm as a separate class simplifies the colorizer class, as we can call methods on theColorizer.norm
regardless of if the data is multivariate or not. In my mind this will make it easier to maintain.
The top-level plotting functions should be largely unaffected by this choice, as they should operate through the colorizer interface, and should not use the norm directly.
If we choose to have a MultiNorm class, it is then a separate question if MultiNorm should be private. To me, this boils down to what should the user recieve if requesting the norm after plotting a bivariate/multivariate image:
im = plt.imshow((a, b), cmap='BiOrangeBlue')im.norm # ← what should this be?
My hypothesis here has been that it is easier for the userìm.norm
has type stability, i.e. it is always a subclass ofcolors.Normalize
, and therefore we make MultiNorm public.
@timhoffm Let me know if you want me to implement a version with a list of norms instead of MultiNorm, it will have to be prototyped as a complete solution (with working top-level plotting methods) similar to#29221 , but if we go for that solution I could then break it into smaller PRs for review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Well, you don't really have type stability. While MultiNorm is formally a subclass of Normalize, it violates LSP (*), the call you need to do onim.norm
depends on the type.
(*) I'm not a LSP nazi, it's sometimes ok to violate it. But you can't argue for that with type stability.
My hypothesis is that having MultiNorm as a separate class simplifies the colorizer class
This is a valid argument and we should at least have MultiNorm internally.
To me, this boils down to what should the user recieve if requesting the norm after plotting a bivariate/multivariate image
Yes. Technically that's a separate question, and I don't have a strong preference here. But since the class exist, it may be ok to expose it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
All true, I was a bit inaccurate in my comment.
I have changed from n_input/n_output to n_variables as agreed :)
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
lib/matplotlib/colors.py Outdated
in the case where an invalid string is used. This cannot use | ||
`_api.check_getitem()`, because the norm keyword accepts arguments | ||
other than strings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm confused because this function is only called forisinstance(norm, str)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
So this function exists because the norm keyword acceptsNormalize
objects in addition to strings.
This is fundamentally the same error you get if you give an invalid norm to aColorizer
object.
In main, the@norm.setter
oncolorizer.Colorizer
reads:
@norm.setter def norm(self, norm): _api.check_isinstance((colors.Normalize, str, None), norm=norm) if norm is None: norm = colors.Normalize() elif isinstance(norm, str): try: scale_cls = scale._scale_mapping[norm] except KeyError: raise ValueError( "Invalid norm str name; the following values are " f"supported: {', '.join(scale._scale_mapping)}" ) from None norm = _auto_norm_from_scale(scale_cls)() ...
The_get_scale_cls_from_str()
exists in this PR because this functionality is now needed by bothcolorizer.Colorizer.norm()
andcolors.MultiNorm
.
Note this PR does not include changes tocolorizer.Colorizer.norm()
so that it makes use of_get_scale_cls_from_str()
. These changes follow in the next PR:#29877 .
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Thank you@QuLogicCo-authored-by: Elliott Sales de Andrade <quantum.analyst@gmail.com>
55b85e3
tof42d65b
CompareUh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
def scaled(self): | ||
"""Return whether both *vmin* and *vmax* are set on all constituent norms""" | ||
return all([(n.vmin is not None and n.vmax is not None) for n in self.norms]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
A bit too many parentheses, not that it really matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm changing this to:
return all([n.scaled for n in self.norms])
which should be more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This wasn't changed?
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Just some minor points, plus re-pinging@timhoffm in case he has an opinion re: n_input / input_dims naming?
Thank you@timhoffm |
""" | ||
Parameters | ||
---------- | ||
norms : List of strings or `Normalize` objects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
None
seems to be left out here:
norms :List ofstrings or`Normalize` objects | |
norms :list of(str,`Normalize`, or None) |
""" | ||
if isinstance(norms, str) or not np.iterable(norms): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This is equivalent, I think?
ifisinstance(norms, str) or not np.iterable(norms): | |
ifcbook.is_scalar_or_string(norms): |
# Convert the list of norms to a tuple to make it immutable. | ||
# If there is a use case for swapping a single norm, we can add support for | ||
# that later | ||
self._norms = tuple(norms) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Does this not need a check that the length matchesself.n_variables
?
Returns | ||
------- | ||
Data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Single returns should be the type, not a name.
Data, must be of length `n_variables` or be an np.ndarray type with | ||
`n_variables` fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This seems to be the only docstring for a data parameter that mentionsnp.ndarray
; is that required or not? The others should be made consistent either way.
def scaled(self): | ||
"""Return whether both *vmin* and *vmax* are set on all constituent norms""" | ||
return all([(n.vmin is not None and n.vmax is not None) for n in self.norms]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This wasn't changed?
Returns | ||
------- | ||
list of np.ndarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Dedent.
n.autoscale(a) | ||
self._changed() | ||
def autoscale_None(self, A): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Seems to be missing a test?
PR summary
This PR continues the work of#28658 and#28454, aiming toclose#14168. (Feature request: Bivariate colormapping)
This is part one of the former PR,#29221. Please see#29221 for the previous discussion
Featuresincluded in this PR:
MultiNorm
class. This is a subclass ofcolors.Normalize
and holdsn_variate
norms.MultiNorm
classFeaturesnot included in this PR:
MultiNorm
together withBivarColormap
andMultivarColormap
to the plotting functionsaxes.imshow(...)
,axes.pcolor
, and `axes.pcolormesh(...)