Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork8.2k
Conversation
story645 commentedApr 25, 2018
I like the idea of a generalized way to do legends for PathCollections & PatchCollections, but wonder if this is overly complicated: # produce a legend with the unique colors from the scatterax.legend(*scatter.legend_items(),loc=3,title="Classes")# produce a legend with a cross section of sizes from the scatterhandles,labels=scatter.legend_items(mode="sizes",alpha=0.6)ax.legend(handles,labels,loc=1,title="Sizes") overlapping with the discussion in ##7383, I wonder if there shouldn't just be a subclass of legend that handles Path/Patch and cmap/norm stuff. |
ImportanceOfBeingErnest commentedApr 26, 2018
How is a single line "too complicated"? If I understand correctly, a Legend subclass would then need to take the artist (in this case the PathCollection) as input, as well as those arguments needed to steer the output. In that sense you could replace I also don't see much overlap to What is proposed in this PR is completely analogue to the existing |
jklymak left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I think this a useful thing to add. Some minor suggestions/comments below that I think will improve it. I feel somewhat strongly that the name should not just belegend_items() as the returned tuple is non-unique.
My only other issue is that we are going to add aax.get_legend_handles_labels. I've not quite wrapped my brain around where this fits into that.
lib/matplotlib/collections.py Outdated
| Parameters | ||
| ---------- | ||
| mode : string, optional, default *"colors"* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Not sure I like "mode" here, though I guess its OK. "key" would be more natural to me...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
What makeskeymore natural? Any good argument forkey?
In principle I'm open to other names here, e.g.prop because colors or sizes are properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yeah,prop works for me.
lib/matplotlib/collections.py Outdated
| for obtaining a legend for a :meth:`~.Axes.scatter` plot. E.g.:: | ||
| scatter = plt.scatter([1,2,3], [4,5,6], c=[7,2,3]) | ||
| plt.legend(*scatter.legend_items()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm not sure I like the namelegend_items just because it makes it sound like these are a property of the PathCollection; I think a verb would help likecreate_legend_items ormake_legend_items
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
For consistency I would go withlegend_elements as commented above already. Unless people are prepared to break the existing contour method and callboth something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I still like having a verb because you are actually doing something versus just returning a property of the collection. But I don't feel super strongly...
lib/matplotlib/collections.py Outdated
| Can be *"colors"* or *"sizes"*. In case of *"colors"*, the legend | ||
| handles will show the different colors of the collection. In case | ||
| of "sizes", the legend will show the different sizes. | ||
| useall : bool or string "auto", optional (default "auto") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
EDIT: oops, this somehow got lost: I don't think you need bothuseall andnum kwargs. num=None is all data, num=int is number bins, num='auto' is auto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Ithink that won't work, but I will recheck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
No, you're right, that works. The only "drawback" is that in that casenum cannot act as a parameter for the automatic determination; but I guess it's fine to have that parameter hardcoded (choosing num=9 now for the border between showing all data or some cross section of it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I didn't get that num interacted w/ auto somehow. (still don't get it w/o checking the code).
I wonder if what you really want is a "bins" argument, and make this completely analagous to the histogram bins argument.https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html You could even optionally put in the legend how many elements there are in each bin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I don't think I want "bins". I want "ticks"; and I get them using a Locator. From my side this is all working fine. The idea is of course to get nice "tick positions", unlike in the case of histogram bins where you'd end up with arbitrarily ugly numbers like 1.57214.
What this function does not allow for is to add custom ticks (which is possible for bins inhistogram). I decided against that because I think this is exactly the point where you'd anyways simply create the legend the way you want it with the artists created manually. But if someone feels that this is needed, I could also add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
OK, thats cool. What about letting a (power) user pass in a Locator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
That is sure possible. I think it would then simply ignorenum completely. I'm not sure if it's needed, but if there is the wish to add it, I can do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I meant let them pass in a locator as the argument to num. I don’t think anyone who knows how to write a locator would be put off by the slight semantic incongruity.
| The format or formatter to use for the labels. If a string must be | ||
| a valid input for a `~.StrMethodFormatter`. If None (the default), | ||
| use a `~.ScalarFormatter`. | ||
| func : function, default *lambda x: x* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I found this too confusing. What does*func* do and why do you need it? Maybe this needs an example...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
func would allow to calculate the values to be shown in the legend. This is mainly useful for "sizes", because unlike thec parameter in scatter, which takes numbers and maps them to color using aNormalize, thes parameter does not have a normalization. This means that usually you would encode your data into sizes and provide those sizes in the scatter call. However in the legend you might still want to show the original data's values. Example: You use thes to show price ranging between 0.5 and 10 dollars. Now you cannot use those numbers directly as scatter sizes, because that would result in tiny dots. So you convert them to sizes vias = lambda price : 0.5*(price*10)^2. Nowfunc would allow you to provideprice = lambda s: np.sqrt(2*s)/10 to have a nice legend using the actual prices in dollars.
I think you are right that an example might be useful here. I could add that in addition to the already extended example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Makes sense. I was particularly flummoxed by "This converts the initial values for color or size and needs to take a numpy array as input." I wasn't sure what "initial" meant.
For the doc string maybe: "Function to calculate the labels. Often thesize (orcolor) argument to scatter will have been pre-processed by the user using a functions = f(x) to make the markers visible; egsize = np.log10(x)`. Providing this function allows that pre-processing to be inverted, so that the legend labels have the correct values. egnp.exp(x, 10)``"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I reformulated the docstring and also added an additional example now.
story645 commentedApr 29, 2018 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Sorry, I was reacting to the * notation and should have looked at this way more carefully. It's fine as is, especially if you go with |
jklymak commentedApr 29, 2018
Actually ax.get_legend_handles_labels already exists. As does the concept of legend handlers. Any way this can be turned into a legend handler? |
ImportanceOfBeingErnest commentedApr 29, 2018
would produce a single legend row with one scatter dot and the label "mylabel". It is hence equalvalent to simply calling Now what is possible using a legend handler would be to change the single handle from a single point to something more characteristic of a scatter, e.g.: However, this is still a single legend entry for the complete scatter. What this PR proposes is a way to getseveral legend entries forone single scatter plot. This is not possible with a One might consider putting the proposed API for |
lib/matplotlib/collections.py Outdated
| def get_paths(self): | ||
| return self._paths | ||
| def legend_items(self, mode="colors", useall="auto", num=10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
How far up the class hierarchy does it make sense to move this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This particular method only makes sense for thePathCollection. A less useful version of it could be created for anyCollection and yes, evenArtist could in principle have such method. However the point being that at least for more complex things like scatter (PathCollection), LineCollection, contour (ContourSet), imshow (AxesImage) there need to be some options for the user to decide on the final appearance. I'm unsure how to generalize such arguments and how to use them, other than letting the user call this method and supply the returned handles/labels tolegend - see comment below.
tacaswell commentedApr 29, 2018
The issue here is that the "one artist : one dataset : one legend entry" mapping that exist for ex A concern with the way we currently do legends is that the handlers are created by lookup up state held in the How far can the notion of 'legend_elements' be generalized? Even if we only have a real implemenation for PathCollection, having a I think for now coming up with a consistent name and API for |
ImportanceOfBeingErnest commentedApr 29, 2018
jklymak commentedApr 29, 2018
I wasn't exactly sure the scope of what legend_handler could do. I agree that the functionality here is useful, and I don't really see how it can easily be made extensible. What I'm hung up on is that I am not sure I like the API of having it a method on OTOH, if we do this for every different artist type, I imagine this gets to be a mess so I can see the point of having it attached to the artist. Again, just trying to think the API through a bit. I think this is a good idea overall. |
ImportanceOfBeingErnest commentedApr 30, 2018
we currently have:
Concerning the API, i.e. the needed arguments, I think there might be little to no overlap between what different artists need as input. Even |
a801e21 to809c130Comparejklymak commentedApr 30, 2018
Sorry, I missed the point about |
jklymak left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I think this is useful, and consistent w/ the similar method toContourSet. "num" could possible pass a Locator if we want more flexibility in the binning of the sizes, but that could either happen now or as a follow up if someone wants more flexibility. Thanks a lot for this!
dfe09f3 to1097412CompareImportanceOfBeingErnest commentedMay 2, 2018
Update: You may now pass a locator to Concerning the discussion about having a However, as long as this isn't used anywhere internally, it's not really useful. In any case this can be part of a different PR, in case one wants to let |
jklymak commentedMay 14, 2018
@ImportanceOfBeingErnest we talks about this on the weekly phone call, and the issue is with attaching this method to The request would be that we add a If you don't feel like doing this, I imagine someone else will be up for picking it up! Thanks for a great idea... |
ImportanceOfBeingErnest commentedMay 14, 2018
But this is not only useful for scatters. If you have a different PathCollection than the one being created by a scatter, you may still use it to create legend entries for it. As of now, it seems no other matplotlib function other than scatter creates PathCollections, but if there were others, why restrict this functionality to only |
ImportanceOfBeingErnest commentedMay 28, 2018
What's the status here? |
jklymak commentedMay 28, 2018
Myunderstanding which may be incomplete, is that If you still disagree with this organization, then we can bring other folks into the conversation. |
ImportanceOfBeingErnest commentedMay 28, 2018
As said, this is not specific to |
jklymak commentedMay 28, 2018
I don't have any skin in this game - you'll have to work it out w/@tacaswell |
jklymak commentedJun 11, 2018
ping@tacaswell |
jklymak commentedAug 19, 2018
Marking as "Needs Review" to hopefully burble to the top of people's lists... |
dstansby commentedNov 3, 2018
Looks like doc build and travis aren't happy at the moment? |
3d85295 to431e074CompareImportanceOfBeingErnest commentedNov 3, 2018
I'm happy to fix all the little hickups that come and go with updated dependencies (pep8 to flake, sphinx 1.7 to 1.8 etc.) once the general fate of this PR is decided upon. |
jklymak commentedFeb 9, 2019
... is this really release critical? No, but I can imagine@ImportanceOfBeingErnest is a bit frustrated its sat around for months... |
efiring left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Overall, this seems like important functionality and a reasonable implementation. I would be reluctant to hold it up indefinitely based on the controversy over whether the method should be on something other than thePathCollection. I don't see any harm in putting it there; that doesn't preclude some future refactoring, so long as the user would still be able to call it as a method of whatever a futurescatter returns. I would like to see some consideration of my suggestion regarding the "num" kwarg (possibly with a new "locator" kwarg) to facilitate what I would expect to be the very common use case of wanting to specify the levels as a sequence.
| scatter = ax.scatter(x, y, c=c, s=s) | ||
| # produce a legend with the unique colors from the scatter | ||
| legend1 = ax.legend(*scatter.legend_elements(), loc=3, title="Classes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Minor nit-pick: I think that the numeric location codes are a terrible API and should not be used in examples. (I would favor removing them entirely from the legend documentation.)
lib/matplotlib/collections.py Outdated
| Can be *"colors"* or *"sizes"*. In case of *"colors"*, the legend | ||
| handles will show the different colors of the collection. In case | ||
| of "sizes", the legend will show the different sizes. | ||
| num : int, None, "auto" (default), or `~.ticker.Locator`, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Although the ability to use aLocator is good, and provides the ability to put in a list via aFixedLocator, wouldn't one of the most common--maybethe most common--use cases be an explicit list? For example, sizes represent populations of cities, so the legend shows sizes for[10000, 100000, 1000000, 10000000]. If this would be a common use case, then it might be worth supporting it directly by accepting a sequence. To make all this more likecontour with its 'levels', a separate kwarg could be used to specify aLocator for the rare cases when it is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I would still use the same argumentnum though, because a list of values is an alternative to those other options. Else one would need to go through checking what happens if both are provided etc.
So what is a canonical way of checking if something is a list, tuple or array of numbers? (isinstance(x, collections.abc.Iterable) only does half the job; is there something incbook that I overlooked?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
np.iterable() should suffice, I think.
lib/matplotlib/collections.py Outdated
| ------- | ||
| tuple (handles, labels) | ||
| with *handles* being a list of `.Line2D` objects | ||
| and *labels* a list of strings of the same length. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Very minor suggestion: 'andlabels being a matching list of strings.' (Otherwise, it could be misinterpreted as meaning the strings must all be the same length.)
lib/matplotlib/collections.py Outdated
| u = np.unique(self.get_sizes()) | ||
| color = kwargs.pop("color", "k") | ||
| else: | ||
| warnings.warn("Invalid prop provided, or collection without " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
What is the rationale for a warning rather than an Exception here? At least in the case of an invalid prop, shouldn't this raise an Exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I think my rationale here was that similar to how a legend is still created even if no labeled objects are found, here it should not break the code in case no handles/labels can be determined. So one would still get a plot out, just without propper legend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
OK, but then I think that the "invalid prop" case should raiseValueError and only the lack of an array should raise the warning.
Uh oh!
There was an error while loading.Please reload this page.
lib/matplotlib/collections.py Outdated
| loc = mpl.ticker.MaxNLocator(nbins=num, min_n_ticks=num-1, | ||
| steps=[1, 2, 2.5, 3, 5, 6, 8, 10]) | ||
| label_values = loc.tick_values(func(arr).min(), func(arr).max()) | ||
| cond = (label_values >= func(arr).min()) & \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Style preference: when possible, as it is here, use parentheses rather than trailing backslashes for line continuation.
lib/matplotlib/collections.py Outdated
| kw.update(kwargs) | ||
| for val, lab in zip(values, label_values): | ||
| if prop == "colors" and hasarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Again, if I am reading the code correctly, thehasarray check is not needed. (But it makes me wonder whether I am missing something, which is quite possible.)
af5bb74 to646e2c8Comparetacaswell commentedFeb 11, 2019
I am 👍 on this,@efiring is responsible for getting this merged. |
646e2c8 tod9bd109CompareImportanceOfBeingErnest commentedFeb 12, 2019
Thanks for the review@efiring. I think I have now incoorporated all the requested changes. I think it's good to have the API fixed for the case of the scatter (which is the most common usecase I assume). We can now still implement this for other collections. If you want I can already now put a |
efiring commentedFeb 12, 2019
Looks good, so I merged it. I don't think that putting the method into the Collection base would be good, because it probably makes sense to implement it only for some types of Collection. |
jklymak commentedFeb 12, 2019
Thanks@efiring and thanks@ImportanceOfBeingErnest for your patience with this! |
tacaswell commentedFeb 12, 2019
🎉 I am glad this landed! |




PR Summary
This PR proposes to include an easy, yet versatile option to create legends for scatterplots.
Motivation:
Scatter plots create a Collection of points, for which it is rather straight forward to create a colorbar. This is useful for continuous variables mapped to color. However, creating a legend with discrete entries requires to manually set up the necessary proxy artists. It appears that scatter plots are more and more used to show discrete of categorical variables. In such cases, a discrete legend is often more desireable than a (continuous, or even discrete) colorbar.
ggplot offers an easy method to create legends for scatter plots,

ggplot(...) + geom_point(aes(x=.., y=.., size=.., color=..)), resulting in something likeThis PR adds a method
legend_itemsto thePathCollection, which returns the handles and labels for a legend.Creating a scatter legend would then look like
PR Checklist