Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

ENH: Add grouped_bar() method#28560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
timhoffm wants to merge6 commits intomatplotlib:main
base:main
Choose a base branch
Loading
fromtimhoffm:grouped_bar

Conversation

timhoffm
Copy link
Member

@timhoffmtimhoffm commentedJul 13, 2024
edited
Loading

This is a WIP to implement#24313. It will be updated incrementally. As a first step, I've designed the data and label input API.

Please check the included example, which explains the API variants and motivations/consistency considerations. It's a WIP for the illustration and discussion of the API, and will not go into the final PR

I've flagged as "work in progress" because many aspects are not implemented yet. **But please give comments as I consider the implemented parts as complete proposal for that aspects.**

The API proposal is complete. Please check whether it is reasonable. You can get an overview thepreliminary overview example and in theAPI docs forgrouped_bar. Design decisions and background are listed in right below in the todos:

👉Ready for review

Todos:

  • API: How to structure and interpret the two data dimensions and the associated labels.

  • For now, only categorical labels are supported as x. Should we also support numbers as with bar?
    Numbers are now also supported.

  • How to parametrize bar width and group spacing?
    A bar group takes horizontal spacew (default: 1).

    I believe explicit bar widths are not helpful because they can easily overflow if enough bars are in a group, and "reasonable" values need fine-tuning for per number of bars in a group. I therefore suggest to define the spacing (margin / padding; name t.b.d.) in relative units. Options:

    • padding in units ofw: e.g.pad=0.1 would mean a bar group targeting the interval(x-0.5*w, x+0.5*w) would have the outer bar edges at(x-(0.5-pad*w), x+(0.5-pad*w) =(x-0.4w, x+0.4w)`.
    • padding as multiples of the bar width: e.g. apad=1 would mean that the bar width is chosen so that there's a padding of 1-bar width on both sides of the group.

    I believe, we can choose reasonable defaults for both cases. It's possibly easier to tune a desired look with the second approach, but harder if one wants to fully control positions.

    Decision: Implemented as multiples of bar width

  • Do we want to support spacing between the bars in a group?
    Yes. Spacing should follow the same relative conventions as group padding.
    Since we have num_bars times the spacing "spacing in units ofw will require tuning depending on the number of bars.
    Spacing would be universal when given as "fraction of bar width".
    When defaulting to 0 (my slight preference) both variants will work for any number of bars by default.

  • How to support horizontal bars?
    Useorientation parameter. Replicating to a separate functiongrouped_barh is overly cumbersome.

  • Do we want to support stacked bars? And grouped stacked bars? - Not for now/here.
    While stacked bars could use the same input datastructures, other plotting parameters like width and spacing settings do not match, and somebar() flexibility like varying width will not be available viagrouped_bar(). I thus recommend not to try and squeeze stacked bars in togrouped_bar(). Stacking bars is already much simpler with the existingbar() methods compared to grouping bars.bar() is thus more sufficient. If we want a higher level interface, a separatestacked_bar() would be a thin wrapper aroundbar(). I don't see a reasonable function for stacked and grouped bars, because the addtional dimension of data and labelling would make the interface very difficult. If you need stacked and grouped bars, you should use a loop ofgrouped_bar() with adjustedbottom.

  • support kwargs (Rectangle properties)

  • support vectorized style parameters - at least colors

  • optional: supportbottom values, so that one could stack (but could be added later)

  • dataset labels naming?label as inbar(), orlabels or dataset_labels or …
    Note:stackplot() useslabels. This would be an argument for consistency. What's different:
    We potentially have two labels: dataset labels and group labels. This may warrant deviating from the simplelabels
    apporach to disambiguate them. OTOH, I feel "dataset" is quite a generic name (but I don't have any better) so that
    it does not make things much clearer. The group labels are implicit in x (or historically in bar(), calledtick_labels.
    I'm going withlabels because its

    • consistent withstackplot() andbar() (which useslabel)
    • consistent with legend labels always being specified throughlabel
    • a longer name does not significantly help with disambiguation
  • add pyplot wrapper

  • add mypy stubs

  • improve documentation

  • add tests

  • Should we mark this as provisional? This would allow to react to user feedback. -> Yes.

  • What is the return type? List ofBarContainer, a newBarGroupContainer, something else, nothing for a start (because we can't decide yet)?
    Going withlist[BarContainer}. This is most simple and in analogy to whatstackplot() does. Since the API is provisional, we can still reconsider.

@story645
Copy link
Member

For now, only categorical labels are supported as x. Should we also support numbers as with bar?

Am thinking of a bunch of situations (date times, coded data, machine learning labels) where the numbers are really categoricals but the dtype isn't string and I think folks will expect it to just work. Also the categorical remapping mechanism isn't terribly robust 😓 so I could see folks wanting to just use numbers + formatter if they're trying to do something like animated/interactive sorting of bars (something like bar chart race) where the position of the bar is actually independent of its label as categorical will maintain the same cat->int mapping as stuff is added to the chart. Which tldr, yes numbers b/c that allows for a lot of fine tuning of bar position.

@timhoffm
Copy link
MemberAuthor

Added numeric values support for x. I've required that x ist equidistant, becuause otherwise positioning will be quite messy.

if they're trying to do something like animated/interactive sorting of bars (something like bar chart race) where the position of the bar is actually independent of its label as categorical will maintain the same cat->int mapping as stuff is added to the chart.

I've not thought this case though, but remember thatgrouped_bar() is just a convenience interface for positioning multiple sets of data. One boundary condition is that each group (=bars at one category/tick position) has the same datasets in the same order. If you need something exotic, you'll have to go back to drawing individual bars.

@story645
Copy link
Member

story645 commentedJul 15, 2024
edited
Loading

If you need something exotic, you'll have to go back to drawing individual bars

I don't think this is technically feasible, but this has me thinking it'd be nice if the return object was a 2D bar container that supported slicing out all the segments at a given X or all the segments of a group for easier consistent updates.

timhoffm reacted with eyes emoji

@timhoffm
Copy link
MemberAuthor

timhoffm commentedSep 11, 2024
edited
Loading

The API proposal is complete. Please check whether it is reasonable. You can get an overview thepreliminary overview example
and in theAPI docs forgrouped_bar. Design decisions and background are listed in thefirst comment of this PR. Feedback is welcome!

@tacaswell
Copy link
Member

A couple of quick comments:

  • thoughts on how this could be extended if (when) we implement hierarchical tick labels?
  • for the return did you consider a dict? It might be worth considering a minimal custom class (__getitem__ by the labels andobj.remove()) as well? The two things I think are important are the by-label access (particularly when user passed us a dict to begin with) and the ability to remove the whole lot in one shot. Do we want to return a new compound artist (that legit lives in the tree)?
  • is there a way to control styling by label (e.g. put a different hatch on each group)? I'm not sure this is useful, but also not sure it is not useful. "grab the returned object and do it your self" is also a good answer for now.
  • the first example in the docs should be turned into an figure comparison test to get at least some coverage?

@timhoffm
Copy link
MemberAuthor

timhoffm commentedSep 13, 2024
edited
Loading

thoughts on how this could be extended if (when) we implement hierarchical tick labels?

Not thought about it. But the fundamental APIgrouped_bar(x, heights) should be able to bear this. Likely with a hierarchialx and with or without substructuring inheights.

Side note: There is one aspect, I've been pondering: We could revese the positional argument order togroupde_bar(heights, x). We have kinda precendence for that instairs. This would allow to (i) leave out x entirely for a quick look (use range() as default) and (ii) more importantly make it possible to pass a complex objectgrouped_bar(dataframe), which contains all the labelling information; this could then also be hierarchial. I've refrained from it to keep more analogy tobar. After all, our API is mostly low-level component-based. I believe to change that, it would be better to add a data preprocessing decorator that takes complex objects and decomposes them into the data components.

for the return did you consider a dict? It might be worth considering a minimal custom class (__getitem__ by the labels andobj.remove()) as well? The two things I think are important are the by-label access (particularly when user passed us a dict to begin with) and the ability to remove the whole lot in one shot. Do we want to return a new compound artist (that legit lives in the tree)?

We have no precedence that plotting functions return dicts. Additionally, one can plot without passinglabels, which would us require to invent keys. So dict is not an option to me. Mid-term, I'd like to have better return objects, but his is a larger topic, which I don't want to figure out in this PR. I see the list of Container as a temporary approach - one of the main reasons to make this provisonal. It is certainly not the most elegant approach, but it's consistent (c.f.stackplot()), simple and gives users a grip on the underlying artists. Given that re-using the Artists is quite rare, that's good enough for now. Mid-term, I'd like to have better return objects. I will add an explicit note that this is likely to change.

is there a way to control styling by label (e.g. put a different hatch on each group)? I'm not sure this is useful, but also not sure it is not useful. "grab the returned object and do it your self" is also a good answer for now.

No. I've never seen this. Generally, groups are distinguished by location (and identified by x tick labels), and there's a point in that the bars of one dataset look the same over all groups to help make the connection. Per-group settings is not something I would want to encourage through API (though we could later introduce a parameter dictgroup_props if you convince me that's really a valid use case 😃). For now: Do it yourself.

the first example in the docs should be turned into an figure comparison test to get at least some coverage?

Yes, as the todo in the intial PR shows, tests are still open. I'm collecting feedback before so that I don't have to rewrite all tests in case I would have to overhaul the design.

@tacaswell
Copy link
Member

We could revese the positional argument order to groupde_bar(heights, x). We have kinda precendence for that in stairs.

I agree that is interesting, but as you point out could be solved by layers on top of this.

Additionally, one can plot without passing labels, which would us require to invent keys.

We could fallback torange() or[f"dataset_{i}" for in range()] but I agree that is less than great. That would probably mean we would also force those invented labels into the legend which is not great. I'm convinced we should not return adict (but a bit sad about it).

Rather than alist could we do a custom object like:

classGroupedBarReturn:def__init__(self,bars):self.bars=barsdefremove(self):        [b.remove()forbinself.bars]

? This would also give us a reasonable expansion path if this API grew to add errorbars or annotations to the bars. Also a path toobj.by_labels[key]

Even though it is provisional, I'm a bit worried about returning an object with an API that would be annoying to replicate (e.g. not fall into the trap ofax.errorbar).

This may be a bigger discussion, but I think we should try to make everything we return from plotting methods to have a.remove method so "make it go away" is always the same.

That said, I'm not going to push this further and returning a list is acceptable to merge.

No. I've never seen this....For now: Do it yourself.

sounds reasonable.

@timhoffm
Copy link
MemberAuthor

timhoffm commentedSep 13, 2024
edited
Loading

Edit: I think I've reasonbly addressed the doc build failure.

Technical topic: I don't understand the doc failure:

/home/circleci/project/doc/api/axes_api.rst:294:<autosummary>:1: WARNING: py:obj reference target not found: Axes.dataLim [ref.obj]/home/circleci/project/doc/api/axes_api.rst:449:<autosummary>:1: WARNING: py:obj reference target not found: AxesBase [ref.obj]

The call stacks goes through the.. autosummary:: sectionsaxes_api.rst ending with the stated line numbers. They try to autogenerate the doc pages forupdate_datalim andadd_child_axes but fail to resolve the following object references:

Extend the `~.Axes.dataLim` Bbox to include the given points.

and
Add an `.AxesBase` to the Axes' children; return the child Axes.

What's funny is that this PR does not touch the respective docstrings or targeted objects. Also, the only modification ofaxes_api.rst itself is the addition ofgrouped_bar (in another autosummary).

It seems that the failure is not related to the PR but some external influence (change of sphinx?). The failure may even be expected, because both references useAxesBase, which is not in our documentation, and the stable/devedocs render them as unresolved references. But why does it happen here and now? Other PRs do not see the problem. Do we have caching of the doc builds, and I'm the first to touchaxes_api.rst to trigger regeneration?

@story645
Copy link
Member

more importantly make it possible to pass a complex object grouped_bar(dataframe), which contains all the labelling information

Also would make it trivial to transpose, which honestly is one of the best parts of the pandas API.

Which also, I think my request for an BarContainerArray and Tom's for a dict based return are both pointing towards a data frame type artist container - which yes I can see being out of scope here and also I think you proposed something like this once for Axes so you could do both mosaic label based and index based

timhoffm added a commit to timhoffm/matplotlib that referenced this pull requestSep 16, 2024
These were not rendered as links in the current docs, because theassociated code objects do not exist as targets in the docs. For somereason, sphinx did not complain about it. But it does in some recentPRs (extracted frommatplotlib#28560) - I'm unclear why, but anyway the correctsolution is to change the references:- `AxesBase` is not documented itself and in the given context, we don't  have to make the specific distinction, so just link `Axes`- The datalim attribute is documented as part of the Axes class, but  does not have a dedicated target to link to. So we simply use a  literal instead of a link.
@timhoffm
Copy link
MemberAuthor

Changed to a provisional custom return object as proposed in#28560 (comment) to ease possible future migration.

timhoffm added a commit to timhoffm/matplotlib that referenced this pull requestSep 17, 2024
These were not rendered as links in the current docs, because theassociated code objects do not exist as targets in the docs. For somereason, sphinx did not complain about it. But it does in some recentPRs (extracted frommatplotlib#28560) - I'm unclear why, but anyway the correctsolution is to change the references:- `AxesBase` is not documented itself and in the given context, we don't  have to make the specific distinction, so just link `Axes`- The datalim attribute is documented as part of the Axes class, but  does not have a dedicated target to link to. So we simply use a  literal instead of a link.
@timhoffmtimhoffmforce-pushed thegrouped_bar branch 4 times, most recently fromdcd253e to3825074CompareJanuary 23, 2025 14:05
@timhoffmtimhoffm marked this pull request as ready for reviewJanuary 23, 2025 16:43
@timhoffm
Copy link
MemberAuthor

timhoffm commentedJan 23, 2025
edited
Loading

From my perspective, this is ready for review.

@timhoffmtimhoffmforce-pushed thegrouped_bar branch 2 times, most recently from7cf1e6b to5e82fbbCompareJanuary 23, 2025 23:55
@timhoffm
Copy link
MemberAuthor

@story645 A big thank you for a though round of review 🎉.

I've either changed the commented positions or replied why I think they should be kept as is. Please mark the comments as resolved on which you are ok with my reply so that only the open topics remain visible.

story645 reacted with thumbs up emoji

@timhoffmtimhoffmforce-pushed thegrouped_bar branch 2 times, most recently frome023e0e to162ce79CompareJanuary 24, 2025 11:18
Co-authored-by: hannah <story645@gmail.com>
Copy link
Member

@story645story645 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

take or leave the table tweak

Co-authored-by: hannah <story645@gmail.com>
@timhoffmtimhoffm added New feature and removed topic: pyplot API Documentation: examplesfiles in galleries/examples Documentation: APIfiles in lib/ and doc/api labelsFeb 11, 2025
Comment on lines +80 to +81
def remove(self):
[b.remove() for b in self.bars]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This appears to need testing; there is noself.bars attribute.


grouped_bar([dataset_1, dataset_2, dataset_3],
tick_labels=['A', 'B'],
labels=['dataset 1', 'dataset 2', 'dataset 3'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The embedded plot usesdataset 0 throughdataset 2.


Returns
-------
_GroupedBarReturn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Shouldn't be indented.

1.5 bar widths between bar groups.

bar_spacing : float, default: 0
The space between bars as multiples of bar width.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
Thespacebetweenbarsasmultiplesofbarwidth.
Thespacebetweenbarsasamultipleofbarwidth.

These will show up in the legend.

group_spacing : float, default: 1.5
The space between two bar groups as multiples of bar width.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
Thespacebetweentwobargroupsasmultiplesofbarwidth.
Thespacebetweentwobargroupsasamultipleofbarwidth.

ax.legend()


@check_figures_equal(extensions=["png"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ditto.

ax.legend()


@check_figures_equal(extensions=["png"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ditto.

Comment on lines +3298 to +3299
raise ValueError(
"'labels' cannot be used if 'heights' are a mapping")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
raiseValueError(
"'labels' cannot be used if 'heights' are a mapping")
raiseValueError("'labels' cannot be used if 'heights' is a mapping")

Comment on lines +3354 to +3355
for i, (hs, label, color) in enumerate(
zip(heights, labels, colors)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
fori, (hs,label,color)inenumerate(
zip(heights,labels,colors)):
fori, (hs,label,color)inenumerate(zip(heights,labels,colors)):

import pandas as pd


class _GroupedBarReturn:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Missing the property.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@QuLogicQuLogicQuLogic left review comments

@rcomerrcomerrcomer left review comments

@story645story645story645 approved these changes

Assignees
No one assigned
Projects
None yet
Milestone
v3.11.0
Development

Successfully merging this pull request may close these issues.

5 participants
@timhoffm@story645@tacaswell@rcomer@QuLogic

[8]ページ先頭

©2009-2025 Movatter.jp