Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

DOC Update plots in Categorical Feature Support in GBDT example#31062

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
betatim merged 13 commits intoscikit-learn:mainfromArturoAmorQ:change_plots
Jul 17, 2025

Conversation

ArturoAmorQ
Copy link
Member

@ArturoAmorQArturoAmorQ commentedMar 24, 2025
edited
Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Inspired bythis example's way of showing the test_score/fit_time trade-off, I find a scatter plot easier to read and interpret than bar plots.

This PR also introduces a log scale for fit times.

Current plot:

sphx_glr_plot_gradient_boosting_categorical_main

This PR (most recent render):

sphx_glr_plot_gradient_boosting_categorical_001

Any other comments?

I took the opportunity to show the intermediate html diagrams and a introduce a wording tweak.

@github-actionsGitHub Actions
Copy link

github-actionsbot commentedMar 24, 2025
edited
Loading

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit:96b7c7c. Link to the linter CI:here

@ogrisel
Copy link
Member

ogrisel commentedApr 30, 2025
edited
Loading

As discussed during the bi-weekly meeting, it would be great to make it explicit that the best models are in the bottom left corner, maybe using a matplotlib arrow annotation in the bottom left corner with the text "best models" pointing towards the point at coordinate(0, 0) (or(0.1, 0.1) because of the log scale on the x axis).

Also, could you please add more ticks on the x axis, e.g.: 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4.

Also, please add theTargetEncoder to this plot.

@ArturoAmorQ
Copy link
MemberAuthor

Current result:

image

Also, please add theTargetEncoder to this plot.

As that also requires adding narrative on the interpretation, I would rather leave that to a follow-up PR.

Copy link
Member

@ogriselogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM, thanks!

Copy link
Member

@lucyleeowlucyleeow left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Sorry this took me so long to get to!
This is a much nicer graph, thank you!

I couldn't comment on the line but maybe we could add the name of the category handling strategy in the bullet list at the top, i.e.:

- "Dropped": dropping the categorical features- "One Hot": using a :class:`~preprocessing.OneHotEncoder`- "Ordinal": using an :class:`~preprocessing.OrdinalEncoder` and treat categories as  ordered, equidistant quantities- "Native": using an :class:`~preprocessing.OrdinalEncoder` and rely on the :ref:`native  category support <categorical_support_gbdt>` of the  :class:`~ensemble.HistGradientBoostingRegressor` estimator.

Do you think it's worth mentioning somewhere on the graph that the error bars are 1 standard deviation or is it obvious what the error bars are?

I saw thecomment on addingTargetEncoder but maybe that could be left to another PR...?

It's nice that we make it clear to the user where the best models are (with the arrow and 'best models' text), but it does confuse me at first:

  • I expect it to be pointing to something, e.g., a scatter point. We could add a circle that it points to, but this is not necessarily clear either
  • The graph axes does not start at 0,0 so it's possibly mis-leading?

I can't think how to improve this. Maybe it is clear enough if people read the label titles, that smaller error/faster is better? Maybe we could add to the x and y axes labels, e.g., an arrow; "<- faster fitting" and "↓ better model/lower error"). Or just explaining it in the text may be enough?

Edit: Maybe we could just have the text 'Best models' in the bottom left corner and no arrow ?



class CustomLogFormatter(ticker.Formatter):
def __call__(self, x, pos=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybe a docstring here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Also I'm surprised it's so hard to format superscript in mpl?!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybe scientific notation is good enough? e.g.,"%1.1e" ?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You are right, that simplifies code a lot.


def plot_performance_tradeoff(results, title):
fig, ax = plt.subplots()
markers = ["s", "o", "^", "x"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Nitpick, the last graph, it is a bit tricky to quickly tell the native and ordinal markers/error bars
Would it be 'easy' to change the colour of the the marker, to be the same colour as the scatter points?

image

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think that risks leading the reader to think that the marker is an actual scatter point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

That's a good point.

Stupid question, for 'Ordinal', why is the black error bar (?) marker not centered between the horizontal and vertical error bars?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Nevermind, I see that it is in the new rendered doc, for some reason the horizontal error bar was not rendering properly in the first one above

image

coeff_str = f"{coeff:.1f}x"

# Format exponent using Unicode superscripts
superscripts = str.maketrans("-0123456789", "⁻⁰¹²³⁴⁵⁶⁷⁸⁹")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Would math notation work? e.g.,"$10^{%d}$"

@ArturoAmorQ
Copy link
MemberAuthor

I saw the#31062 (comment) on adding TargetEncoder but maybe that could be left to another PR...?

Yes. I prefer reducing the scope of this PR.

Copy link
Member

@lucyleeowlucyleeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM thanks!

Sorry I meant the words 'faster fitting'/'lower error' may be added to the x and y axis labels.
They do also make sense here, but maybe they are too long inside the graph and 'Best models' was better?

image

@betatim
Copy link
Member

I'll merge this for now as it has two approvals and I am looking for something to merge to update the banner on scikit-learn.org

@betatimbetatim merged commit588f396 intoscikit-learn:mainJul 17, 2025
36 checks passed
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@ogriselogriselogrisel approved these changes

@lucyleeowlucyleeowlucyleeow approved these changes

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

4 participants
@ArturoAmorQ@ogrisel@betatim@lucyleeow

[8]ページ先頭

©2009-2025 Movatter.jp