Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork26.1k
DOC Update plots in Categorical Feature Support in GBDT example#31062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
github-actionsbot commentedMar 24, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
ogrisel commentedApr 30, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
As discussed during the bi-weekly meeting, it would be great to make it explicit that the best models are in the bottom left corner, maybe using a matplotlib arrow annotation in the bottom left corner with the text "best models" pointing towards the point at coordinate Also, could you please add more ticks on the x axis, e.g.: 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4. Also, please add the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM, thanks!
lucyleeow left a comment• edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Sorry this took me so long to get to!
This is a much nicer graph, thank you!
I couldn't comment on the line but maybe we could add the name of the category handling strategy in the bullet list at the top, i.e.:
- "Dropped": dropping the categorical features- "One Hot": using a :class:`~preprocessing.OneHotEncoder`- "Ordinal": using an :class:`~preprocessing.OrdinalEncoder` and treat categories as ordered, equidistant quantities- "Native": using an :class:`~preprocessing.OrdinalEncoder` and rely on the :ref:`native category support <categorical_support_gbdt>` of the :class:`~ensemble.HistGradientBoostingRegressor` estimator.
Do you think it's worth mentioning somewhere on the graph that the error bars are 1 standard deviation or is it obvious what the error bars are?
I saw thecomment on addingTargetEncoder
but maybe that could be left to another PR...?
It's nice that we make it clear to the user where the best models are (with the arrow and 'best models' text), but it does confuse me at first:
- I expect it to be pointing to something, e.g., a scatter point. We could add a circle that it points to, but this is not necessarily clear either
- The graph axes does not start at 0,0 so it's possibly mis-leading?
I can't think how to improve this. Maybe it is clear enough if people read the label titles, that smaller error/faster is better? Maybe we could add to the x and y axes labels, e.g., an arrow; "<- faster fitting" and "↓ better model/lower error"). Or just explaining it in the text may be enough?
Edit: Maybe we could just have the text 'Best models' in the bottom left corner and no arrow ?
Uh oh!
There was an error while loading.Please reload this page.
class CustomLogFormatter(ticker.Formatter): | ||
def __call__(self, x, pos=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Maybe a docstring here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Also I'm surprised it's so hard to format superscript in mpl?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Maybe scientific notation is good enough? e.g.,"%1.1e"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
You are right, that simplifies code a lot.
def plot_performance_tradeoff(results, title): | ||
fig, ax = plt.subplots() | ||
markers = ["s", "o", "^", "x"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I think that risks leading the reader to think that the marker is an actual scatter point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
That's a good point.
Stupid question, for 'Ordinal', why is the black error bar (?) marker not centered between the horizontal and vertical error bars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
coeff_str = f"{coeff:.1f}x" | ||
# Format exponent using Unicode superscripts | ||
superscripts = str.maketrans("-0123456789", "⁻⁰¹²³⁴⁵⁶⁷⁸⁹") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Would math notation work? e.g.,"$10^{%d}$"
Yes. I prefer reducing the scope of this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'll merge this for now as it has two approvals and I am looking for something to merge to update the banner on scikit-learn.org |
588f396
intoscikit-learn:mainUh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Reference Issues/PRs
What does this implement/fix? Explain your changes.
Inspired bythis example's way of showing the test_score/fit_time trade-off, I find a scatter plot easier to read and interpret than bar plots.
This PR also introduces a log scale for fit times.
Current plot:
This PR (most recent render):
Any other comments?
I took the opportunity to show the intermediate html diagrams and a introduce a wording tweak.