Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

BUG: Fix _most_frequent to safely handle incomparable types (e.g. Non…#31737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
AHB30 wants to merge6 commits intoscikit-learn:main
base:main
Choose a base branch
Loading
fromAHB30:main

Conversation

AHB30
Copy link

@AHB30AHB30 commentedJul 10, 2025
edited by jeremiedbb
Loading

Fixes#31717

Hi maintainers

This PR addresses aTypeError in_most_frequent when arrays contain mixed, incomparable types (e.g.,None,str,int).

The patch ensures safe, deterministic tie-breaking usingtype +hash, in line withscipy.stats.mode.

Unit tests are included to cover:

  • Tie betweenNone andstr
  • DominantNone
  • Empty array handling
  • Tie-breaking againstextra_value

Thanks for your time and review!

…e vs str)This patch improves `_most_frequent` to gracefully handle arrays with mixed, incomparable types (e.g. `None`, `str`, `int`), which previously raised `TypeError` due to Python's `min()` not supporting heterogeneous comparisons.Fix:- Uses `type` + `hash` as a tie-breaking key to avoid direct comparison between incompatible types.- Keeps behaviour consistent with `scipy.stats.mode` tie-breaking logic.- Maintains deterministic and efficient output.Adds unit tests for edge cases:- Tie between `None` and `str`- Dominant `None`- Empty array handling- Tie involving extra_valueCloses:scikit-learn#31717
@github-actionsGitHub Actions
Copy link

github-actionsbot commentedJul 10, 2025
edited
Loading

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enablingpre-commit hooks. Instructions to enable them can be foundhere.

You can see the details of the linting issues under thelint jobhere


ruff check

ruff detected issues. Please runruff check --fix --output-format=full locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installedruff version isruff=0.11.7.

sklearn/impute/_base.py:18:27: F401 [*] `..utils.fixes._mode` imported but unused   |16 | from ..utils._missing import is_pandas_na, is_scalar_nan17 | from ..utils._param_validation import MissingValues, StrOptions18 | from ..utils.fixes import _mode   |                           ^^^^^ F40119 | from ..utils.sparsefuncs import _get_median20 | from ..utils.validation import (   |   = help: Remove unused import: `..utils.fixes._mode`Found 1 error.[*] 1 fixable with the `--fix` option.

Generated for commit:574b23d. Link to the linter CI:here

This update improves the _most_frequent function to gracefully handle arrays containing mixed, incomparable types (e.g., None, str, int). The patch uses a deterministic tie-breaking strategy based on type and hash to avoid TypeErrors, consistent with scipy.stats.mode behaviour.
…lintingThis update improves the _most_frequent function by:Ensuring safe, deterministic tie-breaking for mixed, incomparable types (e.g., None, str, int) using (type, hash) as a key.Fixing docstring formatting and style issues to meet project linting requirements.Aligning logic with scipy.stats.mode tie-breaking behavior.Improving code readability and PEP-8 compliance.
Ensure safe comparison in _most_frequent by using (str(type(x)), hash(x)) as a tie-breaking key.- Handles mixed types like None, str, int.- Prevents TypeError from Python's min() on incomparable values.- Aligns with scipy.stats.mode behaviour.- Ready for review and linted locally.
…akingThis patch improves the _most_frequent function to safely process arrays with mixed, incomparable types (e.g., None, str, int) without raising TypeError.
Copy link
Contributor

@0xs1d0xs1d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

lgtm with vectorization and type check. check the ci linting issue, though

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@0xs1d0xs1d0xs1d left review comments

At least 1 approving review is required to merge this pull request.

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

SimpleImputer fails in "most_frequent" if incomparable types only if ties
2 participants
@AHB30@0xs1d

[8]ページ先頭

©2009-2025 Movatter.jp