Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork26k
BUG: Fix _most_frequent to safely handle incomparable types (e.g. Non…#31737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Conversation
…e vs str)This patch improves `_most_frequent` to gracefully handle arrays with mixed, incomparable types (e.g. `None`, `str`, `int`), which previously raised `TypeError` due to Python's `min()` not supporting heterogeneous comparisons.Fix:- Uses `type` + `hash` as a tie-breaking key to avoid direct comparison between incompatible types.- Keeps behaviour consistent with `scipy.stats.mode` tie-breaking logic.- Maintains deterministic and efficient output.Adds unit tests for edge cases:- Tie between `None` and `str`- Dominant `None`- Empty array handling- Tie involving extra_valueCloses:scikit-learn#31717
github-actionsbot commentedJul 10, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
❌ Linting issuesThis PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling You can see the details of the linting issues under the
|
This update improves the _most_frequent function to gracefully handle arrays containing mixed, incomparable types (e.g., None, str, int). The patch uses a deterministic tie-breaking strategy based on type and hash to avoid TypeErrors, consistent with scipy.stats.mode behaviour.
…lintingThis update improves the _most_frequent function by:Ensuring safe, deterministic tie-breaking for mixed, incomparable types (e.g., None, str, int) using (type, hash) as a key.Fixing docstring formatting and style issues to meet project linting requirements.Aligning logic with scipy.stats.mode tie-breaking behavior.Improving code readability and PEP-8 compliance.
Ensure safe comparison in _most_frequent by using (str(type(x)), hash(x)) as a tie-breaking key.- Handles mixed types like None, str, int.- Prevents TypeError from Python's min() on incomparable values.- Aligns with scipy.stats.mode behaviour.- Ready for review and linted locally.
…akingThis patch improves the _most_frequent function to safely process arrays with mixed, incomparable types (e.g., None, str, int) without raising TypeError.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
lgtm with vectorization and type check. check the ci linting issue, though
Uh oh!
There was an error while loading.Please reload this page.
Fixes#31717
Hi maintainers
This PR addresses a
TypeError
in_most_frequent
when arrays contain mixed, incomparable types (e.g.,None
,str
,int
).The patch ensures safe, deterministic tie-breaking using
type
+hash
, in line withscipy.stats.mode
.Unit tests are included to cover:
None
andstr
None
extra_value
Thanks for your time and review!