Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Fix: Found array with 0 sample(s)#743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
allenyllee wants to merge1 commit intoscikit-learn-contrib:master
base:master
Choose a base branch
Loading
fromallenyllee:patch-1

Conversation

allenyllee
Copy link

Symptom:
When using SVMSMOTE on dataset which contains a minority class which has very few samples (may be < 10), it'll raise errorValueError: Found array with 0 sample(s) (shape=(0, 600)) while a minimum of 1 is required.

Reference Issue

#742

What does this implement/fix? Explain your changes.

Root cause:
The linenoise_bool = self._in_danger_noise(...) will find noise data according tokneighbors estimator'sn_neighbors attribute, this value is equal tom_neighbors attribute ofSVMSMOTE class. If we set a very large number tom_neighbors to initializeSVMSMOTE, for example:SVMSMOTE(m_neighbors=1000), this error will be gone. This is because the range of neighbor searches is large enough to contain another minority class data point, therefore the center data point will not be treated as noise according to this linen_maj == nn_estimator.n_neighbors - 1. But whenm_neighbors is small (default is 10), and the minority class has very few sample, it may treat whole minority class data as noise data, cause returnednoise_bool with all true, then in _safe_indexing(...) will remove all these data, resulted in zero number of support_vector data.

Solution:
Savesupport vector before trimming noise data point. When after trimmed noise data, check whether the length of support vector is zero, if true, then restore previous savedsupport vector, this enforce every minority data point used assupport_vector.

Any other comments?

Symptom: When using SVMSMOTE on dataset which contains a minority class which has very few samples (may be < 10), it'll raise error `ValueError: Found array with 0 sample(s) (shape=(0, 600)) while a minimum of 1 is required.`Root cause:The line `noise_bool = self._in_danger_noise(...)` will find noise data according to `kneighbors` estimator's `n_neighbors` attribute, this value is equal to  `m_neighbors` attribute of `SVMSMOTE` class. If we set a very large number to `m_neighbors` to initialize `SVMSMOTE`, for example: `SVMSMOTE(m_neighbors=1000)`, this error will be gone. This is because the range of neighbor searches is large enough to contain another minority class data point, therefore the center data point will not be treated as noise according to this line `n_maj == nn_estimator.n_neighbors - 1`. But when `m_neighbors` is small (default is 10), and the minority class has very few sample, it may treat whole  minority class data as noise data, cause returned `noise_bool` with all true, then in _safe_indexing(...) will remove all these data, resulted in zero number of support_vector data.Solution: Save `support vector` before trimming noise data point. When after trimmed noise data, check whether the length of support vector is zero, if true, then restore previous saved `support vector`, this enforce every minority data point used as `support_vector`.
@pep8speaks
Copy link

Hello@allenyllee! Thanks for opening this PR. We checked the lines you've touched forPEP 8 issues, and found:

Line 557:1:W293 blank line contains whitespace
Line 559:1:W293 blank line contains whitespace
Line 566:1:W293 blank line contains whitespace
Line 569:1:W293 blank line contains whitespace
Line 612:17:W503 line break before binary operator
Line 857:89:E501 line too long (91 > 88 characters)

@glemaitre
Copy link
Member

You will need to correct the PEP8 issue. I think that we should raise a warning as well because we are not strictly performing the algorithm which is expected (but we are in a corner case).

@glemaitre
Copy link
Member

We will need a non-regression test (that you posted in the issue) and an entry in what's new as well since it would impact the end-user

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers
No reviews
Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@allenyllee@pep8speaks@glemaitre

[8]ページ先頭

©2009-2025 Movatter.jp