scikit-learn-contrib/imbalanced-learnPublic

NotificationsYou must be signed in to change notification settings
Fork1.3k
Star7k

FIX BorddelineSMOTE-2 use the full dataset to generate new sample#1023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

glemaitre merged 5 commits intoscikit-learn-contrib:masterfromglemaitre:is/846

Jul 10, 2023

Merged

FIX BorddelineSMOTE-2 use the full dataset to generate new sample#1023

glemaitre merged 5 commits intoscikit-learn-contrib:masterfromglemaitre:is/846

Jul 10, 2023

Conversation

Copy link

Member

glemaitre commentedJul 10, 2023•
edited
Loading

closes#861

Make sure that we use the full dataset to generate new samples inBorderlineSMOTE version 2.

FIX make sure that BorddelineSMOTE-2 use the full dataset to generate…

af5d20a

… synthetic sample

glemaitre marked this pull request as draft

July 10, 2023 19:24

glemaitre added2 commits

July 10, 2023 22:33

iter

6bd1fba

iter

bcf16cc

glemaitre marked this pull request as ready for review

July 10, 2023 20:39

glemaitre added2 commits

July 10, 2023 22:50

iter

c09eb5a

iter

638b0f4

glemaitre merged commit2859cb0 intoscikit-learn-contrib:master

Jul 10, 2023

solegalli reviewed

Jul 11, 2023

View reviewed changes

imblearn/over_sampling/_smote/filter.py


		self.nn_k_.fit(X_to_sample_from)
		nns = self.nn_k_.kneighbors(X_danger, return_distance=False)[:, 1:]
		X_new, y_new = self._make_samples(

Copy link

Contributor

solegalliJul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This implementation does not fully reflect the description of Borderline smote 2 in the paper. The paper says that to create the samples by interpolation between the template of the minority and a neigbhour of the majority, it multiplies by a factor between 0 and 0.5 (instead of 0-1) to ensure the synthetic data is closer to the minority.

If I understand this code correctly, we are multiplying everything by a factor between 0 and 1. Pls correct me if I am wrong.

Copy link

MemberAuthor

glemaitreJul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Nop, indeed. I forgot to look at the next page of the article. I will try to propose a fix.

Labels

None yet

2 participants

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FIX BorddelineSMOTE-2 use the full dataset to generate new sample#1023

FIX BorddelineSMOTE-2 use the full dataset to generate new sample#1023

Uh oh!

Conversation

glemaitre commentedJul 10, 2023•
edited
Loading

Uh oh!

Uh oh!

solegalliJul 11, 2023

Choose a reason for hiding this comment

Uh oh!

glemaitreJul 11, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Movatterモバイル変換

FIX BorddelineSMOTE-2 use the full dataset to generate new sample#1023

FIX BorddelineSMOTE-2 use the full dataset to generate new sample#1023

Uh oh!

Conversation

glemaitre commentedJul 10, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

solegalliJul 11, 2023

Choose a reason for hiding this comment

Uh oh!

glemaitreJul 11, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glemaitre commentedJul 10, 2023•
edited
Loading