Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

DOC improve documentation of NCR#1017

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
solegalli wants to merge8 commits intoscikit-learn-contrib:master
base:master
Choose a base branch
Loading
fromsolegalli:update_ncl_docs

Conversation

solegalli
Copy link
Contributor

Reword documentation and docstrings for the NCR.

Related to#854

@solegalli
Copy link
ContributorAuthor

@glemaitre ready for review

^^^^^^^^^^^^^^^^^^^^^^^^^^^

The :class:`NeighbourhoodCleaningRule` is another "cleaning" algorithm. It removes
samples from the majority class that are closest to the boundary with the minority
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
samples from the majority class that are closest to the boundarywiththe minority
samples from the majority class that aretheclosest to the boundaryformed bythesamples of theminority class

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't totally understand this sentence. Let me try a modification in a new commit.


The :class:`NeighbourhoodCleaningRule` expands on the cleaning performed by
:class:`EditedNearestNeighbours` by eliminating additional majority class samples if
they are among the 3 closest neighbours of a sample from the minority class.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We have a parameter controlling the3-NN.

Suggested change
they are among the3 closest neighbours of a sample from the minority class.
they are among the:math:`N` closest neighbours (i.e. using the parameter `n_neighbours`) of a sample from the minority class.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Throughout the docs we are using K as the number of neighbours, not N. I guess the n in n_neighbours comes from n=number. I'd rather stick to K if that's alright with you, for consitency. I'll fix this in a separate commit.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actually, I removed this sentence altogether as per below suggestion.

The procedure for the :class:`NeighbourhoodCleaningRule` is as follows:

1. Remove observations from the majority class with edited nearest neighbors (ENN).
2. Remove additional samples from the majority class if they are one of the k closest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Since we repeating the same sentence as above, I would remove the paragraph above and only go with the bullet point sequence.

solegalli reacted with thumbs up emoji
To carry out step 2 there is one condition: a sample will only be removed if its class
has a minimum number of observations. The minimum number of observations is regulated
by the `threshold_cleaning` parameter. In the original article
:cite:`laurikkala2001improving`, samples would be removed if the class had at
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I would not go in details regarding the original paper but instead just phrase that we check that the number of samples in the class to under-sample is above the threshold times the number of samples in the minority class.

solegalli reacted with thumbs up emoji
@glemaitreglemaitre changed the titlere-word explanation and docstrings of NCRDOC improve documentation of NCRJul 10, 2023
solegalliand others added3 commitsJuly 11, 2023 13:37
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@solegalli
Copy link
ContributorAuthor

How can I check the linting error message?

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@solegalli
Copy link
ContributorAuthor

Thank you!

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@glemaitreglemaitreglemaitre left review comments

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

2 participants
@solegalli@glemaitre

[8]ページ先頭

©2009-2025 Movatter.jp