[ENH] Make SMOTENC version without internal One-Hot Encoding #990

New issue

Closed

#1000

Closed

[ENH] Make SMOTENC version without internal One-Hot Encoding#990

#1000

Description

ArseniuML

opened

on May 2, 2023

I have a large dataset with categorical features having huge number of levels. Usually, I use One-Hot Encoding with min_frequency parameter to encode this dataset.

But It seems that SMOTENC uses One-Hot Encoding under the hood (without min_frequency) - and it leads to memory overflow in my case.

I suggest to create a SMOTENC version, which does not use One-Hot Encoding internally , but relies to user on doing that. As before, SMOTENC_NO_ONE_HOT should accept categorical_features parameter, but these categorical_features actually are binary 0-1 features.

Moreover, this will be more efficient - no more need of doing internal encoding and internal reverse encoding.

Metadata

Assignees

No one assigned

Labels

No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ENH] Make SMOTENC version without internal One-Hot Encoding #990

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions