- Notifications
You must be signed in to change notification settings - Fork1.3k
Closed
Description
I have a large dataset with categorical features having huge number of levels. Usually, I use One-Hot Encoding with min_frequency parameter to encode this dataset.
But It seems that SMOTENC uses One-Hot Encoding under the hood (without min_frequency) - and it leads to memory overflow in my case.
I suggest to create a SMOTENC version, which does not use One-Hot Encoding internally , but relies to user on doing that. As before, SMOTENC_NO_ONE_HOT should accept categorical_features parameter, but these categorical_features actually are binary 0-1 features.
Moreover, this will be more efficient - no more need of doing internal encoding and internal reverse encoding.
Metadata
Metadata
Assignees
Labels
No labels