Download and extract the scripts in a directory. Copy the imagenet_val_*.sh scripts into the validation set subdirectory of the dataset (val/) and execute the scripts in the following order:

Fix image labels based on confident learning:

./imagenet_val_1_image_fixes.sh

Remove the wrong-problematic images based on model consensus and confident learning:

./imagenet_val_2_image_removal.sh

Apply categorical fixes:

./imagenet_val_3_categorical_fixes.sh

Clean up ImageNet 1k (Training set)

Download and extract the scripts in a directory. Copy the imagenet_train_*.sh scripts into the training set subdirectory of the dataset (train/) and execute the scripts in the following order:

Fix image labels based on confident learning:

./imagenet_train_1_image_fixes.sh

Remove the wrong-problematic images based on model consensus and confident learning:

./imagenet_train_2_image_removal.sh

Apply categorical fixes:

./imagenet_train_3_categorical_fixes.sh

Optional steps:

Removing the wrong images only found by confident learning (a subset of point 2): imagenet_train_2_image_removal1.sh
Removing the wrong images only found by model consensus (a subset of point 2): imagenet_train_2_image_removal3.sh
Applying the fixes and removal before category fixes for CAE-EDSR images (https://github.com/hendrycks/imagenet-r/tree/master/DeepAugment) before category fixes: imagenet_train_cae_edsr_1_image_fixes.sh and imagenet_train_cae_edsr_2_image_removal.sh

Note: The CAE and EDSR scripts expect that CAE/EDSR images must be renamed to a new name schema (e.g. n01440764_10042.JPEG -> n01440764_10042_CAE.JPEG)

Clean up ImageNetV2 Matched Frequency (Validation set)

Download and extract the scripts in a directory. Copy the imagenetv2_*.sh scripts into the ImageNetV2 subdirectory and execute the scripts in the following order:

Fix image labels based on confident learning:

./imagenetv2_matched_frequency_format_1_image_fixes.sh

Remove the wrong-problematic images based on model consensus and confident learning:

./imagenetv2_matched_frequency_format_2_image_removal.sh

Apply categorical fixes:

./imagenetv2_matched_frequency_format_3_categorical_fixes.sh

Optional steps:

Removing the wrong images only found by confident learning (a subset of point 2): imagenetv2_matched_frequency_format_2_image_removal1.sh
Removing the wrong images only found by model consensus (a subset of point 2): imagenetv2_matched_frequency_format_2_image_removal3.sh
Renaming the alphabethical folder names to nxxxxxxx format: imagenetv2_folder_name_fixes.sh

Pretrained Pytorch models

The pretrained models have the following name schema:

model_name-widthxheight-variant.pth.tar

model_name - efficientnet_b0, shufflenet_v2_x1_5 or squeezenet1_1
variant - baseline (trained on original ImageNet), clean (trained on ImageNet Clean), clean-imagenet-r (trained on ImageNet Clean with CAE/EDSR images)

Install Pytorch Image Models:

pip3 install timm

Pretrained Pytorch models (example validations)

Validate an EfficientNet-B0 model (trained on ImageNet Clean, portrait input 216x384) on cleaned ImageNetV2 dataset (top-1/top-5 - 69.26 %/89.29 %):

./validate.py --model efficientnet_b0 --checkpoint efficientnet_b0-384x216-clean.pth.tar -b 64 --log-interval 100 --input-size 3 216 384 --num-classes 1000 IMAGENETV2_DIRECTORY

Validate a SqueezeNet 1.1 model (trained on ImageNet Clean+CAE/EDSR, landscape input 320x180) on ImageNet validation dataset (top-1/top-5 - 60.89 %/83.15 %):

./validate.py --torchvision-model squeezenet1_1 --checkpoint squeezenet1_1-180x320-clean-imagenetr.pth.tar -b 64 --log-interval 100 --input-size 3 320 180 --num-classes 1000 IMAGENET_VALIDATION_DIRECTORY

Validate a ShuffleNetV2 (x1_5) model (trained on original ImageNet, standard input 224x224) on cleaned ImageNet validation dataset (top-1/top-5 - 77.93 %/94.57 %):

./validate.py --hub-model-github-or-dir kecsap/vision --hub-model shufflenet_v2_x1_5 --checkpoint shufflenet_v2_x1_5-224x224-baseline.pth.tar -b 64 --log-interval 100 --num-classes 1000 IMAGENET_VALIDATION_DIRECTORY

Citation

If this helps your research, please cite the paper (https://arxiv.org/abs/2103.16324):

@misc{kertész2021automated,      title={Automated Cleanup of the ImageNet Dataset by Model Consensus, Explainability and Confident Learning},       author={Csaba Kertész},      year={2021},      eprint={2103.16324},      archivePrefix={arXiv},      primaryClass={cs.CV}}

About

Automated cleanup of ImageNet 1k and ImageNetV2 datasets

Releases

No releases published

Packages

No packages published

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

ImageNet Clean

Requirements

Clean up ImageNet 1k (Validation set)

Clean up ImageNet 1k (Training set)

Clean up ImageNetV2 Matched Frequency (Validation set)

Pretrained Pytorch models

Pretrained Pytorch models (example validations)

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

kecsap/imagenet-clean

Folders and files

Latest commit

History

Repository files navigation

ImageNet Clean

Requirements

Clean up ImageNet 1k (Validation set)

Clean up ImageNet 1k (Training set)

Clean up ImageNetV2 Matched Frequency (Validation set)

Pretrained Pytorch models

Pretrained Pytorch models (example validations)

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages