Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Automated cleanup of ImageNet 1k and ImageNetV2 datasets

License

NotificationsYou must be signed in to change notification settings

kecsap/imagenet-clean

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains Bash scripts to clean up the ImageNet 1k dataset and pretrained Pytorch models in different configurations.

The Bash scripts can be downloaded fromhttps://www.dropbox.com/s/pyzem2svhnx5h6m/imagenet_clean_scripts.tar.gz?dl=0.

Pytorch pretrained models can be downloaded fromhttps://www.dropbox.com/s/lzm60bz90wfl6ys/imagenet_clean_models.tar.gz?dl=0.

Requirements

Clean up ImageNet 1k (Validation set)

Download and extract the scripts in a directory. Copy the imagenet_val_*.sh scripts into the validation set subdirectory of the dataset (val/) and execute the scripts in the following order:

  1. Fix image labels based on confident learning:
./imagenet_val_1_image_fixes.sh
  1. Remove the wrong-problematic images based on model consensus and confident learning:
./imagenet_val_2_image_removal.sh
  1. Apply categorical fixes:
./imagenet_val_3_categorical_fixes.sh

Clean up ImageNet 1k (Training set)

Download and extract the scripts in a directory. Copy the imagenet_train_*.sh scripts into the training set subdirectory of the dataset (train/) and execute the scripts in the following order:

  1. Fix image labels based on confident learning:
./imagenet_train_1_image_fixes.sh
  1. Remove the wrong-problematic images based on model consensus and confident learning:
./imagenet_train_2_image_removal.sh
  1. Apply categorical fixes:
./imagenet_train_3_categorical_fixes.sh

Optional steps:

  • Removing the wrong images only found by confident learning (a subset of point 2): imagenet_train_2_image_removal1.sh
  • Removing the wrong images only found by model consensus (a subset of point 2): imagenet_train_2_image_removal3.sh
  • Applying the fixes and removal before category fixes for CAE-EDSR images (https://github.com/hendrycks/imagenet-r/tree/master/DeepAugment) before category fixes: imagenet_train_cae_edsr_1_image_fixes.sh and imagenet_train_cae_edsr_2_image_removal.sh

Note: The CAE and EDSR scripts expect that CAE/EDSR images must be renamed to a new name schema (e.g. n01440764_10042.JPEG -> n01440764_10042_CAE.JPEG)

Clean up ImageNetV2 Matched Frequency (Validation set)

Download and extract the scripts in a directory. Copy the imagenetv2_*.sh scripts into the ImageNetV2 subdirectory and execute the scripts in the following order:

  1. Fix image labels based on confident learning:
./imagenetv2_matched_frequency_format_1_image_fixes.sh
  1. Remove the wrong-problematic images based on model consensus and confident learning:
./imagenetv2_matched_frequency_format_2_image_removal.sh
  1. Apply categorical fixes:
./imagenetv2_matched_frequency_format_3_categorical_fixes.sh

Optional steps:

  • Removing the wrong images only found by confident learning (a subset of point 2): imagenetv2_matched_frequency_format_2_image_removal1.sh
  • Removing the wrong images only found by model consensus (a subset of point 2): imagenetv2_matched_frequency_format_2_image_removal3.sh
  • Renaming the alphabethical folder names to nxxxxxxx format: imagenetv2_folder_name_fixes.sh

Pretrained Pytorch models

The pretrained models have the following name schema:

model_name-widthxheight-variant.pth.tar

  • model_name - efficientnet_b0, shufflenet_v2_x1_5 or squeezenet1_1
  • variant - baseline (trained on original ImageNet), clean (trained on ImageNet Clean), clean-imagenet-r (trained on ImageNet Clean with CAE/EDSR images)

Install Pytorch Image Models:

pip3 install timm

Pretrained Pytorch models (example validations)

Validate an EfficientNet-B0 model (trained on ImageNet Clean, portrait input 216x384) on cleaned ImageNetV2 dataset (top-1/top-5 - 69.26 %/89.29 %):

./validate.py --model efficientnet_b0 --checkpoint efficientnet_b0-384x216-clean.pth.tar -b 64 --log-interval 100 --input-size 3 216 384 --num-classes 1000 IMAGENETV2_DIRECTORY

Validate a SqueezeNet 1.1 model (trained on ImageNet Clean+CAE/EDSR, landscape input 320x180) on ImageNet validation dataset (top-1/top-5 - 60.89 %/83.15 %):

./validate.py --torchvision-model squeezenet1_1 --checkpoint squeezenet1_1-180x320-clean-imagenetr.pth.tar -b 64 --log-interval 100 --input-size 3 320 180 --num-classes 1000 IMAGENET_VALIDATION_DIRECTORY

Validate a ShuffleNetV2 (x1_5) model (trained on original ImageNet, standard input 224x224) on cleaned ImageNet validation dataset (top-1/top-5 - 77.93 %/94.57 %):

./validate.py --hub-model-github-or-dir kecsap/vision --hub-model shufflenet_v2_x1_5 --checkpoint shufflenet_v2_x1_5-224x224-baseline.pth.tar -b 64 --log-interval 100 --num-classes 1000 IMAGENET_VALIDATION_DIRECTORY

Citation

If this helps your research, please cite the paper (https://arxiv.org/abs/2103.16324):

@misc{kertész2021automated,      title={Automated Cleanup of the ImageNet Dataset by Model Consensus, Explainability and Confident Learning},       author={Csaba Kertész},      year={2021},      eprint={2103.16324},      archivePrefix={arXiv},      primaryClass={cs.CV}}

About

Automated cleanup of ImageNet 1k and ImageNetV2 datasets

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp