- Notifications
You must be signed in to change notification settings - Fork30
Code for the CVPR15 paper "Learning from Massive Noisy Labeled Data for Image Classification"
Cysu/noisy_label
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
The repository contains the code of our CVPR15 paperLearning from Massive Noisy Labeled Data for Image Classification (paper link).
Clone this repository
# Make sure to clone with --recursive to get the modified Caffegit clone --recursive https://github.com/Cysu/noisy_label.git
Build the Caffe
cd external/caffe# Now follow the Caffe installation instructions here:# http://caffe.berkeleyvision.org/installation.html# If you're experienced with Caffe and have all of the requirements installed# and your Makefile.config in place, then simply do:make -j8 && make pycd -
Setup an experiment directory. You can either create a new one under external/, or make a link to another existing directory.
mkdir -p external/exp
or
ln -s /path/to/your/exp/directory external/exp
Download the CIFAR-10 data (python version).
scripts/cifar10/download_cifar10.sh
Synthesize label noise and prepare LMDBs. Will corrupt the labels of 40k randomly selected training images, while leaving other 10k image labels unchanged.
scripts/cifar10/make_db.sh 0.3
The parameter 0.3 controls the level of label noise. Can be any number between [0, 1].
Run a series of experiments
# Train a CIFAR10-quick model using only the 10k clean labeled imagesscripts/cifar10/train_clean.sh# Baseline:# Treat 40k noisy labels as ground truth and finetune from the previous modelscripts/cifar10/train_noisy_gt_ft_clean.sh# Our methodscripts/cifar10/train_ntype.shscripts/cifar10/init_noisy_label_loss.shscripts/cifar10/train_noisy_label_loss.sh
We provide the training logs inlogs/cifar10/
for reference.
Clothing1M is the dataset we proposed in our paper.
Download the dataset. Please contacttong.xiao.work[at]gmail[dot]com to get the download link. Untar the images and unzip the annotations under
external/exp/datasets/clothing1M
. The directory structure should beexternal/exp/datasets/clothing1M/├── category_names_chn.txt├── category_names_eng.txt├── clean_label_kv.txt├── clean_test_key_list.txt├── clean_train_key_list.txt├── clean_val_key_list.txt├── images│ ├── 0│ ├── ⋮│ └── 9├── noisy_label_kv.txt├── noisy_train_key_list.txt├── README.md└── venn.png
Make the LMDBs and compute the matrix C to be used.
scripts/clothing1M/make_db.sh
Run experiments for our method
# Download the ImageNet pretrained CaffeNetwget -P external/exp/snapshots/ http://dl.caffe.berkeleyvision.org/bvlc_reference_caffenet.caffemodel# Train the clothing prediction CNN using only the clean labeled imagesscripts/clothing1M/train_clean.sh# Train the noise type prediction CNNscripts/clothing1M/train_ntype.sh# Train the whole net using noisy labeled datascripts/clothing1M/init_noisy_label_loss.shscripts/clothing1M/train_noisy_label_loss.sh
We provide the training logs inlogs/clothing1M/
for reference. A final trained model is also providedhere. To test the performance, please download the model, place it underexternal/exp/snapshots/clothing1M/
, and then
# Run the testexternal/caffe/build/tools/caffe test \ -model models/clothing1M/noisy_label_loss_test.prototxt \ -weights external/exp/snapshots/clothing1M/noisy_label_loss_inference.caffemodel \ -iterations 106 \ -gpu 0
The self-brewedexternal/caffe
supports data parallel with multiple GPUs using MPI. One can accelerate the training / test process by
- Compile the caffe with MPI enabled
- Tweak the training shell scripts to use multiple GPUs, for example,
mpirun -n 2 ... -gpu 0,1
Detailed instructions are listedhere.
@inproceedings{xiao2015learning, title={Learning from Massive Noisy Labeled Data for Image Classification}, author={Xiao, Tong and Xia, Tian and Yang, Yi and Huang, Chang and Wang, Xiaogang}, booktitle={CVPR}, year={2015}}