- Notifications
You must be signed in to change notification settings - Fork2
deeplearning-wisc/npos
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This codebase provides a Pytorch implementation for the paper NPOS:Non-parametric Outlier Synthesis at ICLR 2023.
Out-of-distribution (OOD) detection is indispensable for safely deploying machine learning models in the wild. One of the key challenges is that models lack supervision signals from unknown data, and as a result, can produce overconfident predictions on OOD data. Recent work on outlier synthesis modeled the feature space as parametric Gaussian distribution, a strong and restrictive assumption that might not hold in reality. In this paper, we propose a novel framework, non-parametric outlier synthesis (NPOS), which generates artificial OOD training data and facilitates learning a reliable decision boundary between ID and OOD data. Importantly, our proposed synthesis approach does not make any distributional assumption on the ID embeddings, thereby offering strong flexibility and generality. We show that our synthesis approach can be mathematically interpreted as a rejection sampling framework. Extensive experiments show that NPOS can achieve superior OOD detection performance, outperforming the competitive rivals by a significant margin.
Checkout our
- ICLR'22 workVOS on parametric outlier synthesis for object detection and classification networks.
- CVPR'22 workSTUD on unknown synthesis for object detection in video datasets.
- NeurIPS'23 workDREAM-OOD on outlier generation in the pixel space (by diffusion models) if you are interested!
Our experiments are conducted on Ubuntu Linux 20.04 with Python 3.8 and Pytorch 1.11. Besides, the following packages are required to be installed:
Remarks: This is the initial version of our codebase, and while the scripts are functional, there is much room for improvement in terms of streamlining the pipelines for better efficiency. Additionally, there are unused parts that we plan to remove in an upcoming cleanup soon. Stay tuned for more updates.
This part refers to thiscodebase
We consider the following (in-distribution) datasets: CIFAR and ImageNet.
Please downloadImageNet-1k and place the training data and validation data in./datasets/imagenet/train
and./datasets/imagenet/val
, respectively.
Please downloadCIFAR and place the training data and validation data in./datasets/CIFAR/CIFAR100
and./datasets/CIFAR/CIFAR100
, respectively.
For small-scale ID (e.g. CIFAR-10), we use SVHN, Textures (dtd), Places365, LSUN-C (LSUN), LSUN-R (LSUN_resize), and iSUN.
OOD datasets can be downloaded via the following links (source:ATOM):
- SVHN: download it and place it in the folder of
datasets/small_OOD_dataset/svhn
. Then runpython utils/select_svhn_data.py
to generate test subset. - Textures: download it and place it in the folder of
datasets/small_OOD_dataset/dtd
. - Places365: download it and place it in the folder of
datasets/ood_datasets/places365/test_subset
. We randomly sample 10,000 images from the original test dataset. - LSUN-C: download it and place it in the folder of
datasets/small_OOD_dataset/LSUN
. - LSUN-R: download it and place it in the folder of
datasets/small_OOD_dataset/LSUN_resize
. - iSUN: download it and place it in the folder of
datasets/small_OOD_dataset/iSUN
.
For example, run the following commands in theroot directory to downloadLSUN-C:
cd datasets/small_OOD_datasetwget https://www.dropbox.com/s/fhtsw1m3qxlwj6h/LSUN.tar.gztar -xvzf LSUN.tar.gz
The directory structure looks like:
datasets/---CIFAR10/---CIFAR100/---small_OOD_dataset/------dtd/------iSUN/------LSUN/------LSUN_resize/------places365/------SVHN/
For large-scale ID (e.g. ImageNet-100), we use the curated 4 OOD datasets fromiNaturalist,SUN,Places, andTextures, and de-duplicated concepts overlapped with ImageNet-1k. The datasets are created byHuang et al., 2021 .
The subsampled iNaturalist, SUN, and Places can be downloaded via the following links:
wget http://pages.cs.wisc.edu/~huangrui/imagenet_ood_dataset/iNaturalist.tar.gzwget http://pages.cs.wisc.edu/~huangrui/imagenet_ood_dataset/SUN.tar.gzwget http://pages.cs.wisc.edu/~huangrui/imagenet_ood_dataset/Places.tar.gz
The directory structure looks like this:
datasets/---ImageNet100/---ImageNet_OOD_dataset/------dtd/------iNaturalist/------Places/------SUN/
Firstly, enter the CLIP-based method folder by running
cd CLIP_based/OOD
To reduce the training time, we only fine-tuned a limited number of layers in the pre-trained model. Therefore, we pre-extract features for training, instead of using images to repeatedly extract features with fixed parameters every iteration.
We provide the following script to get the features of CLIP (ViT-B/16):
sh feature_extraction_imagenet_100.shsh feature_extraction_imagenet_1k.sh
We also provide pretrained features forImageNet-100.
Since the file of the ImageNet-1k pre-extracted feature is too large, we cannot provide it for the time being. Please extract it yourself based on the dataset of ImageNet-1k.
We provide the file formulation forImageNet-1k for reference.
We provide sample scripts to train from scratch. Feel free to modify the hyperparameters and training configurations.
sh train_npos_imagenet_100.shsh train_npos_imagenet_1k.sh
We use theMCM score as the OOD score to evaluate the fine-tuning CLIP model.
We provide scripts for checkpoint evaluations:
sh test_npos_imagenet_1k.shsh test_npos_imagenet_100.sh
Our checkpoints can be downloaded here forImageNet-100 andImageNet-1k. The performance of these checkpoints is consistent with the results in our paper.
Firstly, enter the training from scratch method folder by running:
cd training_from_sctrach
We provide sample scripts to train from scratch. Feel free to modify the hyperparameters and training configurations.
sh train_npos_cifar10.shsh train_npos_cifar100.shsh train_npos_imagenet_100.sh
Since methods based on contrastive learning cannot obtain explicit logit output. We use KNN distance as the OOD score to evaluate the model trained from scratch.
We provide scripts for checkpoint evaluations:
sh test_npos_cifar10.shsh test_npos_cifar100.shsh test_npos_imagenet_100.sh
Our checkpoints can be downloaded here forImageNet-100,CIFAR-10, andCIFAR-100. The performance of these checkpoints is consistent with the results in our paper.
If you find this work useful in your own research, please cite the paper as:
@inproceedings{tao2023nonparametric,title={Non-parametric Outlier Synthesis},author={Leitian Tao and Xuefeng Du and Jerry Zhu and Yixuan Li},booktitle={The Eleventh International Conference on Learning Representations },year={2023},url={https://openreview.net/forum?id=JHklpEZqduQ}}