NotificationsYou must be signed in to change notification settings
Fork2
Star53

source code for ICLR'23 paper "Non-parametric Outlier Synthesis"

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
CLIP_based		CLIP_based
training_from_scratch		training_from_scratch
README.md		README.md
npos.png		npos.png

Repository files navigation

Non-parametric Outlier Synthesis (To be updated)

This codebase provides a Pytorch implementation for the paper NPOS:Non-parametric Outlier Synthesis at ICLR 2023.

Abstract

Out-of-distribution (OOD) detection is indispensable for safely deploying machine learning models in the wild. One of the key challenges is that models lack supervision signals from unknown data, and as a result, can produce overconfident predictions on OOD data. Recent work on outlier synthesis modeled the feature space as parametric Gaussian distribution, a strong and restrictive assumption that might not hold in reality. In this paper, we propose a novel framework, non-parametric outlier synthesis (NPOS), which generates artificial OOD training data and facilitates learning a reliable decision boundary between ID and OOD data. Importantly, our proposed synthesis approach does not make any distributional assumption on the ID embeddings, thereby offering strong flexibility and generality. We show that our synthesis approach can be mathematically interpreted as a rejection sampling framework. Extensive experiments show that NPOS can achieve superior OOD detection performance, outperforming the competitive rivals by a significant margin.

Checkout our

ICLR'22 workVOS on parametric outlier synthesis for object detection and classification networks.
CVPR'22 workSTUD on unknown synthesis for object detection in video datasets.
NeurIPS'23 workDREAM-OOD on outlier generation in the pixel space (by diffusion models) if you are interested!

Setup

Required Packages

Our experiments are conducted on Ubuntu Linux 20.04 with Python 3.8 and Pytorch 1.11. Besides, the following packages are required to be installed:

Quick Start

Remarks: This is the initial version of our codebase, and while the scripts are functional, there is much room for improvement in terms of streamlining the pipelines for better efficiency. Additionally, there are unused parts that we plan to remove in an upcoming cleanup soon. Stay tuned for more updates.

Data Preparation

This part refers to thiscodebase

In-distribution dataset

We consider the following (in-distribution) datasets: CIFAR and ImageNet.

Please downloadImageNet-1k and place the training data and validation data in./datasets/imagenet/train and./datasets/imagenet/val, respectively.

Please downloadCIFAR and place the training data and validation data in./datasets/CIFAR/CIFAR100 and./datasets/CIFAR/CIFAR100, respectively.

Out-of-distribution dataset

Small-scale OOD datasets (for CIFAR)

For small-scale ID (e.g. CIFAR-10), we use SVHN, Textures (dtd), Places365, LSUN-C (LSUN), LSUN-R (LSUN_resize), and iSUN.

OOD datasets can be downloaded via the following links (source:ATOM):

SVHN: download it and place it in the folder ofdatasets/small_OOD_dataset/svhn. Then runpython utils/select_svhn_data.py to generate test subset.
Textures: download it and place it in the folder ofdatasets/small_OOD_dataset/dtd.
Places365: download it and place it in the folder ofdatasets/ood_datasets/places365/test_subset. We randomly sample 10,000 images from the original test dataset.
LSUN-C: download it and place it in the folder ofdatasets/small_OOD_dataset/LSUN.
LSUN-R: download it and place it in the folder ofdatasets/small_OOD_dataset/LSUN_resize.
iSUN: download it and place it in the folder ofdatasets/small_OOD_dataset/iSUN.

For example, run the following commands in theroot directory to downloadLSUN-C:

cd datasets/small_OOD_datasetwget https://www.dropbox.com/s/fhtsw1m3qxlwj6h/LSUN.tar.gztar -xvzf LSUN.tar.gz

The directory structure looks like:

datasets/---CIFAR10/---CIFAR100/---small_OOD_dataset/------dtd/------iSUN/------LSUN/------LSUN_resize/------places365/------SVHN/

Large-scale OOD datasets (for ImageNet)

For large-scale ID (e.g. ImageNet-100), we use the curated 4 OOD datasets fromiNaturalist,SUN,Places, andTextures, and de-duplicated concepts overlapped with ImageNet-1k. The datasets are created byHuang et al., 2021 .

The subsampled iNaturalist, SUN, and Places can be downloaded via the following links:

wget http://pages.cs.wisc.edu/~huangrui/imagenet_ood_dataset/iNaturalist.tar.gzwget http://pages.cs.wisc.edu/~huangrui/imagenet_ood_dataset/SUN.tar.gzwget http://pages.cs.wisc.edu/~huangrui/imagenet_ood_dataset/Places.tar.gz

The directory structure looks like this:

datasets/---ImageNet100/---ImageNet_OOD_dataset/------dtd/------iNaturalist/------Places/------SUN/

Training and Evaluation

CLIP-based model

Firstly, enter the CLIP-based method folder by running

cd CLIP_based/OOD

Feature extraction

To reduce the training time, we only fine-tuned a limited number of layers in the pre-trained model. Therefore, we pre-extract features for training, instead of using images to repeatedly extract features with fixed parameters every iteration.

We provide the following script to get the features of CLIP (ViT-B/16):

sh feature_extraction_imagenet_100.shsh feature_extraction_imagenet_1k.sh

We also provide pretrained features forImageNet-100.

Since the file of the ImageNet-1k pre-extracted feature is too large, we cannot provide it for the time being. Please extract it yourself based on the dataset of ImageNet-1k.

We provide the file formulation forImageNet-1k for reference.

Training

We provide sample scripts to train from scratch. Feel free to modify the hyperparameters and training configurations.

sh train_npos_imagenet_100.shsh train_npos_imagenet_1k.sh

Evaluation

We use theMCM score as the OOD score to evaluate the fine-tuning CLIP model.

We provide scripts for checkpoint evaluations:

sh test_npos_imagenet_1k.shsh test_npos_imagenet_100.sh

Our checkpoints can be downloaded here forImageNet-100 andImageNet-1k. The performance of these checkpoints is consistent with the results in our paper.

Training from scratch

Firstly, enter the training from scratch method folder by running:

cd training_from_sctrach

Training

We provide sample scripts to train from scratch. Feel free to modify the hyperparameters and training configurations.

sh train_npos_cifar10.shsh train_npos_cifar100.shsh train_npos_imagenet_100.sh

Evaluation

Since methods based on contrastive learning cannot obtain explicit logit output. We use KNN distance as the OOD score to evaluate the model trained from scratch.

We provide scripts for checkpoint evaluations:

sh test_npos_cifar10.shsh test_npos_cifar100.shsh test_npos_imagenet_100.sh

Our checkpoints can be downloaded here forImageNet-100,CIFAR-10, andCIFAR-100. The performance of these checkpoints is consistent with the results in our paper.

Citation

If you find this work useful in your own research, please cite the paper as:

@inproceedings{tao2023nonparametric,title={Non-parametric Outlier Synthesis},author={Leitian Tao and Xuefeng Du and Jerry Zhu and Yixuan Li},booktitle={The Eleventh International Conference on Learning Representations },year={2023},url={https://openreview.net/forum?id=JHklpEZqduQ}}

About

source code for ICLR'23 paper "Non-parametric Outlier Synthesis"

Movatterモバイル変換

deeplearning-wisc/npos

Folders and files

Latest commit

History

Repository files navigation

Non-parametric Outlier Synthesis (To be updated)

Abstract

Ads

Setup

Required Packages

Quick Start

Data Preparation

In-distribution dataset

Out-of-distribution dataset

Small-scale OOD datasets (for CIFAR)

Large-scale OOD datasets (for ImageNet)

Training and Evaluation

CLIP-based model

Feature extraction

Training

Evaluation

Training from scratch

Training

Evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Contributors2

Uh oh!

Languages

Packages