sail-sg/D-TRAKPublic

NotificationsYou must be signed in to change notification settings
Fork3
Star31

Intriguing Properties of Data Attribution on Diffusion Models (ICLR 2024)

License

MIT license

31 stars 3 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
ArtBench2		ArtBench2
ArtBench5		ArtBench5
CIFAR10		CIFAR10
CIFAR2		CIFAR2
CelebA		CelebA
assets		assets
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
quickstart.ipynb		quickstart.ipynb
requirements.txt		requirements.txt

Repository files navigation

Intriguing Properties of Data Attribution on Diffusion Models

[Project Page] | [arXiv] | [Data Repository]

TL, DR:

We report counter-intuitive observations that theoretically unjustified design choices for attributing diffusion models empirically outperform previous baselines by a large margin.

Proponents and opponents visualization on ArtBench-2 using TRAK and D-TRAK withvarious # of timesteps (10 or 100). For each sample of interest, 5 most positive influential trainingsamples and 3 most negative influential training samples are given together with the influence scores(below each sample).

Counterfactual visualization on CIFAR-2	Counterfactual visualization on ArtBench-2

How to run

Quickstart

Checkquickstart.ipynb to conduct data attribution on pre-trained diffusion models loaded from huggingface directly!

Replicating the paper's results

Setup

To get started, follow these steps:

Clone the GitHub Repository: Begin by cloning the repository using the command:
```
git clone https://github.com/sail-sg/D-TRAK.git
```
Set Up Python Environment: Ensure you have a version 3.8.name:
```
conda create -n dtrak python=3.8 -yconda activate dtrak
```
Install Dependencies: Install the necessary dependencies by running:
```
pip install -r requirements.txt
```

Commands for LDS evaluation

We provide the commands to run experiments on CIFAR-2.It is easy to transfer to other datasets.

Data pre-processing:
```
cd CIFAR2
```
Run00_EDA.ipynb to create dataset splits and subsets of the training set.

Train a diffusion model and generate images:

bash scripts/run_train.sh 0 18888 5000-0.5bash scripts/run_gen.sh 0 0 5000-0.5

Construct the LDS benchmark:

Train 64 models corresponding to 64 subsets of the training set

bash scripts/run_lds_val_sub.sh 0 18888 5000-0.5 0 63

Evaluate the model outputs on the validation set

bash scripts/run_eval_lds_val_sub.sh 0 0 5000-0.5 idx_val.pkl 0 63bash scripts/run_eval_lds_val_sub.sh 0 1 5000-0.5 idx_val.pkl 0 63bash scripts/run_eval_lds_val_sub.sh 0 2 5000-0.5 idx_val.pkl 0 63

Evaluate the model outputs on the generation set

bash scripts/run_eval_lds_val_sub.sh 0 0 5000-0.5 idx_gen.pkl 0 63bash scripts/run_eval_lds_val_sub.sh 0 1 5000-0.5 idx_gen.pkl 0 63bash scripts/run_eval_lds_val_sub.sh 0 2 5000-0.5 idx_gen.pkl 0 63

Compute gradients:

We shard the training set into 5 parts, each has 1000 examples.

Use the following commands to compute the gradients to be used for TRAK.

bash scripts/run_grad.sh 0 0 5000-0.5 idx-train.pkl 0 ddpm/checkpoint-8000 loss uniform 10 32768bash scripts/run_grad.sh 0 0 5000-0.5 idx-train.pkl 1 ddpm/checkpoint-8000 loss uniform 10 32768bash scripts/run_grad.sh 0 0 5000-0.5 idx-train.pkl 2 ddpm/checkpoint-8000 loss uniform 10 32768bash scripts/run_grad.sh 0 0 5000-0.5 idx-train.pkl 3 ddpm/checkpoint-8000 loss uniform 10 32768bash scripts/run_grad.sh 0 0 5000-0.5 idx-train.pkl 4 ddpm/checkpoint-8000 loss uniform 10 32768bash scripts/run_grad.sh 0 0 5000-0.5 idx-val.pkl 0 ddpm/checkpoint-8000 loss uniform 10 32768bash scripts/run_grad.sh 0 0 5000-0.5 idx-gen.pkl 0 ddpm/checkpoint-8000 loss uniform 10 32768

Use the following commands to compute the gradients to be used for D-TRAK.

bash scripts/run_grad.sh 0 0 5000-0.5 idx-train.pkl 0 ddpm/checkpoint-8000 mean-squared-l2-norm uniform 10 32768bash scripts/run_grad.sh 0 0 5000-0.5 idx-train.pkl 1 ddpm/checkpoint-8000 mean-squared-l2-norm uniform 10 32768bash scripts/run_grad.sh 0 0 5000-0.5 idx-train.pkl 2 ddpm/checkpoint-8000 mean-squared-l2-norm uniform 10 32768bash scripts/run_grad.sh 0 0 5000-0.5 idx-train.pkl 3 ddpm/checkpoint-8000 mean-squared-l2-norm uniform 10 32768bash scripts/run_grad.sh 0 0 5000-0.5 idx-train.pkl 4 ddpm/checkpoint-8000 mean-squared-l2-norm uniform 10 32768bash scripts/run_grad.sh 0 0 5000-0.5 idx-val.pkl 0 ddpm/checkpoint-8000 mean-squared-l2-norm uniform 10 32768bash scripts/run_grad.sh 0 0 5000-0.5 idx-gen.pkl 0 ddpm/checkpoint-8000 mean-squared-l2-norm uniform 10 32768

Compute the TRAK/D-TRAK attributions and evaluate the LDS scores
Run notebooks inmethods/04_if.
The implementations of other baselines can also be found inmethods.

Commands for counterfactual evaluation

Data pre-processing
Run thisnotebook first to get the indices of those training examples to be removed.
Retrain models after removing the top-influenctial training examples
```
bash scripts/run_counter.sh 0 18888 5000-0.5 0 59
```
Generate images using the retrained models
Run02_counter.ipynb
Measure l2 distance
Run03_counter_eval_l2.ipynb
Measure CLIP cosine similarity
Run03_counter_eval_clip.ipynb

Bibtex

If you find this project useful in your research, please consider citing our paper:

@inproceedings{zheng2023intriguing,title={Intriguing Properties of Data Attribution on Diffusion Models},  author={Zheng, Xiaosen and Pang, Tianyu and Du, Chao and Jiang, Jing and Lin, Min},booktitle={International Conference on Learning Representations (ICLR)},year={2024},}

About

Intriguing Properties of Data Attribution on Diffusion Models (ICLR 2024)

sail-sg.github.io/D-TRAK/

Releases

No releases published

Packages

No packages published

Languages

Jupyter Notebook99.0%
Other1.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Intriguing Properties of Data Attribution on Diffusion Models

TL, DR:

How to run

Quickstart

Replicating the paper's results

Setup

Commands for LDS evaluation

Commands for counterfactual evaluation

Bibtex

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

sail-sg/D-TRAK

Folders and files

Latest commit

History

Repository files navigation

Intriguing Properties of Data Attribution on Diffusion Models

TL, DR:

How to run

Quickstart

Replicating the paper's results

Setup

Commands for LDS evaluation

Commands for counterfactual evaluation

Bibtex

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages