Movatterモバイル変換

zer0int/CLIP-SAE-finetunePublic

NotificationsYou must be signed in to change notification settings
Fork3
Star13

Sparse Autoencoders (SAE) vs CLIP fine-tuning fun.

13 stars 3 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
attack		attack
gmpclip		gmpclip
longgmp		longgmp
longmodel		longmodel
LongCLIP-a1-finetune.py		LongCLIP-a1-finetune.py
LongCLIP-a2-convert-back-to-weight.py		LongCLIP-a2-convert-back-to-weight.py
README.md		README.md
a1-finetune.py		a1-finetune.py
a2-convert-back-to-weight.py		a2-convert-back-to-weight.py
a3-convert-to-state-dict.py		a3-convert-to-state-dict.py
a4-eval-imagenet-objectnet.py		a4-eval-imagenet-objectnet.py
a4-eval-zeroshot-attack.py		a4-eval-zeroshot-attack.py
bwcat_cat.png		bwcat_cat.png
bwcat_dog.png		bwcat_dog.png
bwcat_notext.png		bwcat_notext.png
coco-SPRIGHT-train-0_9.json		coco-SPRIGHT-train-0_9.json
coco-SPRIGHT-val-10_11.json		coco-SPRIGHT-val-10_11.json
long_triple-coco-SPRIGHT-train-0_9.json		long_triple-coco-SPRIGHT-train-0_9.json
long_triple-coco-SPRIGHT-val-10_11.json		long_triple-coco-SPRIGHT-val-10_11.json

Repository files navigation

CLIP finetune: SAE-informed adversarial training 💥🤖💫

⚠️ This is EXPERIMENTAL code / a repo for messing with CLIP + Sparse Autoencoders (SAE)
For 'good, known-working' code (and more scripts + info), please seezer0int/CLIP-fine-tune!

Changes 19/DEC/2024:

New (best) SAE-informed Long-CLIP model with 90% ImageNet/ObjectNet accuracy.
Code is here, model is at my HF 🤗:https://huggingface.co/zer0int/LongCLIP-SAE-ViT-L-14

🔨

Contains the code used to fine-tune my modelHF: zer0int/CLIP-SAE-ViT-L-14 🤗
See the "attack" folder to obtain datasets required / used in 'a1-finetune.py'
Gradients will be very large throughout training. Comment out 'monitor_gradient_norms' as needed
Use a2 to convert GmP model back to .weight after fine-tune -> normal CLIP model (use in any 'import clip' downstream tasks)
Use a4 to quickly zero-shot test the 3 typographic attack test images provided

🔎

Theattack dataset was curated via SAE
Selected for typographic attack salience (i.e. CLIP's 'text obsession' -> misclassifies image, as text is highly salient to model)
Fine-tune: Geometric Parametrization (GmP) + scaling of 'text salient' neurons top stimulating images (via SAE)
For details about GmP, see my other repo:zer0int/CLIP-fine-tune

🔬

Info:Toy Models of Superposition | Perturbing a single feature
Reasoning: Brute-force snap those geometric bonds, hoping to force CLIP model to find better (less text obsessed) solution 😅
...Until I learn / find out what I am actually doing here (with regard to Sparse Autoencoders), at least. =)
Sparse Autoencoder inspiration:
Anthropic.AI research"Golden Gate Claude" +SAE details
OpenAI: Top-K activation function (replace ReLU in Sparse Autoencoders),arxiv

💡❓

My SAE: Encoder-Decoder, tied weights + Top-K (puzzled together from the above!)
Is this a good autoencoder for CLIP? I don't know. 🤔
Small hidden dimension + low Top-K => very sparse -> will learn concepts from CLIP that [with SAE-reconstructed embeds] retrieve images of very narrow concepts, e.g. ONLY stop signs.
Huge hidden dimension (e.g. 8192) -> not so sparse, accuracy drops, more (seemingly) random encoded concepts (judging via image retrieval)
Intermediate -> Learns complex, surprising, but meaningful concepts that are 'totally an AI-thing to encode'
Alas: SAE empirically shown to be 'working', but is it good? What is BEST? 🤔
Should I be using projection? Going 'back up' in the model with pinv? Hook into residual stream? I don't (yet) know! 🤷
I will publish the code for the SAE once I am more confident in that I know what I am actually doing (and cleaned up the mess of a code 😂).

🤪For now, here's a fun concept of "things on the back of other things" in CLIP ViT-L/14 that the SAE learned:

Example of the effect of images the SAE had chosen as salient typographic attacks for CLIP.

And zero-shot results via script (4):

About

Sparse Autoencoders (SAE) vs CLIP fine-tuning fun.

Releases

No releases published

Packages

No packages published

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Folders and files

Latest commit

History

Repository files navigation

CLIP finetune: SAE-informed adversarial training 💥🤖💫

Changes 19/DEC/2024:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

zer0int/CLIP-SAE-finetune

Folders and files

Latest commit

History

Repository files navigation

CLIP finetune: SAE-informed adversarial training 💥🤖💫

Changes 19/DEC/2024:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages