Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Sparse Autoencoders (SAE) vs CLIP fine-tuning fun.

NotificationsYou must be signed in to change notification settings

zer0int/CLIP-SAE-finetune

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

  • ⚠️ This is EXPERIMENTAL code / a repo for messing with CLIP + Sparse Autoencoders (SAE)
  • For 'good, known-working' code (and more scripts + info), please seezer0int/CLIP-fine-tune!

Changes 19/DEC/2024:


🔨

  • Contains the code used to fine-tune my modelHF: zer0int/CLIP-SAE-ViT-L-14 🤗
  • See the "attack" folder to obtain datasets required / used in 'a1-finetune.py'
  • Gradients will be very large throughout training. Comment out 'monitor_gradient_norms' as needed
  • Use a2 to convert GmP model back to .weight after fine-tune -> normal CLIP model (use in any 'import clip' downstream tasks)
  • Use a4 to quickly zero-shot test the 3 typographic attack test images provided

🔎

  • Theattack dataset was curated via SAE
  • Selected for typographic attack salience (i.e. CLIP's 'text obsession' -> misclassifies image, as text is highly salient to model)
  • Fine-tune: Geometric Parametrization (GmP) + scaling of 'text salient' neurons top stimulating images (via SAE)
  • For details about GmP, see my other repo:zer0int/CLIP-fine-tune

🔬


💡❓

  • My SAE: Encoder-Decoder, tied weights + Top-K (puzzled together from the above!)
  • Is this a good autoencoder for CLIP? I don't know. 🤔
  • Small hidden dimension + low Top-K => very sparse -> will learn concepts from CLIP that [with SAE-reconstructed embeds] retrieve images of very narrow concepts, e.g. ONLY stop signs.
  • Huge hidden dimension (e.g. 8192) -> not so sparse, accuracy drops, more (seemingly) random encoded concepts (judging via image retrieval)
  • Intermediate -> Learns complex, surprising, but meaningful concepts that are 'totally an AI-thing to encode'
  • Alas: SAE empirically shown to be 'working', but is it good? What is BEST? 🤔
  • Should I be using projection? Going 'back up' in the model with pinv? Hook into residual stream? I don't (yet) know! 🤷
  • I will publish the code for the SAE once I am more confident in that I know what I am actually doing (and cleaned up the mess of a code 😂).

🤪For now, here's a fun concept of "things on the back of other things" in CLIP ViT-L/14 that the SAE learned:

6

Example of the effect of images the SAE had chosen as salient typographic attacks for CLIP.

8

And zero-shot results via script (4):

results-zeroshot

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp