Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork1.8k
integrate SAM (segment anything) encoder with Unet#757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Rusteam commentedMay 5, 2023
hi@qubvel is there any update on this? |
Rusteam commentedMay 14, 2023
Rusteam commentedMay 15, 2023
make sure you install this package from my fork So far I have been able to train the model, but I can't say it's learning. I'm still struggling there. Also I cannot fit more than 1 sample per batch on a 32gb gpu with a 512 input size. |
ccl-private commentedMay 16, 2023
@Rusteam how about this:https://github.com/tianrun-chen/SAM-Adapter-PyTorch |
Rusteam commentedMay 16, 2023
thanks for sharing, I'll try it if my current approach does not work. I've able to get some learning withthis transformers notebook |
qubvel commentedMay 17, 2023
Hi@Rusteam, thanks a lot for your contribution and sorry for the delay, I am going to review the request and will let you know |
Rusteam commentedMay 17, 2023
Hey hey hey. While this solution worked I can't say the model was able to learn on my data. We might need to use the version before my ddp adjustments or make the model handle points and boxes as inputs, or use Sam image encoder with unet or other architectures. |
| from typing import Optional, Union, List, Tuple | ||
| import torch | ||
| from segment_anything.modeling import MaskDecoder, TwoWayTransformer, PromptEncoder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Is it a pip package? probably need to add to reqs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
just added it to reqs, or should we make it optional?
qubvel commentedMay 17, 2023
Yes, I was actually thinking about just pre-trained encoder integration, did you test it? |
Rusteam commentedMay 18, 2023
@qubvel It didn't work with Unet yet, but I can make it work. Which models would be essential to integrate? |
Rusteam commentedMay 18, 2023 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
that was my intention as well, but I was unable to make it learn without passing box/point prompts. However, when passing a prompt along with input image, it does learn. We might need to integrate multiple inputs to |
siddpiku commentedJul 5, 2023
The following worked for me:
|
Rusteam commentedJul 13, 2023
@qubvel hey any updates? |
Rusab commentedSep 6, 2023
Please add this, this library hasn't have new features for a long time |
This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days. |
csaroff commentedNov 17, 2023
Is this PR ready? |
Rusteam commentedNov 18, 2023
It's ready. |
17SIM commentedNov 21, 2023
The current PR seems to work with image with the size of 1024x1024 only. |
Rusteam commentedNov 21, 2023
Yes, as the original Sam model |
This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days. |
Stinosko commentedJan 28, 2024
Any progress on this? |
Rusab commentedJan 29, 2024
Why is the library dying? no new updates in a long time |
This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days. |
Rusteam commentedMar 30, 2024
@qubvel can you merge this? It did work |
isaaccorley commentedApr 9, 2024
adamjstewart commentedJan 18, 2025
giswqs commentedJan 18, 2025
A relevant PR:huggingface/transformers#32317 |
isaaccorley commentedJan 18, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I think you can already do this because timm supports the SAM ViT weights like:
|
adamjstewart commentedJan 18, 2025
But I'm not sure how well SAM works with U-Net instead of their own custom decoder. |
isaaccorley commentedJan 18, 2025
Agreed, it's likely highly dependent on the prompt embeddings as well. |
ogencoglu commentedJun 10, 2025
I don't think SAM works out of the box like this. |
Uh oh!
There was an error while loading.Please reload this page.
Closes#756
Added:
vit_h,vit_bandvit_l) to encodersChanged: