sovit-123/SAM_Molmo_WhisperPublic

NotificationsYou must be signed in to change notification settings
Fork5
Star27

An integration of Segment Anything Model, Molmo, and, Whisper to segment objects using voice and natural language.

License

Apache-2.0 license

27 stars 5 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
demo_data		demo_data
docs		docs
experiments		experiments
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Repository files navigation

SAM_Molmo_Whisper

Note: The project is in very initial stages and will change drastically in the near future. Things may break.

Go to Setup

A simple integration of Segment Anything Model, Molmo, and, Whisper to segment objects using voice and natural language.

Capabilities:

Segment objects withSAM2.1 using point prompts.
Points can be obtained byprompting Molmo with natural language. Molmo can take inputs by thetext box (typing) orWhisper via microphone (speech to text).

Run the Gradio demo using:

python app.py

sam2_molmo_whisper-2024-10-11_07.09.47.mp4

What's New

October 30, 2024

Added tabbed interface for video segmentation. Process remains the same. Either prompt via text or voice, upload a video and get the segmentation maps of the objects.

Setup

Clone Repo

git clone https://github.com/sovit-123/SAM_Molmo_Whisper.git

cd SAM_Molmo_Whisper

Installing Requirements

Install Pytorch, Hugging Face Transformers, and the rest of the base requirements.

pip install -r requirements.txt

Install SAM2

It is highly recommended to clone SAM2 to a separate directory other than this project directory and run the installation commands.

git clone https://github.com/facebookresearch/sam2.git && cd sam2pip install -e .

To Use CLIP Auto Labelling

After installing the requirements install SpaCy'sen_core_web_sm model.

spacy download en_core_web_sm

Run the App

python app.py

About

An integration of Segment Anything Model, Molmo, and, Whisper to segment objects using voice and natural language.

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

SAM_Molmo_Whisper

What's New

October 30, 2024

Setup

Clone Repo

Installing Requirements

Install SAM2

To Use CLIP Auto Labelling

Run the App

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

sovit-123/SAM_Molmo_Whisper

Folders and files

Latest commit

History

Repository files navigation

SAM_Molmo_Whisper

What's New

October 30, 2024

Setup

Clone Repo

Installing Requirements

Install SAM2

To Use CLIP Auto Labelling

Run the App

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages