Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

An integration of Segment Anything Model, Molmo, and, Whisper to segment objects using voice and natural language.

License

NotificationsYou must be signed in to change notification settings

sovit-123/SAM_Molmo_Whisper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Note: The project is in very initial stages and will change drastically in the near future. Things may break.

Go to Setup

A simple integration of Segment Anything Model, Molmo, and, Whisper to segment objects using voice and natural language.

Capabilities:

  • Segment objects withSAM2.1 using point prompts.
  • Points can be obtained byprompting Molmo with natural language. Molmo can take inputs by thetext box (typing) orWhisper via microphone (speech to text).

Run the Gradio demo using:

python app.py
sam2_molmo_whisper-2024-10-11_07.09.47.mp4

What's New

October 30, 2024

  • Added tabbed interface for video segmentation. Process remains the same. Either prompt via text or voice, upload a video and get the segmentation maps of the objects.

Setup

Clone Repo

git clone https://github.com/sovit-123/SAM_Molmo_Whisper.git
cd SAM_Molmo_Whisper

Installing Requirements

Install Pytorch, Hugging Face Transformers, and the rest of the base requirements.

pip install -r requirements.txt

Install SAM2

It is highly recommended to clone SAM2 to a separate directory other than this project directory and run the installation commands.

git clone https://github.com/facebookresearch/sam2.git && cd sam2pip install -e .

To Use CLIP Auto Labelling

After installing the requirements install SpaCy'sen_core_web_sm model.

spacy download en_core_web_sm

Run the App

python app.py

About

An integration of Segment Anything Model, Molmo, and, Whisper to segment objects using voice and natural language.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp