collabora/WhisperFusionPublic

NotificationsYou must be signed in to change notification settings
Fork122
Star1.6k

WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.

1.6k stars 122 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
assets		assets
docker		docker
examples/chatbot/html		examples/chatbot/html
whisper_live		whisper_live
README.md		README.md
README.qmd		README.qmd
docker-compose.yml		docker-compose.yml
llm_service.py		llm_service.py
main.py		main.py
requirements.txt		requirements.txt
tts_service.py		tts_service.py

Repository files navigation

WhisperFusion

Seamless conversations with AI (with ultra-low latency)

Welcome to WhisperFusion. WhisperFusion builds upon the capabilities oftheWhisperLive andWhisperSpeech byintegrating Mistral, a Large Language Model (LLM), on top of thereal-time speech-to-text pipeline. Both LLM andWhisper are optimized to run efficiently as TensorRT engines, maximizingperformance and real-time processing capabilities. While WhiperSpeech isoptimized with torch.compile.

Features

Real-Time Speech-to-Text: Utilizes OpenAI WhisperLive to convertspoken language into text in real-time.
Large Language Model Integration: Adds Mistral, a Large LanguageModel, to enhance the understanding and context of the transcribedtext.
TensorRT Optimization: Both LLM and Whisper are optimized torun as TensorRT engines, ensuring high-performance and low-latencyprocessing.
torch.compile: WhisperSpeech uses torch.compile to speed upinference which makes PyTorch code run faster by JIT-compiling PyTorchcode into optimized kernels.

Hardware Requirements

A GPU with at least 24GB of RAM
For optimal latency, the GPU should have a similar FP16 (half) TFLOPS as the RTX 4090. Here are thehardware specifications for the RTX 4090.

The demo was run on a single RTX 4090 GPU. WhisperFusion uses the Nvidia TensorRT-LLM library for CUDA optimized versions of popular LLM models. TensorRT-LLM supports multiple GPUs, so it should be possible to run WhisperFusion for even better performance on multiple GPUs.

Getting Started

We provide a Docker Compose setup to streamline the deployment of the pre-built TensorRT-LLM docker container. This setup includes both Whisper and Phi converted to TensorRT engines, and the WhisperSpeech model is pre-downloaded to quickly start interacting with WhisperFusion. Additionally, we include a simple web server for the Web GUI.

Build and Run with docker compose

mkdir docker/scratch-spacecp docker/scripts/build-* docker/scripts/run-whisperfusion.sh docker/scratch-space/docker compose buildexport MODEL=Phi-3-mini-4k-instruct#Phi-3-mini-128k-instruct or phi-2, By default WhisperFusion uses phi-2docker compose up

Start Web GUI onhttp://localhost:8000

NOTE

Contact Us

For questions or issues, please open an issue. Contact us at:marcus.edel@collabora.com,jpc@collabora.com,vineet.suryan@collabora.com

About

WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

WhisperFusion

Seamless conversations with AI (with ultra-low latency)

Features

Hardware Requirements

Getting Started

Contact Us

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors4

Languages

Movatterモバイル変換

collabora/WhisperFusion

Folders and files

Latest commit

History

Repository files navigation

WhisperFusion

Seamless conversations with AI (with ultra-low latency)

Features

Hardware Requirements

Getting Started

Contact Us

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors4

Languages

Packages