Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.

NotificationsYou must be signed in to change notification settings

collabora/WhisperFusion

Repository files navigation

WhisperFusion

Seamless conversations with AI (with ultra-low latency)

Welcome to WhisperFusion. WhisperFusion builds upon the capabilities oftheWhisperLive andWhisperSpeech byintegrating Mistral, a Large Language Model (LLM), on top of thereal-time speech-to-text pipeline. Both LLM andWhisper are optimized to run efficiently as TensorRT engines, maximizingperformance and real-time processing capabilities. While WhiperSpeech isoptimized with torch.compile.

Features

  • Real-Time Speech-to-Text: Utilizes OpenAI WhisperLive to convertspoken language into text in real-time.

  • Large Language Model Integration: Adds Mistral, a Large LanguageModel, to enhance the understanding and context of the transcribedtext.

  • TensorRT Optimization: Both LLM and Whisper are optimized torun as TensorRT engines, ensuring high-performance and low-latencyprocessing.

  • torch.compile: WhisperSpeech uses torch.compile to speed upinference which makes PyTorch code run faster by JIT-compiling PyTorchcode into optimized kernels.

Hardware Requirements

  • A GPU with at least 24GB of RAM
  • For optimal latency, the GPU should have a similar FP16 (half) TFLOPS as the RTX 4090. Here are thehardware specifications for the RTX 4090.

The demo was run on a single RTX 4090 GPU. WhisperFusion uses the Nvidia TensorRT-LLM library for CUDA optimized versions of popular LLM models. TensorRT-LLM supports multiple GPUs, so it should be possible to run WhisperFusion for even better performance on multiple GPUs.

Getting Started

We provide a Docker Compose setup to streamline the deployment of the pre-built TensorRT-LLM docker container. This setup includes both Whisper and Phi converted to TensorRT engines, and the WhisperSpeech model is pre-downloaded to quickly start interacting with WhisperFusion. Additionally, we include a simple web server for the Web GUI.

  • Build and Run with docker compose
mkdir docker/scratch-spacecp docker/scripts/build-* docker/scripts/run-whisperfusion.sh docker/scratch-space/docker compose buildexport MODEL=Phi-3-mini-4k-instruct#Phi-3-mini-128k-instruct or phi-2, By default WhisperFusion uses phi-2docker compose up
  • Start Web GUI onhttp://localhost:8000

NOTE

Contact Us

For questions or issues, please open an issue. Contact us at:marcus.edel@collabora.com,jpc@collabora.com,vineet.suryan@collabora.com

About

WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp