speech-interaction
Here are 5 public repositories matching this topic...
Language:All
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
- Updated
May 19, 2025 - Python
MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction models along with training and inference code, covering but not limited to end-to-end speech interaction, end-to-end speech translation and speech recognition.
- Updated
Jan 8, 2025 - Python
a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.
- Updated
Apr 7, 2025 - Python
[CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
- Updated
Apr 7, 2025 - Python
web-based voice-controlled media player, designed to run in any modern browser (Chrome/Edge recommended).
- Updated
Sep 20, 2025 - JavaScript
Improve this page
Add a description, image, and links to thespeech-interaction topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thespeech-interaction topic, visit your repo's landing page and select "manage topics."