speech-language-model
Here are 18 public repositories matching this topic...
Language:All
Sort:Most stars
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
- Updated
May 19, 2025 - Python
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
- Updated
Mar 2, 2025 - Python
A Survey of Spoken Dialogue Models (60 pages)
- Updated
Nov 28, 2024
[ICCV 2025] Official Implementation for "Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition"
- Updated
Jan 9, 2025 - Python
AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
- Updated
Apr 20, 2025 - Python
SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on One GPU in a Day"
- Updated
May 18, 2025 - Python
Code for DeSTA2.5-Audio
- Updated
Aug 7, 2025 - Python
Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"
- Updated
Jul 15, 2025 - HTML
A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.
- Updated
Jun 4, 2025 - Python
Streamable Text-to-Speech model using a language modeling approach, without vector quantization
- Updated
May 20, 2025 - Python
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
- Updated
Dec 20, 2024 - Python
Survey of audio language models
- Updated
Jun 21, 2025 - Jupyter Notebook
The official code for the SALMon🍣 benchmark (ICASSP 2025 - Oral)
- Updated
Aug 15, 2025 - Python
Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"
- Updated
Sep 27, 2025 - Python
A collections of audio codecs with a standardized API
- Updated
May 27, 2025 - Python
Speech Resynthesis and Language Modeling
- Updated
Jun 11, 2025 - Python
a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.
- Updated
Apr 7, 2025 - Python
[CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
- Updated
Apr 7, 2025 - Python
Improve this page
Add a description, image, and links to thespeech-language-model topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thespeech-language-model topic, visit your repo's landing page and select "manage topics."