Movatterモバイル変換

Skip to content

#

speech-language-model

Here are 18 public repositories matching this topic...

Language:All

Filter by language

All18 Python15 HTML1 Jupyter Notebook1

Sort:Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

ictnlp /LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

speech-to-text speech-to-speech large-language-models multimodal-large-language-models speech-language-model speech-interaction

UpdatedMay 19, 2025
Python

jishengpeng /WavTokenizer

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

semantic text-to-speech codec acoustic dac speech-representation audio-representation encodec soundstream music-representation-learning gpt4o speech-language-model

UpdatedMar 2, 2025
Python

jishengpeng /WavChat

A Survey of Spoken Dialogue Models (60 pages)

streaming duplex speech moshi speech-representation encodec gpt-4o speech-language-model spoken-dialogue-models modal-alignment intreaction mini-omni llama-omni wavtokenizer

UpdatedNov 28, 2024

dvlab-research /Lyra

[ICCV 2025] Official Implementation for "Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition"

efficiency vision-language-model speech-language-model omni-language-model multimodal-large-language-model

UpdatedJan 9, 2025
Python

zhenye234 /xcodec

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

audio music semantic text-to-speech tokenizer speech sound codec audio-codec gpt language-model self-supervised-learning text-to-music vall-e text-to-sound speech-language-model

UpdatedApr 20, 2025
Python

slamkit

slp-rl /slamkit

SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on One GPU in a Day"

transformers language-model efficient-training speech-language-model

UpdatedMay 18, 2025
Python

kehanlu /DeSTA2.5-Audio

Code for DeSTA2.5-Audio

large-language-model speech-language-model audio-language-model

UpdatedAug 7, 2025
Python

kehanlu /DeSTA2

Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"

speech-processing large-language-models speech-language-model

UpdatedJul 15, 2025
HTML

Ereboas /MagiCodec

A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.

text-to-speech pytorch tts codec speech-representation llm llms speech-language-model

UpdatedJun 4, 2025
Python

ictnlp /SLED-TTS

Streamable Text-to-Speech model using a language modeling approach, without vector quantization

text-to-speech speech-synthesis streaming-inference speech-language-model

UpdatedMay 20, 2025
Python

hhguo /SoCodec

Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications

audio speech tts speech-codec speech-language-model

UpdatedDec 20, 2024
Python

ryota-komatsu /slp2025

Survey of audio language models

speech speech-processing speech-representation multimodal-large-language-models speech-language-model

UpdatedJun 21, 2025
Jupyter Notebook

slp-rl /salmon

The official code for the SALMon🍣 benchmark (ICASSP 2025 - Oral)

audio-processing acoustic-model speech-language-model

UpdatedAug 15, 2025
Python

ryota-komatsu /speaker_disentangled_hubert

Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"

speech speech-processing spoken-language-processing self-supervised-learning speech-language-model

UpdatedSep 27, 2025
Python

lucadellalib /audiocodecs

A collections of audio codecs with a standardized API

text-to-speech pytorch speech-synthesis codec quantization mimi dac self-supervised-learning encodec wavlm speech-coding speechtokenizer speech-language-model

UpdatedMay 27, 2025
Python

ryota-komatsu /speech_resynth

Speech Resynthesis and Language Modeling

speech speech-synthesis speech-processing self-supervised-learning speech-language-model

UpdatedJun 11, 2025
Python

OmniMMI /OpenOmniNexus

a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.

multi-modal speech-language-model speech-interaction omni-language-model

UpdatedApr 7, 2025
Python

OmniMMI /OmniMMI

[CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts

multi-modal large-language-models llms multimodal-large-language-models speech-language-model speech-interaction

UpdatedApr 7, 2025
Python

Improve this page

Add a description, image, and links to thespeech-language-model topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thespeech-language-model topic, visit your repo's landing page and select "manage topics."

[8]ページ先頭

©2009-2025 Movatter.jp