wavlm

This repo contains the source code of the first deep learning-base singing voice beat tracking system. It leverages WavLM and DistilHuBERT pre-trained speech models to create vocal embeddings and trains linear multi-head self-attention layers on top of them to extract vocal beat activations. Then, it uses HMM decoder to infer signing beats and t…

music music-information-retrieval beat-tracking self-supervised singing-voice hubert linear-transformer wavlm

UpdatedSep 4, 2022
Python

lucadellalib /discrete-wavlm-codec

Star23

A neural speech codec based on discrete WavLM representations

clustering pytorch speech-synthesis codec k-means quantization self-supervised-learning hifi-gan wavlm token-extraction neural-speech-coding

UpdatedAug 28, 2024
Python

lucadellalib /audiocodecs

Star11

A collections of audio codecs with a standardized API

text-to-speech pytorch speech-synthesis codec quantization mimi dac self-supervised-learning encodec wavlm speech-coding speechtokenizer speech-language-model

UpdatedFeb 12, 2025
Python

alessandropec /data_driven_ai_voice_cloning

Star8

This repository contain the code of the main part of my master thesis degree at Politecnico di Torino in Data science & Engineering

machine-learning text-to-speech ai deep-learning speaker-verification zero-shot-learning speaker-embeddings voice-cloning tacotron2 fastspeech2 ecapa-tdnn wavlm generative-ai

UpdatedMar 5, 2023
Python

Sarasadeghii /Sharif-WavLM

Star8

In this repository, the wavLM model is used for quality and poor quality data for speaker verification task, and the PyCM library is used for evaluation.

confusion-matrix speaker-verification farsi-datasets wavlm pycm

UpdatedMay 27, 2023
Jupyter Notebook

theolepage /wavlm_ssl_sv

Star7

SOTA method for self-supervised speaker verification leveraging a large-scale pretrained ASR model.

pytorch speaker-recognition speaker-verification asr dino self-supervised-learning voxceleb meta-project-show meta-project-color-8877f4 meta-project-order-2 wavlm

UpdatedFeb 19, 2025
Python

bunyaminergen /WavLMMSDD

Star6

This repository combines `WavLM`, a powerful speech representation model from Microsoft, with `MSDD` (Multi-Scale Diarization Decoder), a state-of-the-art approach for speaker diarization from Nvidia.

microsoft speech embedding speaker-diarization diarization nvidia-nemo wavlm speech-embedding

UpdatedMar 10, 2025
Jupyter Notebook

sadPororo /UniPool-SV

Star5

Universal Pooling Method for Speaker Verification Utilizing Pre-trained Multi-layer Features, 2025 preprint

pretrained-models speaker-recognition speaker-verification hubert wav2vec2 wavlm

UpdatedSep 19, 2024
Python

zhu00121 /Universal-representation-dynamics-of-deepfake-speech

Star4

This repo contains code used in the paper "Characterizing the temporal dynamics of universal speech representations for generalizable deepfake detection"

self-supervised deepfake-detection wav2vec2 wavlm modulation-transformation