revdotcom/reverbPublic

NotificationsYou must be signed in to change notification settings
Fork27
Star433

Open source inference code for Rev's model

License

Apache-2.0 license

433 stars 27 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
asr		asr
diarization		diarization
examples		examples
resources		resources
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.arm64		Dockerfile.arm64
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Repository files navigation

Reverb

Open source inference and evaluation code for Rev's state-of-the-art speech recognition and diarization models. The speech recognition (ASR) code uses theWeNet framework and the speech diarization code uses thePyannote framework. More detailed model descriptions can be found in ourblog and the models can be downloaded fromhuggingface.

ASR

Speech-to-text code based on the WeNet framework. Seethe ASR folder for more details and usage instructions.

Long-form speech recognition WER results:

Model	Earnings21	Earnings22	Rev16
Reverb ASR	9.68	13.68	10.30
Whisper Large-v3	14.26	19.05	10.86
Canary-1B	14.40	19.01	13.82

Diarization

Speaker diarization code based on the Pyannote framework. Seethe diarization folder for more details and usage instructions.

Long-form WDER results, in combination with Rev's ASR:

Model	Earnings21	Rev16
Pyannote3.0	0.051	0.090
Reverb Diarization V1	0.047	0.077
Reverb Diarization V2	0.046	0.078

Getting Started

Important

These instructions require that you set up:

HuggingFace access token and have cli login.
- Click the following links for more information onHuggingFace access tokens and setting up yourcli login.
Git LFS
- Simply rungit lfs install from your terminal.

Check out the READMEs within each subdirectory for more information on theASR ordiarization models.

Python Setup

This codebase is compatible Python 3.10+. To get started, simply run

pip install.

This will install thereverb package into your python environment which is a modified version of thewenet python package. In order to usereverb's code, make sure youdo not have another wenet installation in your environment which might cause conflict issues.

Tip

While we suggest using our CLI or Python package to download the reverb model, you can also download it manually by running:

git lfs installgit clone https://huggingface.co/Revai/reverb-asr

Command Line Usage

The following command can be used to transcribe audio files:

reverb --model reverb_asr_v1 --audio_file audio.mp3 --result_dir results

You can also specify how "verbatim" the transcription should be:

reverb --model reverb_asr_v1 --audio_file audio.mp3 --result_dir results --verbatimicity 0.2

Even change the decoding mode:

reverb --model reverb_asr_v1 --audio_file audio.mp3 --result_dir results --modes ctc_prefix_beam_search

For a full list of arguments, run:

reverb --help

or checkout ourscript.

Python Usage

Reverb can also be used from within Python:

importwenetreverb=wenet.load_model("reverb_asr_v1")output=reverb.transcribe("audio.mp3")print(output)

Theload_model function will automatically download the reverb model from HuggingFace.If instead you have a local version of the model that you downloaded from our HuggingFace or that you've finetuned, you can simply specify the path to the directory containing the.pt checkpoint,config.yaml, and extra files inload_model to use your model.

importwenetreverb=wenet.load_model("/local/reverb-asr")output=reverb.transcribe("audio.mp3")print(output)

If instead of text output, you'd prefer CTM output, simply specify the format in thetranscribe command.

importwenetreverb=wenet.load_model("reverb_asr_v1")# Specifying the "format" will change the outputoutput=reverb.transcribe("audio.mp3",format="ctm")print(output)

All arguments available to thereverb command line are also parameters that can be included in thetranscribe command.

importwenetreverb=wenet.load_model("reverb_asr_v1")# Specifying the "format" will change the outputoutput=reverb.transcribe("audio.mp3",verbatimicity=0.5,beam_size=2,ctc_weight=0.6)print(output)

Docker Image

Alternatively, you can use Docker to run ASR and/or diarization without needing to install dependencies (including the model files).directly on your system. First, make sure Docker is installed on your system. If you wish to runon NVIDIA GPU, more steps might be required.Then, run the following command to build the Docker image:

docker build -t reverb. --build-arg HUGGINGFACE_ACCESS_TOKEN=${YOUR_HUGGINGFACE_ACCESS_TOKEN}

And to run docker

sudo docker run --entrypoint"/bin/bash" --gpus all --rm -it reverb

Hosting the Model

If your usecase requires a to deploy these models at a larger scale and maintaining strictsecurity requirements, consider using our other release:https://github.com/revdotcom/reverb-self-hosted.This setup will give you full control over the deployment of our models on your own infrastructurewithout the need for internet connectivity or cloud dependencies.

License

The license in this repository appliesonly to the code not the models. See LICENSE for details. For model licenses, check out their pages on HuggingFace.

Citations

If you make use of this model, please cite this paper

@article{bhandari2024reverb,  title={Reverb: Open-Source ASR and Diarization from Rev},  author={Bhandari, Nishchal and Chen, Danny and del Río Fernández, Miguel Ángel and Delworth, Natalie and Fox, Jennifer Drexler and Jetté, Miguel and McNamara, Quinten and Miller, Corey and Novotný, Ondřej and Profant, Ján and Qin, Nan and Ratajczak, Martin and Robichaud, Jean-Philippe},  journal={arXiv preprint arXiv:2410.03930},  year={2024}}

Contributors

Nishchal Bhandari, Danny Chen, Miguel Del Rio, Natalie Delworth, Jennifer Drexler Fox, Miguel Jette, Quinn McNamara, Corey Miller, Ondrej Novotny, Jan Profant, Nan Qin, Martin Ratajczak, and Jean-Philippe Robichaud.

About

Open source inference code for Rev's model

www.rev.com

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Reverb

Table of Contents

ASR

Diarization

Getting Started

Python Setup

Command Line Usage

Python Usage

Docker Image

Hosting the Model

License

Citations

Contributors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors6

Uh oh!

Languages

Movatterモバイル変換

License

revdotcom/reverb

Folders and files

Latest commit

History

Repository files navigation

Reverb

Table of Contents

ASR

Diarization

Getting Started

Python Setup

Command Line Usage

Python Usage

Docker Image

Hosting the Model

License

Citations

Contributors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors6

Uh oh!

Languages

Packages