DeutscheKI/tevr-asr-toolPublic

NotificationsYou must be signed in to change notification settings
Fork18
Star412

State-of-the-art (ranked #1 Aug 2022) German Speech Recognition in 284 lines of C++. This is a 100% private 100% offline 100% free CLI tool.

License

MIT license

412 stars 18 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.idea		.idea
kenlm @ fee7b05		kenlm @ fee7b05
tensorflow_src @ d8ce9f9		tensorflow_src @ d8ce9f9
tevr-asr-data		tevr-asr-data
wave @ 42bc0fe		wave @ 42bc0fe
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
debug_logits.h		debug_logits.h
op_erf.cc		op_erf.cc
package.sh		package.sh
test_audio.wav		test_audio.wav
tevr_asr_tool.cc		tevr_asr_tool.cc

Repository files navigation

TEVR ASR Tool

state-of-the-art performance
- 3.64% WER on Common Voice German
- rank #1 onpaperswithcode.com
no GPU needed
100% offline
100% private
100% free
MIT license
Linux x86_64
command-line tool
easy to understand
- only 284 lines of C++ code
- AI model on HuggingFace

High Transcription Quality

In August 2022, we ranked#1 on "Speech Recognition on Common Voice German (using extra training data)"with a 3.64% word error rate.Accordingly, the performance of this tool is considered to bethe best of what's currently possiblein German speech recognition:

How does this work?

L175-L185load the WAV file.L189-L229execute the acoustic AI model.L260-L275convert the predicted token logits into string snippets.L73-L162implement the Beam search re-scoring based on a KenLM language model.

If you're curious how the acoustic AI model worksand why I designed it that way, here's the paper:https://arxiv.org/abs/2206.12693and here's a pre-trained HuggingFace transformers model:https://huggingface.co/fxtentacle/wav2vec2-xls-r-1b-tevr

Install the Debian/Ubuntu package

Downloadtevr_asr_tool-1.0.0-Linux-x86_64.deb from GitHuband extract the multipart ZIP:

wget"https://github.com/DeutscheKI/tevr-asr-tool/releases/download/v1.0.0/tevr_asr_tool-1.0.0-Linux-x86_64.zip.001"wget"https://github.com/DeutscheKI/tevr-asr-tool/releases/download/v1.0.0/tevr_asr_tool-1.0.0-Linux-x86_64.zip.002"wget"https://github.com/DeutscheKI/tevr-asr-tool/releases/download/v1.0.0/tevr_asr_tool-1.0.0-Linux-x86_64.zip.003"wget"https://github.com/DeutscheKI/tevr-asr-tool/releases/download/v1.0.0/tevr_asr_tool-1.0.0-Linux-x86_64.zip.004"wget"https://github.com/DeutscheKI/tevr-asr-tool/releases/download/v1.0.0/tevr_asr_tool-1.0.0-Linux-x86_64.zip.005"cat tevr_asr_tool-1.0.0-Linux-x86_64.zip.00*> tevr_asr_tool-1.0.0-Linux-x86_64.zipunzip tevr_asr_tool-1.0.0-Linux-x86_64.zip

Install it:

sudo dpkg -i tevr_asr_tool-1.0.0-Linux-x86_64.deb

Install from Source Code

Download submodules:

git submodule update --init

CMake configure and build:

cmake -DCMAKE_BUILD_TYPE=MinSizeRel -DCPACK_CMAKE_GENERATOR=Ninja -S. -B buildcmake --build build --target tevr_asr_tool -j 16

Create debian package:

(cd build&& cpack -G DEB)

Install it:

sudo dpkg -i build/tevr_asr_tool-1.0.0-Linux-x86_64.deb

Usage

tevr_asr_tool --target_file=test_audio.wav2>log.txt

should display the correct transcriptionmückenstiche sollte man nicht aufkratzen.Andlog.txt will contain the diagnostics and progressthat was logged to stderr during execution.

GPU Acceleration for Developers

I plan to release a Vulkan & OpenGL-acceleratedreal-time low-latency transcriptionsoftware for developers soon.It'll run 100% private + 100% offlinejust like this tool,but instead of processing a WAV file on CPUit'll stream the real-time GPU transcriptionof your microphone inputthrough a WebRTC-capable REST APIso that you can easily integrate itwith your own voice-controlled projects.For example, that'll enablehackable voice typingtogether withpynput.keyboard.

If you want to get notified when it launches,please enter your email athttps://madmimi.com/signups/f0da3b13840d40ce9e061cafea6280d5/join

Commercial Customization

This tool itself is free to use also for commercial use.And of course it comes with no warranty of any kind.

But if you have an idea for a commercial use-case fora customized version of this tool or for similartechnology - ideally something that helpssmall and medium-sized businesses in northern Germanybecome more competitive -then please contact me atmoin@DeutscheKI.de

Research Citation

If you use this for research, please cite:

@misc{https://doi.org/10.48550/arxiv.2206.12693,doi ={10.48550/ARXIV.2206.12693},url ={https://arxiv.org/abs/2206.12693},author ={Krabbenhöft, Hajo Nils and Barth, Erhardt},keywords ={Computation and Language (cs.CL), Sound (cs.SD), Audio and Speech Processing (eess.AS), FOS: Computer and information sciences, FOS: Computer and information sciences, FOS: Electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering, F.2.1; I.2.6; I.2.7},title ={TEVR: Improving Speech Recognition by Token Entropy Variance Reduction},publisher ={arXiv},year ={2022},copyright ={Creative Commons Attribution 4.0 International}}

Replace the AI Model

The German AI model and my training scripts can be found on HuggingFace:https://huggingface.co/fxtentacle/wav2vec2-xls-r-1b-tevr

The model has undergone XLS-R cross-language pre-training.You can directly fine-tune it with a differentlanguage dataset - for example CommonVoice English -and then re-export the files in thetevr-asr-data folder.

Alternatively, you can donate roughly 2 weeks ofA100 GPU credits to meand I'll train a suitable recognition modeland upload it to HuggingFace.

About

State-of-the-art (ranked #1 Aug 2022) German Speech Recognition in 284 lines of C++. This is a 100% private 100% offline 100% free CLI tool.

Releases1

v1.0.0 Latest

Aug 9, 2022

Packages

No packages published

Languages

C97.0%
C++2.6%
Other0.4%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

TEVR ASR Tool

High Transcription Quality

How does this work?

Install the Debian/Ubuntu package

Install from Source Code

Usage

GPU Acceleration for Developers

Commercial Customization

Research Citation

Replace the AI Model

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases1

Packages

Languages

Movatterモバイル変換

License

DeutscheKI/tevr-asr-tool

Folders and files

Latest commit

History

Repository files navigation

TEVR ASR Tool

High Transcription Quality

How does this work?

Install the Debian/Ubuntu package

Install from Source Code

Usage

GPU Acceleration for Developers

Commercial Customization

Research Citation

Replace the AI Model

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases1

Packages0

Languages

Packages