- Notifications
You must be signed in to change notification settings - Fork18
State-of-the-art (ranked #1 Aug 2022) German Speech Recognition in 284 lines of C++. This is a 100% private 100% offline 100% free CLI tool.
License
DeutscheKI/tevr-asr-tool
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
- state-of-the-art performance
- 3.64% WER on Common Voice German
- rank #1 onpaperswithcode.com
- no GPU needed
- 100% offline
- 100% private
- 100% free
- MIT license
- Linux x86_64
- command-line tool
- easy to understand
- only 284 lines of C++ code
- AI model on HuggingFace
In August 2022, we ranked#1 on "Speech Recognition on Common Voice German (using extra training data)"with a 3.64% word error rate.Accordingly, the performance of this tool is considered to bethe best of what's currently possiblein German speech recognition:
L175-L185load the WAV file.L189-L229execute the acoustic AI model.L260-L275convert the predicted token logits into string snippets.L73-L162implement the Beam search re-scoring based on a KenLM language model.
If you're curious how the acoustic AI model worksand why I designed it that way, here's the paper:https://arxiv.org/abs/2206.12693and here's a pre-trained HuggingFace transformers model:https://huggingface.co/fxtentacle/wav2vec2-xls-r-1b-tevr
Downloadtevr_asr_tool-1.0.0-Linux-x86_64.deb from GitHuband extract the multipart ZIP:
wget"https://github.com/DeutscheKI/tevr-asr-tool/releases/download/v1.0.0/tevr_asr_tool-1.0.0-Linux-x86_64.zip.001"wget"https://github.com/DeutscheKI/tevr-asr-tool/releases/download/v1.0.0/tevr_asr_tool-1.0.0-Linux-x86_64.zip.002"wget"https://github.com/DeutscheKI/tevr-asr-tool/releases/download/v1.0.0/tevr_asr_tool-1.0.0-Linux-x86_64.zip.003"wget"https://github.com/DeutscheKI/tevr-asr-tool/releases/download/v1.0.0/tevr_asr_tool-1.0.0-Linux-x86_64.zip.004"wget"https://github.com/DeutscheKI/tevr-asr-tool/releases/download/v1.0.0/tevr_asr_tool-1.0.0-Linux-x86_64.zip.005"cat tevr_asr_tool-1.0.0-Linux-x86_64.zip.00*> tevr_asr_tool-1.0.0-Linux-x86_64.zipunzip tevr_asr_tool-1.0.0-Linux-x86_64.zip
Install it:
sudo dpkg -i tevr_asr_tool-1.0.0-Linux-x86_64.deb
Download submodules:
git submodule update --init
CMake configure and build:
cmake -DCMAKE_BUILD_TYPE=MinSizeRel -DCPACK_CMAKE_GENERATOR=Ninja -S. -B buildcmake --build build --target tevr_asr_tool -j 16Create debian package:
(cd build&& cpack -G DEB)Install it:
sudo dpkg -i build/tevr_asr_tool-1.0.0-Linux-x86_64.deb
tevr_asr_tool --target_file=test_audio.wav2>log.txtshould display the correct transcriptionmückenstiche sollte man nicht aufkratzen.Andlog.txt will contain the diagnostics and progressthat was logged to stderr during execution.
I plan to release a Vulkan & OpenGL-acceleratedreal-time low-latency transcriptionsoftware for developers soon.It'll run 100% private + 100% offlinejust like this tool,but instead of processing a WAV file on CPUit'll stream the real-time GPU transcriptionof your microphone inputthrough a WebRTC-capable REST APIso that you can easily integrate itwith your own voice-controlled projects.For example, that'll enablehackable voice typingtogether withpynput.keyboard.
If you want to get notified when it launches,please enter your email athttps://madmimi.com/signups/f0da3b13840d40ce9e061cafea6280d5/join
This tool itself is free to use also for commercial use.And of course it comes with no warranty of any kind.
But if you have an idea for a commercial use-case fora customized version of this tool or for similartechnology - ideally something that helpssmall and medium-sized businesses in northern Germanybecome more competitive -then please contact me atmoin@DeutscheKI.de
If you use this for research, please cite:
@misc{https://doi.org/10.48550/arxiv.2206.12693,doi ={10.48550/ARXIV.2206.12693},url ={https://arxiv.org/abs/2206.12693},author ={Krabbenhöft, Hajo Nils and Barth, Erhardt},keywords ={Computation and Language (cs.CL), Sound (cs.SD), Audio and Speech Processing (eess.AS), FOS: Computer and information sciences, FOS: Computer and information sciences, FOS: Electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering, F.2.1; I.2.6; I.2.7},title ={TEVR: Improving Speech Recognition by Token Entropy Variance Reduction},publisher ={arXiv},year ={2022},copyright ={Creative Commons Attribution 4.0 International}}
The German AI model and my training scripts can be found on HuggingFace:https://huggingface.co/fxtentacle/wav2vec2-xls-r-1b-tevr
The model has undergone XLS-R cross-language pre-training.You can directly fine-tune it with a differentlanguage dataset - for example CommonVoice English -and then re-export the files in thetevr-asr-data folder.
Alternatively, you can donate roughly 2 weeks ofA100 GPU credits to meand I'll train a suitable recognition modeland upload it to HuggingFace.
About
State-of-the-art (ranked #1 Aug 2022) German Speech Recognition in 284 lines of C++. This is a 100% private 100% offline 100% free CLI tool.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.