Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

State-of-the-art (ranked #1 Aug 2022) German Speech Recognition in 284 lines of C++. This is a 100% private 100% offline 100% free CLI tool.

License

NotificationsYou must be signed in to change notification settings

DeutscheKI/tevr-asr-tool

Repository files navigation

  • state-of-the-art performance
  • no GPU needed
  • 100% offline
  • 100% private
  • 100% free
  • MIT license
  • Linux x86_64
  • command-line tool
  • easy to understand
    • only 284 lines of C++ code
    • AI model on HuggingFace

High Transcription Quality

In August 2022, we ranked#1 on "Speech Recognition on Common Voice German (using extra training data)"with a 3.64% word error rate.Accordingly, the performance of this tool is considered to bethe best of what's currently possiblein German speech recognition:PWC

How does this work?

L175-L185load the WAV file.L189-L229execute the acoustic AI model.L260-L275convert the predicted token logits into string snippets.L73-L162implement the Beam search re-scoring based on a KenLM language model.

If you're curious how the acoustic AI model worksand why I designed it that way, here's the paper:https://arxiv.org/abs/2206.12693and here's a pre-trained HuggingFace transformers model:https://huggingface.co/fxtentacle/wav2vec2-xls-r-1b-tevr

Install the Debian/Ubuntu package

Downloadtevr_asr_tool-1.0.0-Linux-x86_64.deb from GitHuband extract the multipart ZIP:

wget"https://github.com/DeutscheKI/tevr-asr-tool/releases/download/v1.0.0/tevr_asr_tool-1.0.0-Linux-x86_64.zip.001"wget"https://github.com/DeutscheKI/tevr-asr-tool/releases/download/v1.0.0/tevr_asr_tool-1.0.0-Linux-x86_64.zip.002"wget"https://github.com/DeutscheKI/tevr-asr-tool/releases/download/v1.0.0/tevr_asr_tool-1.0.0-Linux-x86_64.zip.003"wget"https://github.com/DeutscheKI/tevr-asr-tool/releases/download/v1.0.0/tevr_asr_tool-1.0.0-Linux-x86_64.zip.004"wget"https://github.com/DeutscheKI/tevr-asr-tool/releases/download/v1.0.0/tevr_asr_tool-1.0.0-Linux-x86_64.zip.005"cat tevr_asr_tool-1.0.0-Linux-x86_64.zip.00*> tevr_asr_tool-1.0.0-Linux-x86_64.zipunzip tevr_asr_tool-1.0.0-Linux-x86_64.zip

Install it:

sudo dpkg -i tevr_asr_tool-1.0.0-Linux-x86_64.deb

Install from Source Code

Download submodules:

git submodule update --init

CMake configure and build:

cmake -DCMAKE_BUILD_TYPE=MinSizeRel -DCPACK_CMAKE_GENERATOR=Ninja -S. -B buildcmake --build build --target tevr_asr_tool -j 16

Create debian package:

(cd build&& cpack -G DEB)

Install it:

sudo dpkg -i build/tevr_asr_tool-1.0.0-Linux-x86_64.deb

Usage

tevr_asr_tool --target_file=test_audio.wav2>log.txt

should display the correct transcriptionmückenstiche sollte man nicht aufkratzen.Andlog.txt will contain the diagnostics and progressthat was logged to stderr during execution.

GPU Acceleration for Developers

I plan to release a Vulkan & OpenGL-acceleratedreal-time low-latency transcriptionsoftware for developers soon.It'll run 100% private + 100% offlinejust like this tool,but instead of processing a WAV file on CPUit'll stream the real-time GPU transcriptionof your microphone inputthrough a WebRTC-capable REST APIso that you can easily integrate itwith your own voice-controlled projects.For example, that'll enablehackable voice typingtogether withpynput.keyboard.

If you want to get notified when it launches,please enter your email athttps://madmimi.com/signups/f0da3b13840d40ce9e061cafea6280d5/join

Commercial Customization

This tool itself is free to use also for commercial use.And of course it comes with no warranty of any kind.

But if you have an idea for a commercial use-case fora customized version of this tool or for similartechnology - ideally something that helpssmall and medium-sized businesses in northern Germanybecome more competitive -then please contact me atmoin@DeutscheKI.de

Research Citation

If you use this for research, please cite:

@misc{https://doi.org/10.48550/arxiv.2206.12693,doi ={10.48550/ARXIV.2206.12693},url ={https://arxiv.org/abs/2206.12693},author ={Krabbenhöft, Hajo Nils and Barth, Erhardt},keywords ={Computation and Language (cs.CL), Sound (cs.SD), Audio and Speech Processing (eess.AS), FOS: Computer and information sciences, FOS: Computer and information sciences, FOS: Electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering, F.2.1; I.2.6; I.2.7},title ={TEVR: Improving Speech Recognition by Token Entropy Variance Reduction},publisher ={arXiv},year ={2022},copyright ={Creative Commons Attribution 4.0 International}}

Replace the AI Model

The German AI model and my training scripts can be found on HuggingFace:https://huggingface.co/fxtentacle/wav2vec2-xls-r-1b-tevr

The model has undergone XLS-R cross-language pre-training.You can directly fine-tune it with a differentlanguage dataset - for example CommonVoice English -and then re-export the files in thetevr-asr-data folder.

Alternatively, you can donate roughly 2 weeks ofA100 GPU credits to meand I'll train a suitable recognition modeland upload it to HuggingFace.


[8]ページ先頭

©2009-2026 Movatter.jp