- Notifications
You must be signed in to change notification settings - Fork29
Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.
License
mkiol/dsnote
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Linux desktop and Sailfish OS app for note taking, reading and translating with offline Speech to Text, Text to Speech and Machine Translation
- Description
- Languages and Models
- How to install
- Flatpak packages
- Beta version
- Building from sources
- How to enable a custom model
- Contributing to Speech Note
- How to support
- Reviews and demos
- License
Speech Note let you take, read and translate notes in multiple languages.It uses Speech to Text, Text to Speech and Machine Translation to do so.Text and voice processing take place entirely offline, locally on yourcomputer, without using a network connection. Your privacy is alwaysrespected. No data is sent to the Internet.
Speech Note uses many different processing engines to do its job.Currently these are used:
- Speech to Text (STT)
- Text to Speech (TTS)
- Machine Translation (MT)
Speech Note installation package does not include checkpoint files for supported models, but instead they can be easily downloaded using the graphical model browser built into the application.
Following languages and models are supported and enable for download:
Lang ID | Name | DeepSpeech (STT) | Whisper (STT) | Vosk (STT) | April-ASR (STT) | Piper (TTS) | RHVoice (TTS) | espeak (TTS) | MBROLA (TTS) | Coqui (TTS) | Mimic3 (TTS) | WhisperSpeech (TTS) | Bergamot (MT) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
af | Afrikaans | ● | ● | ● | |||||||||
am | Amharic | ● (e) | ● | ● | ● | ||||||||
ar | Arabic | ● | ● | ● | ● | ● | ● | ● | |||||
bg | Bulgarian | ● | ● | ● | |||||||||
bn | Bengali | ● | ● | ● | ● | ||||||||
bs | Bosnian | ● | ● | ● | |||||||||
ca | Catalan | ● | ● | ● | ● | ● | ● | ● | |||||
cs | Czech | ● | ● | ● | ● | ● | ● | ● | ● | ● | |||
cy | Welsh | ● | |||||||||||
da | Danish | ● | ● | ● | ● | ● | |||||||
de | German | ● | ● | ● | ● | ● | ● | ● | ● | ● | |||
el | Greek | ● (e) | ● | ● | ● | ● | ● | ● | |||||
en | English | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | |
eo | Esperanto | ● | ● | ● | |||||||||
es | Spanish | ● | ● | ● | ● | ● | ● | ● | ● | ● | |||
et | Estonian | ● (e) | ● | ● | ● | ● | ● | ||||||
eu | Basque | ● (e) | ● | ● | ● | ||||||||
fa | Persian | ● | ● | ● | ● | ● | ● | ● | ● | ● | |||
fi | Finnish | ● | ● | ● | ● | ● | ● | ● | |||||
fr | French | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ||
ga | Irish | ● | ● | ||||||||||
gu | Gujarati | ● | ● | ● | |||||||||
ha | Hausa | ● | ● | ||||||||||
he | Hebrew | ● | ● | ||||||||||
hi | Hindi | ● | ● | ● | |||||||||
hr | Croatian | ● | ● | ● | ● | ● | |||||||
hu | Hungarian | ● (e) | ● | ● | ● | ● | ● | ● | ● | ||||
id | Indonesian | ● (e) | ● | ● | ● | ● | ● | ||||||
is | Icelandic | ● | ● | ● | ● | ● | |||||||
it | Italian | ● | ● | ● | ● | ● | ● | ● | ● | ● | |||
ja | Japanese | ● | ● | ● | ● | ||||||||
jv | Javanese | ● | ● | ||||||||||
ka | Georgian | ● | ● | ● | ● | ||||||||
kk | Kazakh | ● | ● | ● | ● | ● | |||||||
ko | Korean | ● | ● | ● | ● | ||||||||
ky | Kyrgyz | ● | ● | ||||||||||
la | Latin | ● | ● | ||||||||||
lb | Luxembourgish | ● | |||||||||||
lt | Lithuanian | ● | ● | ● | ● | ● | |||||||
lv | Latvian | ● | ● | ● | ● | ● | ● | ||||||
mk | Macedonian | ● | ● | ● | |||||||||
mn | Mongolian | ● (e) | ● | ● | |||||||||
mr | Marathi | ● | ● | ||||||||||
ms | Malay | ● | ● | ● | ● | ||||||||
mt | Maltese | ● | ● | ● | |||||||||
ne | Nepali | ● | ● | ● | ● | ||||||||
nl | Dutch | ● (e) | ● | ● | ● | ● | ● | ● | ● | ● | |||
no | Norwegian | ● | ● | ● | ● | ||||||||
pl | Polish | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● |
pt | Portuguese | ● (e) | ● | ● | ● | ● | ● | ● | ● | ||||
ro | Romanian | ● (e) | ● | ● | ● | ● | ● | ● | |||||
ru | Russian | ● | ● | ● | ● | ● | ● | ● | ● | ||||
sk | Slovak | ● | ● | ● | ● | ● | ● | ||||||
sl | Slovenian | ● (e) | ● | ● | ● | ● | ● | ||||||
sq | Albanian | ● | ● | ● | ● | ||||||||
sr | Serbian | ● | ● | ● | ● | ● | |||||||
sv | Swedish | ● | ● | ● | ● | ● | ● | ● | ● | ||||
sw | Swahili | ● | ● | ● | ● | ● | |||||||
te | Telugu | ● | ● | ● | |||||||||
th | Thai | ● (e) | ● | ● | ● | ||||||||
tl | Tagalog | ● | ● | ● | |||||||||
tn | Tswana | ● | ● | ● | |||||||||
tr | Turkish | ● (e) | ● | ● | ● | ● | ● | ● | ● | ||||
tt | Tatar | ● | ● | ● | ● | ||||||||
uk | Ukrainian | ● | ● | ● | ● | ● | ● | ● | ● | ● | |||
uz | Uzbek | ● | ● | ● | ● | ||||||||
vi | Vietnamese | ● | ● | ● | ● | ● | ● | ||||||
yo | Yoruba | ● (e) | ● | ● | ● | ||||||||
zh | Chinese | ● | ● | ● | ● | ● | ● | ● |
(e) experimental, most likely doesn't work well
Faster Whisper, Coqui TTS and Mimic3 models are only available on x86-64.
Language models can be downloaded directly from the app.
Details of models which are currently configured for download are described inmodels.json (GitHub) ormodels.json (GitLab).
- Linux Desktop:Flatpak
# Flatpak base packageflatpak install net.mkiol.SpeechNote# Optional NVIDIA add-on packageflatpak install net.mkiol.SpeechNote.Addon.nvidia# Optional AMD add-on package (not recommended)flatpak install net.mkiol.SpeechNote.Addon.amd
Arch Linux (AUR):
openSUSE (Packman repository)
# Base packagezypperin speechnote# Optional support for Python-based features in Speech Notezypperin speechnote-python-modules
- Sailfish OS:OpenRepos
The app distributed via Flatpak (published on Flathub) consists of the following packages:
- Base package "Speech Note" (net.mkiol.SpeechNote)
- Optional add-on for NVIDIA graphics card "Speech Note NVIDIA" (net.mkiol.SpeechNote.Addon.nvidia)
- Optional (and not recommended) add-on for AMD graphics card "Speech Note AMD" (net.mkiol.SpeechNote.Addon.amd)
Base package includes all the dependencies needed to run every feature of the application.Add-ons add the capability of GPU acceleration, which speeds up some operations in the application.
Base package and add-ons contain many "heavy" libraries like CUDA, ROCm, Torch and Python libraries.Due to this, the size of the packages and the space required after installation are significant.If you don't need all the functionalities, you can use much smaller "Tiny" package(available onReleases page),which provides only the basic features. If you need, you can also use "Tiny" packages together with GPU acceleration add-on.
It is not recommended to install the AMD add-on. It is very large in size and does not provide many benefits.In addition, ROCm 6.x included in the add-on may cause problems on some GPUs.
Comparison between Base, Tiny and Add-ons Flatpak packages:
Sizes | Base | Tiny | AMD add-on | NVIDIA add-on |
---|---|---|---|---|
Download size | 0.9 GiB | 70 MiB | +7.1 GiB | +3.7 GiB |
Unpacked size | 3.2 GiB | 170 MiB | +25.6 GiB | +6.4 GiB |
Features | Base | Tiny | AMD add-on | NVIDIA add-on |
---|---|---|---|---|
Coqui/DeepSpeech STT | + | + | ||
Vosk STT | + | + | ||
Whisper (whisper.cpp) STT | + | + | ||
Whisper (whisper.cpp) STT OpenCL ROCm | - | - | + | |
Whisper (whisper.cpp) STT OpenCL NVIDIA | + | + | ||
Whisper (whisper.cpp) STT ROCm | - | - | + | |
Whisper (whisper.cpp) STT CUDA | - | - | + | |
Whisper (whisper.cpp) STT OpenVINO | + | - | ||
Whisper (whisper.cpp) STT Vulkan | + | + | ||
Faster Whisper STT | + | - | ||
Faster Whisper STT CUDA | - | - | + | |
April-ASR STT | + | + | ||
eSpeak TTS | + | + | ||
MBROLA TTS | + | + | ||
Piper TTS | + | + | ||
RHVoice TTS | + | + | ||
Coqui TTS | + | - | ||
Coqui TTS ROCm | - | - | + | |
Coqui TTS CUDA | - | - | + | |
Mimic3 TTS | + | - | ||
WhisperSpeech TTS | + | - | ||
WhisperSpeech TTS ROCm | - | - | + | |
WhisperSpeech TTS CUDA | - | - | + | |
Punctuation restoration | + | - | ||
Translator | + | + |
In addition to the stable version in the Flathub repository, you can try to test the "Beta" version of the upcoming release. This version is usable, but may contain more bugs.
Beta version is available in "flathub-beta" repository. Followthese instructions to enable flathub-beta on your computer.
It is also possible to build and install the latest development (git) or latest stable (release) version from the repository using the provided PKGBUILD file (please note that the same remarks about building on Linux apply):
git clone<git repository url>cd dsnote/arch/git# build latest git version# orcd dsnote/arch/release# build latest release versionmakepkg -si
It is also possible to build and install the latest development version from the repository using the provided SPEC file and helpermake_rpm.sh
script:
git clone<git repository url>cd dsnote/fedora# optionally install build dependenciesdnf install rpmdevtools autoconf automake boost-devel cmake git kf5-kdbusaddons-devel libarchive-devel libxdo-devel libXinerama-devel libxkbcommon-x11-devel libXtst-devel libtool meson openblas-devel patchelf pybind11-devel python3-devel python3-pybind11 qt5-linguist qt5-qtmultimedia-devel qt5-qtquickcontrols2-devel qt5-qtx11extras-devel rubberband-devel taglib-devel vulkan-headers./make_rpm.sh
git clone<git repository url>cd dsnote/flatpak# build a base packageflatpak-builder --force-clean --user --install-deps-from=flathub --repo="<name or /path/to/local/flatpak/repo>""/path/to/output/dir" net.mkiol.SpeechNote.yaml# build an optional NVIDIA add-on packageflatpak-builder --force-clean --user --install-deps-from=flathub --repo="<name or /path/to/local/flatpak/repo>""/path/to/output/dir" net.mkiol.SpeechNote.Addon.nvidia.yaml
git clone<git repository url>cd dsnotemkdir buildcd buildsfdk config --session specfile=../sfos/harbour-dsnote.specsfdk config --session target=SailfishOS-4.4.0.58-aarch64sfdk cmake ../ -DCMAKE_BUILD_TYPE=Release -DWITH_SFOS=ON -DWITH_PY=OFFsfdk package
Speech Note has many build-time and run-time dependencies. This includes shared and static libraries,3rd-party executables, Python and Perl scripts. Because of these complexity, the recommended way to buildis to use Flatpak tool-chain (Flatpak manifest file andflatpak-builder).If you want to make a direct build (i.e. without flatpak) it is also possible but more complicated.
git clone<git repository url>cd dsnotemkdir buildcd buildcmake ../ -DCMAKE_BUILD_TYPE=Release -DWITH_DESKTOP=ONmake
To make build without support for Python components, add-DWITH_PY=OFF
in cmake step.
To see other build options search foroption(BUILD_XXX)
inCMakeList.txt
file.
All models available for download are specified in the configuration file (config/models.json).To enable a custom model that is compatible with currently supported engines, simply edit this file and restart the application.
When you first run the application, the models configuration file is created in:
~/.local/share/net.mkiol/dsnote/models.json
, or~/.var/app/net.mkiol.SpeechNote/data/net.mkiol/dsnote/models.json
(Flatpak), or~/.local/share/org.mkiol/dsnote/models.json
(Sailfish OS)
You can freely edit currently enabled models or add new ones.
Model definition looks like this:
{ "name": "<model name>", "model_id": "<model unique id>", "engine": "<engine type>", "lang_id": "<lang id>", "checksum": "<md5 checksum>", "checksum_quick": "<partial md5 checksum>", "comp": "<compression type", "urls": [ <model URLs> ], "size": "<download size of all files>"}
Allowed engine types:stt_ds
,stt_vosk
,stt_april
,stt_whisper
,stt_fasterwhisper
,tts_piper
,tts_rhvoice
,tts_espeak
,tts_coqui
,tts_mimic3
,mnt_bergamot
Allowed compression types:none
,gz
,xz
,tarxz
,targz
,zip
,zipall
,dir
,dirgz
Allowed URL types:http
,https
,file
Checksums are calculated for all files after unpacking. If you are adding a new model, you can use the--gen-checksums
command line option to find the right checksums. To do this, put empty strings in bothchecksum
andchecksum_quick
, save the file and run Speech Note with the mentioned option.
For example:
{ "name": "New Piper Voice", "model_id": "en_piper_new", "engine": "tts_piper", "lang_id": "en", "checksum": "", "checksum_quick": "", "size": "" "comp": "dir", "urls": [ "file:///home/me/models/new-model-medium.onnx", "file:///home/me/models/new-model-medium.onnx.json" ]}
flatpak run net.mkiol.SpeechNote --verbose --gen-checksums
Any contribution is very welcome!
Project is hosted both onGitHub andGitLab.Feel free to make a PR/MR, report an issue or reqest for new feature on the platform you prefer the most.
Translation files in Qt format are intranslations
directory.
Preferred way to contribute translation is viaTransifex service,but if you would like to make a direct PR/MR, please do it.
If you findSpeech Note useful and would like to support this project,please consider doing one or two of the following:
- Give a ⭐ onGitHub or/andGitLab.
- Write a review in your applications manager app (Discover, Software or any other).
- Tell others about this app by mentioning it on social media.
- If you have spare money, make a small donation viako-fi (one time) orLiberapay (recurring).
Speech Note relies on following open source projects:
- Qt
- Coqui STT
- Coqui TTS
- Vosk
- whisper.cpp
- WebRTC VAD
- libarchive
- RNNoise-nu
- {fmt}
- Hugging Face Transformers
- Piper
- RHVoice
- ssplit-cpp
- espeak-ng
- bergamot-translator
- Rubber Band Library
- simdjson
- Nlohmann JSON
- uroman
- astrunc
- FFmpeg
- LAME
- Vorbis
- TagLib
- libnumbertext
- KDBusAddons
- QHotkey
- faster-whisper
- Mimic 3
- Unikud
- april-asr
- Opus
- html2md
- maddy
- WhisperSpeech
- libxdo
Speech Note 4.7 changes video (Speech Note 4.7)
Speech Note 4.6 changes video (Speech Note 4.6)
Speech Note 4.5 changes video (Speech Note 4.5)
Screenshots (Speech Note 4.5)
lwn.net (Speech Note 4.6)
Softpedia (Speech Note 4.6)
OSTechNix (Speech Note 4.6)
Best FREE Speech-to-Text For Linux Mint video (Speech Note 4.6)
Marco's Box (Speech Note 4.4, Italian)
Marco's Box video (Speech Note 4.4, Italian)
alternativalinux (Speech Note 4.4, Italian)
alternativalinux video (Speech Note 4.4, Italian)
ZDNET (Speech Note 4.2)
Translator feature video demo on Sailfish OS (Speech Note 4.0)
Translator feature video demo on PinePhone (Speech Note 4.0)
DebugPoint.com (Speech Note 4.0)
DebugPoint.com video (Speech Note 4.0)
OMG! Linux (Speech Note 4.0)
LinuxLinks (Speech Note 4.0)
The Linux Cast video (Speech Note 4.0)
CONNECTwww.com (Speech Note 4.0)
Speech Note is an open source project. Source code is released under theMozilla Public License Version 2.0.
3rd party libraries:
- Coqui STT, released under theMozilla Public License Version 2.0
- Coqui TTS, released under theMozilla Public License Version 2.0
- Vosk API, released uder theApache License 2.0
- whisper.cpp, released under theMIT License
- WebRTC, released underthis license
- libarchive, released under theBSD License
- RNNoise-nu, released under theBSD 3-Clause License
- {fmt}, released uderthis license
- Hugging Face Transformers, released under theApache License 2.0
- Piper, released under theMIT License
- RHVoice, released under theGNU General Public License v2.0
- ssplit-cpp, released under theApache License 2.0
- espeak-ng, released under theGNU General Public License v3.0
- bergamot-translator, released under theMozilla Public License 2.0
- Rubber Band Library, released under theGNU General Public License (version 2 or later)
- simdjson, released under theApache License 2.0
- Nlohmann JSON, released under theMIT License
- uroman, released underthis license
- astrunc, released under theMIT License
- FFmpeg, released under theGNU Lesser General Public License version 2.1 or later
- LAME, released under the LGPL
- Vorbis, released underthis license
- TagLib, released under theGNU Lesser General Public License (LGPL)andMozilla Public License (MPL)
- libnumbertext, released under theBSD License
- KDBusAddons, released under theLGPL licenses
- QHotkey, released under theBSD-3-Clause License
- faster-whisper, released under theMIT License
- Mimic 3, released under theAGPL-3.0 license
- Unikud, released under theMIT License
- april-asr, released under theGNU General Public License v3.0
- libopus, released underthis license
- html2md, released under theMIT License
- maddy, released under theMIT License
- WhisperSpeech, released under theMIT License
The files in the directorynonbreaking_prefixes
were copied frommosesdecoder project and distributed under theGNU Lesser General Public License v2.1.
About
Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.