- Notifications
You must be signed in to change notification settings - Fork11
ASRDeepspeech x Sakura-ML (English/Japanese) with deepspeech2 model in pytorch with support from Zakuro AI.
License
zakuro-ai/asr
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Modules •Code structure •Installing the application •Makefile commands •Environments •Dataset•Running the application•Notes•
This repository offers a clean code version of the original repository from SeanNaren with classes and modularcomponents (eg trainers, models, loggers...).
I have added a configuration file to manage the parameters set in the model. You will also find a pretrained model in japanese performing aCER = 34
on JSUT test set .
At a granular level, ASRDeepSpeech is a library that consists of the following components:
Component | Description |
---|---|
asr_deepspeech | Speech Recognition package |
asr_deepspeech.data | Data related module |
asr_deepspeech.data.dataset | Build the dataset |
asr_deepspeech.data.loaders | Load the dataset |
asr_deepspeech.data.parsers | Parse the dataset |
asr_deepspeech.data.samplers | Sample the dataset |
asr_deepspeech.decoders | Decode the generated text |
asr_deepspeech.loggers | Loggers |
asr_deepspeech.modules | Components of the network |
asr_deepspeech.parsers | Arguments parser |
asr_deepspeech.tests | Test units |
asr_deepspeech.trainers | Trainers |
fromsetuptoolsimportsetupfromasr_deepspeechimport__version__setup(name="asr_deepspeech",version=__version__,short_description="ASRDeepspeech (English / Japanese)",long_description="".join(open("README.md","r").readlines()),long_description_content_type="text/markdown",url="https://github.com/zakuro-ai/asr",license="MIT Licence",author="CADIC Jean-Maximilien",python_requires=">=3.8",packages=["asr_deepspeech","asr_deepspeech.audio","asr_deepspeech.data","asr_deepspeech.data.dataset","asr_deepspeech.data.loaders","asr_deepspeech.data.manifests","asr_deepspeech.data.parsers","asr_deepspeech.data.samplers","asr_deepspeech.decoders","asr_deepspeech.etl","asr_deepspeech.loggers","asr_deepspeech.models","asr_deepspeech.modules","asr_deepspeech.parsers","asr_deepspeech.tests","asr_deepspeech.trainers", ],include_package_data=True,package_data={"": ["*.yml"]},install_requires=[r.rsplit()[0]forrinopen("requirements.txt")],author_email="git@zakuro.ai",description="ASRDeepspeech (English / Japanese)",platforms="linux_debian_10_x86_64",classifiers=["Programming Language :: Python :: 3.8","License :: OSI Approved :: MIT License", ],)
To clone and run this application, you'll need the following installed on your computer:
Install bpd:
# Clone this repository and install the codegit clone https://github.com/zakuro-ai/asr# Go into the repositorycd asr
Exhaustive list of make commands:
pull # Download the docker imagesandbox # Launch the sandox image install_wheels # Install the wheeltests # Test the code
We are providing a support for local or docker setup. However we recommend to use docker to avoid any difficulty to runthe code.If you decide to run the code locally you will need python3.6 with cuda>=10.1.Several libraries are needed to be installed for training to work. I will assume that everything is being installed inan Anaconda installation on Ubuntu, with Pytorch 1.0.InstallPyTorch if you haven't already.
Note
Running this application by using Docker is recommended.
To build and run the docker image
make pullmake sandbox
Warning
Running this application by using PythonEnv is possible butnot recommended.
make install_wheels
make tests
You should be able to get an output like
=1=TESTPASSED :asr_deepspeech=1=TESTPASSED :asr_deepspeech.data=1=TESTPASSED :asr_deepspeech.data.dataset=1=TESTPASSED :asr_deepspeech.data.loaders=1=TESTPASSED :asr_deepspeech.data.parsers=1=TESTPASSED :asr_deepspeech.data.samplers=1=TESTPASSED :asr_deepspeech.decoders=1=TESTPASSED :asr_deepspeech.loggers=1=TESTPASSED :asr_deepspeech.modules=1=TESTPASSED :asr_deepspeech.parsers=1=TESTPASSED :asr_deepspeech.test=1=TESTPASSED :asr_deepspeech.trainers
By default we process the JSUT dataset. See theconfig
section to know how to process a custom dataset.
fromgnutools.remoteimportgdrivefromasr_deepspechimportcfg# This will download the JSUT dataset in your /tmpgdrive(cfg.gdrive_uri)
python -m asr_deepspeech.etl
To train on a single gpu
sakura -m asr_deepspeech.trainers
python -m asr_deepspeech
================ VARS ===================manifest: cleandistributed: Truetrain_manifest: __data__/manifests/train_clean.jsonval_manifest: __data__/manifests/val_clean.jsonmodel_path: /data/ASRModels/deepspeech_jp_500_clean.pthcontinue_from: Noneoutput_file: /data/ASRModels/deepspeech_jp_500_clean.txtmain_proc: Truerank: 0gpu_rank: 0world_size: 2==========================================
...clean - 0:00:46 >> 2/1000 (1) | Loss 95.1626 | Lr 0.30e-3 | WER/CER 98.06/95.16 - (98.06/[95.16]): 100%|██████████████████████| 18/18 [00:46<00:00, 2.59s/it]clean - 0:00:47 >> 3/1000 (1) | Loss 96.3579 | Lr 0.29e-3 | WER/CER 97.55/97.55 - (98.06/[95.16]): 100%|██████████████████████| 18/18 [00:47<00:00, 2.61s/it]clean - 0:00:47 >> 4/1000 (1) | Loss 97.5705 | Lr 0.29e-3 | WER/CER 100.00/100.00 - (98.06/[95.16]): 100%|████████████████████| 18/18 [00:47<00:00, 2.66s/it]clean - 0:00:48 >> 5/1000 (1) | Loss 97.8628 | Lr 0.29e-3 | WER/CER 98.74/98.74 - (98.06/[95.16]): 100%|██████████████████████| 18/18 [00:50<00:00, 2.78s/it]clean - 0:00:50 >> 6/1000 (5) | Loss 97.0118 | Lr 0.29e-3 | WER/CER 96.26/93.61 - (96.26/[93.61]): 100%|██████████████████████| 18/18 [00:49<00:00, 2.76s/it]clean - 0:00:50 >> 7/1000 (5) | Loss 97.2341 | Lr 0.28e-3 | WER/CER 98.35/98.35 - (96.26/[93.61]): 17%|███▊ | 3/18 [00:10<00:55, 3.72s/it]...
================= 100.00/34.49 =================----- BEST -----Ref:良ある人ならそんな風にに話しかけないだろうHyp:用ある人ならそんな風にに話しかけないだろうWER:100.0 - CER:4.761904761904762----- LAST -----Ref:すみませんがオースチンさんは5日にはですHyp:すみませんがースンさんは一つかにはですWER:100.0 - CER:25.0----- WORST -----Ref:小切には内がみられるHyp:コには内先金地つ作みが見られるWER:100.0 - CER:90.0CER histogram|###############################################################################|███████████ 6 0-10 |███████████████████████████ 15 10-20 |███████████████████████████████████████████████████████████████████ 36 20-30 |█████████████████████████████████████████████████████████████████ 35 30-40 |██████████████████████████████████████████████████ 27 40-50 |█████████████████████████████ 16 50-60 |█████████ 5 60-70 |███████████ 6 70-80 | 0 80-90 |█ 1 90-100=============================================
Thanks toEgor andRyan for their contributions!
This is a fork fromhttps://github.com/SeanNaren/deepspeech.pytorch. The code has been improved for the readability only.
For any question please contact me at j.cadic[at]protonmail.ch
About
ASRDeepspeech x Sakura-ML (English/Japanese) with deepspeech2 model in pytorch with support from Zakuro AI.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors2
Uh oh!
There was an error while loading.Please reload this page.