zakuro-ai/asrPublic

NotificationsYou must be signed in to change notification settings
Fork11
Star68

ASRDeepspeech x Sakura-ML (English/Japanese) with deepspeech2 model in pytorch with support from Zakuro AI.

License

MIT license

68 stars 11 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
asr_deepspeech		asr_deepspeech
docker		docker
notebooks		notebooks
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENCE		LICENCE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Repository files navigation

ASRDeepspeech x Sakura-ML (English/Japanese)

Modules •Code structure •Installing the application •Makefile commands •Environments •Dataset•Running the application•Notes•

This repository offers a clean code version of the original repository from SeanNaren with classes and modularcomponents (eg trainers, models, loggers...).

I have added a configuration file to manage the parameters set in the model. You will also find a pretrained model in japanese performing aCER = 34 on JSUT test set .

Modules

At a granular level, ASRDeepSpeech is a library that consists of the following components:

Component	Description
asr_deepspeech	Speech Recognition package
asr_deepspeech.data	Data related module
asr_deepspeech.data.dataset	Build the dataset
asr_deepspeech.data.loaders	Load the dataset
asr_deepspeech.data.parsers	Parse the dataset
asr_deepspeech.data.samplers	Sample the dataset
asr_deepspeech.decoders	Decode the generated text
asr_deepspeech.loggers	Loggers
asr_deepspeech.modules	Components of the network
asr_deepspeech.parsers	Arguments parser
asr_deepspeech.tests	Test units
asr_deepspeech.trainers	Trainers

Code structure

fromsetuptoolsimportsetupfromasr_deepspeechimport__version__setup(name="asr_deepspeech",version=__version__,short_description="ASRDeepspeech (English / Japanese)",long_description="".join(open("README.md","r").readlines()),long_description_content_type="text/markdown",url="https://github.com/zakuro-ai/asr",license="MIT Licence",author="CADIC Jean-Maximilien",python_requires=">=3.8",packages=["asr_deepspeech","asr_deepspeech.audio","asr_deepspeech.data","asr_deepspeech.data.dataset","asr_deepspeech.data.loaders","asr_deepspeech.data.manifests","asr_deepspeech.data.parsers","asr_deepspeech.data.samplers","asr_deepspeech.decoders","asr_deepspeech.etl","asr_deepspeech.loggers","asr_deepspeech.models","asr_deepspeech.modules","asr_deepspeech.parsers","asr_deepspeech.tests","asr_deepspeech.trainers",    ],include_package_data=True,package_data={"": ["*.yml"]},install_requires=[r.rsplit()[0]forrinopen("requirements.txt")],author_email="git@zakuro.ai",description="ASRDeepspeech (English / Japanese)",platforms="linux_debian_10_x86_64",classifiers=["Programming Language :: Python :: 3.8","License :: OSI Approved :: MIT License",    ],)

Installing the application

To clone and run this application, you'll need the following installed on your computer:

Install bpd:

# Clone this repository and install the codegit clone https://github.com/zakuro-ai/asr# Go into the repositorycd asr

Makefile commands

Exhaustive list of make commands:

pull                # Download the docker imagesandbox             # Launch the sandox image install_wheels      # Install the wheeltests               # Test the code

Environments

We are providing a support for local or docker setup. However we recommend to use docker to avoid any difficulty to runthe code.If you decide to run the code locally you will need python3.6 with cuda>=10.1.Several libraries are needed to be installed for training to work. I will assume that everything is being installed inan Anaconda installation on Ubuntu, with Pytorch 1.0.InstallPyTorch if you haven't already.

Docker

Note
Running this application by using Docker is recommended.

To build and run the docker image

make pullmake sandbox

PythonEnv

Warning
Running this application by using PythonEnv is possible butnot recommended.

make install_wheels

Test

make tests

You should be able to get an output like

=1=TESTPASSED :asr_deepspeech=1=TESTPASSED :asr_deepspeech.data=1=TESTPASSED :asr_deepspeech.data.dataset=1=TESTPASSED :asr_deepspeech.data.loaders=1=TESTPASSED :asr_deepspeech.data.parsers=1=TESTPASSED :asr_deepspeech.data.samplers=1=TESTPASSED :asr_deepspeech.decoders=1=TESTPASSED :asr_deepspeech.loggers=1=TESTPASSED :asr_deepspeech.modules=1=TESTPASSED :asr_deepspeech.parsers=1=TESTPASSED :asr_deepspeech.test=1=TESTPASSED :asr_deepspeech.trainers

Datasets

By default we process the JSUT dataset. See theconfig section to know how to process a custom dataset.

fromgnutools.remoteimportgdrivefromasr_deepspechimportcfg# This will download the JSUT dataset in your /tmpgdrive(cfg.gdrive_uri)

ETL

python -m asr_deepspeech.etl

Running the application

Training a Model

To train on a single gpu

sakura -m asr_deepspeech.trainers

Pretrained model

python -m asr_deepspeech

Notes

Clean verbose during training

================ VARS ===================manifest: cleandistributed: Truetrain_manifest: __data__/manifests/train_clean.jsonval_manifest: __data__/manifests/val_clean.jsonmodel_path: /data/ASRModels/deepspeech_jp_500_clean.pthcontinue_from: Noneoutput_file: /data/ASRModels/deepspeech_jp_500_clean.txtmain_proc: Truerank: 0gpu_rank: 0world_size: 2==========================================

Progress bar

...clean - 0:00:46 >> 2/1000 (1) | Loss 95.1626 | Lr 0.30e-3 | WER/CER 98.06/95.16 - (98.06/[95.16]): 100%|██████████████████████| 18/18 [00:46<00:00,  2.59s/it]clean - 0:00:47 >> 3/1000 (1) | Loss 96.3579 | Lr 0.29e-3 | WER/CER 97.55/97.55 - (98.06/[95.16]): 100%|██████████████████████| 18/18 [00:47<00:00,  2.61s/it]clean - 0:00:47 >> 4/1000 (1) | Loss 97.5705 | Lr 0.29e-3 | WER/CER 100.00/100.00 - (98.06/[95.16]): 100%|████████████████████| 18/18 [00:47<00:00,  2.66s/it]clean - 0:00:48 >> 5/1000 (1) | Loss 97.8628 | Lr 0.29e-3 | WER/CER 98.74/98.74 - (98.06/[95.16]): 100%|██████████████████████| 18/18 [00:50<00:00,  2.78s/it]clean - 0:00:50 >> 6/1000 (5) | Loss 97.0118 | Lr 0.29e-3 | WER/CER 96.26/93.61 - (96.26/[93.61]): 100%|██████████████████████| 18/18 [00:49<00:00,  2.76s/it]clean - 0:00:50 >> 7/1000 (5) | Loss 97.2341 | Lr 0.28e-3 | WER/CER 98.35/98.35 - (96.26/[93.61]):  17%|███▊                   | 3/18 [00:10<00:55,  3.72s/it]...

Separated text file to check wer/cer with histogram on CER values (best/last/worst result)

================= 100.00/34.49 =================----- BEST -----Ref:良ある人ならそんな風にに話しかけないだろうHyp:用ある人ならそんな風にに話しかけないだろうWER:100.0  - CER:4.761904761904762----- LAST -----Ref:すみませんがオースチンさんは5日にはですHyp:すみませんがースンさんは一つかにはですWER:100.0  - CER:25.0----- WORST -----Ref:小切には内がみられるHyp:コには内先金地つ作みが見られるWER:100.0  - CER:90.0CER histogram|###############################################################################|███████████                                                           6  0-10  |███████████████████████████                                          15  10-20 |███████████████████████████████████████████████████████████████████  36  20-30 |█████████████████████████████████████████████████████████████████    35  30-40 |██████████████████████████████████████████████████                   27  40-50 |█████████████████████████████                                        16  50-60 |█████████                                                             5  60-70 |███████████                                                           6  70-80 |                                                                      0  80-90 |█                                                                     1  90-100=============================================

Acknowledgements

Thanks toEgor andRyan for their contributions!

This is a fork fromhttps://github.com/SeanNaren/deepspeech.pytorch. The code has been improved for the readability only.

For any question please contact me at j.cadic[at]protonmail.ch

About

ASRDeepspeech x Sakura-ML (English/Japanese) with deepspeech2 model in pytorch with support from Zakuro AI.

Movatterモバイル変換

License

zakuro-ai/asr

Folders and files

Latest commit

History

Repository files navigation

ASRDeepspeech x Sakura-ML (English/Japanese)

Modules

Code structure

Installing the application

Makefile commands

Environments

Docker

PythonEnv

Test

Datasets

ETL

Running the application

Training a Model

Pretrained model

Notes

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases1

Packages0

Uh oh!

Contributors2

Uh oh!

Languages

Packages