Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

ASRDeepspeech x Sakura-ML (English/Japanese) with deepspeech2 model in pytorch with support from Zakuro AI.

License

NotificationsYou must be signed in to change notification settings

zakuro-ai/asr

Repository files navigation



ASRDeepspeech x Sakura-ML (English/Japanese)

ModulesCode structureInstalling the applicationMakefile commandsEnvironmentsDatasetRunning the applicationNotes

This repository offers a clean code version of the original repository from SeanNaren with classes and modularcomponents (eg trainers, models, loggers...).

I have added a configuration file to manage the parameters set in the model. You will also find a pretrained model in japanese performing aCER = 34 on JSUT test set .

Modules

At a granular level, ASRDeepSpeech is a library that consists of the following components:

ComponentDescription
asr_deepspeechSpeech Recognition package
asr_deepspeech.dataData related module
asr_deepspeech.data.datasetBuild the dataset
asr_deepspeech.data.loadersLoad the dataset
asr_deepspeech.data.parsersParse the dataset
asr_deepspeech.data.samplersSample the dataset
asr_deepspeech.decodersDecode the generated text
asr_deepspeech.loggersLoggers
asr_deepspeech.modulesComponents of the network
asr_deepspeech.parsersArguments parser
asr_deepspeech.testsTest units
asr_deepspeech.trainersTrainers

Code structure

fromsetuptoolsimportsetupfromasr_deepspeechimport__version__setup(name="asr_deepspeech",version=__version__,short_description="ASRDeepspeech (English / Japanese)",long_description="".join(open("README.md","r").readlines()),long_description_content_type="text/markdown",url="https://github.com/zakuro-ai/asr",license="MIT Licence",author="CADIC Jean-Maximilien",python_requires=">=3.8",packages=["asr_deepspeech","asr_deepspeech.audio","asr_deepspeech.data","asr_deepspeech.data.dataset","asr_deepspeech.data.loaders","asr_deepspeech.data.manifests","asr_deepspeech.data.parsers","asr_deepspeech.data.samplers","asr_deepspeech.decoders","asr_deepspeech.etl","asr_deepspeech.loggers","asr_deepspeech.models","asr_deepspeech.modules","asr_deepspeech.parsers","asr_deepspeech.tests","asr_deepspeech.trainers",    ],include_package_data=True,package_data={"": ["*.yml"]},install_requires=[r.rsplit()[0]forrinopen("requirements.txt")],author_email="git@zakuro.ai",description="ASRDeepspeech (English / Japanese)",platforms="linux_debian_10_x86_64",classifiers=["Programming Language :: Python :: 3.8","License :: OSI Approved :: MIT License",    ],)

Installing the application

To clone and run this application, you'll need the following installed on your computer:

Install bpd:

# Clone this repository and install the codegit clone https://github.com/zakuro-ai/asr# Go into the repositorycd asr

Makefile commands

Exhaustive list of make commands:

pull                # Download the docker imagesandbox             # Launch the sandox image install_wheels      # Install the wheeltests               # Test the code

Environments

We are providing a support for local or docker setup. However we recommend to use docker to avoid any difficulty to runthe code.If you decide to run the code locally you will need python3.6 with cuda>=10.1.Several libraries are needed to be installed for training to work. I will assume that everything is being installed inan Anaconda installation on Ubuntu, with Pytorch 1.0.InstallPyTorch if you haven't already.

Docker

Note

Running this application by using Docker is recommended.

To build and run the docker image

make pullmake sandbox

PythonEnv

Warning

Running this application by using PythonEnv is possible butnot recommended.

make install_wheels

Test

make tests

You should be able to get an output like

=1=TESTPASSED :asr_deepspeech=1=TESTPASSED :asr_deepspeech.data=1=TESTPASSED :asr_deepspeech.data.dataset=1=TESTPASSED :asr_deepspeech.data.loaders=1=TESTPASSED :asr_deepspeech.data.parsers=1=TESTPASSED :asr_deepspeech.data.samplers=1=TESTPASSED :asr_deepspeech.decoders=1=TESTPASSED :asr_deepspeech.loggers=1=TESTPASSED :asr_deepspeech.modules=1=TESTPASSED :asr_deepspeech.parsers=1=TESTPASSED :asr_deepspeech.test=1=TESTPASSED :asr_deepspeech.trainers

Datasets

By default we process the JSUT dataset. See theconfig section to know how to process a custom dataset.

fromgnutools.remoteimportgdrivefromasr_deepspechimportcfg# This will download the JSUT dataset in your /tmpgdrive(cfg.gdrive_uri)

ETL

python -m asr_deepspeech.etl

Running the application

Training a Model

To train on a single gpu

sakura -m asr_deepspeech.trainers

Pretrained model

python -m asr_deepspeech

Notes

  • Clean verbose during training
    ================ VARS ===================manifest: cleandistributed: Truetrain_manifest: __data__/manifests/train_clean.jsonval_manifest: __data__/manifests/val_clean.jsonmodel_path: /data/ASRModels/deepspeech_jp_500_clean.pthcontinue_from: Noneoutput_file: /data/ASRModels/deepspeech_jp_500_clean.txtmain_proc: Truerank: 0gpu_rank: 0world_size: 2==========================================
  • Progress bar
    ...clean - 0:00:46 >> 2/1000 (1) | Loss 95.1626 | Lr 0.30e-3 | WER/CER 98.06/95.16 - (98.06/[95.16]): 100%|██████████████████████| 18/18 [00:46<00:00,  2.59s/it]clean - 0:00:47 >> 3/1000 (1) | Loss 96.3579 | Lr 0.29e-3 | WER/CER 97.55/97.55 - (98.06/[95.16]): 100%|██████████████████████| 18/18 [00:47<00:00,  2.61s/it]clean - 0:00:47 >> 4/1000 (1) | Loss 97.5705 | Lr 0.29e-3 | WER/CER 100.00/100.00 - (98.06/[95.16]): 100%|████████████████████| 18/18 [00:47<00:00,  2.66s/it]clean - 0:00:48 >> 5/1000 (1) | Loss 97.8628 | Lr 0.29e-3 | WER/CER 98.74/98.74 - (98.06/[95.16]): 100%|██████████████████████| 18/18 [00:50<00:00,  2.78s/it]clean - 0:00:50 >> 6/1000 (5) | Loss 97.0118 | Lr 0.29e-3 | WER/CER 96.26/93.61 - (96.26/[93.61]): 100%|██████████████████████| 18/18 [00:49<00:00,  2.76s/it]clean - 0:00:50 >> 7/1000 (5) | Loss 97.2341 | Lr 0.28e-3 | WER/CER 98.35/98.35 - (96.26/[93.61]):  17%|███▊                   | 3/18 [00:10<00:55,  3.72s/it]...
  • Separated text file to check wer/cer with histogram on CER values (best/last/worst result)
    ================= 100.00/34.49 =================----- BEST -----Ref:良ある人ならそんな風にに話しかけないだろうHyp:用ある人ならそんな風にに話しかけないだろうWER:100.0  - CER:4.761904761904762----- LAST -----Ref:すみませんがオースチンさんは5日にはですHyp:すみませんがースンさんは一つかにはですWER:100.0  - CER:25.0----- WORST -----Ref:小切には内がみられるHyp:コには内先金地つ作みが見られるWER:100.0  - CER:90.0CER histogram|###############################################################################|███████████                                                           6  0-10  |███████████████████████████                                          15  10-20 |███████████████████████████████████████████████████████████████████  36  20-30 |█████████████████████████████████████████████████████████████████    35  30-40 |██████████████████████████████████████████████████                   27  40-50 |█████████████████████████████                                        16  50-60 |█████████                                                             5  60-70 |███████████                                                           6  70-80 |                                                                      0  80-90 |█                                                                     1  90-100=============================================

    Acknowledgements

    Thanks toEgor andRyan for their contributions!

    This is a fork fromhttps://github.com/SeanNaren/deepspeech.pytorch. The code has been improved for the readability only.

    For any question please contact me at j.cadic[at]protonmail.ch

  • About

    ASRDeepspeech x Sakura-ML (English/Japanese) with deepspeech2 model in pytorch with support from Zakuro AI.

    Topics

    Resources

    License

    Stars

    Watchers

    Forks

    Packages

    No packages published

    Contributors2

    •  
    •  

    [8]ページ先頭

    ©2009-2025 Movatter.jp