pannous/tensorflow-speech-recognitionPublic

NotificationsYou must be signed in to change notification settings
Fork634
Star2.2k

🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks

License

View license

2.2k stars 634 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 333 Commits
.github		.github
.idea		.idea
extra		extra
images		images
layer		layer
tensorpeers @ f571827		tensorpeers @ f571827
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
WarpCTC.txt		WarpCTC.txt
__init__.py		__init__.py
bdlstm_utils.py		bdlstm_utils.py
densenet_layer.py		densenet_layer.py
generate_speech_data.py		generate_speech_data.py
lstm-tflearn.py		lstm-tflearn.py
lstm_ctc_to_chars.py		lstm_ctc_to_chars.py
lstm_mfcc_ctc_to_words.py		lstm_mfcc_ctc_to_words.py
lstm_mfcc_to_chars.py		lstm_mfcc_to_chars.py
lstm_to_chars.py		lstm_to_chars.py
mfcc_feature_classifier.py		mfcc_feature_classifier.py
number_classifier_tflearn.py		number_classifier_tflearn.py
number_gan_layer.py		number_gan_layer.py
number_gan_tflearn.py		number_gan_tflearn.py
record-autoencoder.py		record-autoencoder.py
record.py		record.py
requirements.txt		requirements.txt
speaker_classifier_tflearn.py		speaker_classifier_tflearn.py
spectro_gan.py		spectro_gan.py
speech2text-seq2seq.py		speech2text-seq2seq.py
speech2text-tflearn.py		speech2text-tflearn.py
speech_data.py		speech_data.py
speech_encoder.py		speech_encoder.py
spoken_numbers_pcm.tar		spoken_numbers_pcm.tar
spoken_numbers_spectros_64x64.tar		spoken_numbers_spectros_64x64.tar
subtitle-downloader.py		subtitle-downloader.py
subtitle_srt_parser.py		subtitle_srt_parser.py
wave_GANerate.py		wave_GANerate.py
word_to_phonemes.swift		word_to_phonemes.swift

Repository files navigation

Tensorflow Speech Recognition

Speech recognition using google'stensorflow deep learning framework,sequence-to-sequence neural networks.

Replacescaffe-speech-recognition, see there for some background.

Update 2024: UseWhisper !

This (relatively) old project is NO LONGER UP TO DATE.
The tensorflow 1.0 used is not compatible anymore and the theory is no longer state of the art either.
We highly recommend you check out and usewhisper

Update 2020:Mozilla releasedDeepSpeech

They achieve gooderror rates. Free Speech is in good hands, gothere if you are an end user.For nowthis project is only maintained for educational purposes.

Ultimate goal

Create a decent standalone speech recognition for Linux etc.Some people say we have the models but not enough training data.We disagree: There is plenty of training data (100GBhere and 21GBhere on openslr.org , synthetic Text to Speech snippets, Movies with transcripts, Gutenberg, YouTube with captions etc etc) we just need a simple yet powerful model. It's only a question of time...

Sample spectrogram, Karen uttering 'zero' with 160 words per minute.

Installation

clone code

git clone https://github.com/pannous/tensorflow-speech-recognitioncd tensorflow-speech-recognitiongit clone https://github.com/pannous/layer.gitgit clone https://github.com/pannous/tensorpeers.git

pyaudio

requirements portaudio fromhttp://www.portaudio.com/

git clone  https://git.assembla.com/portaudio.git./configure --prefix=/path/to/your/localmakemake installexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/your/local/libexport LIDRARY_PATH=$LIBRARY_PATH:/path/to/your/local/libexport CPATH=$CPATH:/path/to/your/local/includesource ~/.bashrc

install pyaudio

pip install pyaudio

Getting started

Toy examples:./number_classifier_tflearn.py./speaker_classifier_tflearn.py

Some less trivial architectures:./densenet_layer.py

Later:./train.sh./record.py

Update: Nervanademonstrated that it is possible for 'independents' to build speech recognizers that are state of the art.

Fun tasks for newcomers

Watch video :https://www.youtube.com/watch?v=u9FPqkuoEJ8
Understand and correct the corresponding code:lstm-tflearn.py
Data Augmentation : create on-the-fly modulation of the data: increase the speech frequency, add background noise, alter the pitch etc,...

Extensions

Extensions to current tensorflow which are probably needed:

WarpCTC on the GPU seeissue
Incremental collaborative snapshots ('P2P learning') !
Modular graphs/models + persistance

Even though this project is far from finished we hope it gives you some starting points.

Looking for a tensorflow collaboration / consultant / deep learning contractor? Reach out toinfo@pannous.com

About

🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks

Releases

No releases published

Sponsor this project

Learn more about GitHub Sponsors

Packages

No packages published

Movatterモバイル変換

Uh oh!

License

pannous/tensorflow-speech-recognition

Folders and files

Latest commit

History

Repository files navigation

Tensorflow Speech Recognition

Update 2024: UseWhisper !

Update 2020:Mozilla releasedDeepSpeech

Ultimate goal

Installation

clone code

pyaudio

requirements portaudio fromhttp://www.portaudio.com/

install pyaudio

Getting started

Fun tasks for newcomers

Extensions

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages0

Uh oh!

Contributors2

Uh oh!

Languages

Packages