loochanin/pythoncode-tutorialsPublic

forked fromx4nth055/pythoncode-tutorials

NotificationsYou must be signed in to change notification settings
Fork0
Star0

Commit6963f1a

committed

add speech recognition tutorial with transformers

1 parent1facc7f commit6963f1aCopy full SHA for 6963f1a

File tree

8 files changed

+2611

-0

lines changed

README.md
machine-learning/nlp/speech-recognition-transformers

8 files changed

+2611

-0

lines changed

`‎README.md`

Lines changed: 1 addition & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -46,6 +46,7 @@ This is a repository of all the tutorials of [The Python Code](https://www.thepy`
`46`	`46`	`- [Conversational AI Chatbot with Transformers in Python](https://www.thepythoncode.com/article/conversational-ai-chatbot-with-huggingface-transformers-in-python). ([code](machine-learning/nlp/chatbot-transformers))`
`47`	`47`	`- [How to Pretrain BERT using Transformers in Python](https://www.thepythoncode.com/article/pretraining-bert-huggingface-transformers-in-python). ([code](machine-learning/nlp/pretraining-bert))`
`48`	`48`	`- [How to Perform Machine Translation using Transformers in Python](https://www.thepythoncode.com/article/machine-translation-using-huggingface-transformers-in-python). ([code](machine-learning/nlp/machine-translation))`
	`49`	`+- [Speech Recognition using Transformers in Python](https://www.thepythoncode.com/article/speech-recognition-using-huggingface-transformers-in-python). ([code](machine-learning/nlp/speech-recognition-transformers))`
`49`	`50`	`-###[Computer Vision](https://www.thepythoncode.com/topic/computer-vision)`
`50`	`51`	`-[How to Detect Human Faces in Python using OpenCV](https://www.thepythoncode.com/article/detect-faces-opencv-python). ([code](machine-learning/face_detection))`
`51`	`52`	`-[How to Make an Image Classifier in Python using TensorFlow and Keras](https://www.thepythoncode.com/article/image-classification-keras-python). ([code](machine-learning/image-classifier))`

`‎machine-learning/nlp/speech-recognition-transformers/16-122828-0002.wav`

90.7 KB

Binary file not shown.

`‎machine-learning/nlp/speech-recognition-transformers/30-4447-0004.wav`

535 KB

Binary file not shown.

`‎machine-learning/nlp/speech-recognition-transformers/7601-291468-0006.wav`

1.07 MB

Binary file not shown.

`‎machine-learning/nlp/speech-recognition-transformers/AutomaticSpeechRecognition_PythonCodeTutorial.ipynb`

Lines changed: 2457 additions & 0 deletions

Large diffs are not rendered by default.

`‎machine-learning/nlp/speech-recognition-transformers/AutomaticSpeechRecognition_PythonCodeTutorial.py`

Lines changed: 143 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,143 @@`
	`1`	`+# %%`
	`2`	`+# !pip install transformers==4.11.2 datasets soundfile sentencepiece torchaudio pyaudio`
	`3`	`+`
	`4`	`+# %%`
	`5`	`+fromtransformersimport*`
	`6`	`+importtorch`
	`7`	`+importsoundfileassf`
	`8`	`+# import librosa`
	`9`	`+importos`
	`10`	`+importtorchaudio`
	`11`	`+`
	`12`	`+# %%`
	`13`	`+# model_name = "facebook/wav2vec2-base-960h" # 360MB`
	`14`	`+model_name="facebook/wav2vec2-large-960h-lv60-self"# 1.18GB`
	`15`	`+`
	`16`	`+processor=Wav2Vec2Processor.from_pretrained(model_name)`
	`17`	`+model=Wav2Vec2ForCTC.from_pretrained(model_name)`
	`18`	`+`
	`19`	`+# %%`
	`20`	`+# audio_url = "http://www.fit.vutbr.cz/~motlicek/sympatex/f2bjrop1.0.wav"`
	`21`	`+# audio_url = "http://www.fit.vutbr.cz/~motlicek/sympatex/f2bjrop1.1.wav"`
	`22`	`+# audio_url = "http://www.fit.vutbr.cz/~motlicek/sympatex/f2btrop6.0.wav"`
	`23`	`+# audio_url = "https://github.com/x4nth055/pythoncode-tutorials/raw/master/machine-learning/speech-recognition/16-122828-0002.wav"`
	`24`	`+audio_url="https://github.com/x4nth055/pythoncode-tutorials/raw/master/machine-learning/speech-recognition/30-4447-0004.wav"`
	`25`	`+# audio_url = "https://github.com/x4nth055/pythoncode-tutorials/raw/master/machine-learning/speech-recognition/7601-291468-0006.wav"`
	`26`	`+# audio_url = "https://file-examples-com.github.io/uploads/2017/11/file_example_WAV_1MG.wav"`
	`27`	`+# audio_url = "http://www0.cs.ucl.ac.uk/teaching/GZ05/samples/lathe.wav"`
	`28`	`+`
	`29`	`+# %%`
	`30`	`+# load our wav file`
	`31`	`+speech,sr=torchaudio.load(audio_url)`
	`32`	`+speech=speech.squeeze()`
	`33`	`+# or using librosa`
	`34`	`+# speech, sr = librosa.load(audio_file, sr=16000)`
	`35`	`+sr,speech.shape`
	`36`	`+`
	`37`	`+# %%`
	`38`	`+# resample from whatever the audio sampling rate to 16000`
	`39`	`+resampler=torchaudio.transforms.Resample(sr,16000)`
	`40`	`+speech=resampler(speech)`
	`41`	`+speech.shape`
	`42`	`+`
	`43`	`+# %%`
	`44`	`+# tokenize our wav`
	`45`	`+input_values=processor(speech,return_tensors="pt",sampling_rate=16000)["input_values"]`
	`46`	`+input_values.shape`
	`47`	`+`
	`48`	`+# %%`
	`49`	`+# perform inference`
	`50`	`+logits=model(input_values)["logits"]`
	`51`	`+logits.shape`
	`52`	`+`
	`53`	`+# %%`
	`54`	`+# use argmax to get the predicted IDs`
	`55`	`+predicted_ids=torch.argmax(logits,dim=-1)`
	`56`	`+predicted_ids.shape`
	`57`	`+`
	`58`	`+# %%`
	`59`	`+# decode the IDs to text`
	`60`	`+transcription=processor.decode(predicted_ids[0])`
	`61`	`+transcription.lower()`
	`62`	`+`
	`63`	`+# %%`
	`64`	`+defget_transcription(audio_path):`
	`65`	`+# load our wav file`
	`66`	`+speech,sr=torchaudio.load(audio_path)`
	`67`	`+speech=speech.squeeze()`
	`68`	`+# or using librosa`
	`69`	`+# speech, sr = librosa.load(audio_file, sr=16000)`
	`70`	`+# resample from whatever the audio sampling rate to 16000`
	`71`	`+resampler=torchaudio.transforms.Resample(sr,16000)`
	`72`	`+speech=resampler(speech)`
	`73`	`+# tokenize our wav`
	`74`	`+input_values=processor(speech,return_tensors="pt",sampling_rate=16000)["input_values"]`
	`75`	`+# perform inference`
	`76`	`+logits=model(input_values)["logits"]`
	`77`	`+# use argmax to get the predicted IDs`
	`78`	`+predicted_ids=torch.argmax(logits,dim=-1)`
	`79`	`+# decode the IDs to text`
	`80`	`+transcription=processor.decode(predicted_ids[0])`
	`81`	`+returntranscription.lower()`
	`82`	`+`
	`83`	`+# %%`
	`84`	`+get_transcription(audio_url)`
	`85`	`+`
	`86`	`+# %%`
	`87`	`+importpyaudio`
	`88`	`+importwave`
	`89`	`+`
	`90`	`+# the file name output you want to record into`
	`91`	`+filename="recorded.wav"`
	`92`	`+# set the chunk size of 1024 samples`
	`93`	`+chunk=1024`
	`94`	`+# sample format`
	`95`	`+FORMAT=pyaudio.paInt16`
	`96`	`+# mono, change to 2 if you want stereo`
	`97`	`+channels=1`
	`98`	`+# 44100 samples per second`
	`99`	`+sample_rate=16000`
	`100`	`+record_seconds=10`
	`101`	`+# initialize PyAudio object`
	`102`	`+p=pyaudio.PyAudio()`
	`103`	`+# open stream object as input & output`
	`104`	`+stream=p.open(format=FORMAT,`
	`105`	`+channels=channels,`
	`106`	`+rate=sample_rate,`
	`107`	`+input=True,`
	`108`	`+output=True,`
	`109`	`+frames_per_buffer=chunk)`
	`110`	`+frames= []`
	`111`	`+print("Recording...")`
	`112`	`+foriinrange(int(sample_rate/chunk*record_seconds)):`
	`113`	`+data=stream.read(chunk)`
	`114`	`+# if you want to hear your voice while recording`
	`115`	`+# stream.write(data)`
	`116`	`+frames.append(data)`
	`117`	`+print("Finished recording.")`
	`118`	`+# stop and close stream`
	`119`	`+stream.stop_stream()`
	`120`	`+stream.close()`
	`121`	`+# terminate pyaudio object`
	`122`	`+p.terminate()`
	`123`	`+# save audio file`
	`124`	`+# open the file in 'write bytes' mode`
	`125`	`+wf=wave.open(filename,"wb")`
	`126`	`+# set the channels`
	`127`	`+wf.setnchannels(channels)`
	`128`	`+# set the sample format`
	`129`	`+wf.setsampwidth(p.get_sample_size(FORMAT))`
	`130`	`+# set the sample rate`
	`131`	`+wf.setframerate(sample_rate)`
	`132`	`+# write the frames as bytes`
	`133`	`+wf.writeframes(b"".join(frames))`
	`134`	`+# close the file`
	`135`	`+wf.close()`
	`136`	`+`
	`137`	`+# %%`
	`138`	`+get_transcription("recorded.wav")`
	`139`	`+`
	`140`	`+# %%`
	`141`	`+`
	`142`	`+`
	`143`	`+`

`‎machine-learning/nlp/speech-recognition-transformers/README.md`

Lines changed: 5 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,5 @@`
	`1`	`+#[Speech Recognition using Transformers in Python](https://www.thepythoncode.com/article/speech-recognition-using-huggingface-transformers-in-python)`
	`2`	`+To get it running:`
	`3`	+-`pip3 install -r requirements.txt`
	`4`	`+`
	`5`	`+Check the[the tutorial](https://www.thepythoncode.com/article/speech-recognition-using-huggingface-transformers-in-python) and the[Colab notebook](https://colab.research.google.com/drive/1-0M8zvQrOzlZ8U8l7KdPOuLBNtzqtlsz?usp=sharing) for more information.`

`‎machine-learning/nlp/speech-recognition-transformers/requirements.txt`

Lines changed: 5 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,5 @@`
	`1`	`+transformers==4.11.2`
	`2`	`+soundfile`
	`3`	`+sentencepiece`
	`4`	`+torchaudio`
	`5`	`+pyaudio`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit6963f1a

File tree

8 files changed

8 files changed

`‎README.md`

`‎machine-learning/nlp/speech-recognition-transformers/16-122828-0002.wav`

`‎machine-learning/nlp/speech-recognition-transformers/30-4447-0004.wav`

`‎machine-learning/nlp/speech-recognition-transformers/7601-291468-0006.wav`

`‎machine-learning/nlp/speech-recognition-transformers/AutomaticSpeechRecognition_PythonCodeTutorial.ipynb`

`‎machine-learning/nlp/speech-recognition-transformers/AutomaticSpeechRecognition_PythonCodeTutorial.py`

`‎machine-learning/nlp/speech-recognition-transformers/README.md`

`‎machine-learning/nlp/speech-recognition-transformers/requirements.txt`

0 commit comments