Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit6963f1a

Browse files
committed
add speech recognition tutorial with transformers
1 parent1facc7f commit6963f1a

File tree

8 files changed

+2611
-0
lines changed

8 files changed

+2611
-0
lines changed

‎README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ This is a repository of all the tutorials of [The Python Code](https://www.thepy
4646
- [Conversational AI Chatbot with Transformers in Python](https://www.thepythoncode.com/article/conversational-ai-chatbot-with-huggingface-transformers-in-python). ([code](machine-learning/nlp/chatbot-transformers))
4747
- [How to Pretrain BERT using Transformers in Python](https://www.thepythoncode.com/article/pretraining-bert-huggingface-transformers-in-python). ([code](machine-learning/nlp/pretraining-bert))
4848
- [How to Perform Machine Translation using Transformers in Python](https://www.thepythoncode.com/article/machine-translation-using-huggingface-transformers-in-python). ([code](machine-learning/nlp/machine-translation))
49+
- [Speech Recognition using Transformers in Python](https://www.thepythoncode.com/article/speech-recognition-using-huggingface-transformers-in-python). ([code](machine-learning/nlp/speech-recognition-transformers))
4950
-###[Computer Vision](https://www.thepythoncode.com/topic/computer-vision)
5051
-[How to Detect Human Faces in Python using OpenCV](https://www.thepythoncode.com/article/detect-faces-opencv-python). ([code](machine-learning/face_detection))
5152
-[How to Make an Image Classifier in Python using TensorFlow and Keras](https://www.thepythoncode.com/article/image-classification-keras-python). ([code](machine-learning/image-classifier))
Binary file not shown.
Binary file not shown.
Binary file not shown.

‎machine-learning/nlp/speech-recognition-transformers/AutomaticSpeechRecognition_PythonCodeTutorial.ipynb

Lines changed: 2457 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
# %%
2+
# !pip install transformers==4.11.2 datasets soundfile sentencepiece torchaudio pyaudio
3+
4+
# %%
5+
fromtransformersimport*
6+
importtorch
7+
importsoundfileassf
8+
# import librosa
9+
importos
10+
importtorchaudio
11+
12+
# %%
13+
# model_name = "facebook/wav2vec2-base-960h" # 360MB
14+
model_name="facebook/wav2vec2-large-960h-lv60-self"# 1.18GB
15+
16+
processor=Wav2Vec2Processor.from_pretrained(model_name)
17+
model=Wav2Vec2ForCTC.from_pretrained(model_name)
18+
19+
# %%
20+
# audio_url = "http://www.fit.vutbr.cz/~motlicek/sympatex/f2bjrop1.0.wav"
21+
# audio_url = "http://www.fit.vutbr.cz/~motlicek/sympatex/f2bjrop1.1.wav"
22+
# audio_url = "http://www.fit.vutbr.cz/~motlicek/sympatex/f2btrop6.0.wav"
23+
# audio_url = "https://github.com/x4nth055/pythoncode-tutorials/raw/master/machine-learning/speech-recognition/16-122828-0002.wav"
24+
audio_url="https://github.com/x4nth055/pythoncode-tutorials/raw/master/machine-learning/speech-recognition/30-4447-0004.wav"
25+
# audio_url = "https://github.com/x4nth055/pythoncode-tutorials/raw/master/machine-learning/speech-recognition/7601-291468-0006.wav"
26+
# audio_url = "https://file-examples-com.github.io/uploads/2017/11/file_example_WAV_1MG.wav"
27+
# audio_url = "http://www0.cs.ucl.ac.uk/teaching/GZ05/samples/lathe.wav"
28+
29+
# %%
30+
# load our wav file
31+
speech,sr=torchaudio.load(audio_url)
32+
speech=speech.squeeze()
33+
# or using librosa
34+
# speech, sr = librosa.load(audio_file, sr=16000)
35+
sr,speech.shape
36+
37+
# %%
38+
# resample from whatever the audio sampling rate to 16000
39+
resampler=torchaudio.transforms.Resample(sr,16000)
40+
speech=resampler(speech)
41+
speech.shape
42+
43+
# %%
44+
# tokenize our wav
45+
input_values=processor(speech,return_tensors="pt",sampling_rate=16000)["input_values"]
46+
input_values.shape
47+
48+
# %%
49+
# perform inference
50+
logits=model(input_values)["logits"]
51+
logits.shape
52+
53+
# %%
54+
# use argmax to get the predicted IDs
55+
predicted_ids=torch.argmax(logits,dim=-1)
56+
predicted_ids.shape
57+
58+
# %%
59+
# decode the IDs to text
60+
transcription=processor.decode(predicted_ids[0])
61+
transcription.lower()
62+
63+
# %%
64+
defget_transcription(audio_path):
65+
# load our wav file
66+
speech,sr=torchaudio.load(audio_path)
67+
speech=speech.squeeze()
68+
# or using librosa
69+
# speech, sr = librosa.load(audio_file, sr=16000)
70+
# resample from whatever the audio sampling rate to 16000
71+
resampler=torchaudio.transforms.Resample(sr,16000)
72+
speech=resampler(speech)
73+
# tokenize our wav
74+
input_values=processor(speech,return_tensors="pt",sampling_rate=16000)["input_values"]
75+
# perform inference
76+
logits=model(input_values)["logits"]
77+
# use argmax to get the predicted IDs
78+
predicted_ids=torch.argmax(logits,dim=-1)
79+
# decode the IDs to text
80+
transcription=processor.decode(predicted_ids[0])
81+
returntranscription.lower()
82+
83+
# %%
84+
get_transcription(audio_url)
85+
86+
# %%
87+
importpyaudio
88+
importwave
89+
90+
# the file name output you want to record into
91+
filename="recorded.wav"
92+
# set the chunk size of 1024 samples
93+
chunk=1024
94+
# sample format
95+
FORMAT=pyaudio.paInt16
96+
# mono, change to 2 if you want stereo
97+
channels=1
98+
# 44100 samples per second
99+
sample_rate=16000
100+
record_seconds=10
101+
# initialize PyAudio object
102+
p=pyaudio.PyAudio()
103+
# open stream object as input & output
104+
stream=p.open(format=FORMAT,
105+
channels=channels,
106+
rate=sample_rate,
107+
input=True,
108+
output=True,
109+
frames_per_buffer=chunk)
110+
frames= []
111+
print("Recording...")
112+
foriinrange(int(sample_rate/chunk*record_seconds)):
113+
data=stream.read(chunk)
114+
# if you want to hear your voice while recording
115+
# stream.write(data)
116+
frames.append(data)
117+
print("Finished recording.")
118+
# stop and close stream
119+
stream.stop_stream()
120+
stream.close()
121+
# terminate pyaudio object
122+
p.terminate()
123+
# save audio file
124+
# open the file in 'write bytes' mode
125+
wf=wave.open(filename,"wb")
126+
# set the channels
127+
wf.setnchannels(channels)
128+
# set the sample format
129+
wf.setsampwidth(p.get_sample_size(FORMAT))
130+
# set the sample rate
131+
wf.setframerate(sample_rate)
132+
# write the frames as bytes
133+
wf.writeframes(b"".join(frames))
134+
# close the file
135+
wf.close()
136+
137+
# %%
138+
get_transcription("recorded.wav")
139+
140+
# %%
141+
142+
143+
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
#[Speech Recognition using Transformers in Python](https://www.thepythoncode.com/article/speech-recognition-using-huggingface-transformers-in-python)
2+
To get it running:
3+
-`pip3 install -r requirements.txt`
4+
5+
Check the[the tutorial](https://www.thepythoncode.com/article/speech-recognition-using-huggingface-transformers-in-python) and the[Colab notebook](https://colab.research.google.com/drive/1-0M8zvQrOzlZ8U8l7KdPOuLBNtzqtlsz?usp=sharing) for more information.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
transformers==4.11.2
2+
soundfile
3+
sentencepiece
4+
torchaudio
5+
pyaudio

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp