Speech recognition means converting spoken words into text. It used in various artificial intelligence applications such as home automation, speech to text, etc. In this article, you’ll learn how to do basic speech recognition in Python using the Google Speech Recognition API.
Step 1: Install Required Library
We’ll use theSpeechRecognition library in Python. To install it open your terminal or command prompt and run:
!pip install SpeechRecognition
Step 2: Upload your Audio File
When you run the below code it will ask you to upload a file from your computer. Use a clear .wav file for best results. You can download sample audio file from here.
Pythonfromgoogle.colabimportfilesuploaded=files.upload()
Step 3: Convert Audio to Text
If you already have an audio file like a .wav file, you can use this method instead:
Pythonimportspeech_recognitionassrrecognizer=sr.Recognizer()filename=list(uploaded.keys())[0]withsr.AudioFile(filename)assource:print("Reading audio...")audio_data=recognizer.record(source)try:print("\nRecognized Text:")text=recognizer.recognize_google(audio_data)print(text)exceptsr.UnknownValueError:print("Sorry, could not understand the audio.")exceptsr.RequestError:print("Could not connect to Google API.")
Output:
Speech to textSpeech recognition in Python is very easy with the help ofGoogle Speech API. You can use your voice to control programs, take notes or even build voice assistants.
You can download source code fromhere.