- Notifications
You must be signed in to change notification settings - Fork4
QuickStart. Google Cloud Speech-to-Text API with Python
License
NotificationsYou must be signed in to change notification settings
korniichuk/google-speech
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
You must know sample rate of your audio files like 8000 Hz, 16000 Hz, etc.
In Ubuntu OS just click right button on your audio file and selectProperties-->Audio-->Sample rate
. See image below:
Example: 8000 Hz sample rate of audio on image above.
Seelocal.py file or code below:
importiofromgoogle.cloudimportspeech_v1p1beta1asspeechspeech_file='example.mp3'# Encoding: https://cloud.google.com/# speech-to-text/docs/reference/rest/v1beta1/RecognitionConfigencoding=speech.enums.RecognitionConfig.AudioEncoding.AMRsample_rate_hertz=8000# Language: https://cloud.google.com/# speech-to-text/docs/languageslanguage_code='en-US'client=speech.SpeechClient()withio.open(speech_file,'rb')asaudio_file:content=audio_file.read()audio=speech.types.RecognitionAudio(content=content)config=speech.types.RecognitionConfig(encoding=encoding,sample_rate_hertz=sample_rate_hertz,language_code=language_code,# Enhanced models are only available to projects that# opt in for audio data collection.use_enhanced=True,# A model must be specified to use enhanced model.model='phone_call',profanity_filter=False,enable_automatic_punctuation=True,enable_word_confidence=True)response=client.recognize(config,audio)fori,resultinenumerate(response.results):alternative=result.alternatives[0]print('-'*20)print('First alternative of result {}'.format(i))print('Transcript: {}'.format(alternative.transcript))
Seestorage.py file or code below:
fromgoogle.cloudimportspeech_v1p1beta1asspeechuri='gs://examplebucket/example.mp3'# Encoding: https://cloud.google.com/# speech-to-text/docs/reference/rest/v1beta1/RecognitionConfigencoding='AMR'sample_rate_hertz=8000# Language: https://cloud.google.com/# speech-to-text/docs/languageslanguage_code='en-US'client=speech.SpeechClient()operation=client.long_running_recognize(audio=speech.types.RecognitionAudio(uri=uri),config=speech.types.RecognitionConfig(encoding=encoding,sample_rate_hertz=sample_rate_hertz,language_code=language_code,use_enhanced=True,model='phone_call',profanity_filter=False,enable_automatic_punctuation=True,enable_word_confidence=True))op_result=operation.result()forresultinop_result.results:foralternativeinresult.alternatives:print('='*20)print(alternative.transcript)print(alternative.confidence)