Google Speech-to-Text Audio Transcripts
TheSpeechToTextLoader
allows to transcribe audio files with theGoogle Cloud Speech-to-Text API and loads the transcribed text into documents.
To use it, you should have thegoogle-cloud-speech
python package installed, and a Google Cloud project with theSpeech-to-Text API enabled.
Installation & setup
First, you need to install thegoogle-cloud-speech
python package.
You can find more info about it on theSpeech-to-Text client libraries page.
Follow thequickstart guide in the Google Cloud documentation to create a project and enable the API.
%pip install--upgrade--quiet langchain-google-community[speech]
Example
TheSpeechToTextLoader
must include theproject_id
andfile_path
arguments. Audio files can be specified as a Google Cloud Storage URI (gs://...
) or a local file path.
Only synchronous requests are supported by the loader, which has alimit of 60 seconds or 10MB per audio file.
from langchain_google_communityimport SpeechToTextLoader
project_id="<PROJECT_ID>"
file_path="gs://cloud-samples-data/speech/audio.flac"
# or a local file path: file_path = "./audio.wav"
loader= SpeechToTextLoader(project_id=project_id, file_path=file_path)
docs= loader.load()
Note: Callingloader.load()
blocks until the transcription is finished.
The transcribed text is available in thepage_content
:
docs[0].page_content
"How old is the Brooklyn Bridge?"
Themetadata
contains the full JSON response with more meta information:
docs[0].metadata
{
'language_code': 'en-US',
'result_end_offset': datetime.timedelta(seconds=1)
}
Recognition Config
You can specify theconfig
argument to use different speech recognition models and enable specific features.
Refer to theSpeech-to-Text recognizers documentation and theRecognizeRequest
API reference for information on how to set a custom configuation.
If you don't specify aconfig
, the following options will be selected automatically:
- Model:Chirp Universal Speech Model
- Language:
en-US
- Audio Encoding: Automatically Detected
- Automatic Punctuation: Enabled
from google.cloud.speech_v2import(
AutoDetectDecodingConfig,
RecognitionConfig,
RecognitionFeatures,
)
from langchain_google_communityimport SpeechToTextLoader
project_id="<PROJECT_ID>"
location="global"
recognizer_id="<RECOGNIZER_ID>"
file_path="./audio.wav"
config= RecognitionConfig(
auto_decoding_config=AutoDetectDecodingConfig(),
language_codes=["en-US"],
model="long",
features=RecognitionFeatures(
enable_automatic_punctuation=False,
profanity_filter=True,
enable_spoken_punctuation=True,
enable_spoken_emojis=True,
),
)
loader= SpeechToTextLoader(
project_id=project_id,
location=location,
recognizer_id=recognizer_id,
file_path=file_path,
config=config,
)
Related
- Document loaderconceptual guide
- Document loaderhow-to guides