Google Speech-to-Text Audio Transcripts

TheSpeechToTextLoader allows to transcribe audio files with theGoogle Cloud Speech-to-Text API and loads the transcribed text into documents.

To use it, you should have thegoogle-cloud-speech python package installed, and a Google Cloud project with theSpeech-to-Text API enabled.

Bringing the power of large models to Google Cloud’s Speech API

Installation & setup

First, you need to install thegoogle-cloud-speech python package.

You can find more info about it on theSpeech-to-Text client libraries page.

Follow thequickstart guide in the Google Cloud documentation to create a project and enable the API.

%pip install--upgrade--quiet langchain-google-community[speech]

Example

TheSpeechToTextLoader must include theproject_id andfile_path arguments. Audio files can be specified as a Google Cloud Storage URI (gs://...) or a local file path.

Only synchronous requests are supported by the loader, which has alimit of 60 seconds or 10MB per audio file.

from langchain_google_communityimport SpeechToTextLoader

project_id="<PROJECT_ID>"
file_path="gs://cloud-samples-data/speech/audio.flac"
# or a local file path: file_path = "./audio.wav"

loader= SpeechToTextLoader(project_id=project_id, file_path=file_path)

docs= loader.load()

API Reference:SpeechToTextLoader

Note: Callingloader.load() blocks until the transcription is finished.

The transcribed text is available in thepage_content:

docs[0].page_content

"How old is the Brooklyn Bridge?"

Themetadata contains the full JSON response with more meta information:

docs[0].metadata

{
  'language_code': 'en-US',
  'result_end_offset': datetime.timedelta(seconds=1)
}

Recognition Config

You can specify theconfig argument to use different speech recognition models and enable specific features.

Refer to theSpeech-to-Text recognizers documentation and theRecognizeRequest API reference for information on how to set a custom configuation.

If you don't specify aconfig, the following options will be selected automatically:

Model:Chirp Universal Speech Model
Language:en-US
Audio Encoding: Automatically Detected
Automatic Punctuation: Enabled

from google.cloud.speech_v2import(
    AutoDetectDecodingConfig,
    RecognitionConfig,
    RecognitionFeatures,
)
from langchain_google_communityimport SpeechToTextLoader

project_id="<PROJECT_ID>"
location="global"
recognizer_id="<RECOGNIZER_ID>"
file_path="./audio.wav"

config= RecognitionConfig(
    auto_decoding_config=AutoDetectDecodingConfig(),
    language_codes=["en-US"],
    model="long",
    features=RecognitionFeatures(
        enable_automatic_punctuation=False,
        profanity_filter=True,
        enable_spoken_punctuation=True,
        enable_spoken_emojis=True,
),
)

loader= SpeechToTextLoader(
    project_id=project_id,
    location=location,
    recognizer_id=recognizer_id,
    file_path=file_path,
    config=config,
)

API Reference:SpeechToTextLoader

Document loaderconceptual guide
Document loaderhow-to guides

Movatterモバイル変換

Google Speech-to-Text Audio Transcripts

Installation & setup

Example

Recognition Config

Related

Movatterモバイル変換

Installation & setup​

Example​

Recognition Config​

Related​

Installation & setup

Example

Recognition Config

Related