Chirp 3 Transcription: Enhanced multilingual accuracy

Try Chirp 3 in the Google Cloud consoleTry in ColabView notebook on GitHub

Chirp 3 is the latest generation of Google's multilingual Automatic Speech Recognition (ASR)-specificgenerative models, designed to meet user needs based on feedback and experience. Chirp 3 provides enhanced accuracy and speed beyond previous Chirp models and provides diarization and automatic language detection.

Model details

Chirp 3: Transcription, is exclusively available within the Speech-to-Text API V2.

Model identifiers

You can use Chirp 3: Transcription just like any other model by specifying the appropriate model identifier in your recognition request when using the API or the model name while in the Google Cloud console. Specify the appropriate identifier in your recognition.

ModelModel identifier
Chirp 3chirp_3

API methods

Not all recognition methods support the same language availability sets,because Chirp 3 is available in the Speech-to-Text API V2, it supports thefollowing recognition methods:

API versionAPI methodSupport
V2Speech.StreamingRecognize (good for streaming and real-time audio)Supported
V2Speech.Recognize (good for audio shorter than one minute)Supported
V2Speech.BatchRecognize (good for long audio 1 minute to 1 hour in general, but up to 20 minutes with word-level timestamp enabled)Supported
Note: You can always find the latest list of supported locales and features foreach transcription model, using the locations API as explained here.

Regional availability

Chirp 3 is available in the following Google Cloud regions, with more planned:

Google Cloud ZoneLaunch Readiness
us (multi-region)GA
eu (multi-region)GA
asia-northeast1Preview
asia-southeast1Preview
asia-south1Preview
europe-west2Preview
europe-west3Preview
northamerica-northeast1Preview

Using the locations API as explained here, you can find the latest list ofsupported Google Cloud regions, languages and locales, and features for each transcription model.

Language availability for transcription

Chirp 3 supports transcription inStreamingRecognize,Recognize, andBatchRecognize in thefollowing languages:

LanguageBCP-47 CodeLaunch Readiness
Catalan (Spain)ca-ESGA
Chinese (Simplified, China)cmn-Hans-CNGA
Croatian (Croatia)hr-HRGA
Danish (Denmark)da-DKGA
Dutch (Netherlands)nl-NLGA
English (Australia)en-AUGA
English (United Kingdom)en-GBGA
English (India)en-INGA
English (United States)en-USGA
Finnish (Finland)fi-FIGA
French (Canada)fr-CAGA
French (France)fr-FRGA
German (Germany)de-DEGA
Greek (Greece)el-GRGA
Hindi (India)hi-INGA
Italian (Italy)it-ITGA
Japanese (Japan)ja-JPGA
Korean (Korea)ko-KRGA
Polish (Poland)pl-PLGA
Portuguese (Brazil)pt-BRGA
Portuguese (Portugal)pt-PTGA
Romanian (Romania)ro-ROGA
Russian (Russia)ru-RUGA
Spanish (Spain)es-ESGA
Spanish (United States)es-USGA
Swedish (Sweden)sv-SEGA
Turkish (Turkey)tr-TRGA
Ukrainian (Ukraine)uk-UAGA
Vietnamese (Vietnam)vi-VNGA
Arabicar-XAPreview
Arabic (Algeria)ar-DZPreview
Arabic (Bahrain)ar-BHPreview
Arabic (Egypt)ar-EGPreview
Arabic (Israel)ar-ILPreview
Arabic (Jordan)ar-JOPreview
Arabic (Kuwait)ar-KWPreview
Arabic (Lebanon)ar-LBPreview
Arabic (Mauritania)ar-MRPreview
Arabic (Morocco)ar-MAPreview
Arabic (Oman)ar-OMPreview
Arabic (Qatar)ar-QAPreview
Arabic (Saudi Arabia)ar-SAPreview
Arabic (State of Palestine)ar-PSPreview
Arabic (Syria)ar-SYPreview
Arabic (Tunisia)ar-TNPreview
Arabic (United Arab Emirates)ar-AEPreview
Arabic (Yemen)ar-YEPreview
Armenian (Armenia)hy-AMPreview
Bengali (Bangladesh)bn-BDPreview
Bengali (India)bn-INPreview
Bulgarian (Bulgaria)bg-BGPreview
Burmese (Myanmar)my-MMPreview
Central Kurdish (Iraq)ar-IQPreview
Chinese, Cantonese (Traditional Hong Kong)yue-Hant-HKPreview
Chinese, Mandarin (Traditional, Taiwan)cmn-Hant-TWPreview
Czech (Czech Republic)cs-CZPreview
English (Philippines)en-PHPreview
Estonian (Estonia)et-EEPreview
Filipino (Philippines)fil-PHPreview
Gujarati (India)gu-INPreview
Hebrew (Israel)iw-ILPreview
Hungarian (Hungary)hu-HUPreview
Indonesian (Indonesia)id-IDPreview
Kannada (India)kn-INPreview
Khmer (Cambodia)km-KHPreview
Lao (Laos)lo-LAPreview
Latvian (Latvia)lv-LVPreview
Lithuanian (Lithuania)lt-LTPreview
Malay (Malaysia)ms-MYPreview
Malayalam (India)ml-INPreview
Marathi (India)mr-INPreview
Nepali (Nepal)ne-NPPreview
Norwegian (Norway)no-NOPreview
Persian (Iran)fa-IRPreview
Punjabi (Gurmukhi India)pa-Guru-INPreview
Serbian (Serbia)sr-RSPreview
Slovak (Slovakia)sk-SKPreview
Slovenian (Slovenia)sl-SIPreview
Spanish (Mexico)es-MXPreview
SwahiliswPreview
Tamil (India)ta-INPreview
Telugu (India)te-INPreview
Thai (Thailand)th-THPreview
Uzbek (Uzbekistan)uz-UZPreview

Language availability for diarization

Chirp 3 supports transcription and diarization only inBatchRecognize andRecognize in the following languages:

LanguageBCP-47 Code
Chinese (Simplified, China)cmn-Hans-CN
German (Germany)de-DE
English (United Kingdom)en-GB
English (India)en-IN
English (United States)en-US
Spanish (Spain)es-ES
Spanish (United States)es-US
French (Canada)fr-CA
French (France)fr-FR
Hindi (India)hi-IN
Italian (Italy)it-IT
Japanese (Japan)ja-JP
Korean (Korea)ko-KR
Portuguese (Brazil)pt-BR

Feature support and limitations

Chirp 3 supports the following features:

FeatureDescriptionLaunch stage
Automatic punctuationAutomatically generated by the model and can be optionally disabled.GA
Automatic capitalizationAutomatically generated by the model and can be optionally disabled.GA
Utterance-level timestampsAutomatically generated by the model. Available only inSpeech.StreamingRecognizeGA
Speaker DiarizationAutomatically identifies the different speakers in a single-channel audio sample. Available only inSpeech.BatchRecognizeGA
Speech adaptation (Biasing)Provides hints to the model in the form of phrases or words to improve recognition accuracy for specific terms or proper nouns.GA
Language-agnostic audio transcriptionAutomatically infers and transcribes in the most prevalent language.GA

Chirp 3 doesn't support the following features:

FeatureDescription
Word-level timestampsAutomatically generated by the model and can be optionally enabled, which some transcription degradation is expected. Available only inSpeech.Recognize andSpeech.BatchRecognize
Word-level confidence scoresThe API returns a value, but it isn't truly a confidence score.

Transcribe using Chirp 3

Discover how to use Chirp 3 for transcription tasks.

Perform streaming speech recognition

Python

importosfromgoogle.cloud.speech_v2importSpeechClientfromgoogle.cloud.speech_v2.typesimportcloud_speechfromgoogle.api_core.client_optionsimportClientOptionsPROJECT_ID=os.getenv("GOOGLE_CLOUD_PROJECT")REGION="us"deftranscribe_streaming_chirp3(audio_file:str)->cloud_speech.StreamingRecognizeResponse:"""Transcribes audio from audio file stream using the Chirp 3 model of Google Cloud Speech-to-Text v2 API.   Args:       audio_file (str): Path to the local audio file to be transcribed.           Example: "resources/audio.wav"   Returns:       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API V2 containing       the transcription results.   """# Instantiates a clientclient=SpeechClient(client_options=ClientOptions(api_endpoint=f"{REGION}-speech.googleapis.com",))# Reads a file as byteswithopen(audio_file,"rb")asf:content=f.read()# In practice, stream should be a generator yielding chunks of audio datachunk_length=len(content)//5stream=[content[start:start+chunk_length]forstartinrange(0,len(content),chunk_length)]audio_requests=(cloud_speech.StreamingRecognizeRequest(audio=audio)foraudioinstream)recognition_config=cloud_speech.RecognitionConfig(auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),language_codes=["en-US"],model="chirp_3",)streaming_config=cloud_speech.StreamingRecognitionConfig(config=recognition_config)config_request=cloud_speech.StreamingRecognizeRequest(recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",streaming_config=streaming_config,)defrequests(config:cloud_speech.RecognitionConfig,audio:list)->list:yieldconfigyield fromaudio# Transcribes the audio into textresponses_iterator=client.streaming_recognize(requests=requests(config_request,audio_requests))responses=[]forresponseinresponses_iterator:responses.append(response)forresultinresponse.results:print(f"Transcript:{result.alternatives[0].transcript}")returnresponses

Perform synchronous speech recognition

Python

importosfromgoogle.cloud.speech_v2importSpeechClientfromgoogle.cloud.speech_v2.typesimportcloud_speechfromgoogle.api_core.client_optionsimportClientOptionsPROJECT_ID=os.getenv("GOOGLE_CLOUD_PROJECT")REGION="us"deftranscribe_sync_chirp3(audio_file:str)->cloud_speech.RecognizeResponse:"""Transcribes an audio file using the Chirp 3 model of Google Cloud Speech-to-Text V2 API.   Args:       audio_file (str): Path to the local audio file to be transcribed.           Example: "resources/audio.wav"   Returns:       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing       the transcription results.   """# Instantiates a clientclient=SpeechClient(client_options=ClientOptions(api_endpoint=f"{REGION}-speech.googleapis.com",))# Reads a file as byteswithopen(audio_file,"rb")asf:audio_content=f.read()config=cloud_speech.RecognitionConfig(auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),language_codes=["en-US"],model="chirp_3",)request=cloud_speech.RecognizeRequest(recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",config=config,content=audio_content,)# Transcribes the audio into textresponse=client.recognize(request=request)forresultinresponse.results:print(f"Transcript:{result.alternatives[0].transcript}")returnresponse

Perform batch speech recognition

Python

importosfromgoogle.cloud.speech_v2importSpeechClientfromgoogle.cloud.speech_v2.typesimportcloud_speechfromgoogle.api_core.client_optionsimportClientOptionsPROJECT_ID=os.getenv("GOOGLE_CLOUD_PROJECT")REGION="us"deftranscribe_batch_3(audio_uri:str,)->cloud_speech.BatchRecognizeResults:"""Transcribes an audio file from a Google Cloud Storage URI using the Chirp 3 model of Google Cloud Speech-to-Text v2 API.   Args:       audio_uri (str): The Google Cloud Storage URI of the input audio file.           E.g., gs://[BUCKET]/[FILE]   Returns:       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing       the transcription results.   """# Instantiates a clientclient=SpeechClient(client_options=ClientOptions(api_endpoint=f"{REGION}-speech.googleapis.com",))config=cloud_speech.RecognitionConfig(auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),language_codes=["en-US"],model="chirp_3",)file_metadata=cloud_speech.BatchRecognizeFileMetadata(uri=audio_uri)request=cloud_speech.BatchRecognizeRequest(recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",config=config,files=[file_metadata],recognition_output_config=cloud_speech.RecognitionOutputConfig(inline_response_config=cloud_speech.InlineOutputConfig(),),)# Transcribes the audio into textoperation=client.batch_recognize(request=request)print("Waiting for operation to complete...")response=operation.result(timeout=120)forresultinresponse.results[audio_uri].transcript.results:print(f"Transcript:{result.alternatives[0].transcript}")returnresponse.results[audio_uri].transcript

Use Chirp 3 Features

Explore how you can use the latest features, with code examples:

Perform a language-agnostic transcription

Chirp 3, can automatically identify and transcribe in the dominant language spoken in the audio which is essential for multilingual applications. To achieve this setlanguage_codes=["auto"] as indicated in the code example:

Python

importosfromgoogle.cloud.speech_v2importSpeechClientfromgoogle.cloud.speech_v2.typesimportcloud_speechfromgoogle.api_core.client_optionsimportClientOptionsPROJECT_ID=os.getenv("GOOGLE_CLOUD_PROJECT")REGION="us"deftranscribe_sync_chirp3_auto_detect_language(audio_file:str)->cloud_speech.RecognizeResponse:"""Transcribes an audio file and auto-detect spoken language using Chirp 3.   Please see https://cloud.google.com/speech-to-text/docs/encoding for more   information on which audio encodings are supported.   Args:       audio_file (str): Path to the local audio file to be transcribed.           Example: "resources/audio.wav"   Returns:       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing       the transcription results.   """# Instantiates a clientclient=SpeechClient(client_options=ClientOptions(api_endpoint=f"{REGION}-speech.googleapis.com",))# Reads a file as byteswithopen(audio_file,"rb")asf:audio_content=f.read()config=cloud_speech.RecognitionConfig(auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),language_codes=["auto"],# Set language code to auto to detect language.model="chirp_3",)request=cloud_speech.RecognizeRequest(recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",config=config,content=audio_content,)# Transcribes the audio into textresponse=client.recognize(request=request)forresultinresponse.results:print(f"Transcript:{result.alternatives[0].transcript}")print(f"Detected Language:{result.language_code}")returnresponse

Perform a language-restricted transcription

Chirp 3 can automatically identify and transcribe the dominant language in an audio file. You can also condition it on specific locales you expect, for example:["en-US", "fr-FR"], which would focus the model's resources on the most probable languages for more reliable results, as demonstrated in the code example:

Python

importosfromgoogle.cloud.speech_v2importSpeechClientfromgoogle.cloud.speech_v2.typesimportcloud_speechfromgoogle.api_core.client_optionsimportClientOptionsPROJECT_ID=os.getenv("GOOGLE_CLOUD_PROJECT")REGION="us"deftranscribe_sync_3_auto_detect_language(audio_file:str)->cloud_speech.RecognizeResponse:"""Transcribes an audio file and auto-detect spoken language using Chirp 3.   Please see https://cloud.google.com/speech-to-text/docs/encoding for more   information on which audio encodings are supported.   Args:       audio_file (str): Path to the local audio file to be transcribed.           Example: "resources/audio.wav"   Returns:       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing       the transcription results.   """# Instantiates a clientclient=SpeechClient(client_options=ClientOptions(api_endpoint=f"{REGION}-speech.googleapis.com",))# Reads a file as byteswithopen(audio_file,"rb")asf:audio_content=f.read()config=cloud_speech.RecognitionConfig(auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),language_codes=["en-US","fr-FR"],# Set language codes of the expected spoken localesmodel="chirp_3",)request=cloud_speech.RecognizeRequest(recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",config=config,content=audio_content,)# Transcribes the audio into textresponse=client.recognize(request=request)forresultinresponse.results:print(f"Transcript:{result.alternatives[0].transcript}")print(f"Detected Language:{result.language_code}")returnresponse

Perform transcription and speaker diarization

Use Chirp 3 for transcription and diarization tasks.

Python

importosfromgoogle.cloud.speech_v2importSpeechClientfromgoogle.cloud.speech_v2.typesimportcloud_speechfromgoogle.api_core.client_optionsimportClientOptionsPROJECT_ID=os.getenv("GOOGLE_CLOUD_PROJECT")REGION="us"deftranscribe_batch_chirp3(audio_uri:str,)->cloud_speech.BatchRecognizeResults:"""Transcribes an audio file from a Google Cloud Storage URI using the Chirp 3 model of Google Cloud Speech-to-Text V2 API.   Args:       audio_uri (str): The Google Cloud Storage URI of the input         audio file. E.g., gs://[BUCKET]/[FILE]   Returns:       cloud_speech.RecognizeResponse: The response from the         Speech-to-Text API containing the transcription results.   """# Instantiates a client.client=SpeechClient(client_options=ClientOptions(api_endpoint=f"{REGION}-speech.googleapis.com",))config=cloud_speech.RecognitionConfig(auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),language_codes=["en-US"],# Use "auto" to detect language.model="chirp_3",features=cloud_speech.RecognitionFeatures(# Enable diarization by setting empty diarization configuration.diarization_config=cloud_speech.SpeakerDiarizationConfig(),),)file_metadata=cloud_speech.BatchRecognizeFileMetadata(uri=audio_uri)request=cloud_speech.BatchRecognizeRequest(recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",config=config,files=[file_metadata],recognition_output_config=cloud_speech.RecognitionOutputConfig(inline_response_config=cloud_speech.InlineOutputConfig(),),)# Creates audio transcription job.operation=client.batch_recognize(request=request)print("Waiting for transcription job to complete...")response=operation.result(timeout=120)forresultinresponse.results[audio_uri].transcript.results:print(f"Transcript:{result.alternatives[0].transcript}")print(f"Detected Language:{result.language_code}")print(f"Speakers per word:{result.alternatives[0].words}")returnresponse.results[audio_uri].transcript

Improve accuracy with model adaptation

Chirp 3 can improve transcription accuracy for your specific audio using model adaptation. This lets you to provide a list of specific words and phrases, increasing the likelihood that the model will recognize them. It's especially useful for domain-specific terms, proper nouns, or unique vocabulary.

Note:chirp_3 supports a dictionary of up to 1,000 phrases for adaptation. We recommend using as few entries as possible to prevent degradation on non-adaptation terms.

Python

importosfromgoogle.cloud.speech_v2importSpeechClientfromgoogle.cloud.speech_v2.typesimportcloud_speechfromgoogle.api_core.client_optionsimportClientOptionsPROJECT_ID=os.getenv("GOOGLE_CLOUD_PROJECT")REGION="us"deftranscribe_sync_chirp3_model_adaptation(audio_file:str)->cloud_speech.RecognizeResponse:"""Transcribes an audio file using the Chirp 3 model with adaptation, improving accuracy for specific audio characteristics or vocabulary.   Args:       audio_file (str): Path to the local audio file to be transcribed.           Example: "resources/audio.wav"   Returns:       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing       the transcription results.   """# Instantiates a clientclient=SpeechClient(client_options=ClientOptions(api_endpoint=f"{REGION}-speech.googleapis.com",))# Reads a file as byteswithopen(audio_file,"rb")asf:audio_content=f.read()config=cloud_speech.RecognitionConfig(auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),language_codes=["en-US"],model="chirp_3",# Use model adaptationadaptation=cloud_speech.SpeechAdaptation(phrase_sets=[cloud_speech.SpeechAdaptation.AdaptationPhraseSet(inline_phrase_set=cloud_speech.PhraseSet(phrases=[{"value":"alphabet",},{"value":"cell phone service",}]))]))request=cloud_speech.RecognizeRequest(recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",config=config,content=audio_content,)# Transcribes the audio into textresponse=client.recognize(request=request)forresultinresponse.results:print(f"Transcript:{result.alternatives[0].transcript}")returnresponse

Enable denoiser

Chirp 3 can enhance audio quality by reducing background noise. You can improve results from noisy environments by enabling the built-in denoiser.

Settingdenoiser_audio=true can effectively help you reduce background music or noiseslike rain and street traffic.

Note: The denoiser can't remove background human voices.

Python

importosfromgoogle.cloud.speech_v2importSpeechClientfromgoogle.cloud.speech_v2.typesimportcloud_speechfromgoogle.api_core.client_optionsimportClientOptionsPROJECT_ID=os.getenv("GOOGLE_CLOUD_PROJECT")REGION="us"deftranscribe_sync_chirp3_with_timestamps(audio_file:str)->cloud_speech.RecognizeResponse:"""Transcribes an audio file using the Chirp 3 model of Google Cloud Speech-to-Text v2 API, which provides word-level timestamps for each transcribed word.   Args:       audio_file (str): Path to the local audio file to be transcribed.           Example: "resources/audio.wav"   Returns:       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing       the transcription results.   """# Instantiates a clientclient=SpeechClient(client_options=ClientOptions(api_endpoint=f"{REGION}-speech.googleapis.com",))# Reads a file as byteswithopen(audio_file,"rb")asf:audio_content=f.read()config=cloud_speech.RecognitionConfig(auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),language_codes=["en-US"],model="chirp_3",denoiser_config={denoise_audio:True,snr_threshold:0.0,# snr_threshold is deprecated in Chirp3; set to 0.0 to maintain compatibility.})request=cloud_speech.RecognizeRequest(recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",config=config,content=audio_content,)# Transcribes the audio into textresponse=client.recognize(request=request)forresultinresponse.results:print(f"Transcript:{result.alternatives[0].transcript}")returnresponse

Use Chirp 3 in the Google Cloud console

  1. Sign up for a Google Cloud account, and create a project.
  2. Go toSpeech in the Google Cloud console.
  3. If the API isn't enabled, enable the API.
  4. Make sure that you have an STT consoleWorkspace. If you don't have a workspace, you must create a workspace.

    1. Go to thetranscriptions page, and clickNew Transcription.

    2. Open theWorkspace drop-down and clickNew Workspace to create a workspace for transcription.

    3. From theCreate a new workspace navigation sidebar, clickBrowse.

    4. Click to create a new bucket.

    5. Enter a name for your bucket and clickContinue.

    6. ClickCreate to create your Cloud Storage bucket.

    7. After the bucket is created, clickSelect to select your bucket for use.

    8. ClickCreate to finish creating your workspace for the Speech-to-Text API V2 console.

  5. Perform a transcription on your actual audio.

    The Speech-to-Text transcription creation page, showing file selection or upload.
    The Speech-to-Text transcription creation page, showing file selection or upload.

    From theNew Transcription page, select your audio file through either upload (Local upload) or specifying an existing Cloud Storage file (Cloud storage).

  6. ClickContinue to move to theTranscription options.

    1. Select theSpoken language that you plan to use for recognition with Chirp from your previously created recognizer.

    2. In the model drop-down, selectchirp_3.

    3. In theRecognizer drop-down, select your newly created recognizer.

    4. ClickSubmit to run your first recognition request usingchirp_3.

  7. View your Chirp 3 transcription result.

    1. From theTranscriptions page, click the name of the transcription to view its result.

    2. In theTranscription details page, view your transcription result, and optionally playback the audio in the browser.

What's next

  • Learn how totranscribe short audio files.
  • Learn how totranscribe streaming audio.
  • Learn how totranscribe long audio files.
  • For best performance, accuracy, and other tips, see thebest practices documentation.

  • Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

    Last updated 2026-02-18 UTC.