gcloud alpha ml speech recognizers run-batch Stay organized with collections Save and categorize content based on your preferences.
- NAME
- gcloud alpha ml speech recognizers run-batch - get transcripts of long(more than 60 seconds) audio from a gcloud uri
- SYNOPSIS
gcloud alpha ml speech recognizers run-batch(RECOGNIZER:--location=LOCATION)--audio=AUDIO[--async][--hint-boost=HINT_BOOST][--hint-phrase-sets=[PHRASE_SET,…]][--hint-phrases=[PHRASE,…]][--language-codes=[LANGUAGE_CODE,…]][--model=MODEL][--audio-channel-count=AUDIO_CHANNEL_COUNT--encoding=ENCODING--sample-rate=SAMPLE_RATE][--[no-]enable-automatic-punctuation--[no-]enable-spoken-emojis--[no-]enable-spoken-punctuation--[no-]enable-word-confidence--[no-]enable-word-time-offsets--max-alternatives=MAX_ALTERNATIVES--[no-]profanity-filter--[no-]separate-channel-recognition--max-speaker-count=MAX_SPEAKER_COUNT--min-speaker-count=MIN_SPEAKER_COUNT][GCLOUD_WIDE_FLAG …]
- DESCRIPTION
(ALPHA)Get transcripts of long (more than 60 seconds) audio from agcloud uri.- POSITIONAL ARGUMENTS
- Recognizer resource - recognizer. The arguments in this group can be used tospecify the attributes of this resource. (NOTE) Some attributes are not givenarguments in this group but can be set in other ways.
To set the
projectattribute:- provide the argument
recognizeron the command line with a fullyspecified name; - provide the argument
--projecton the command line; - set the property
core/project.
This must be specified.
RECOGNIZER- ID of the recognizer or fully qualified identifier for the recognizer.
To set the
recognizerattribute:- provide the argument
recognizeron the command line.
This positional argument must be specified if any of the other arguments in thisgroup are specified.
- provide the argument
--location=LOCATION- Location of the recognizer.To set the
locationattribute:- provide the argument
recognizeron the command line with a fullyspecified name; - provide the argument
--locationon the command line.
- provide the argument
- provide the argument
- Recognizer resource - recognizer. The arguments in this group can be used tospecify the attributes of this resource. (NOTE) Some attributes are not givenarguments in this group but can be set in other ways.
- REQUIRED FLAGS
--audio=AUDIO- Location of the audio file to transcribe. Must be a audio data bytes, localfile, or Google Cloud Storage URL (in the format gs://bucket/object).
- OPTIONAL FLAGS
--async- Return immediately, without waiting for the operation in progress to complete.The default is
False. --hint-boost=HINT_BOOST- Boost value for the phrases passed to --phrases. Can have a value between 1 and20.
--hint-phrase-sets=[PHRASE_SET,…]- A list of phrase set resource names to use for speech recognition.
--hint-phrases=[PHRASE,…]- A list of strings containing word and phrase "hints" so that the ' 'speechrecognition is more likely to recognize them. This can be ' 'used to improve theaccuracy for specific words and phrases, ' 'for example, if specific commandsare typically spoken by ' 'the user. This can also be used to add additionalwords to the ' 'vocabulary of the recognizer. ' 'Seehttps://cloud.google.com/speech/limits#content.
--language-codes=[LANGUAGE_CODE,…]- Language code is one of
en-US,en-GB,fr-FR. Checkdocumentationfor using more than one language code. --model=MODEL- Which model to use for recognition requests. Select the model best suited toyour domain to get best results. Guidance for choosing which model to use can befound in theTranscriptionModels Documentation and the models supported in each region can be found intheTableOf Supported Models.
- Encoding format
--audio-channel-count=AUDIO_CHANNEL_COUNT- Number of channels present in the audio data sent for recognition. Required if--encoding flag is specified and is not AUTO. Must be set to a value between 1and 8.
--encoding=ENCODING- Encoding format of the provided audio. For headerless formats, must be set to
LINEAR16,MULAW,orALAW. For otherformats, set toAUTO. Overrides the recognizer configuration ifpresent, else uses recognizer encoding. --sample-rate=SAMPLE_RATE- Sample rate in Hertz of the audio data sent for recognition. Required if--encoding flag is specified and is not AUTO. Must be set to a value between8000 and 48000.
- ASR Features
--[no-]enable-automatic-punctuation- If set, adds punctuation to recognition result hypotheses. Use
--enable-automatic-punctuationto enable and--no-enable-automatic-punctuationto disable. --[no-]enable-spoken-emojis- If set, adds spoken emoji formatting. Use
--enable-spoken-emojistoenable and--no-enable-spoken-emojisto disable. --[no-]enable-spoken-punctuation- If set, replaces spoken punctuation with the corresponding symbols in therequest. Use
--enable-spoken-punctuationto enable and--no-enable-spoken-punctuationto disable. --[no-]enable-word-confidence- If set, the top result includes a list of words and the confidence for thosewords. Use
--enable-word-confidenceto enable and--no-enable-word-confidenceto disable. --[no-]enable-word-time-offsets- If set, the top result includes a list of words and their timestamps. Use
--enable-word-time-offsetsto enable and--no-enable-word-time-offsetsto disable. --max-alternatives=MAX_ALTERNATIVES- Maximum number of recognition hypotheses to be returned. Must be set to a valuebetween 1 and 30.
--[no-]profanity-filter- If set, the server will censor profanities. Use
--profanity-filterto enable and--no-profanity-filterto disable. --[no-]separate-channel-recognition- Mode for recognizing multi-channel audio using Separate Channel Recognition.When set, the service will recognize each channel independently. Use
--separate-channel-recognitionto enable and--no-separate-channel-recognitionto disable. - Speaker Diarization
--max-speaker-count=MAX_SPEAKER_COUNT- Maximum number of speakers in the conversation. Must be greater than or equal to--min-speaker-count. Must be set to a value between 1 and 6.
This flag argument must be specified if any of the other arguments in this groupare specified.
--min-speaker-count=MIN_SPEAKER_COUNT- Minimum number of speakers in the conversation. Must be less than or equal to--max-speaker-count. Must be set to a value between 1 and 6.
This flag argument must be specified if any of the other arguments in this groupare specified.
- GCLOUD WIDE FLAGS
- These flags are available to all commands:
--access-token-file,--account,--billing-project,--configuration,--flags-file,--flatten,--format,--help,--impersonate-service-account,--log-http,--project,--quiet,--trace-token,--user-output-enabled,--verbosity.Run
$gcloud helpfor details. - NOTES
- This command is currently in alpha and might change without notice. If thiscommand fails with API permission errors despite specifying the correct project,you might be trying to access an API with an invitation-only early accessallowlist.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-01-21 UTC.