gcloud container ai profiles manifests create Stay organized with collections Save and categorize content based on your preferences.
- NAME
- gcloud container ai profiles manifests create - generate ready-to-deploy Kubernetes manifests with compute, load balancing, and autoscaling capabilities
- SYNOPSIS
gcloud container ai profiles manifests create--accelerator-type=ACCELERATOR_TYPE--model=MODEL--model-server=MODEL_SERVER[--model-bucket-uri=MODEL_BUCKET_URI][--model-server-version=MODEL_SERVER_VERSION][--namespace=NAMESPACE][--output=OUTPUT; default="all"][--output-path=OUTPUT_PATH][--serving-stack=SERVING_STACK][--serving-stack-version=SERVING_STACK_VERSION][--target-itl-milliseconds=TARGET_ITL_MILLISECONDS][--target-ntpot-milliseconds=TARGET_NTPOT_MILLISECONDS][--target-ttft-milliseconds=TARGET_TTFT_MILLISECONDS][--use-case=USE_CASE][GCLOUD_WIDE_FLAG …]
- DESCRIPTION
- To get supported model, model servers, and model server versions, run
gcloud alpha container ai profiles model-and-server-combinationslist. To get supported accelerators with their performance metrics, rungcloud alpha container ai profiles accelerators list. - REQUIRED FLAGS
--accelerator-type=ACCELERATOR_TYPE- The accelerator type.
--model=MODEL- The model.
--model-server=MODEL_SERVER- The model server.
- OPTIONAL FLAGS
--model-bucket-uri=MODEL_BUCKET_URI- The Google Cloud Storage bucket URI to load the model from. This URI must pointto the directory containing the model's config file (config.json) and modelweights. If unspecified, defaults to loading the model from Hugging Face.
--model-server-version=MODEL_SERVER_VERSION- The model server version. If not specified, this defaults to the latest version.
--namespace=NAMESPACE- The namespace to deploy the manifests in. Default namespace is 'default'.
--output=OUTPUT; default="all"- The output to display. Default is all.
OUTPUTmust beone of:manifest,comments,all. --output-path=OUTPUT_PATH- The path to save the output to. If not specified, output to the terminal.
--serving-stack=SERVING_STACK- The serving stack to filter manifests by. If not provided, will default to none.
--serving-stack-version=SERVING_STACK_VERSION- The serving stack version to filter manifests by. If not provided, manifests forall versions that support the given model and model server will be considered.
--target-itl-milliseconds=TARGET_ITL_MILLISECONDS- The target inter-token latency (ITL) in milliseconds. If this is set, themanifest will include Horizontal Pod Autoscaler (HPA) resources whichautomatically adjust the model server replica count in response to changes inmodel server load to keep p50 ITL below the specified threshold. If the providedtarget-itl-milliseconds is too low to achieve, the HPA manifest will not begenerated.
--target-ntpot-milliseconds=TARGET_NTPOT_MILLISECONDS- The maximum normalized time per output token (NTPOT) in milliseconds. NTPOT ismeasured as the request_latency / output_tokens. If this is set, the manifestswill include Horizontal Pod Autoscaler (HPA) resources which automaticallyadjust the model server replica count in response to changes in model serverload to keep p50 NTPOT below the specified threshold. If the providedtarget-ntpot-milliseconds is too low to achieve, the HPA manifest will not begenerated.
--target-ttft-milliseconds=TARGET_TTFT_MILLISECONDS- If specified, results will only show accelerators that can meet the latencytarget and will show their throughput performances at the target ttft target toachieve, the HPA manifest will not be generated.
--use-case=USE_CASE- The manifest will be optimized for this use case. Options are: Advanced CustomerSupport, Code Completion, Text Summarization, Chatbot (ShareGPT), CodeGeneration, Deep Research. Will default to Chatbot if not specified.
- GCLOUD WIDE FLAGS
- These flags are available to all commands:
--access-token-file,--account,--billing-project,--configuration,--flags-file,--flatten,--format,--help,--impersonate-service-account,--log-http,--project,--quiet,--trace-token,--user-output-enabled,--verbosity.Run
$gcloud helpfor details. - NOTES
- This variant is also available:
gcloudalphacontaineraiprofilesmanifestscreate
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-18 UTC.