gcloud container ai profiles manifests create

NAME: gcloud container ai profiles manifests create - generate ready-to-deploy Kubernetes manifests with compute, load balancing, and autoscaling capabilities
SYNOPSIS: gcloud container ai profiles manifests create--accelerator-type=ACCELERATOR_TYPE--model=MODEL--model-server=MODEL_SERVER[--model-bucket-uri=MODEL_BUCKET_URI][--model-server-version=MODEL_SERVER_VERSION][--namespace=NAMESPACE][--output=OUTPUT; default="all"][--output-path=OUTPUT_PATH][--serving-stack=SERVING_STACK][--serving-stack-version=SERVING_STACK_VERSION][--target-itl-milliseconds=TARGET_ITL_MILLISECONDS][--target-ntpot-milliseconds=TARGET_NTPOT_MILLISECONDS][--target-ttft-milliseconds=TARGET_TTFT_MILLISECONDS][--use-case=USE_CASE][GCLOUD_WIDE_FLAG …]
DESCRIPTION: To get supported model, model servers, and model server versions, rungcloud alpha container ai profiles model-and-server-combinationslist. To get supported accelerators with their performance metrics, rungcloud alpha container ai profiles accelerators list.
REQUIRED FLAGS: --accelerator-type=ACCELERATOR_TYPE
The accelerator type.
--model=MODEL
The model.
--model-server=MODEL_SERVER
The model server.
OPTIONAL FLAGS: --model-bucket-uri=MODEL_BUCKET_URI
The Google Cloud Storage bucket URI to load the model from. This URI must pointto the directory containing the model's config file (config.json) and modelweights. If unspecified, defaults to loading the model from Hugging Face.
--model-server-version=MODEL_SERVER_VERSION
The model server version. If not specified, this defaults to the latest version.
--namespace=NAMESPACE
The namespace to deploy the manifests in. Default namespace is 'default'.
--output=OUTPUT; default="all"
The output to display. Default is all.OUTPUT must beone of:manifest,comments,all.
--output-path=OUTPUT_PATH
The path to save the output to. If not specified, output to the terminal.
--serving-stack=SERVING_STACK
The serving stack to filter manifests by. If not provided, will default to none.
--serving-stack-version=SERVING_STACK_VERSION
The serving stack version to filter manifests by. If not provided, manifests forall versions that support the given model and model server will be considered.
--target-itl-milliseconds=TARGET_ITL_MILLISECONDS
The target inter-token latency (ITL) in milliseconds. If this is set, themanifest will include Horizontal Pod Autoscaler (HPA) resources whichautomatically adjust the model server replica count in response to changes inmodel server load to keep p50 ITL below the specified threshold. If the providedtarget-itl-milliseconds is too low to achieve, the HPA manifest will not begenerated.
--target-ntpot-milliseconds=TARGET_NTPOT_MILLISECONDS
The maximum normalized time per output token (NTPOT) in milliseconds. NTPOT ismeasured as the request_latency / output_tokens. If this is set, the manifestswill include Horizontal Pod Autoscaler (HPA) resources which automaticallyadjust the model server replica count in response to changes in model serverload to keep p50 NTPOT below the specified threshold. If the providedtarget-ntpot-milliseconds is too low to achieve, the HPA manifest will not begenerated.
--target-ttft-milliseconds=TARGET_TTFT_MILLISECONDS
If specified, results will only show accelerators that can meet the latencytarget and will show their throughput performances at the target ttft target toachieve, the HPA manifest will not be generated.
--use-case=USE_CASE
The manifest will be optimized for this use case. Options are: Advanced CustomerSupport, Code Completion, Text Summarization, Chatbot (ShareGPT), CodeGeneration, Deep Research. Will default to Chatbot if not specified.
GCLOUD WIDE FLAGS: These flags are available to all commands:--access-token-file,--account,--billing-project,--configuration,--flags-file,--flatten,--format,--help,--impersonate-service-account,--log-http,--project,--quiet,--trace-token,--user-output-enabled,--verbosity.
Run$gcloud help for details.
NOTES: This variant is also available:
gcloudalphacontaineraiprofilesmanifestscreate

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.

Movatterモバイル変換

gcloud container ai profiles manifests create Stay organized with collections Save and categorize content based on your preferences.

gcloud container ai profiles manifests create