gcloud alpha container ai profiles manifests create Stay organized with collections Save and categorize content based on your preferences.
- NAME
- gcloud alpha container ai profiles manifests create - generate ready-to-deploy Kubernetes manifests with compute, load balancing, and autoscaling capabilities
- SYNOPSIS
gcloud alpha container ai profiles manifests create--accelerator-type=ACCELERATOR_TYPE--model=MODEL--model-server=MODEL_SERVER[--model-server-version=MODEL_SERVER_VERSION][--namespace=NAMESPACE][--output=OUTPUT; default="all"][--output-path=OUTPUT_PATH][--target-ntpot-milliseconds=TARGET_NTPOT_MILLISECONDS][GCLOUD_WIDE_FLAG …]
- DESCRIPTION
(ALPHA)To get supported model, model servers, and model serverversions, rungcloud alpha container ai profilesmodel-and-server-combinations list. To get supported accelerators withtheir performance metrics, rungcloud alpha container ai profilesaccelerators list.- REQUIRED FLAGS
--accelerator-type=ACCELERATOR_TYPE- The accelerator type.
--model=MODEL- The model.
--model-server=MODEL_SERVER- The model server.
- OPTIONAL FLAGS
--model-server-version=MODEL_SERVER_VERSION- The model server version. If not specified, this defaults to the latest version.
--namespace=NAMESPACE- The namespace to deploy the manifests in. Default namespace is 'default'.
--output=OUTPUT; default="all"- The output to display. Default is all.
OUTPUTmust beone of:manifest,comments,all. --output-path=OUTPUT_PATH- The path to save the output to. If not specified, output to the terminal.
--target-ntpot-milliseconds=TARGET_NTPOT_MILLISECONDS- The maximum normalized time per output token (NTPOT) in milliseconds. NTPOT ismeasured as the request_latency / output_tokens. If this is set, the manifestswill include Horizontal Pod Autoscaler (HPA) resources which automaticallyadjust the model server replica count in response to changes in model serverload to keep p50 NTPOT below the specified threshold. If the providedtarget-ntpot-milliseconds is too low to achieve, the HPA manifest will not begenerated.
- GCLOUD WIDE FLAGS
- These flags are available to all commands:
--access-token-file,--account,--billing-project,--configuration,--flags-file,--flatten,--format,--help,--impersonate-service-account,--log-http,--project,--quiet,--trace-token,--user-output-enabled,--verbosity.Run
$gcloud helpfor details. - NOTES
- This command is currently in alpha and might change without notice. If thiscommand fails with API permission errors despite specifying the correct project,you might be trying to access an API with an invitation-only early accessallowlist. This variant is also available:
gcloudcontaineraiprofilesmanifestscreate
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-13 UTC.