gcloud container ai profiles Stay organized with collections Save and categorize content based on your preferences.
- NAME
- gcloud container ai profiles - quickstart engine for GKE AI workloads
- SYNOPSIS
gcloud container ai profilesGROUP|COMMAND[GCLOUD_WIDE_FLAG …]
- DESCRIPTION
- The GKE Inference Quickstart helps simplify deploying AI inference on GoogleKubernetes Engine (GKE). It provides tailored profiles based on Google'sinternal benchmarks. Provide inputs like your preferred open-source model (e.g.Llama, Gemma, or Mistral) and your application's performance target. Based onthese inputs, the quickstart generates accelerator choices with performancemetrics, and detailed, ready-to-deploy profiles for compute, load balancing, andautoscaling. These profiles are provided as standard Kubernetes YAML manifests,which you can deploy or modify.
To visualize the benchmarking data that support these estimates, see theaccompanying Colab notebook:https://colab.research.google.com/github/GoogleCloudPlatform/kubernetes-engine-samples/blob/main/ai-ml/notebooks/giq_visualizations.ipynb
- GCLOUD WIDE FLAGS
- These flags are available to all commands:
--help.Run
$gcloud helpfor details. - GROUPS
is one of the following:GROUPbenchmarks- Manage benchmarks for GKE Inference Quickstart.
manifests- Generate optimized Kubernetes manifests.
model-server-versions- Manage supported model server versions for GKE Inference Quickstart.
model-servers- Manage supported model servers for GKE Inference Quickstart.
models- Manage supported models for GKE Inference Quickstart.
serving-stack-versions- List supported serving stack versions for GKE Inference Quickstart.
serving-stacks- List supported serving stacks for GKE Inference Quickstart.
use-case- List supported use cases for GKE Inference Quickstart.
- COMMANDS
is one of the following:COMMANDlist- List compatible accelerator profiles.
- NOTES
- This variant is also available:
gcloudalphacontaineraiprofiles
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-09 UTC.