gcloud container ai profiles benchmarks list

NAME
gcloud container ai profiles benchmarks list - list benchmarks for a given model and model server
SYNOPSIS
gcloud container ai profiles benchmarks list--model=MODEL--model-server=MODEL_SERVER[--format=FORMAT][--instance-type=INSTANCE_TYPE][--model-server-version=MODEL_SERVER_VERSION][--pricing-model=PRICING_MODEL][--serving-stack=SERVING_STACK][--serving-stack-version=SERVING_STACK_VERSION][--use-case=USE_CASE][--filter=EXPRESSION][--limit=LIMIT][--page-size=PAGE_SIZE][--sort-by=[FIELD,…]][--uri][GCLOUD_WIDE_FLAG]
DESCRIPTION
This command lists all benchmarking data for a given model and model server. Bydefault, the benchmarks are displayed in a CSV format.

For examples of visualizing the benchmarking data, see the accompanying Colabnotebook:https://colab.research.google.com/github/GoogleCloudPlatform/kubernetes-engine-samples/blob/main/ai-ml/notebooks/giq_visualizations.ipynb

REQUIRED FLAGS
--model=MODEL
The model.
--model-server=MODEL_SERVER
The model server.
FLAGS
--format=FORMAT
The format to print the output in. Default is csvprofile, which displays theprofile information in a CSV format, includingcost conversions.
--instance-type=INSTANCE_TYPE
The instance type. If not specified, this defaults to anyinstance type.
--model-server-version=MODEL_SERVER_VERSION
The model server version. Default is latest. Other options include the modelserver version of a profile, all which returns all versions.
--pricing-model=PRICING_MODEL
The pricing model to use to calculate token cost. Currently, this supportson-demand, spot, 3-years-cud, 1-year-cud
--serving-stack=SERVING_STACK
The serving stack to filter benchmarking data by. If not provided, benchmarkingdata for all serving stacks that support the given model and model server willbe returned.
--serving-stack-version=SERVING_STACK_VERSION
The serving stack version to filter benchmarking data by. If not provided,benchmarking data for all versions that support the given model and model serverwill be returned.
--use-case=USE_CASE
If specified, results will only show profiles that match the provided use case.Options are: Advanced Customer Support, Code Completion, Text Summarization,Chatbot (ShareGPT), Code Generation, Deep Research.
LIST COMMAND FLAGS
--filter=EXPRESSION
Apply a Boolean filterEXPRESSION to each resource itemto be listed. If the expression evaluatesTrue, then that item islisted. For more details and examples of filter expressions, run $gcloud topic filters. This flaginteracts with other flags that are applied in this order:--flatten,--sort-by,--filter,--limit.
--limit=LIMIT
Maximum number of resources to list. The default isunlimited. Thisflag interacts with other flags that are applied in this order:--flatten,--sort-by,--filter,--limit.
--page-size=PAGE_SIZE
Some services group resource list output into pages. This flag specifies themaximum number of resources per page. The default is determined by the serviceif it supports paging, otherwise it isunlimited (no paging).Paging may be applied before or after--filter and--limit depending on the service.
--sort-by=[FIELD,…]
Comma-separated list of resource field key names to sort by. The default orderis ascending. Prefix a field with ``~´´ for descending order on thatfield. This flag interacts with other flags that are applied in this order:--flatten,--sort-by,--filter,--limit.
--uri
Print a list of resource URIs instead of the default output, and change thecommand output to a list of URIs. If this flag is used with--format, the formatting is applied on this URI list. To displayURIs alongside other keys instead, use theuri() transform.
GCLOUD WIDE FLAGS
These flags are available to all commands:--access-token-file,--account,--billing-project,--configuration,--flags-file,--flatten,--format,--help,--impersonate-service-account,--log-http,--project,--quiet,--trace-token,--user-output-enabled,--verbosity.

Run$gcloud help for details.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-09 UTC.