TensorFlow Serving Stay organized with collections Save and categorize content based on your preferences.
This document describes how to configure your Google Kubernetes Engine deploymentso that you can use Google Cloud Managed Service for Prometheus to collect metrics fromTensorFlow Serving. This document shows you how to do the following:
- Set up TF Serving to report metrics.
- Access a predefined dashboard in Cloud Monitoring to view the metrics.
These instructions apply only if you are usingmanaged collectionwith Managed Service for Prometheus.If you are using self-deployed collection, then see theTF Serving documentationfor installation information.
These instructions are provided as an example and are expected to work inmost Kubernetes environments.If you are having trouble installing anapplication or exporter due to restrictive security or organizational policies,then we recommend you consult open-source documentation for support.
For information about TensorFlow Serving, seeTF Serving.For information about setting up TF Serving on Google Kubernetes Engine,see the GKE guide for TF Serving.
Prerequisites
To collect metrics fromTF Servingby usingManaged Service for Prometheus and managed collection, your deployment mustmeet the following requirements:
- Your cluster must be running Google Kubernetes Engine version 1.28.15-gke.2475000 or later.
- You must be running Managed Service for Prometheus with managed collection enabled. For more information, see Get started with managed collection.
TF Serving exposes Prometheus-format metrics when the--monitoring_config_file flag is used to specify a file containing a MonitoringConfig protocol buffer.
The following is an example of a MonitoringConfig protocol buffer:
prometheus_config { enable: true, path: "/monitoring/prometheus/metrics"}If you are following the Google Kubernetes Engine set-up guide,Serve a model with asingle GPU in GKE, then the MonitoringConfigprotocol buffer is defined as part of the default setup.
If you are setting up TF Serving yourself, then do the followingto specify the MonitoringConfig protocol buffer:
Create a file named
monitoring_config.txtcontaining theMonitoringConfig protocol buffer in the modeldirectory, before uploading the directory to the Cloud Storage bucket.Upload the model directory to the Cloud Storage bucket:
gcloud storage cpMODEL_DIRECTORY gs://CLOUD_STORAGE_BUCKET_NAME --recursive
Set the environment variable
PATH_TO_MONITORING_CONFIGto the path of the uploadedmonitoring_config.txtfile,for example:export PATH_TO_MONITORING_CONFIG=/data/tfserve-model-repository/monitoring_config.txt
Add the following flag and value to the container's command in yourcontainer's deployment YAML file:
"--monitoring_config=$PATH_TO_MONITORING_CONFIG"
For example, your command might look like the following:
command: [ "tensorflow_model_server", "--model_name=$MODEL_NAME", "--model_base_path=/data/tfserve-model-repository/$MODEL_NAME", "--rest_api_port=8000", "--monitoring_config_file=$PATH_TO_MONITORING_CONFIG" ]
Modify the TF Serving configuration
Modify the TF Serving configuration as shown in the followingexample:
#Copyright2025GoogleLLC##LicensedundertheApacheLicense,Version2.0(the"License");#youmaynotusethisfileexceptincompliancewiththeLicense.#YoumayobtainacopyoftheLicenseat##https://www.apache.org/licenses/LICENSE-2.0##Unlessrequiredbyapplicablelaworagreedtoinwriting,software#distributedundertheLicenseisdistributedonan"AS IS"BASIS,#WITHOUTWARRANTIESORCONDITIONSOFANYKIND,eitherexpressorimplied.#SeetheLicenseforthespecificlanguagegoverningpermissionsand#limitationsundertheLicense.apiVersion:apps/v1kind:Deploymentmetadata:name:tfserve-deploymentlabels:app:tfserve-serverspec:selector:matchLabels:app:tfservereplicas:1template:metadata:labels:app:tfserveannotations:gke-gcsfuse/volumes:'true'spec:nodeSelector:cloud.google.com/gke-accelerator:nvidia-l4containers:-name:tfserve-serverimage:'tensorflow/serving:2.13.1-gpu'command:-tensorflow_model_server-'--model_name=$MODEL_NAME'-'--model_base_path=/data/tfserve-model-repository/$MODEL_NAME'-'--rest_api_port=8000'+-'--monitoring_config_file=$PATH_TO_MONITORING_CONFIG'ports:-name:httpcontainerPort:8000-name:grpccontainerPort:8500resources:...volumeMounts:-name:gcs-fuse-csi-volmountPath:/datareadOnly:falseserviceAccountName:$K8S_SA_NAMEvolumes:-name:gcs-fuse-csi-volcsi:driver:gcsfuse.csi.storage.gke.ioreadOnly:falsevolumeAttributes:bucketName:$GSBUCKETmountOptions:implicit-dirsYou must add any lines preceded by the+ symbol to yourconfiguration.
To apply configuration changes from a local file, run the following command:
kubectl apply -nNAMESPACE_NAME -fFILE_NAME
You can alsouse Terraformto manage your configurations.
To verify that TF Serving is emitting metrics on the expected endpoints, do the following:- Set up port forwarding by using the following command:
kubectl -nNAMESPACE_NAME port-forwardPOD_NAME 8000
- Access the endpoint
localhost:8000/monitoring/prometheus/metricsby usingthe browser or thecurlutility in another terminal session.
Define a PodMonitoring resource
For target discovery, the Managed Service for Prometheus Operatorrequires a PodMonitoring resource that corresponds toTF Serving in the same namespace.
You can use the following PodMonitoring configuration:
# Copyright 2025 Google LLC## Licensed under the Apache License, Version 2.0 (the "License");# you may not use this file except in compliance with the License.# You may obtain a copy of the License at## https://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.apiVersion:monitoring.googleapis.com/v1kind:PodMonitoringmetadata:name:tfservelabels:app.kubernetes.io/name:tfserveapp.kubernetes.io/part-of:google-cloud-managed-prometheusspec:endpoints:-port:8000scheme:httpinterval:30spath:/monitoring/prometheus/metricsselector:matchLabels:app:tfserveTo apply configuration changes from a local file, run the following command:
kubectl apply -nNAMESPACE_NAME -fFILE_NAME
You can alsouse Terraformto manage your configurations.
Verify the configuration
You can use Metrics Explorer to verify that you correctly configuredTF Serving. It might take one or two minutes forCloud Monitoring to ingest your metrics.
To verify the metrics are ingested, do the following:
In the Google Cloud console, go to theleaderboard Metrics explorer page:
If you use the search bar to find this page, then select the result whose subheading isMonitoring.
- In the toolbar of thequery-builder pane, select the button whose name is eithercode MQL orcode PromQL.
- Verify thatPromQL is selectedin theLanguage toggle. The language toggle is in the same toolbar thatlets you format your query.
- Enter and run the following query:
up{job="tfserve", cluster="CLUSTER_NAME", namespace="NAMESPACE_NAME"}
View dashboards
The Cloud Monitoring integration includestheTensorFlow Serving Prometheus Overview dashboard.Dashboards are automatically installed when you configure the integration.You can also view static previews of dashboards without installing theintegration.
To view an installed dashboard, do the following:
In the Google Cloud console, go to the Dashboards page:
If you use the search bar to find this page, then select the result whose subheading isMonitoring.
- Select theDashboard List tab.
- Choose theIntegrations category.
- Click the name of the dashboard, for example,TensorFlow Serving Prometheus Overview.
To view a static preview of the dashboard, do the following:
In the Google Cloud console, go to the
Integrations page:If you use the search bar to find this page, then select the result whose subheading isMonitoring.
- Click theKubernetes Engine deployment-platform filter.
- Locate the TensorFlow Serving integration and clickView Details.
- Select theDashboards tab.
Troubleshooting
For information about troubleshooting metric ingestion problems, seeProblems with collection from exporters inTroubleshooting ingestion-side problems.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.