Use online inference logging

For AutoML tabular models, AutoML image models,and custom-trained models, you can enable or disable inference logs duringmodel deployment or endpointcreation. This page explains the different types of inferencelogs available, and how to enable or disable these logs.

Types of inference logs

There are several types of inference logs that you can use to get informationfrom your inference nodes:

Container logging, which logs thestdout andstderr streams from yourinference nodes toCloud Logging. These logsare required for debugging.
- On thev1 service endpoint, container logging is enabled by default.You can disable it when you deploy a model. You can also disable orenable logging when youmutatethe deployed model.
- On thev1beta1 service endpoint, container logging is disabled bydefault. You can enable it when you deploy a model. You can also disableor enable logging when youmutatethe deployed model.
Note: The default logging behavior in Python sends outputs tostderr, whichwill appear at theERROR level in Cloud Logging. If you'd like forcontainer logs to appear at theINFO level, configure your containerlogging to send outputs tostdout. For more information, see the PythonLogging handlerstutorials and the PythonLogging Cookbook.
Access logging, which logs information like timestamp and latency foreach request to Cloud Logging.
On both thev1 andv1beta1 service endpoints, access logging is disabledby default. You can enable access logging when you deploy a model to anendpoint.
Request-response logging, which logs a sample of online inferencerequests and responses to a BigQuery table.
You can enable request-response logging by creating or patching theinference endpoint.

You can enable or disable each type of log independently.

Inference log settings

You can enable or disable online inference logs when you create an endpoint,deploy a model to the endpoint, or mutate a deployed model.

To update the settings for access logs, you mustundeploy your model,and then redeploy the model with your new settings. You can update the settingsfor container logs without re-deploying your model.

Online inference at a high rate of queries per second (QPS) can produce asubstantial number of logs, which are subject toCloud Logging pricing. To estimate the pricing foryour online inference logs, seeEstimating your bills for logging. To reducethis cost, you can disable inference logging.

Enable and disable inference logs

The following examples highlight where to modify the default log settings:

Console

When you deploy a model to an endpoint or create a new endpoint in theGoogle Cloud console, you can specify which types of inference logs to enablein theLogging step. Select the checkboxes to enableAccess logging orContainer logging, or clear the checkboxes to disable these logs.

Use the REST API to update the settings for container logs.

Use the REST API to enable request-response logging. TheGoogle Cloud console and gcloud CLI don't support request-responselogging configuration.

To see more context about how to deploy models, readDeploy a model using the Google Cloud console.

gcloud

To change the default behavior for which logs are enabled indeployed models, add flags to yourgcloud command:

`v1` service endpoint

Rungcloud ai endpoints deploy-model:

gcloudaiendpointsdeploy-modelENDPOINT_ID\--region=LOCATION\--model=MODEL_ID\--display-name=DEPLOYED_MODEL_NAME\--machine-type=MACHINE_TYPE\--accelerator=count=2,type=nvidia-tesla-t4\--disable-container-logging\--enable-access-logging

`v1beta1` service endpoint

Rungcloud beta ai endpoints deploy-model:

gcloudbetaaiendpointsdeploy-modelENDPOINT_ID\--region=LOCATION\--model=MODEL_ID\--display-name=DEPLOYED_MODEL_NAME\--machine-type=MACHINE_TYPE\--accelerator=count=2,type=nvidia-tesla-t4\--enable-access-logging\--enable-container-logging

Use the REST API to update the settings for container logs.

Use the REST API to enable request-response logging. TheGoogle Cloud console and gcloud CLI don't support request-responselogging configuration.

To see more context about how to deploy models, readDeploy a model using the Vertex AI API.

REST

To change the default behavior for which logs are enabled indeployed models, set the relevant fields toTrue:

`v1` service endpoint

To disablecontainer logging, set thedisableContainerLogging field toTrue when you call eitherprojects.locations.endpoints.deployModel orprojects.locations.endpoints.mutateDeployedModel.

To enableaccess logging, setenableAccessLogging toTruewhen deploying your model withprojects.locations.endpoints.deployModel.

`v1beta1` service endpoint

To enablecontainer logging, set theenableContainerLogging field toTrue when you call eitherprojects.locations.endpoints.deployModel orprojects.locations.endpoints.mutateDeployedModel.

To enableaccess logging, setenableAccessLogging toTruewhen deploying your model withprojects.locations.endpoints.deployModel.

To see more context about how to deploy models, readDeploy a model using the Vertex AI API.

Request-response logging

You can only enable request-response logging when you create anendpoint usingprojects.locations.endpoints.create or patch an existingendpoint usingprojects.locations.endpoints.patch.

Request-response logging is done at the endpoint level, so requestssent to any deployed models under the same endpoint are logged.

When you create or patch an endpoint, populate thepredictRequestResponseLoggingConfig field of theEndpoint resourcewith the following entries:

enabled: set toTrue to enable request-response logging.
samplingPercentage: a number between 0 or 1 defining the fraction ofrequests to log. For example, set this value to1 in order to log allrequests or to0.1 to log 10% of requests.
BigQueryDestination: the BigQuery table to be usedfor logging. If you only specify a project name, a new dataset is created withthe namelogging_ENDPOINT_DISPLAY_NAME_ENDPOINT_ID,whereENDPOINT_DISPLAY_NAME follows theBigQuery naming rules. If you don't specify a tablename, a new table is created with the namerequest_response_logging.
The schema for the BigQuery table should look like the following:
Field name Type Mode
endpoint STRING NULLABLE
deployed_model_id STRING NULLABLE
logging_time TIMESTAMP NULLABLE
request_id NUMERIC NULLABLE
request_payload STRING REPEATED
response_payload STRING REPEATED

Field name	Type	Mode
`endpoint`	STRING	NULLABLE
`deployed_model_id`	STRING	NULLABLE
`logging_time`	TIMESTAMP	NULLABLE
`request_id`	NUMERIC	NULLABLE
`request_payload`	STRING	REPEATED
`response_payload`	STRING	REPEATED

The following is an example configuration:

{   "predict_request_response_logging_config": {     "enabled": true,     "sampling_rate": 0.5,     "bigquery_destination": {       "output_uri": "bq://PROJECT_ID.DATASET_NAME.TABLE_NAME"     }   }}

Inference request-response logging for dedicated endpoints and Private Service Connect endpoints

Fordedicated endpointsandPrivate Service Connectendpoints, you can use request-response logging to record requestand response payloads under 10 MB (larger payloads are skipped automatically)for TensorFlow, PyTorch, sklearn, and XGBoost models.

Request-response logging is available only for thepredictandrawPredictmethods.

To enable request-response logging, populate thepredictRequestResponseLoggingConfigfield of theEndpoint resource with the following entries:

enabled: set toTrue to enable request-response logging.
samplingRate: the fraction of requests and responses to log. Set to a numberthat is greater than 0 and less than or equal to 1. For example, set thisvalue to1 in order to log all requests or to0.1 to log 10% of requests.
BigQueryDestination: the BigQuery location for theoutput content, as a URI to a project or table.

The following is an example configuration for creating a dedicated endpoint withrequest-response logging enabled:

curl-XPOST\-H"Content-Type: application/json"\-H"Authorization: Bearer `gcloud auth print-access-token`"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints\-d'{displayName: "ENDPOINT_NAME", \     dedicatedEndpointEnabled: true, \     predictRequestResponseLoggingConfig: { \       enabled: true, \       samplingRate: 1.0, \       bigqueryDestination: { \          outputUri:"bq://PROJECT_ID" \       } \     } \   }'

Replace the following:

LOCATION_ID: The region where you are using Vertex AI.
PROJECT_NUMBER: The project number for your Google Cloudproject.
ENDPOINT_NAME: The display name for the endpoint.
PROJECT_ID: The project ID for your Google Cloud project.

The following is an example configuration for creating aPrivate Service Connect endpoint with request-response logging enabled:

curl-XPOST\-H"Content-Type: application/json"\-H"Authorization: Bearer `gcloud auth print-access-token`"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints\-d'{displayName: "ENDPOINT_NAME", \     privateServiceConnectConfig: { \       enablePrivateServiceConnect: true, \       projectAllowlist: ["ALLOWED_PROJECTS"] \     }, \     predictRequestResponseLoggingConfig: { \       enabled: true, \       samplingRate: 1.0, \       bigqueryDestination: { \          outputUri:"bq://PROJECT_ID" \       } \     } \   }'

Replace the following:

ALLOWED_PROJECTS: a comma-separated list of Google Cloudproject IDs, each enclosed in quotation marks. For example,["PROJECTID1", "PROJECTID2"].If a project isn't contained in this list,you won't be able to send inference requests to theVertex AI endpoint from it. Make sure to includeVERTEX_AI_PROJECT_ID in this list so that you can call theendpoint from the same project it's in.

Request-response logging and Model Monitoring v1

Request-response logging andModel Monitoring v1 usethe same BigQuery table on the backend to log incoming requests.To prevent unexpected changes to this BigQuery table, thefollowing limitations are enforced when using both features at the same time:

If an endpoint has Model Monitoring enabled, you can'tenable request-response logging for the same endpoint.
If you enable request-response logging and thenModel Monitoring on the same endpoint, you won't be ableto change the request-response logging configuration.

What's next

Estimate pricing for online inferencelogging.
Deploy a modelusing the Google Cloud console orusing the Vertex AI API.
Learnhow to create a BigQuery table.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.

Movatterモバイル変換

Use online inference logging Stay organized with collections Save and categorize content based on your preferences.

Types of inference logs

Inference log settings

Enable and disable inference logs

Console

gcloud

v1 service endpoint

v1beta1 service endpoint

REST

v1 service endpoint

v1beta1 service endpoint

Inference request-response logging for dedicated endpoints and Private Service Connect endpoints

Request-response logging and Model Monitoring v1

What's next

Use online inference logging

`v1` service endpoint

`v1beta1` service endpoint

`v1` service endpoint

`v1beta1` service endpoint