Use online inference logging

For AutoML tabular models, AutoML image models,and custom-trained models, you can enable or disable inference logs duringmodel deployment or endpointcreation. This page explains the different types of inferencelogs available, and how to enable or disable these logs.

Types of inference logs

There are several types of inference logs that you can use to get informationfrom your inference nodes:

You can enable or disable each type of log independently.

Inference log settings

You can enable or disable online inference logs when you create an endpoint,deploy a model to the endpoint, or mutate a deployed model.

To update the settings for access logs, you mustundeploy your model,and then redeploy the model with your new settings. You can update the settingsfor container logs without re-deploying your model.

Online inference at a high rate of queries per second (QPS) can produce asubstantial number of logs, which are subject toCloud Logging pricing. To estimate the pricing foryour online inference logs, seeEstimating your bills for logging. To reducethis cost, you can disable inference logging.

Enable and disable inference logs

The following examples highlight where to modify the default log settings:

Console

When you deploy a model to an endpoint or create a new endpoint in theGoogle Cloud console, you can specify which types of inference logs to enablein theLogging step. Select the checkboxes to enableAccess logging orContainer logging, or clear the checkboxes to disable these logs.

Use the REST API to update the settings for container logs.

Use the REST API to enable request-response logging. TheGoogle Cloud console and gcloud CLI don't support request-responselogging configuration.

To see more context about how to deploy models, readDeploy a model using the Google Cloud console.

gcloud

To change the default behavior for which logs are enabled indeployed models, add flags to yourgcloud command:

v1 service endpoint

Rungcloud ai endpoints deploy-model:

gcloudaiendpointsdeploy-modelENDPOINT_ID\--region=LOCATION\--model=MODEL_ID\--display-name=DEPLOYED_MODEL_NAME\--machine-type=MACHINE_TYPE\--accelerator=count=2,type=nvidia-tesla-t4\--disable-container-logging\--enable-access-logging

v1beta1 service endpoint

Rungcloud beta ai endpoints deploy-model:

gcloudbetaaiendpointsdeploy-modelENDPOINT_ID\--region=LOCATION\--model=MODEL_ID\--display-name=DEPLOYED_MODEL_NAME\--machine-type=MACHINE_TYPE\--accelerator=count=2,type=nvidia-tesla-t4\--enable-access-logging\--enable-container-logging

Use the REST API to update the settings for container logs.

Use the REST API to enable request-response logging. TheGoogle Cloud console and gcloud CLI don't support request-responselogging configuration.

To see more context about how to deploy models, readDeploy a model using the Vertex AI API.

REST

To change the default behavior for which logs are enabled indeployed models, set the relevant fields toTrue:

v1 service endpoint

To disablecontainer logging, set thedisableContainerLogging field toTrue when you call eitherprojects.locations.endpoints.deployModel orprojects.locations.endpoints.mutateDeployedModel.

To enableaccess logging, setenableAccessLogging toTruewhen deploying your model withprojects.locations.endpoints.deployModel.

v1beta1 service endpoint

To enablecontainer logging, set theenableContainerLogging field toTrue when you call eitherprojects.locations.endpoints.deployModel orprojects.locations.endpoints.mutateDeployedModel.

To enableaccess logging, setenableAccessLogging toTruewhen deploying your model withprojects.locations.endpoints.deployModel.

To see more context about how to deploy models, readDeploy a model using the Vertex AI API.

Request-response logging

You can only enable request-response logging when you create anendpoint usingprojects.locations.endpoints.create or patch an existingendpoint usingprojects.locations.endpoints.patch.

Request-response logging is done at the endpoint level, so requestssent to any deployed models under the same endpoint are logged.

When you create or patch an endpoint, populate thepredictRequestResponseLoggingConfig field of theEndpoint resourcewith the following entries:

  • enabled: set toTrue to enable request-response logging.

  • samplingPercentage: a number between 0 or 1 defining the fraction ofrequests to log. For example, set this value to1 in order to log allrequests or to0.1 to log 10% of requests.

  • BigQueryDestination: the BigQuery table to be usedfor logging. If you only specify a project name, a new dataset is created withthe namelogging_ENDPOINT_DISPLAY_NAME_ENDPOINT_ID,whereENDPOINT_DISPLAY_NAME follows theBigQuery naming rules. If you don't specify a tablename, a new table is created with the namerequest_response_logging.

    The schema for the BigQuery table should look like the following:

    Field nameTypeMode
    endpointSTRINGNULLABLE
    deployed_model_idSTRINGNULLABLE
    logging_timeTIMESTAMPNULLABLE
    request_idNUMERICNULLABLE
    request_payloadSTRINGREPEATED
    response_payloadSTRINGREPEATED

The following is an example configuration:

{   "predict_request_response_logging_config": {     "enabled": true,     "sampling_rate": 0.5,     "bigquery_destination": {       "output_uri": "bq://PROJECT_ID.DATASET_NAME.TABLE_NAME"     }   }}

Inference request-response logging for dedicated endpoints and Private Service Connect endpoints

Fordedicated endpointsandPrivate Service Connectendpoints, you can use request-response logging to record requestand response payloads under 10 MB (larger payloads are skipped automatically)for TensorFlow, PyTorch, sklearn, and XGBoost models.

Request-response logging is available only for thepredictandrawPredictmethods.

To enable request-response logging, populate thepredictRequestResponseLoggingConfigfield of theEndpoint resource with the following entries:

  • enabled: set toTrue to enable request-response logging.

  • samplingRate: the fraction of requests and responses to log. Set to a numberthat is greater than 0 and less than or equal to 1. For example, set thisvalue to1 in order to log all requests or to0.1 to log 10% of requests.

  • BigQueryDestination: the BigQuery location for theoutput content, as a URI to a project or table.

The following is an example configuration for creating a dedicated endpoint withrequest-response logging enabled:

curl-XPOST\-H"Content-Type: application/json"\-H"Authorization: Bearer `gcloud auth print-access-token`"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints\-d'{displayName: "ENDPOINT_NAME", \     dedicatedEndpointEnabled: true, \     predictRequestResponseLoggingConfig: { \       enabled: true, \       samplingRate: 1.0, \       bigqueryDestination: { \          outputUri:"bq://PROJECT_ID" \       } \     } \   }'

Replace the following:

  • LOCATION_ID: The region where you are using Vertex AI.
  • PROJECT_NUMBER: The project number for your Google Cloudproject.
  • ENDPOINT_NAME: The display name for the endpoint.
  • PROJECT_ID: The project ID for your Google Cloud project.

The following is an example configuration for creating aPrivate Service Connect endpoint with request-response logging enabled:

curl-XPOST\-H"Content-Type: application/json"\-H"Authorization: Bearer `gcloud auth print-access-token`"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints\-d'{displayName: "ENDPOINT_NAME", \     privateServiceConnectConfig: { \       enablePrivateServiceConnect: true, \       projectAllowlist: ["ALLOWED_PROJECTS"] \     }, \     predictRequestResponseLoggingConfig: { \       enabled: true, \       samplingRate: 1.0, \       bigqueryDestination: { \          outputUri:"bq://PROJECT_ID" \       } \     } \   }'

Replace the following:

  • ALLOWED_PROJECTS: a comma-separated list of Google Cloudproject IDs, each enclosed in quotation marks. For example,["PROJECTID1", "PROJECTID2"].If a project isn't contained in this list,you won't be able to send inference requests to theVertex AI endpoint from it. Make sure to includeVERTEX_AI_PROJECT_ID in this list so that you can call theendpoint from the same project it's in.

Request-response logging and Model Monitoring v1

Request-response logging andModel Monitoring v1 usethe same BigQuery table on the backend to log incoming requests.To prevent unexpected changes to this BigQuery table, thefollowing limitations are enforced when using both features at the same time:

  • If an endpoint has Model Monitoring enabled, you can'tenable request-response logging for the same endpoint.

  • If you enable request-response logging and thenModel Monitoring on the same endpoint, you won't be ableto change the request-response logging configuration.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.