Automating responses to integrity validation failures

Learn how to use a Cloud Run functions trigger to automatically act onShielded VM integrity monitoringevents.

Overview

Integrity monitoring collects measurements from Shielded VM instancesand surfaces them in Cloud Logging. If integrity measurements change acrossboots of a Shielded VM instance, integrity validation fails. Thisfailure is captured as a logged event, and is also raised in Cloud Monitoring.

Sometimes, Shielded VM integrity measurements change for a legitimatereason. For example, a system update might cause expected changes to theoperating system kernel. Because of this, integrity monitoring lets you prompta Shielded VM instance to learn a new integrity policy baseline inthe case of an expected integrity validation failure.

In this tutorial, you'll first create a simple automated system that shuts downShielded VM instances that fail integrity validation:

  1. Export all integrity monitoringevents to a Pub/Sub topic.
  2. Create aCloud Run functions triggerthat uses the events in that topic to identify and shut downShielded VM instances that fail integrity validation.

Next, you can optionally expand the system so that it promptsShielded VM instances that fail integrity validation to learn the newbaseline if it matches a known good measurement, or to shutdown otherwise:

  1. Create a Firestore database to maintain a set of known goodintegrity baseline measurements.
  2. Update the Cloud Run functions trigger so that it prompts Shielded VMinstances that fail integrity validation to learn the newbaseline if it is in the database, or else to shut down.

If you choose to implement the expanded solution, use it in the following way:

  1. Each time there is an update that is expected to cause validationfailure for a legitimate reason, run that update on a singleShielded VM instance in the instance group.
  2. Using the late boot event from the updated VM instance as a source, add thenew policy baseline measurements to the database by creating a new documentin theknown_good_measurements collection. SeeCreating a database of known good baseline measurementsfor more information.
  3. Update the remaining Shielded VM instances. The trigger promptsthe remaining instances to learn the new baseline, because it can beverified as known good. SeeUpdating the Cloud Run functions trigger to learn known good baselinefor more information.

Prerequisites

  • Use a project that has Firestore in Native mode selected as thedatabase service. You make this selection when you create the project, and itcan't be changed. If your project doesn't use Firestore in Nativemode, you will see the message "This project uses another database service"when you open the Firestore console.
  • Have a Compute Engine Shielded VM instance in that project to serveas the source of integrity baseline measurements. The Shielded VMinstance must have been restarted at least once.
  • Have thegcloud command-line toolinstalled.
  • Enable the Cloud Logging and Cloud Run functions APIs by following thesesteps:

    1. In the Google Cloud console, go to theAPIs & Services page.

      Go to APIs & Services

    2. See ifCloud Functions API andStackdriver Logging API appearon theEnabled APIs and services list.

    3. If either of the APIs don't appear, clickAdd APIs and Services.

    4. Search for and enable the APIs, as needed.

Exporting integrity monitoring log entries to a Pub/Sub topic

Use Logging to export all integrity monitoring log entriesgenerated by Shielded VM instances to a Pub/Sub topic.You use this topic as a data source for a Cloud Run functions trigger toautomate responses to integrity monitoring events.

Logs Explorer

  1. In the Google Cloud console, go to theLogs Explorer page.

    Go to Cloud Logging

  2. In theQuery Builder, enter the following values.

    resource.type="gce_instance"AND logName:  "projects/YOUR_PROJECT_ID/logs/compute.googleapis.com/shielded_vm_integrity"

  3. ClickRun Filter.

  4. ClickMore actions, andthen selectCreate sink.

  5. On theCreate logs routing sink page:

    1. InSink details, forSink Name, enterintegrity-monitoring,and then clickNext.
    2. InSink destination, expandSink Service, and then selectCloud Pub/Sub.
    3. ExpandSelect a Cloud Pub/Sub topic, and then selectCreate a topic.
    4. In theCreate a topic dialog, forTopic ID, enterintegrity-monitoring, and then clickCreate topic.
    5. ClickNext, and then clickCreate sink.

Logs Explorer

  1. In the Google Cloud console, go to theLogs Explorer page.

    Go to Cloud Logging

  2. ClickOptions, and then selectGo back to Legacy Logs Explorer.

  3. ExpandFilter by label or text search, and then clickConvert to advanced filter.

  4. Enter the following advanced filter:

    resource.type="gce_instance"AND logName:  "projects/YOUR_PROJECT_ID/logs/compute.googleapis.com/shielded_vm_integrity"
    Note that there are two spaces afterlogName:.

  5. ClickSubmit Filter.

  6. Click onCreate Export.

  7. ForSink Name, enterintegrity-monitoring.

  8. ForSink Service, selectCloud Pub/Sub.

  9. ExpandSink Destination, and then clickCreate new Cloud Pub/Sub topic.

  10. ForName, enterintegrity-monitoring and then clickCreate.

  11. ClickCreate Sink.

Creating a Cloud Run functions trigger to respond to integrity failures

Create a Cloud Run functions trigger that reads the data in thePub/Sub topic and that stops any Shielded VM instancethat fails integrity validation.

  1. The following code defines the Cloud Run functions trigger. Copy it intoa file namedmain.py.

    importbase64importjsonimportgoogleapiclient.discoverydefshutdown_vm(data,context):"""A Cloud Function that shuts down a VM on failed integrity check."""log_entry=json.loads(base64.b64decode(data['data']).decode('utf-8'))payload=log_entry.get('jsonPayload',{})entry_type=payload.get('@type')ifentry_type!='type.googleapis.com/cloud_integrity.IntegrityEvent':raiseTypeError("Unexpected log entry type:%s"%entry_type)report_event=(payload.get('earlyBootReportEvent')orpayload.get('lateBootReportEvent'))ifreport_eventisNone:# We received a different event type, ignore.returnpolicy_passed=report_event['policyEvaluationPassed']ifnotpolicy_passed:print('Integrity evaluation failed:%s'%report_event)print('Shutting down the VM')instance_id=log_entry['resource']['labels']['instance_id']project_id=log_entry['resource']['labels']['project_id']zone=log_entry['resource']['labels']['zone']# Shut down the instance.compute=googleapiclient.discovery.build('compute','v1',cache_discovery=False)# Get the instance name from instance id.list_result=compute.instances().list(project=project_id,zone=zone,filter='id eq%s'%instance_id).execute()iflen(list_result['items'])!=1:raiseKeyError('unexpected number of items:%d'%len(list_result['items']))instance_name=list_result['items'][0]['name']result=compute.instances().stop(project=project_id,zone=zone,instance=instance_name).execute()print('Instance%s in project%s has been scheduled for shut down.'%(instance_name,project_id))
  2. In the same location asmain.py, create a file namedrequirements.txtand copy in the following dependencies:

    google-api-python-client==1.6.6google-auth==1.4.1google-auth-httplib2==0.0.3
  3. Open a terminal window and navigate to the directory containingmain.py andrequirements.txt.

  4. Run thegcloud beta functions deploy commandto deploy the trigger:

    gcloud beta functions deploy shutdown_vm \    --projectPROJECT_ID \    --runtime python37 \    --trigger-resource integrity-monitoring \    --trigger-event google.pubsub.topic.publish

Creating a database of known good baseline measurements

Create a Firestore database to provide a source of knowngood integrity policy baseline measurements. You must manually addbaseline measurements to keep this database up to date.

  1. In the Google Cloud console, go to theVM instances page.

    Go to the VM instances page

  2. Click the Shielded VM instance ID to open theVM instance details page.

  3. UnderLogs, click onStackdriver Logging.

  4. Locate the most recentlateBootReportEvent log entry.

  5. Expand the log entry >jsonPayload >lateBootReportEvent >policyMeasurements.

  6. Note the values for the elements contained inlateBootReportEvent >policyMeasurements.

  7. In the Google Cloud console, go to theFirestore page.

    Go to the Firestore console

  8. ChooseStart collection.

  9. ForCollection ID, typeknown_good_measurements.

  10. ForDocument ID, typebaseline1.

  11. ForField name, type thepcrNum field value from element0 inlateBootReportEvent >policyMeasurements.

  12. ForField type, selectmap.

  13. Add three string fields to the map field, namedhashAlgo,pcrNum,andvalue, respectively. Make their values the values of the element0fields inlateBootReportEvent >policyMeasurements.

  14. Create more map fields, one for each additional element inlateBootReportEvent >policyMeasurements. Give themthe same subfields as the first map field. The values for those subfieldsshould map to those in each of the additional elements.

    For example, if you are using a Linux VM, the collection should look similar to the following when you are done:

    A Firestore database showing a completed known_good_measurements collection for Linux.

    If you are using a Windows VM, you will see more measurements thus the collection should look similar to the following:

    A Firestore database showing a completed known_good_measurements collection for Windows.

Updating the Cloud Run functions trigger to learn known good baseline

  1. The following code creates a Cloud Run functions trigger that causes anyShielded VM instance that fails integrity validation to learn thenew baseline if it is in the database of known good measurements, or elseshut down. Copy this code and use it to overwrite the existing code inmain.py.

    importbase64importjsonimportgoogleapiclient.discoveryimportfirebase_adminfromfirebase_adminimportcredentialsfromfirebase_adminimportfirestorePROJECT_ID='PROJECT_ID'firebase_admin.initialize_app(credentials.ApplicationDefault(),{'projectId':PROJECT_ID,})defpcr_values_to_dict(pcr_values):"""Converts a list of PCR values to a dict, keyed by PCR num"""result={}forvalueinpcr_values:result[value['pcrNum']]=valuereturnresultdefinstance_id_to_instance_name(compute,zone,project_id,instance_id):list_result=compute.instances().list(project=project_id,zone=zone,filter='id eq%s'%instance_id).execute()iflen(list_result['items'])!=1:raiseKeyError('unexpected number of items:%d'%len(list_result['items']))returnlist_result['items'][0]['name']defrelearn_if_known_good(data,context):"""A Cloud Function that shuts down a VM on failed integrity check.    """log_entry=json.loads(base64.b64decode(data['data']).decode('utf-8'))payload=log_entry.get('jsonPayload',{})entry_type=payload.get('@type')ifentry_type!='type.googleapis.com/cloud_integrity.IntegrityEvent':raiseTypeError("Unexpected log entry type:%s"%entry_type)# We only send relearn signal upon receiving late boot report event: if# early boot measurements are in a known good database, but late boot# measurements aren't, and we send relearn signal upon receiving early boot# report event, the VM will also relearn late boot policy baseline, which we# don't want, because they aren't known good.report_event=payload.get('lateBootReportEvent')ifreport_eventisNone:returnevaluation_passed=report_event['policyEvaluationPassed']ifevaluation_passed:# Policy evaluation passed, nothing to do.return# See if the new measurement is known good, and if it is, relearn.measurements=pcr_values_to_dict(report_event['actualMeasurements'])db=firestore.Client()kg_ref=db.collection('known_good_measurements')# Check current measurements against known good database.relearn=Falseforkginkg_ref.get():kg_map=kg.to_dict()# Check PCR values for lateBootReportEvent measurements against the known good# measurements stored in the Firestore tableif('PCR_0'inkg_mapandkg_map['PCR_0']==measurements['PCR_0']and'PCR_4'inkg_mapandkg_map['PCR_4']==measurements['PCR_4']and'PCR_7'inkg_mapandkg_map['PCR_7']==measurements['PCR_7']):# Linux VM (3 measurements), only need to check above 3 measurementsiflen(kg_map)==3:relearn=True# Windows VM (6 measurements), need to check 3 additional measurementseliflen(kg_map)==6:if('PCR_11'inkg_mapandkg_map['PCR_11']==measurements['PCR_11']and'PCR_13'inkg_mapandkg_map['PCR_13']==measurements['PCR_13']and'PCR_14'inkg_mapandkg_map['PCR_14']==measurements['PCR_14']):relearn=Truecompute=googleapiclient.discovery.build('compute','beta',cache_discovery=False)instance_id=log_entry['resource']['labels']['instance_id']project_id=log_entry['resource']['labels']['project_id']zone=log_entry['resource']['labels']['zone']instance_name=instance_id_to_instance_name(compute,zone,project_id,instance_id)ifnotrelearn:# Issue shutdown API call.print('New measurement is not known good. Shutting down a VM.')result=compute.instances().stop(project=project_id,zone=zone,instance=instance_name).execute()print('Instance%s in project%s has been scheduled for shut down.'%(instance_name,project_id))else:# Issue relearn API call.print('New measurement is known good. Relearning...')result=compute.instances().setShieldedInstanceIntegrityPolicy(project=project_id,zone=zone,instance=instance_name,body={'updateAutoLearnPolicy':True}).execute()print('Instance%s in project%s has been scheduled for relearning.'%(instance_name,project_id))
  2. Copy the following dependencies and use them to overwrite the existing codeinrequirements.txt:

    google-api-python-client==1.6.6google-auth==1.4.1google-auth-httplib2==0.0.3google-cloud-firestore==0.29.0firebase-admin==2.13.0
  3. Open a terminal window and navigate to the directory containingmain.py andrequirements.txt.

  4. Run thegcloud beta functions deploy commandto deploy the trigger:

    gcloud beta functions deploy relearn_if_known_good \    --projectPROJECT_ID \    --runtime python37 \    --trigger-resource integrity-monitoring \    --trigger-event google.pubsub.topic.publish
  5. Manually delete the previousshutdown_vm function in the cloud functionconsole.

  6. In the Google Cloud console, go to theCloud Functions page.

    Go to Cloud Functions

  7. Select theshutdown_vm function and click delete.

Verify the automated responses to integrity validation failures

  1. First, check if you have a running instance withSecure Boot turned on asa Shielded VM option. If not, you can create a new instance withShielded VM image (Ubuntu 18.04LTS) and turn on theSecure Bootoption. You may be charged a few cents for the instance (this step can befinished within an hour).
  2. Now, assume for some reason, you want to manually upgrade the kernel.
  3. SSH into the instance, and use the following command to check the current kernel.

    uname -sr

    You should see something likeLinux 4.15.0-1028-gcp.

  4. Download a generic kernel from https://kernel.ubuntu.com/~kernel-ppa/mainline/

  5. Use the command to install.

    sudo dpkg -i *.deb
  6. Reboot the VM.

  7. You should notice the VM is not booting up (cannot SSH into the machine).This is what we expect, because the signature of the new kernel is not inourSecure Boot whitelist. This also demonstrates howSecure Boot canprevent an unauthorized/malicious kernel modification.

  8. But because we know this time the kernel upgrading is not malicious and isindeed done by ourself, we can turn offSecure Boot in order to boot thenew kernel.

  9. Shutdown the VM and untick theSecure Boot option, then restart the VM.

  10. The boot of the machine should fail again! But this time it is beingshutdown automatically by the cloud function we created as theSecure Boot option has been altered (also because of the new kernelimage), and they caused the measurement to be different than the baseline.(We can check that in the cloud function'sStackdriver log.)

  11. Because we know this is not a malicious modification and we know the rootcause, we can add the current measurement inlateBootReportEvent to theknown good measurement Firebase table. (Remember there are two things beingchanged: 1.Secure Boot option 2. Kernel Image.)

    Follow the previous stepCreating a database of known good baseline measurements to append a newbaseline to the Firestore database using the actual measurement inthe latestlateBootReportEvent.

    A Firestore database showing a new completed known_good_measurements collection.

  12. Now reboot the machine. When you check theStackdriver log, you will seethelateBootReportEvent still showing false, but the machine should nowboot successfully, because the cloud function trusted and relearned the newmeasurement. We can verify it by checking theStackdriver of the cloudfunction.

  13. WithSecure Boot being disabled, we can now boot into the kernel.SSH into the machine and check the kernel again, you will see the new kernelversion.

    uname -sr
  14. Finally, let's clean up the resources and the data used in this step.

  15. Shutdown the VM if you created one for this step to avoid additional charge.

  16. In the Google Cloud console, go to theVM instances page.

    Go to the VM instances page

  17. Remove the known good measurements you added in this step.

  18. In the Google Cloud console, go to theFirestore page.

    Go to the Firestore page

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-11-24 UTC.