How Google enforces boot integrity on production machines Stay organized with collections Save and categorize content based on your preferences.
This content was last updated in May 2024, and represents the status quoas of the time it was written. Google's security policies and systems may changegoing forward, as we continually improve protection for our customers.
This document describes the infrastructure controls that Google uses to enforcethe integrity of the boot process on production machines that are equipped withTitan.These controls, built on top of ameasured boot process, help ensure thatGoogle can recover its data center machines from vulnerabilities throughouttheir boot stack and return the machines from arbitrary boot states to knowngood configurations.
Introduction
The security posture of a data center machine is highly dependent on themachine's configuration at boot time. The machine's boot process configures themachine's hardware and initializes its operating system, while keeping themachine safe to run in Google's production environment.
At each step in the boot process, Google implements industry-leading controlsto help enforce the boot state that we expect and to help keep customer datasafe. These controls help ensure that our machines boot into their intendedsoftware, allowing us to remove vulnerabilities that could compromise theinitial security posture of the machine.
This document describes the boot process and demonstrates how our controlsoperate during the boot flow. The key objectives of our controls are the following:
- Establish trust in machine credentials throughhardware roots-of-trust
- Seal machine credentials to a boot policy that specifies allowed firmware and software versions
- Enforce the boot policy on machines through ameasured boot process
Background
This section defines and provides context for the termsmachine credentials,hardware root of trust,sealed credentials, andcryptographic sealing.
Machine credentials
One of the central components in Google's machine management system is ourcredential infrastructure, which consists of an internal certificate authority(CA) and other control plane elements that are responsible for coordinatingcredential rotation flows.
Machines in Google's production fleet perform mutual authentication whenestablishing secure channels. To perform mutual authentication, each machinepossesses Google's CA public keys. Each machine also possesses its ownpublic/private key pair, as well as a certificate for that key pair.
Each machine's public/private key pair, together with the certificate signed bythe CA, is known as amachine credential, which the machine uses toauthenticate itself to other machines in the fleet. Within the productionnetwork, machines check that other machines' public keys are certified byGoogle's CA before exchanging traffic.
Hardware roots of trust and cryptographic sealing
As computing devices grow more sophisticated, each device's attack surface alsogrows. To account for this, devices increasingly featurehardware roots oftrust (RoTs) which are small, trusted execution environments that safeguardsensitive data for the machine. RoTs also appear in mobile devices like laptopsor cell phones, and in more conventional devices like desktop PCs.
Google's data center machines feature custom, Google-designed hardware roots oftrust integrated into each machine's deepest layers, known asTitan.We use Titan, along with a mechanism calledcryptographic sealing, to ensurethat each machine is running the configuration and software versions we expect.
Cryptographic sealing is a service offered by Titan that is used to safeguardsecrets. Titan's sealing capabilities are similar to those found in theTrusted Platform Module (TPM) specification, which is published by theTrusted Computing Group.Titan's cryptographic sealing has an additional advantage, in that Titan brings a betterability to measure and attest to low-level firmware.
Cryptographic sealing comprises the following two controls:
- Encryption of sensitive data
- A policy that must be satisfied before the data can be decrypted
Sealed credentials
Google's credential infrastructure uses cryptographic sealing to encryptmachine credentials at rest with a key that is controlled by the machine'shardware root of trust. The encrypted credential private key, and thecorresponding certificate, is known as asealed credential. In addition tomachine credentials, Google uses this sealing mechanism to protect other piecesof sensitive data as well.
Each machine can decrypt and access its machine credential only if it cansatisfy a decryption policy that specifies what software the machine must havebooted. For example, sealing a machine's credential to a policy that specifiesthe intended release of the operating system kernel helps ensure that the machinecan't participate in its machine cluster unless it booted the intended kernelversion.
The decryption policy is enforced through a process calledmeasured boot.Every layer in the boot stack measures the next layer, and the machine atteststo this chain of measurements at the end of the boot. This measurement is oftena cryptographic hash.
Credential sealing process
This section describes the credential sealing and measured boot process used byGoogle machines. The following diagram illustrates this flow.
To seal a machine's credentials to a particular boot policy, the followingsteps happen:
- Google's machine automation infrastructure initiates a software updateon the machine. It passes the intended software versions to the credentialinfrastructure.
- Google's credential infrastructure requests a sealing key from Titan,policy-bound such that Titan only uses it if the machine boots into itsintended software.
- The credential infrastructure compares the returned key's policy withthe intent communicated to it by the machine automation infrastructure. Ifthe credential infrastructure is satisfied that the policy matches theintent, it issues a certified machine credential to the machine.
- The credential infrastructure encrypts this credential using the sealingkey that is procured in step 2.
- The encrypted credential is stored on disk for decryption by Titan onsubsequent boots.
Measured boot process
Google machines' boot stack consists of four layers, which are visualized inthe following diagram.
The layers are the following:
- Userspace: applications like daemons or workloads.
- System software: a hypervisor or kernel. The lowest level ofsoftware that provides an abstraction over hardware features likenetworking, the file system, or virtual memory to the userspace.
- Boot firmware: thefirmware that initializes the kernel, such as a BIOS and bootloader.
- Hardware root of trust: in Google machines, a Titan chip thatcryptographically measures the firmware and other low-level CPU services.
Throughout boot, each layer measures the next layer before passing control tothat layer. The machine's sealed credential is only made available to theoperating system if all measurements that are captured during boot conform tothe sealed credential's decryption policy, as specified by Google's credentialinfrastructure. Therefore, if the machine can perform operations with its sealedcredentials, that is evidence that the machine satisfied its measured bootpolicy. This process is a form of implicit attestation.
If a machine boots software that deviates from the intended state, the machinecannot decrypt and perform operations with the credentials that it needs tooperate within the fleet. Such machines cannot participate in workloadscheduling until machine management infrastructure triggers automated repairactions.
Recovering from vulnerabilities in the kernel
Suppose that a machine is running kernel version A, but security researchersfind that this kernel version has a vulnerability. In these scenarios, Googlepatches the vulnerability and rolls out an updated kernel version B to thefleet.
In addition to patching the vulnerability, Google also issues new machinecredentials to each machine in the fleet. As described inCredential sealing process,the new machine credentials are bound to a decryption policy that is onlysatisfied if kernel version B boots on the machine. Any machine that is notrunning its intended kernel cannot decrypt its new machine credentials, as theboot firmware measurements won't satisfy the machine's boot policy. As partof this process, the old machine credentials are also revoked.
As a result, these machines are unable to participate in their machine clusteruntil their kernel is updated to conform to the control plane's intent. Thesecontrols help ensure that machines running the vulnerable kernel version Acannot receive jobs or user data until they are upgraded to kernel version B.
Recovering from vulnerabilities in boot firmware
Suppose that there is a vulnerability in the boot firmware, instead of theoperating system kernel. The same controls described inRecovering from vulnerabilities in the kernel help Google recover from such a vulnerability.
Google's Titan chip measures a machine's boot firmware before it runs, so thatTitan can determine whether the boot firmware satisfies the machine credential'sboot policy. Any machine that is not running its intended boot firmware cannotobtain new machine credentials, and that machine cannot participate in itsmachine cluster until its boot firmware conforms to the control plane'sintent.
Recovering from vulnerabilities in root-of-trust firmware
RoTs are not immune to vulnerabilities, but Google's boot controls enablerecovery from bugs even at this layer of the boot stack, within the RoT's ownmutable code.
Titan's boot stack implements a secure and measured boot flow of its own. Whena Titan chip powers on, its hardware cryptographically measures Titan'sbootloader, which in turn measures Titan's firmware. Similarly to the machine'skernel and boot firmware, Titan firmware is cryptographically signed with aversion number. Titan's bootloader validates the signature and extracts theversion number of Titan firmware, feeding the version number to Titan'shardware-based key derivation subsystem.
Titan's hardware subsystem implements a versioned key derivation scheme,whereby Titan firmware with versionX can obtain chip-unique keys bound to allversions less than or equal to X. Titan hardware allows firmware with version Xto access keys that are bound to versions that are less than or equal to X, butthat are not greater than X. All secrets sealed to Titan, including the machinecredential, are encrypted using a versioned key.
Attestation and sealing keys are unique to each Titan chip. Unique keys letGoogle trust only those Titan chips that are expected to be running within aGoogle data center.
The following diagram shows Titan with version keys. The Version X+1 key cannotbe accessed by version X firmware, but all keys older than that areaccessible.
In the event of a severe vulnerability in Titan firmware, Google rolls out apatch with a greater version number, then issues new machine credentials thatare bound to the higher Titan firmware version. Any older, vulnerable Titanfirmware is unable to decrypt these new credentials. Therefore, if a machineperforms operations with its new credentials in production, Google can assertwith confidence that the machine's Titan chip booted up-to-date Titanfirmware.
Ensuring root of trust authenticity
The controls described in this document all rest on the functionality of thehardware RoT itself. Google's credential infrastructure relies on signaturesemitted by these RoTs to know whether the machine is running intended software.
It is critical, therefore, that the credential infrastructure can determinewhether a hardware RoT is authentic and whether the RoT is running up-to-datefirmware.
When each Titan chip is manufactured, it is programmed with unique entropy.Titan's low-level boot routine turns that entropy into a device-unique key. Asecure element on the Titan manufacturing line endorses this chip-unique keysuch that Google will recognize it as a legitimate Titan chip.
The following diagram illustrates this endorsement process.
When in production, Titan uses its device-unique key to endorse any signatureit emits. Titan chips use a flow that is similar toDevice Identifier Composition Engine (DICE).The endorsement includes Titan firmware's version information. This attestationhelps prevent an attacker from impersonating a signature that is emitted by aTitan chip, and from rolling back to older Titan firmware and impersonatingnewer Titan firmware. These controls help Google verify that signatures receivedfrom Titan were emitted by authentic Titan hardware running authentic Titanfirmware.
Building on boot integrity
This paper described mechanisms for ensuring that machines' applicationprocessors boot intended code. These mechanisms rely on a measured boot flow,coupled with a hardware root-of-trust chip.
Google's threat model includes attackers who may physically interpose on thebus between the CPU and RoT, with the goal of improperly obtaining the machine'sdecrypted credential. To help minimize this risk, Google is driving development of astandards-based approach for defeating active interposers, bringing together theTPM andDPE APIs from Trusted Computing Group and theCaliptra integrated root of trust.
What's next
- For information about how Google helps ensure the integrity of complexdisaggregated machines' boot stacks, seeRemote attestation of disaggregated machines.
- For overview information about Google's security infrastructure, seeGoogle infrastructure security design overview.
- For more on how Google is contributing Titan security solutions toindustry standards, see theTPM Attested Boot in Big, Distributed Environments talk on the Trusted Computing Group YouTube channel.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.