Google infrastructure security design overview

This content was last updated in June 2024, and represents the status quo asof the time it was written. Google's security policies and systems may changegoing forward, as we continually improve protection for our customers.

Download PDF version

Introduction

This document provides an overview of how security is designed into Google'stechnical infrastructure. It is intended for security executives, securityarchitects, and auditors.

This document describes the following:

  • Google's global technical infrastructure, which is designed to providesecurity through the entire information processing lifecycle at Google.This infrastructure helps provide the following:
    • Secure deployment of services
    • Secure storage of data with end-user privacy safeguards
    • Secure communication between services
    • Secure and private communication with customers over the internet
    • Safe operation by Google engineers
  • How we use this infrastructure to build internet services, includingconsumer services such as Google Search, Gmail, andGoogle Photos, and enterprise services such as Google Workspace andGoogle Cloud.
  • The security products and services that are the result of innovationsthat we implemented internally to meet our security needs. For example,BeyondCorp is the direct result of our internal implementation of thezero-trust security model.
  • How the security of the infrastructure is designed in progressivelayers. These layers include the following:

The remaining sections of this document describe the security layers.

Secure low-level infrastructure

This section describes how we secure the physical premises of our data centers,the hardware in our data centers, and the software stack running on thehardware.

Security of physical premises

We design and build our own data centers, which incorporate multiple layers ofphysical security. Access to these data centers is tightly controlled. We usemultiple physical security layers to protect our data center floors. We usebiometric identification, metal detection, cameras, vehicle barriers, andlaser-based intrusion detection systems. For more information, seeData center security.

Inside the data center, we implement additional controls to ensure that physicalaccess to servers is protected and monitored. For more information, seeHowGoogle protects the physical-to-logical space in a datacenter.

We also host some servers in third-party data centers. In these data centers, wealign with the same regulatory standards as we do in our own data centers. Weensure that there are Google-controlled physical security measures andGoogle-controlled connectivity on top of the security layers that are providedby the data center operator. For example, we operate biometric identificationsystems, cameras, and metal detectors that are independent from the securitylayers that the data center operator provides.

Unless specifically stated, the security controls in this document apply to bothGoogle owned data centers and third-party data centers.

Hardware design and provenance

Google data centers consist of thousands of servers connected to a localnetwork. We design the server boards and the networking equipment. We vet thecomponent vendors that we work with and choose components with care. We workwith vendors to audit and validate the security properties that are provided bythe components. We also design custom chips, including a hardware security chip(calledTitan),that we deploy on servers, devices, and peripherals. These chips let us identifyand authenticate legitimate Google devices at the hardware level and serve ashardware roots of trust.

Note: Variants of the Titan hardware chip arealso used inPixel devices and theTitan Security Key.

Secure boot stack and machine identity

Google servers use various technologies to ensure that they boot the intendedsoftware stack. At each step in the boot process, Google implementsindustry-leading controls to help enforce the boot state that we expect and tohelp keep customer data safe.

We strive to continually improve our servers with each successive hardwaregeneration, and to bring these improvements to the rest of the industry throughengagement in standards processes with Trusted Computing Group and DMTF.

Each server in the data center has its own unique identity. This identity canbe tied to the hardware roots of trust and the software that the machineboots. This identity is used to authenticate API calls to and from low-levelmanagement services on the machine. This identity is also used for mutual serverauthentication and transport encryption. We developed theApplication Layer Transport Security (ALTS) system for securing remote procedure call (RPC) communications within ourinfrastructure. These machine identities can be centrally revoked to respond toa security incident. In addition, their certificates and keys are routinelyrotated, and old ones revoked.

We developed automated systems to do the following:

  • Ensure that servers run up-to-date versions of their software stacks(including security patches).
  • Detect and diagnose hardware and software problems.
  • Ensure the integrity of the machines and peripherals with verified bootand attestation.
  • Ensure that only machines running the intended software and firmware canaccess credentials that allow them to communicate on the production network.
  • Remove or repair machines if they don’t pass the integrity check or whenthey're no longer needed.

For more information about how we secure our boot stack and machine integrity,seeHow Google enforces boot integrity on productionmachines andRemote attestation ofdisaggregated machines.

Secure service deployment

Google services are the application binaries that our developers write and runon our infrastructure. Examples of Google services are Gmailservers, Spanner databases, Cloud Storage servers, YouTubevideo transcoders, and Compute Engine VMs running customer applications. Tohandle the required scale of the workload, thousands of machines might berunning binaries of the same service. A cluster orchestration service, calledBorg,controls the services that are running directly on the infrastructure.

The infrastructure does not assume any trust between the services that arerunning on the infrastructure. This trust model is referred to as azero-trustsecurity model. A zero-trust security model means that no devices or users aretrusted by default, whether they are inside or outside of the network.

Because the infrastructure is designed to be multi-tenant, data from ourcustomers (consumers, businesses, and even our own data) is distributed acrossshared infrastructure. This infrastructure is composed of tens of thousands ofhomogeneous machines. The infrastructure does not segregate customer data onto asingle machine or set of machines, except in specific circumstances, such aswhen you are using Google Cloud to provision VMs onsole-tenant nodes for Compute Engine.

Google Cloud and Google Workspace support regulatory requirementsaround data residency. For more information about data residency andGoogle Cloud, seeRestricting resource locations.For more information about data residency and Google Workspace, seeData regions: Choose a geographic location for your data.

Service identity, integrity, and isolation

To enable inter-service communication, applications use cryptographicauthentication and authorization. Authentication and authorization providestrong access control at an abstraction level and granularity thatadministrators and services can understand.

Services do not rely on internal network segmentation or firewalling as theprimary security mechanism. Ingress and egress filtering at various points inour network helps prevent IP spoofing. This approach also helps us to maximizeour network's performance and availability. For Google Cloud, you can addadditional security mechanisms such asVPC Service Controls andCloud Interconnect.

Each service that runs on the infrastructure has an associated service accountidentity. A service is provided with cryptographic credentials that it can useto prove its identity to other services when making or receiving RPCs. Theseidentities are used in security policies. The security policies ensure thatclients are communicating with the intended server, and that servers arelimiting the methods and data that particular clients can access.

We use various isolation and sandboxing techniques to help protect a servicefrom other services running on the same machine. These techniques include Linuxuser separation, language-based (such as theSandboxedAPI) andkernel-based sandboxes, application kernel for containers (such asgVisor), and hardware-based virtualization. In general, we use more layers ofisolation for riskier workloads. Riskier workloads include workloads thatprocess unsanitized input from the internet. For example, riskier workloadsinclude running complex file converters on untrusted input or running arbitrarycode as a service for products like Compute Engine.

For extra security, sensitive services, such as the cluster orchestrationservice and some key management services, run exclusively on dedicated machines.

In Google Cloud, to provide stronger cryptographic isolation for yourworkloads and to protect data in use, we supportConfidential Computing services forCompute Engine virtual machine (VM) instances andGoogle Kubernetes Engine (GKE) nodes.

Inter-service access management

The owner of a service can manage access by creating a list of other servicesthat can communicate with the service. This access management feature isprovided by Google infrastructure. For example, a service can restrict incomingRPCs solely to an allowed list of other services. The owner can also configurethe service with an allowed list of service identities, which the infrastructureenforces automatically. Enforcement includes audit logging, justifications, andunilateral access restriction (for engineer requests, for example).

Google engineers who need access to services are also issued individualidentities. Services can be configured to allow or deny their access based ontheir identities. All of these identities (machine, service, and employee) arein a global namespace that the infrastructure maintains.

To manage these identities, the infrastructure provides a workflow system thatincludes approval chains, logging, and notification. For example, the securitypolicy can enforce multi-party authorization. This system uses the two-personrule to ensure that an engineer acting alone cannot perform sensitive operationswithout first getting approval from another, authorized engineer. This systemallows secure access-management processes to scale to thousands of servicesrunning on the infrastructure.

The infrastructure also provides services with the canonical service for user,group, and membership management so that they can implement custom,fine-grained access control where necessary.

End-user identities are managed separately, as described inAccess management of end-user data in Google Workspace.

Encryption of inter-workload communication

The infrastructure provides confidentiality and integrity for RPC data on thenetwork. All Google Cloud virtual networking traffic is encrypted. Communicationbetween Google Cloud infrastructure workloads isencrypted, with exemptions that are granted only for high-performance workloadswhere traffic doesn't cross the multiple layers of physical security at the edgeof a Google data center. Communicationbetween Google Cloud infrastructure services has cryptographic integrityprotection.

The infrastructure automatically and efficiently (with help of hardware offload)provides end-to-end encryption for the infrastructure RPC traffic that goes overthe network between data centers.

Access management of end-user data in Google Workspace

A typical Google Workspace service is written to do something for anend user. For example, an end user can store their email onGmail. The end user's interaction with an application likeGmail might span other services within the infrastructure. Forexample, Gmail might call a People API to access the enduser's address book.

TheEncryption of inter-service communication section describes how a service (such as Google Contacts) is designed to protectRPC requests from another service (such as Gmail).However, this level of access control is still a broad set of permissionsbecause Gmail is able to request the contacts of any user at anytime.

When Gmail makes an RPC request to Google Contacts on behalf ofan end user, the infrastructure lets Gmail present an end-userpermission ticket in the RPC request. This ticket proves thatGmail is making the RPC request on behalf of that particular enduser. The ticket enables Google Contacts to implement a safeguard so that itonly returns data for the end user named in the ticket.

The infrastructure provides a central user identity service that issues theseend-user context tickets. The identity service verifies the end-user loginand then issues a user credential, such as a cookie or OAuth token, to theuser's device. Every subsequent request from the device to our infrastructuremust present that end-user credential.

When a service receives an end-user credential, the service passes thecredential to the identity service for verification. If the end-user credentialis verified, the identity service returns a short-lived end-user contextticket that can be used for RPCs related to the user's request. In our example,the service that gets the end-user context ticket is Gmail,which passes the ticket to Google Contacts. From that point on, for anycascading calls, the calling service can send the end-user context ticket tothe callee as a part of the RPC.

The following diagram shows how Service A and Service B communicate. Theinfrastructure provides service identity, automatic mutual authentication,encrypted inter-service communication, and enforcement of the access policiesthat are defined by the service owner. Each service has a service configuration,which the service owner creates. For encrypted inter-service communication,automatic mutual authentication uses caller and callee identities. Communicationis only possible when an access rule configuration permits it.

Diagram that shows how Service A and Service B communicate.

For information about access management in Google Cloud, seeIAM overview.

Access management of end-user data in Google Cloud

Similar toAccess management of end-user data in Google Workspace,the infrastructure provides a central user identity service that authenticatesservice accounts and issues end-user context tickets after a service account isauthenticated. Access management between Google Cloud services istypically done withservice agents rather than using end-user context tickets.

Google Cloud uses Identity and Access Management (IAM) and context-aware productssuch as Identity-Aware Proxy to let you manage access to the resources in yourGoogle Cloud organization. Requests to Google Cloud services gothrough IAM to verify permissions.

The access management process is as follows:

  1. A request comes in through theGoogle Front End service or the Cloud Front End service for customer VMs.
  2. The request is routed to the central user identity service thatcompletes the authentication check and issues the end-user context tickets.
  3. The request is also routed to check for items such as the following:
  4. After all of these checks pass, the Google Cloud backend services arecalled.

For information about access management in Google Cloud, seeIAM overview.

Secure data storage

This section describes how we implement security for data that is stored on theinfrastructure.

Encryption at rest

Google's infrastructure provides various storage services and distributed filesystems (for example, Spanner andColossus), and a central key management service. Applicationsat Google access physical storage by using storage infrastructure. We useseveral layers of encryption to protect data at rest. By default, the storageinfrastructure encrypts user data before the user data is written to physicalstorage.

The infrastructure performs encryption at the application or storageinfrastructure layer. The keys for this encryption are managed and owned byGoogle. Encryption lets the infrastructure isolate itself from potential threatsat the lower levels of storage, such as malicious disk firmware. Whereapplicable, we also enable hardware encryption support in our hard drives andSSDs, and we meticulously track each drive through its lifecycle. Before adecommissioned, encrypted storage device can physically leave our custody, thedevice is cleaned by using a multi-step process that includes two independentverifications. Devices that do not pass this cleaning process are physicallydestroyed (that is, shredded) on-premises.

In addition to the encryption done by the infrastructure withGoogle-owned and Google-managed encryption keys, Google Cloud andGoogle Workspace provide key management services for keys that you canown and manage. ForGoogle Cloud,Cloud KMS is a cloud service that lets you create your own cryptographic keys,including hardware-based FIPS 140-3 L3 certified keys. These keys are specificto you, not to the Google Cloud service, and you can manage thekeys according to your policies and procedures. ForGoogle Workspace, you can use client-side encryption. For moreinformation, seeClient-side encryption and strengthened collaboration in Google Workspace.

Deletion of data

Deletion of cryptographic material or data typically starts with markingspecific keys or data as scheduled for deletion. The process for marking data fordeletion takes into account the service-specific policies and the customer’sspecific policies.

By scheduling the data for deletion or disabling the keys first, we can recoverfrom unintentional deletions, whether the deletions are customer-initiated, aredue to a bug, or are the result of an internal process error.

When an end user deletes their account, the infrastructure notifies the servicesthat are handling the end-user data that the account has been deleted. Theservices can then schedule the data that is associated with the deleted end-useraccount for deletion. This feature enables an end user to control their owndata.

For more information, seeData deletion onGoogle Cloud. For informationabout how to use Cloud Key Management Service to disable your own keys, seeDestroy and restorekey versions.

Secure internet communication

This section describes how we secure communication between the internet and theservices that run on Google infrastructure.

As discussed inHardware design and provenance,the infrastructure consists of many physical machines that are interconnectedover the LAN and WAN. The security of inter-service communication is notdependent on the security of the network. However, we isolate our infrastructurefrom the internet into a private IP address space. We only expose a subset ofthe machines directly to external internet traffic so that we can implementadditional protections such as defenses against denial of service (DoS)attacks.

Google Front End service

When a service must make itself available on the internet, it can registeritself with an infrastructure service called the Google Front End (GFE). The GFEensures that all TLS connections are terminated with correct certificates and byfollowing best practices such as supporting perfect forward secrecy. The GFEalso applies protections against DoS attacks. The GFE then forwards requests forthe service by using the RPC security protocol discussed inAccess management of end-user data in Google Workspace.

In effect, any internal service that must publish itself externally uses the GFEas a smart reverse-proxy frontend. The GFE provides public IP address hosting ofits public DNS name, DoS protection, and TLS termination. GFEs run on theinfrastructure like any other service and can scale to match incoming requestvolumes.

When customer VMs in Google Cloud VPC networks access Google APIs and servicesthat are hosted directly on Borg, the customer VMs communicate with specificGFEs that are calledCloud Front Ends. To minimize latency, Cloud Front Endsare located within the same cloud region as the customer VM. Network routingbetween customer VMs and Cloud Front Ends doesn’t require that the customer VMshave external IP addresses. WhenPrivate Google Access is enabled, customerVMs with only internal IP addresses can communicate with the external IPaddresses for Google APIs and services using Cloud Front Ends. All networkrouting between customer VMs, Google APIs, and services depend on next hopswithin Google's production network, even for customer VMs that have externalIP addresses.

DoS protection

The scale of our infrastructure enables it to absorb many DoS attacks. Tofurther reduce the risk of DoS impact on services, we have multi-tier,multi-layer DoS protections.

When our fiber-optic backbone delivers an external connection to one of our datacenters, the connection passes through several layers of hardware and softwareload balancers. These load balancers report information about incoming trafficto a central DoS service running on the infrastructure. When the central DoSservice detects a DoS attack, the service can configure the load balancers todrop or throttle traffic associated with the attack.

The GFE instances also report information about the requests that they arereceiving to the central DoS service, including application-layer informationthat the load balancers don't have access to. The central DoS service can thenconfigure the GFE instances to drop or throttle attack traffic.

User authentication

After DoS protection, the next layer of defense for secure communication comesfrom the central identity service. End users interact with this service throughthe Google login page. The service asks for a username and password, and it canalso challenge users for additional information based on risk factors. Examplerisk factors include whether the users have logged in from the same device orfrom a similar location in the past. After authenticating the user, the identityservice issues credentials such as cookies and OAuth tokens that can be used forsubsequent calls.

When users sign in, they can use second factors such as OTPs orphishing-resistant security keys such as theTitan Security Key.The Titan Security Key is a physical token that supports theFIDO Universal 2nd Factor (U2F).We helped develop the U2F open standard with the FIDO Alliance. Most webplatforms and browsers have adopted this open authentication standard.

Operational security

This section describes how we develop infrastructure software, protect ouremployees' machines and credentials, and defend against threats to theinfrastructure from both insiders and external actors.

Safe software development

Besidesthesource control protections and two-party review process described earlier, we use libraries that prevent developers from introducingcertain classes of security bugs. For example, we have libraries and frameworksthat help eliminate XSS vulnerabilities in web apps. We also use automated toolssuch as fuzzers, static analysis tools, and web security scanners toautomatically detect security bugs.

As a final check, we use manual security reviews that range from quick triagesfor less risky features to in-depth design and implementation reviews for themost risky features. The team that conducts these reviews includes expertsacross web security, cryptography, and operating system security. The reviewscan lead to the development of new security library features and new fuzzersthat we can use for future products.

In addition, we run aVulnerability Rewards Program that rewards anyone who discovers and informs us of bugs in our infrastructureor applications. For more information about this program, including the rewardsthat we've given, seeBug hunters key stats.

We also invest in finding zero-day exploits and other security issues in theopen source software that we use. We runProject Zero,which is a team of Google researchers who are dedicated to researching zero-dayvulnerabilities, includingSpectre and Meltdown.In addition, we are the largest submitter of CVEs and security bug fixes for theLinux KVM hypervisor.

Source code protections

Our source code is stored in repositories with built-in source integrity andgovernance, where both current and past versions of the service can be audited.The infrastructure requires that a service's binaries be built from specificsource code, after it is reviewed, checked in, and tested.Binary Authorization for Borg (BAB) is an internal enforcement check that happens when a service is deployed. BABdoes the following:

  • Ensures that the production software and configuration that is deployedat Google is reviewed and authorized, particularly when that code canaccess user data.
  • Ensures that code and configuration deployments meet certain minimumstandards.
  • Limits the ability of an insider or adversary to make maliciousmodifications to source code and also provides a forensic trail from aservice back to its source.

Keeping employee devices and credentials safe

We implement safeguards to help protect our employees' devices and credentialsfrom compromise. To help protect our employees against sophisticated phishingattempts, we have replaced OTP second-factor authentication with the mandatoryuse of U2F-compatible security keys.

We monitor the client devices that our employees use to operate ourinfrastructure. We ensure that the operating system images for these devices areup to date with security patches and we control the applications that employeescan install on their devices. We also have systems that scan user-installedapplications, downloads, browser extensions, and web browser content todetermine whether they are suitable for corporate devices.

Being connected to the corporate LAN is not our primary mechanism for grantingaccess privileges. Instead, we use zero-trust security to help protect employeeaccess to our resources. Access-management controls at the application levelexpose internal applications to employees only when employees use a manageddevice and are connecting from expected networks and geographic locations. Aclient device is trusted based on a certificate that's issued to the individualmachine, and based on assertions about its configuration (such as up-to-datesoftware). For more information, seeBeyondCorp.

Reducing insider risk

Insider risk is the potential of a current or former employee, contractor, orother business partner who has or had authorized access to our network, system,or data to misuse that access to undermine the confidentiality, integrity, oravailability of our information or information systems.

To help reduce insider risk, we limit and actively monitor the activities ofemployees who have been granted administrative access to the infrastructure. Wecontinually work to eliminate the need for privileged access for particulartasks by using automation that can accomplish the same tasks in a safe andcontrolled way. We expose limited APIs that allow debugging without exposingsensitive data, and we require two-party approvals for certain sensitive actionsperformed by human operators.

Google employee access to end-user information can be logged through low-levelinfrastructure hooks. Our security team monitors access patterns andinvestigates unusual events. For more information, seePrivileged access in Google Cloud.

Weusebinary authorization for Borg to help protect our supply chain from insider risk. In addition, our investmentinBeyondProd helps to protect user data in Google infrastructure and to establish trust in ourservices.

In Google Cloud, you can monitor access to your data usingAccess Transparency.Access Transparency logs let you verify that Google personnel are accessing yourcontent only for valid business reasons, such as fixing an outage or attendingto your support requests.Access Approval ensures that Cloud Customer Care and engineering require your explicit approvalwhen they need to access your data. The approval is cryptographicallyverified to ensure the integrity of the access approval.

For more information about production service protections, seeHow Googleprotects its productionservices.

Threat monitoring

TheThreat Analysis Groupat Google monitors threat actors and the evolution of their tactics andtechniques. The goals of this group are to help improve the safety and securityof Google products and share this intelligence for the benefit of the onlinecommunity.

For Google Cloud, you can useGoogle Cloud Threat Intelligence for Google Security Operations andVirusTotal to monitor and respond to many types of malware. Google Cloud ThreatIntelligence for Google Security Operations is a team of threat researchers who developthreat intelligence for use withGoogle Security Operations.VirusTotal is a malware database and visualization solution that you can use tobetter understand how malware operates within your enterprise.

For more information about our threat monitoring activities, see theThreat Horizons report.

Intrusion detection

We use sophisticated data processing pipelines to integrate host-based signalson individual devices, network-based signals from various monitoring points inthe infrastructure, and signals from infrastructure services. Rules and machineintelligence built on top of these pipelines give operational security engineerswarnings of possible incidents.Our investigation and incident-response teams triage, investigate, and respond to these potential incidents 24 hours a day,365 days a year. We conductRed Team exercises to measure and improve the effectiveness of our detection and responsemechanisms.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.