Confidential computing for data analytics, AI, and federated learning

This document provides a general overview of confidential computing, includinghow you can use it for secure data collaboration, AI model training, andfederated learning. The document also provides information about theConfidential Computing services inGoogle Cloud and architecture references for different use cases.

This document is intended to help technology executives understand the businesspotential of confidential computing with generative AI and applied AI acrossvarious industries, including financial services and healthcare.

What is confidential computing?

Data security practices have conventionally centered on protecting data at restand in transit through encryption. Confidential computing adds a new layer ofprotection by addressing the vulnerability of data during its active use. Thistechnology ensures that sensitive information remains confidential even as it isbeing processed, thus helping to close a critical gap in data security.

A confidential computing environment implements the protection of data in usewith a hardware-basedtrusted execution environment(TEE). A TEE is asecure area within a processor that protects the confidentiality and integrityof code and data loaded inside it. TEE acts as a safe room for sensitiveoperations, which mitigates risk to data even if the system is compromised. Withconfidential computing, data can be kept encrypted in memory during processing.

For example, you can use confidential computing for data analytics and machinelearning to help achieve the following:

  • Enhanced privacy: Perform analysis on sensitive datasets (for example,medical records or financial data) without exposing the data to the underlyinginfrastructure or the parties that are involved in the computation.
  • Secure collaboration: Jointly train machine learning models or performanalytics on the combined datasets of multiple parties without revealingindividual data to each other. Confidential computing fosters trust andenables the development of more robust and generalizable models, particularlyin sectors like healthcare and finance.
  • Improved data security: Mitigate the risk of data breaches andunauthorized access, ensuring compliance with data protection regulations —such as the General Data Protection Regulation (GDPR) or the Health InsurancePortability and Accountability Act (HIPAA).
  • Increased trust and transparency: Provide verifiable proof thatcomputations are performed on the intended data and in a secure environment,increasing trust among stakeholders.

How a confidential computing environment works

Confidential computing environments have the following properties:

  • Runtime encryption: The processor keeps all confidential computingenvironment data encrypted in memory. Any system component or hardwareattacker that attempts to read confidential computing environment datadirectly from memory only sees encrypted data. Likewise, encryption preventsthe modification of confidential computing environment data through directaccess to memory.
  • Isolation: The processor blocks software-based access to the confidentialcomputing environment. The operating system and other applications can onlycommunicate with the confidential computing environment over specificinterfaces.
  • Attestation: In the context of confidential computing, attestationverifies the trustworthiness of the confidential computing environment. Usingattestation, users can see the evidence that confidential computing issafeguarding their data because attestation lets you authenticate the TEEinstance.

    During the attestation process, the CPU chip that supports the TEE produces acryptographically signed report (known as anattestation report) of themeasurement of the instance. The measurement is then sent to an attestationservice. An attestation for process isolation authenticates an application. Anattestation for VM isolation authenticates a VM, the virtual firmware that isused to launch the VM, or both.

  • Data lifecycle security: Confidential computing creates a secureprocessing environment to provide hardware-backed protection for data in use.

Confidential computing technology

The following technologies enable confidential computing:

  • Secure enclaves, also known asapplication-based confidential computing
  • Confidential VMs and GPUs, also known asVM-based confidential computing

Google Cloud uses Confidential VM to enable confidentialcomputing. For more information, seeImplement confidential computing onGoogle Cloud.

Secure enclaves

A secure enclave is a computing environment that provides isolation for code anddata from the operating system using hardware-based isolation or isolating anentire VM by placing the hypervisor within the trusted computing base (TCB).Secure enclaves are designed to ensure that even users with physical or rootaccess to the machines and operating system can't learn the contents of secureenclave memory or tamper with the execution of code inside the enclave. Anexample of a secure enclave is Intel Software Guard Extension (SGX).

Confidential VMs and confidential GPUs

A confidential VM is a type of VM that uses hardware-based memory encryption tohelp protect data and applications. Confidential VM offers isolation andattestation to improve security. Confidential VM computing technologies includeAMD SEV, AMD SEV-SNP, Intel TDX, Arm CCA, IBM Z, IBM LinuxONE, and NvidiaConfidential GPU.

Confidential GPUs help protect data and accelerate computing, especially incloud and shared environments. They use hardware-based encryption and isolationtechniques to help protect data while it's being processed on the GPU, ensuringthat even the cloud provider or malicious actors cannot access sensitiveinformation.

Confidential data analytics, AI, and federated learning use cases

The following sections provide examples of confidential computing use cases forvarious industries.

Healthcare and life sciences

Confidential computing enables secure data sharing and analysis acrossorganizations while preserving patient privacy. Confidential computing letshealthcare organizations participate in collaborative research, diseasemodeling, drug discovery, and personalized treatment plans.

The following table describes some example uses for confidential computing inhealthcare.

Use caseDescription

Disease prediction and early detection

Hospitals train a federated learning model to detect cancerous lesionsfrom medical imaging data (for example, MRI scans or CT scans across multiplehospitals or hospital regions) while maintaining patientconfidentiality.

Real-time patient monitoring

Health care providers analyze data from wearable health devices andmobile health apps for real-time monitoring and alerts. For example, wearabledevices collect data on glucose levels, physical activity, and dietary habits toprovide personalized recommendations and early warnings for blood sugarfluctuations.

Collaborative drug discovery

Pharmaceutical companies train models on proprietary datasets toaccelerate drug discovery, enhancing collaboration while protecting intellectualproperty.

Financial services

Confidential computing lets financial institutions create a more secure andresilient financial system.

The following table describes some example uses for confidential computing infinancial services.

Use caseDescription

Financial crimes

Financial institutions can collaborate on anti-money laundering (AML) orgeneral fraud model efforts by sharing information about suspicious transactionswhile protecting customer privacy. Using confidential computing, institutionscan analyze this shared data in a secure manner, and train the models toidentify and disrupt complex money laundering schemes more effectively.

Privacy-preserving credit risk assessment

Lenders can assess credit risk using a wider range of data sources,including data from other financial institutions or even non-financial entities.Using confidential computing, lenders can access and analyze this data withoutexposing it to unauthorized parties, enhancing the accuracy of credit scoringmodels while maintaining data privacy.

Privacy-preserving pricing discovery

In the financial world, especially in areas like over-the-counter marketsor illiquid assets, accurate pricing is crucial. Confidential computing letsmultiple institutions calculate accurate prices collaboratively, withoutrevealing their sensitive data to each other.

Public sector

Confidential computing lets governments create more transparent, efficient, andeffective services, while retaining control and sovereignty of their data.

The following table describes some example uses for confidential computing inthe public sector.

Use caseDescription

Digital sovereignty

Confidential computing ensures that data is always encrypted, even whilebeing processed. It enables secure cloud migrations of citizens' data, with databeing protected even when hosted on external infrastructure, across hybrid,public, or multi-cloud environments. Confidential computing supports andempowers digital sovereignty and digital autonomy, with additional data controland protection for data in use so that encryption keys are not accessible by thecloud provider.

Multi-agency confidential analytics

Confidential computing enables multi-party data analytics across multiplegovernment agencies (for example, health, tax, and education), or acrossmultiple governments in different regions or countries. Confidential computinghelps ensure that trust boundaries and data privacy are protected, whileenabling data analytics (using data loss prevention (DLP), large-scaleanalytics, and policy engines) and AI training and serving.

Trusted AI

Government data is critical and can be used to train private AI models ina trusted way to improve internal services as well as citizen interactions.Confidential computing allows for trusted AI frameworks, with confidentialprompting or confidential retrieval augmented generation (RAG) training to keepcitizen data and models private and secure.

Supply chain

Confidential computing lets organizations manage their supply chain andsustainability collaborate and share insights while maintaining data privacy.

The following table describes some example uses for confidential computing insupply chains.

Use caseDescription

Demand forecasting and inventory optimization

With confidential computing, each business trains their own demandforecasting model on their own sales and inventory data. These models are thensecurely aggregated into a global model, providing a more accurate and holisticview of demand patterns across the supply chain.

Privacy-preserving supplier risk assessment

Each organization involved in supplier risk assessment (for example,buyers, financial institutions, and auditors) trains their own risk-assessmentmodel on their own data. These models are aggregated to create a comprehensiveand privacy-preserving supplier risk profile, thereby enabling earlyidentification of potential supplier risks, improved supply-chain resilience,and better decision making in supplier selection and management.

Carbon footprint tracking and reduction

Confidential computing offers a solution for tackling the challenges ofdata privacy and transparency in carbon footprint tracking and reductionefforts. Confidential computing lets organizations share and analyze datawithout revealing its raw form, which empowers organizations to make informeddecisions and take effective action towards a more sustainable future.

Digital advertising

Digital advertising has moved away from third-party cookies and towards moreprivacy-safe alternatives, likePrivacy Sandbox.Privacy Sandbox supports critical advertising use cases while limitingcross-site and application tracking. Privacy Sandbox uses TEEs to ensure secureprocessing of users' data by advertising firms.

You can useTEEsin the following digital advertising use cases:

  • Matching algorithms: Finding correspondences or relationships withindatasets.
  • Attribution: Linking effects or events back to their likely causes.
  • Aggregation: Calculating summaries or statistics from the raw data.

Implement confidential computing on Google Cloud

Google Cloud includes the following services that enable confidentialcomputing:

  • Confidential VM: Enable encryption of data in use for workloads that useVMs
  • Confidential GKE: Enable encryption of data in use for workloads that usecontainers
  • Confidential Dataflow: Enable encryption of data in use for streaminganalytics and machine learning
  • Confidential Dataproc: Enable encryption of data in use for dataprocessing
  • Confidential Space: Enable encryption of data in use for joint dataanalysis and machine learning

These services let you reduce yourtrustboundary so that fewer resourceshave access to your confidential data. For example, in a Google Cloudenvironment without Confidential Computing, the trust boundary includes theGoogle Cloud infrastructure (hardware, hypervisor, and host OS) and theguest OS. In a Google Cloud environment that includesConfidential Computing (without Confidential Space), the trustboundary includes only the guest OS and the application. In a Google Cloudenvironment with Confidential Space, the trust boundary is just theapplication and its associated memory space. The following table shows how thetrust boundary is reduced with Confidential Computing andConfidential Space.

ElementsWithin trust boundary without using Confidential ComputingWithin trust boundary when using Confidential ComputingWithin trust boundary when usingConfidential Space

Cloud stack and administrators

Yes

No

No

BIOS and firmware

Yes

No

No

Host OS and hypervisor

Yes

No

No

VM guest admin

Yes

Yes

No

VM guest OS

Yes

Yes

Yes, measured and attested

Applications

Yes

Yes

Yes, measured and attested

Confidential data

Yes

Yes

Yes

Confidential Spacecreates a secure area within a VM to provide the highest level of isolation andprotection for sensitive data and applications. The main security benefits ofConfidential Space include the following:

  • Defense in depth: Adds an extra layer of security on top of existingconfidential computing technologies.
  • Reduced attack surface: Isolates applications from potentialvulnerabilities in the guest OS.
  • Enhanced control: Provides granular control over access and permissionswithin the secure environment.
  • Stronger trust: Offers higher assurance of data confidentiality andintegrity.

Confidential Space is designed for handling highly sensitiveworkloads, especially in regulated industries or scenarios involving multi-partycollaborations where data privacy is paramount.

Architecture references for confidential analytics, AI, and federated learning

You can implement confidential computing on Google Cloud to address thefollowing use cases:

  • Confidential analytics
  • Confidential AI
  • Confidential federated learning

The following sections provide more information about the architecture for theseuse cases, including examples for financial and healthcare businesses.

Confidential analytics architecture for healthcare institutions

The confidential analytics architecture demonstrates how multiple healthcareinstitutions (such as providers, biopharmaceutical, and research institutions)can work together to accelerate drug research. This architecture usesconfidential computing techniques to create a digital clean room for runningconfidential collaborative analytics.

This architecture has the following benefits:

  • Enhanced insights: Collaborative analytics lets health organizations gainbroader insights and decrease time to market for enhanced drug discovery.
  • Data privacy: Sensitive transaction data remains encrypted and is neverexposed to other participants or the TEE, ensuring confidentiality.
  • Regulatory compliance: The architecture helps health institutions complywith data protection regulations by maintaining strict control over theirdata.
  • Trust and collaboration: The architecture enables secure collaborationbetween competing institutions, fostering a collective effort to discoverdrugs.

The following diagram shows this architecture.

Diagram of confidential analytics architecture for healthcare institutions.

The key components in this architecture include the following:

  • TEE OLAP aggregation server: A secure, isolated environment where machinelearning model training and inference occur. Data and code within the TEE areprotected from unauthorized access, even from the underlying operating systemor cloud provider.
  • Collaboration partners: Each participating health institution has a localenvironment that acts as an intermediary between the institution's privatedata and the TEE.
  • Provider-specific encrypted data: Each healthcare institution stores itsown private, encrypted patient data that includes electronic health records.This data remains encrypted during the analytics process, which ensures dataprivacy. The data is only released to the TEE after validating the attestationclaims from the individual providers.
  • Analytics client: Participating health institutions can run confidentialqueries against their data to gain immediate insights.

Confidential AI architecture for financial institutions

This architectural pattern demonstrates how financial institutions cancollaboratively train a fraud detection model while using fraud labels topreserve the confidentiality of their sensitive transaction data. Thearchitecture uses confidential computing techniques to enable secure,multi-party machine learning.

This architecture has the following benefits:

  • Enhanced fraud detection: Collaborative training uses a larger, morediverse dataset, leading to a more accurate and effective fraud detectionmodel.
  • Data privacy: Sensitive transaction data remains encrypted and is neverexposed to other participants or the TEE, ensuring confidentiality.
  • Regulatory compliance: The architecture helps financial institutionscomply with data protection regulations by maintaining strict control overtheir data.
  • Trust and collaboration: This architecture enables secure collaborationbetween competing institutions, fostering a collective effort to combatfinancial fraud.

The following diagram shows this architecture.

Diagram of confidential analytics architecture for financial institutions.

The key components of this architecture include the following:

  • TEE OLAP aggregation server: A secure, isolated environment where machinelearning model training and inference occur. Data and code within the TEE areprotected from unauthorized access, even from the underlying operating systemor cloud provider.
  • TEE model training: The global fraud base model is packaged as containersto run the ML training. Within the TEE, the global model is further trainedusing the encrypted data from all participating banks. The training processemploys techniques like federated learning or secure multi-party computationto ensure that no raw data is exposed.
  • Collaborator partners: Each participating financial institution has alocal environment that acts as an intermediary between the institution'sprivate data and the TEE.
  • Bank-specific encrypted data: Each bank holds its own private, encryptedtransaction data that includes fraud labels. This data remains encryptedthroughout the entire process, ensuring data privacy. The data is onlyreleased to the TEE after validating the attestation claims from individualbanks.
  • Model repository: A pre-trained fraud detection model that serves as thestarting point for collaborative training.
  • Global fraud trained model and weights (symbolized by the green line): Theimproved fraud detection model, along with its learned weights, is securelyexchanged back to the participating banks. They can then deploy this enhancedmodel locally for fraud detection on their own transactions.

Confidential federated learning architecture for financial institutions

Federated learning offers an advanced solution for customers who value stringentdata privacy and data sovereignty. The confidential federated learningarchitecture provides a secure, scalable, and efficient way to use data for AIapplications. This architecture brings the models to the location where the datais stored, rather than centralizing the data in a single location, therebyreducing the risks associated with data leakage.

This architectural pattern demonstrates how multiple financial institutions cancollaboratively train a fraud detection model while preserving theconfidentiality of their sensitive transaction data with fraud labels. It usesfederated learning along with confidential computing techniques to enablesecure, multi-party machine learning without training data movement.

This architecture has the following benefits:

  • Enhanced data privacy and security: Federated learning enables dataprivacy and data locality by ensuring that sensitive data remains at eachsite. Additionally, financial institutions can use privacy preservingtechniques such as homomorphic encryption and differential privacy filters tofurther protect any transferred data (such as the model weights).
  • Improved accuracy and diversity: By training with a variety of datasources across different clients, financial institutions can develop a robustand generalizable global model to better represent heterogeneous datasets.
  • Scalability and network efficiency: With the ability to perform trainingat the edge, institutions can scale federated learning across the globe.Additionally, institutions only need to transfer the model weights rather thanentire datasets, which enables efficient use of network resources.

The following diagram shows this architecture.

Diagram of confidential federated learning architecture.

The key components of this architecture include the following:

  • Federated server in the TEE cluster: A secure, isolated environment wherethe federated learning server orchestrates the collaboration of multipleclients by first sending an initial model to the federated learning clients.The clients perform training on their local datasets, then send the modelupdates back to the federated learning server for aggregation to form a globalmodel.
  • Federated learning model repository: A pre-trained fraud detection modelthat serves as the starting point for federated learning.
  • Local application inference engine: An application that executes tasks,performs local computation and learning with local datasets, and submitsresults back to federated learning server for secure aggregation.
  • Local private data: Each bank holds its own private, encrypted transactiondata that includes fraud labels. This data remains encrypted throughout theentire process, ensuring data privacy.
  • Secure aggregation protocol (symbolized by the dotted blue line): Thefederated learning server doesn't need to access any individual bank's updateto train the model; it requires only the element-wise weighted averages of theupdate vectors, taken from a random subset of banks or sites. Using a secureaggregation protocol to compute these weighted averages helps ensure that theserver can learn only that one or more banks in this randomly selected subsetwrote a given word, but not which banks, thereby preserving the privacy ofeach participant in the federated learning process.
  • Global fraud-trained model and aggregated weights (symbolized by the greenline): The improved fraud detection model, along with its learned weights,is securely sent back to the participating banks. The banks can then deploythis enhanced model locally for fraud detection on their own transactions.

What's next

Contributors

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2024-12-20 UTC.