Use customer-managed encryption keys

By default, Dataflow encrypts customer content at rest. Dataflow handles encryption for you without any additional actions on your part. This option is calledGoogle default encryption.

If you want to control your encryption keys, then you can use customer-managed encryption keys (CMEKs) inCloud KMS with CMEK-integrated services including Dataflow. Using Cloud KMS keys gives you control over their protection level, location, rotation schedule, usage and access permissions, and cryptographic boundaries. Using Cloud KMS also letsyoutrack key usage, view audit logs, andcontrol key lifecycles. Instead of Google owning and managing the symmetrickey encryption keys (KEKs) that protect your data, you control and manage these keys in Cloud KMS.

After you set up your resources with CMEKs, the experience of accessing your Dataflow resources is similar to using Google default encryption. For more information about your encryption options, seeCustomer-managed encryption keys (CMEK).

You can create a batch or streaming pipeline that is protected with a CMEK or access CMEK-protected data in sources and sinks.

CMEK with Cloud KMS Autokey

You can either create CMEKs manually to protect your Dataflow resources or use Cloud KMS Autokey. With Autokey, key rings and keys are generated on demand aspart of resource creation or update in Dataflow. Service agents that use the keys for encrypt and decrypt operations are created if they don't already exist and are granted the required Identity and Access Management (IAM) roles. For more information, seeAutokey overview.

To learn how to usemanually-created CMEKs to protect your Dataflow resources, seeCreate a pipeline protected by Cloud KMS.

To use CMEKs created byCloud KMS Autokey to protect your Dataflow resources,use the steps provided for Dataflow atUsing Autokey withDataflow resourcesto generate a key, then followCreate a pipeline protected by Cloud KMS.

Cloud KMS quotas and Dataflow

When you use CMEK in Dataflow,your projects can consume Cloud KMS cryptographic requestsquotas. For example, Dataflow pipelines can consume thesequotas when your pipeline accesses CMEK-protected data in sources and sinksor when the state of a CMEK-encrypted pipeline is retrieved. For more information,see theEncryption of pipeline state locationssection in this page.

Encryption and decryption operationsusing CMEK keys affect Cloud KMS quotas in these ways:

  • For software CMEK keys generated in Cloud KMS, no Cloud KMS quota is consumed.
  • For hardware CMEK keys—sometimes called Cloud HSM keys—encryption and decryption operations count againstCloud HSM quotas in the project that contains the key.
  • For external CMEK keys—sometimes called Cloud EKM keys—encryption and decryption operations count againstCloud EKM quotas in the project that contains the key.

For more information, seeCloud KMS quotas.

Support and limitations

  • Cloud KMS is supported in the following Apache Beam SDK versions:

    • Java SDK versions 2.13.0 and later
    • Python SDK versions 2.13.0 and later
    • Go SDK versions 2.40.0 and later
  • Cloud KMS with Dataflow supportsregional keys. If youoverride the worker region or zone of the pipelineto use a region other than the one associated with your keys, regional keys don't work.

  • The region for your CMEK and theregionfor your Dataflow job must be the same.

  • Multi-region and global locations are not supported. You can't use global andmulti-regional keys with Dataflow pipelines.

Encryption of pipeline state artifacts

Data that a Dataflow pipeline reads from user-specifieddata sources is encrypted, except for the data keys that you specify forkey-based transforms in streaming jobs.

For batch jobs, all data, including data keys that you specify for key-basedtransforms, is always protected by CMEK encryption.

For streaming jobs created after March 7, 2024, all user data is encryptedwith CMEK.

For streaming jobs created before March 7, 2024, data keys used in key-basedoperations, such as windowing, grouping, and joining, are not protected by CMEKencryption. To enable this encryption for your jobs,drain or cancel the job, and thenrestart it.

Job metadata is not encrypted with Cloud KMS keys. Job metadataincludes the following:

  • User-supplied data, such as Job Names, Job Parameter values, and Pipeline Graph
  • System-generated data, such as Job IDs and IP addresses of workers

Encryption of pipeline state locations

The following storage locations are protected with Cloud KMS keys:

  • Persistent Disks attached to Dataflow workers and used for Persistent Disk-basedshuffle and streaming state storage.
  • DataflowShufflestate for batch pipelines.
  • Cloud Storage buckets that store temporary export or import data.Dataflow only supports default keys set by the user on thebucket level.
  • Cloud Storage buckets used to store binary files containingpipeline code. Dataflow only supports default keys set by theuser on the bucket level.
  • Cloud Storage buckets used to store sampled pipeline data, whendata sampling is enabled.
  • Dataflow Streaming Engine state for streaming pipelines.

External keys

You can useCloud External Key Manager (Cloud EKM) to encrypt data withinGoogle Cloud Platform using external keys that you manage.

When you use a Cloud EKM key, Google has no control over theavailability of your externally managed key. If the key becomes unavailableduring the job or pipeline creation period, your job or pipeline is canceled.

For more considerations when using external keys, seeCloud External Key Manager.

Before you begin

  1. Verify that you have the Apache Beam SDK for Java 2.13.0 or later,the Apache Beam SDK for Python 2.13.0 or later, or the Apache BeamSDK for Go 2.40.0 or later.

    For more information, seeInstalling the Apache Beam SDK.

  2. Decide whether you're going to run Dataflow andCloud KMS in the same Google Cloud Platform project orin different projects. This page uses the following convention:

    • PROJECT_ID is the project ID of the project thatis running Dataflow.
    • PROJECT_NUMBER is the project number of theproject that is running Dataflow.
    • KMS_PROJECT_ID is the project ID of theproject that is running Cloud KMS.

    For information about Google Cloud project IDs and project numbers, seeIdentifying projects.

  3. On the Google Cloud project that you want to run Cloud KMS:

    1. Enable the Cloud KMS API.
    2. Create a key ring and a key as described inCreating symmetric keys. Cloud KMSand Dataflow are both regionalized services. The region for your CMEK and theregionof your Dataflow job must be the same. Don't use global ormulti-regional keys with your Dataflow pipelines. Instead, useregional keys.

Grant Encrypter/Decrypter permissions

Note: If you use the Google Cloud console and theCreate job from template page, a prompt appears to grant the encrypt and decrypt permissions to your Compute Engine service account and Dataflow service account. If you follow that prompt to grant the permissions, you can skip this section, which describes how to manually set these permissions.
  1. Assign theCloud KMS CryptoKey Encrypter/Decrypterrole to theDataflow service account. This permission grants yourDataflow service account the permissionto encrypt and decrypt with the CMEK you specify. If you use theGoogle Cloud console and theCreate job from template page,this permission is granted automatically and you can skip thisstep.

    Use the Google Cloud CLI to assign the role:

    gcloudprojectsadd-iam-policy-bindingKMS_PROJECT_ID\--memberserviceAccount:service-PROJECT_NUMBER@dataflow-service-producer-prod.iam.gserviceaccount.com\--roleroles/cloudkms.cryptoKeyEncrypterDecrypter

    ReplaceKMS_PROJECT_ID with the ID of your Google Cloud project thatis running Cloud KMS, and replacePROJECT_NUMBER with theproject number (not project ID) of your Google Cloud project that isrunning the Dataflow resources.

  2. Assign theCloud KMS CryptoKey Encrypter/Decrypterrole to theCompute Engine service account. This permission grants yourCompute Engine service account the permissionto encrypt and decrypt with the CMEK you specify.

    Use the Google Cloud CLI to assign the role:

    gcloudprojectsadd-iam-policy-bindingKMS_PROJECT_ID\--memberserviceAccount:service-PROJECT_NUMBER@compute-system.iam.gserviceaccount.com\--roleroles/cloudkms.cryptoKeyEncrypterDecrypter

    ReplaceKMS_PROJECT_ID with the ID of your Google Cloud project thatis running Cloud KMS, and replacePROJECT_NUMBER with theproject number (not project ID) of your Google Cloud project that isrunning the Compute Engine resources.

Create a pipeline protected by Cloud KMS

When you create a batch or streaming pipeline, you can select aCloud KMS key to encrypt the pipeline state. The pipeline state is thedata that is stored by Dataflow in temporary storage.

Command-line interface

To create a new pipeline with pipeline state that is protected by a Cloud KMSkey, add the relevant flag to the pipeline parameters. The following exampledemonstratesrunning a word count pipelinewith Cloud KMS.

Use Autokey

If you haven't already done so,Enable Cloud KMS Autokey.

To use Autokey with pipelines created from the command line, followUsing Autokey with Dataflow resourcesto provision a key, then use it in place ofKMS_KEY.

Java

Dataflow does not support creating defaultCloud Storage paths for temporary files when using aCloud KMS key. SpecifyinggcpTempLocation is required.

mvncompileexec:java-Dexec.mainClass=org.apache.beam.examples.WordCount\-Dexec.args="--inputFile=gs://dataflow-samples/shakespeare/kinglear.txt \               --output=gs://STORAGE_BUCKET/counts \               --runner=DataflowRunner --project=PROJECT_ID \               --gcpTempLocation=gs://STORAGE_BUCKET/tmp \               --dataflowKmsKey=KMS_KEY"-Pdataflow-runner

Python

Dataflow does not support creating defaultCloud Storage paths for temporary files when using aCloud KMS key. SpecifyinggcpTempLocation is required.

python-mapache_beam.examples.wordcount\--inputgs://dataflow-samples/shakespeare/kinglear.txt\--outputgs://STORAGE_BUCKET/counts\--runnerDataflowRunner\--regionHOST_GCP_REGION\--projectPROJECT_ID\--temp_locationgs://STORAGE_BUCKET/tmp/\--dataflow_kms_key=KMS_KEY

Go

Dataflow does not support creating defaultCloud Storage paths for temporary files when using aCloud KMS key. SpecifyinggcpTempLocation is required.

wordcount--projectHOST_PROJECT_ID\--regionHOST_GCP_REGION\--runnerdataflow\--staging_locationgs://STORAGE_BUCKET/staging\--temp_locationgs://STORAGE_BUCKET/temp\--inputgs://dataflow-samples/shakespeare/kinglear.txt\--outputgs://STORAGE_BUCKET/output\--dataflow_kms_key=KMS_KEY

Google Cloud console

  1. Open the Dataflow monitoring interface.
    Go to the Dataflow Web Interface
  2. SelectCreate job from template.
  3. In theEncryption section, selectCustomer-managed key.
The encryption options on the Create job from template page to use              a Google-owned and Google-managed encryption key or customer-managed keys.
Note: The drop-down menuSelect a customer-managed key only shows keys with the regional scope global or the region you selected in theRegional endpoint drop-down menu. In order to minimize Cloud KMS operation latency and improve system availability, we recommend choosing regional keys.

The first time you attempt to run a job with a particular Cloud KMSkey, yourCompute Engine service account orDataflow service account might not have been grantedthe permissions to encrypt and decrypt using that key. In this case, a warningmessage appears to prompt you to grant the permission to your service account.

Prompts to grant permissions to encrypt and decrypt on your              Compute Engine and Dataflow service accounts using a              particular CMEK.

Verify Cloud KMS key usage

You can verify whether your pipeline uses a Cloud KMS key using theGoogle Cloud console or the Google Cloud CLI.

Console

  1. Open the Dataflow monitoring interface.
    Go to the Dataflow Web Interface
  2. To view job details, select your Dataflow job.
  3. In theJob info side panel, to see the key type, check theEncryption type field.

    • For Encryption type: "Google-Managed key"
      Job info side panel listing the details of a Dataflow job.      The type of key your job uses is listed in the Encryption type field.
    • For Encryption type: "Customer-Managed key"
      Job info side panel listing the details of a Dataflow job.      The type of key your job uses is listed in the Encryption type field.

CLI

Run thedescribe command usingthe gcloud CLI:

gcloud dataflow jobs describeJOB_ID

Search for the line that containsserviceKmsKeyName. This informationshows that a Cloud KMS key was used forDataflow pipeline state encryption.

You can verify Cloud KMS key usage for encrypting sources and sinks byusing the Google Cloud console pages and tools of those sources and sinks,including Pub/Sub, Cloud Storage, and BigQuery. You can alsoverify Cloud KMS key usage through viewing yourCloud KMS audit logs.

Disable or destroy the key

If for any reason you may need to disable or destroy the key, you can use theGoogle Cloud console. Both disable and destroy operations cancel the jobsusing that key. This operation is permanent.

If you're using Cloud EKM, disable or destroy the key in your external key manager.

If you're using the Streaming Engine option, taking asnapshot of thejob before disabling the key is recommended.

Remove Dataflow access to the Cloud KMS key

You can remove Dataflow access to the Cloud KMS key byusing the following steps:

  1. RevokeCloud KMS CryptoKey Encrypter/Decrypterrole to theDataflow service account using theGoogle Cloud consoleor thegcloud CLI.
  2. RevokeCloud KMS CryptoKey Encrypter/Decrypterrole to theCompute Engine service account using theGoogle Cloud consoleor thegcloud CLI.
  3. Optionally, you can alsodestroy the key version material to furtherprevent Dataflow and other services from accessing the pipelinestate.

Although you can destroy thekey version material, youcannot delete keys and key rings.Key rings and keys don't have billable costs or quota limitations, so theircontinued existence doesn't affect costs or production limits.

Dataflow jobs periodically validate whether theDataflow service account can successfully use the givenCloud KMS key. If an encrypt or decrypt request fails, theDataflow service halts all data ingestion and processing as soonas possible. Dataflow immediately begins cleaning up the Google Cloud resourcesattached to your job.

Use sources and sinks that are protected with Cloud KMS keys

Dataflow can access Google Cloud sources and sinks that are protected byCloud KMS keys. If you're not creating new objects, you don't need tospecify the Cloud KMS key of those sources and sinks. If yourDataflow pipeline might create new objects in a sink, you mustdefine pipeline parameters. These parameters specify the Cloud KMS keys for thatsink and pass this Cloud KMS key to appropriate I/O connector methods.

For Dataflow pipeline sources and sinks that don't support CMEKmanaged by Cloud KMS, the Dataflow CMEK settings areirrelevant.

Cloud KMS key permissions

When accessing services that are protected with Cloud KMS keys, verifythat you have assigned theCloud KMS CryptoKey Encrypter/Decrypterrole to thatservice. The accounts are of the following form:

  • Cloud Storage:service-{project_number}@gs-project-accounts.iam.gserviceaccount.com
  • BigQuery:bq-{project_number}@bigquery-encryption.iam.gserviceaccount.com
  • Pub/Sub:service-{project_number}@gcp-sa-pubsub.iam.gserviceaccount.com

Cloud Storage

If you want to protect the temporary and staging buckets that youspecified with theTempLocation/temp_location andstagingLocation/staging_location pipeline parameters, seesetting up CMEK-protected Cloud Storage buckets.

BigQuery

Java

Use thewith_kms_key() method on return values fromBigQueryIO.readTableRows(),BigQueryIO.read(),BigQueryIO.writeTableRows(), andBigQueryIO.write().

You can find an example in theApache Beam GitHub repository.

Python

Use thekms_key argument inBigQuerySource andBigQuerySink.

You can find an example in theApache Beam GitHub repository.

Go

BigQuery IOs don't support using the kms key in Go.

Pub/Sub

Dataflow handles access to CMEK-protected topics by using yourtopic CMEK configuration.

To read from and write to CMEK-protected Pub/Sub topics, seePub/Sub instructions for using CMEK.

Audit logging for Cloud KMS key usage

Dataflow enables Cloud KMS to use Cloud Audit Logs forloggingkey operations,such as encrypt and decrypt. Dataflow provides the job ID ascontext to a Cloud KMS caller. This ID lets you track each instancea specific Cloud KMS key is used for a Dataflow job.

Cloud Audit Logs maintainsaudit logs for each Google Cloud Platformproject, folder, and organization. You have several options forviewing yourCloud KMS audit logs.

Cloud KMS writesAdmin Activity audit logs for yourDataflow jobs with CMEK encryption. These logs record operationsthat modify the configuration or metadata of a resource. You can't disable AdminActivity audit logs.

If explicitly enabled, Cloud KMS writesData Access audit logs foryour Dataflow jobs with CMEK encryption. Data Access audit logscontain API calls that read the configuration or metadata of resources. Theselogs also contain user-driven API calls that create, modify, or read user-provided resourcedata. For instructions on enabling some or all of your Data Accessaudit logs, go toConfiguring data access Logs.

Pricing

You can use Cloud KMS encryption keys with Dataflow in allDataflow regionswhere Cloud KMS is available.

This integration does not incur additionalcosts beyond the key operations, which are billed to your Google Cloudproject. Each time the Dataflow service account uses yourCloud KMS key, the operation is billed at the rate ofCloud KMS key operations.

For more information, seeCloud KMS pricing details.

Troubleshooting

Use the suggestions in this section to troubleshoot errors.

Cloud KMS cannot be validated

Your workflow might fail with the following error:

Workflow failed. Causes: Cloud KMS key <key-name> cannot be validated.

To fix this issue, verify that you have passed the full key path. It looks likeprojects/<project-id>/locations/<gcp-region>/keyRings/<key-ring-name>/cryptoKeys/<key-name>. Look for possible typos in the key path.

Cloud KMS key permission denied

Your workflow might fail with the following error:

Workflow failed. Causes: Cloud KMS key Permission 'cloudkms.cryptoKeyVersions.useToEncrypt' denied on resource'projects/<project-id>/locations/<gcp-region>/keyRings/<key-ring-name>/cryptoKeys/<key-name>' (or it may not exist). cannot be validated.

To fix this issue, verify that theproject ID mentioned in the key path is correct.Also, check that you have thepermission to use the key.

Cloud KMS key location doesn't match Dataflow job location

Your workflow might fail with the following error:

Workflow failed. Causes: Cloud KMS key projects/<project-id>/locations/<gcp-region>/keyRings/<key-ring-name>/cryptoKeys/<key-name>can't protect resources for this job. Make sure the region of the KMS key matches the Dataflow region.

To fix this issue, if you're using a regional key, verify that the Cloud KMS key is in the same regionas the Dataflow job.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.