Customer managed encryption keys (CMEK)

When you use Dataproc, cluster and job data is stored on persistent disks associated with the Compute Engine VMs in your cluster and in a Cloud Storagestaging bucket. By default, this persistent disk and bucket data is encrypted using a Google-generated data encryption key (DEK) andkey encryption key (KEK).

If you want to control and manage the key encryption key (KEK), you can useCustomer-Managed Encryption Keys (CMEK) (Google continues to control the dataencryption key (DEK)). For more information on Google data encryption keys, seeEncryption at Rest.

CMEK is different from Customer-Supplied Encryption Keys (CSEK), which is a feature that lets you specify the contents of the key encryption key. For more information, seeCustomer-supplied encryption keys.

CMEK cluster data encryption

You can use customer-managed encryption keys (CMEK) to encrypt the followingcluster data:

  • Data on persistent disks attached to Dataproc cluster VMs
  • Job argument data submitted to your cluster, such as a query string submittedwith a Spark SQL job
  • Cluster metadata,job driver output,and other data written to your Dataproc clusterstaging bucket
Note: You can also use CMEK withencryption of workflow template job arguments.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. Enable the Dataproc, Cloud Key Management Service, Compute Engine, and Cloud Storage APIs.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    Enable the APIs

  5. Install the Google Cloud CLI.

    Note: If you installed the gcloud CLI previously, make sure you have the latest version by runninggcloud components update.
  6. If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

  7. Toinitialize the gcloud CLI, run the following command:

    gcloudinit
  8. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  9. Verify that billing is enabled for your Google Cloud project.

  10. Enable the Dataproc, Cloud Key Management Service, Compute Engine, and Cloud Storage APIs.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    Enable the APIs

  11. Install the Google Cloud CLI.

    Note: If you installed the gcloud CLI previously, make sure you have the latest version by runninggcloud components update.
  12. If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

  13. Toinitialize the gcloud CLI, run the following command:

    gcloudinit

Create keys

To protect your Dataproc resources with CMEK, you can automate thecreation of keys or create keys manually.

Automated key creation

UseAutokey to automate CMEK provisioningand assignment. Autokey generates key rings and keys on demand whenresources are created. Service agents use the keys in encrypt and decryptoperations. If needed, Autokey creates the agents and grants therequired Identity and Access Management (IAM) (IAM) roles to them.For more information, seeAutokey overview.

Manual key creation

Follow these steps to manually create keys for CMEK encryption ofcluster data:

  1. Create one or more keys using theCloud KMS.The resource name, also called the resource ID of a key, which you use inthe next steps, is constructed as follows:

    projects/PROJECT_ID/locations/REGION/keyRings/KEY_RING_NAME/cryptoKeys/KEY_NAME
    Use theCryptographic Keys page in theGoogle Cloud console to copy a key resource ID to the clipboard.Screenshot showing how to copy the key resource ID from the Cryptographic Keys page.The key (CMEK) must be located in the same location as the encrypted resource.For example, the CMEK used to encrypt a resource in theus-central1 region mustalso be located in theus-central1 region.

  2. To ensure that each of the following service accounts,Compute Engine Service Agent service account,Cloud Storage Service Agent service account, andDataproc Service Agent service account, has the necessary permissions to protect resources by using Cloud KMS keys, ask your administrator to grant each of the following service accounts,Compute Engine Service Agent service account,Cloud Storage Service Agent service account, andDataproc Service Agent service account, theCloud KMS CryptoKey Encrypter/Decrypter (roles/cloudkms.cryptoKeyEncrypterDecrypter) IAM role on your project.

    Example assignment of Cloud KMS CryptoKey Encrypter/Decrypter role tothe Dataproc service Agent service account using Google Cloud CLI:

    gcloud projects add-iam-policy-bindingKMS_PROJECT_ID \--member serviceAccount:service-PROJECT_NUMBER@dataproc-accounts.iam.gserviceaccount.com \--role roles/cloudkms.cryptoKeyEncrypterDecrypter

    Replace the following:

    KMS_PROJECT_ID: the ID of your Google Cloud project thatcontains the Cloud KMS key.

    PROJECT_NUMBER: the project number (not the project ID) of your Google Cloud project that runs Dataproc resources.

  3. If theDataproc Service Agent role is not attached totheDataproc Service Agent service account,add theserviceusage.services.use permission to a customrole attached to the Dataproc Service Agent service account.

Create a cluster with CMEK

Pass the resource ID of your key when you create the Dataproccluster.

gcloud CLI

  • To encrypt cluster persistent disk data using your key, pass the resource ID of your key to the--gce-pd-kms-key flag when you create the cluster.
    gcloud dataproc clusters createCLUSTER_NAME \    --region=REGION \    --gce-pd-kms-key='projects/PROJECT_ID/locations/REGION/keyRings/KEY_RING_NAME/cryptoKeys/KEY_NAME' \    other arguments ...

    You can verify the key setting from thegcloud command-line tool.

    gcloud dataproc clusters describeCLUSTER_NAME \    --region=REGION

    Command output snippet:

    ...configBucket: dataproc- ...  encryptionConfig:gcePdKmsKeyName: projects/project-id/locations/region/keyRings/key-ring-name/cryptoKeys/key-name...
  • To encrypt cluster persistent disk data and job argument data using your key, pass the resource ID of the key to the--kms-key flag when you create the cluster. See [Cluster.EncryptionConfig.kmsKey](/dataproc/docs/reference/rest/v1/ClusterConfig#EncryptionConfig.FIELDS.kms_key) for a list of job types and arguments that are encrypted with the `--kms-key` flag.
    gcloud dataproc clusters createCLUSTER_NAME \    --region=REGION \    --kms-key='projects/PROJECT_ID/locations/REGION/keyRings/KEY_RING_NAME/cryptoKeys/KEY_NAME' \    other arguments ...

    You can verify key settings with the gcloud CLIdataproc clusters describe command. The key resource ID is set ongcePdKmsKeyName andkmsKey to use your key with the encryption of cluster persistent disk and job argument data.

    gcloud dataproc clusters describeCLUSTER_NAME \    --region=REGION

    Command output snippet:

    ...configBucket: dataproc- ...  encryptionConfig:gcePdKmsKeyName: projects/PROJECT_ID/locations/REGION/keyRings/KEY_RING_NAME/cryptoKeys/KEY_NAMEkmsKey: projects/PROJECT_ID/locations/REGION/keyRings/KEY_RING_NAME/cryptoKeys/KEY_NAME...
    You can use either the--gce-pd-kms-key or the--kms-key flag, but not both, to encrypt cluster data using your key.
  • To encrypt cluster metadata, job driver, and other output data written to your Dataproc staging bucket in Cloud Storage:
    gcloud dataproc clusters createCLUSTER_NAME \    --region=REGION \    --bucket=CMEK_BUCKET_NAME \    other arguments ...

    You can also pass CMEK-enabled buckets to the `gcloud dataproc jobs submit` command if your job takes bucket arguments, as shown in the following `cmek-bucket` example:

    gcloud dataproc jobs submit pyspark gs://cmek-bucket/wordcount.py \    --region=region \    --cluster=cluster-name \    -- gs://cmek-bucket/shakespeare.txt gs://cmek-bucket/counts
    • Dataproc doesn't manage customer managed encryption keys on your Cloud Storage bucket.
    • Using a bucket with a customer managed encryption key can slow write times to large files.

REST API

Use CMEK with workflow template data

Dataproc workflow template job argument data,such as the query string of a Spark SQL job, can be encrypted usingCMEK. Follow steps 1, 2, and 3 in this section to use CMEK with yourDataproc workflow template. SeeWorkflowTemplate.EncryptionConfig.kmsKeyfor a list of workflow template job types and arguments that are encrypted using CMEKwhen this feature is enabled.

  1. Create a key using theCloud KMS.The resource name of the key, which you use in the next steps,is constructed as follows:
    projects/project-id/locations/region/keyRings/key-ring-name/cryptoKeys/key-name
    Use theCryptographic Keys page of theGoogle Cloud console to copy a key resource ID to the clipboard.Screenshot showing how to copy the key resource ID from the Cryptographic Keys page.
  2. To enable the Dataproc service accounts to use your key:

    1. Assign the Cloud KMSCryptoKey Encrypter/Decrypter role to theDataproc Service Agent service account.You can use the gcloud CLI to assign the role:

       gcloud projects add-iam-policy-bindingKMS_PROJECT_ID \ --member serviceAccount:service-PROJECT_NUMBER@dataproc-accounts.iam.gserviceaccount.com \ --role roles/cloudkms.cryptoKeyEncrypterDecrypter

      Replace the following:

      KMS_PROJECT_ID: the ID of your Google Cloud project thatruns Cloud KMS. This project can also be the project that runs Dataproc resources.

      PROJECT_NUMBER: the project number (not the project ID) of your Google Cloud project that runs Dataproc resources.

    2. Enable the Cloud KMS API on the project that runs Dataproc resources.

    3. If theDataproc Service Agent role is not attached to theDataproc Service Agent service account, then add theserviceusage.services.use permission to the custom role attached to the Dataproc Service Agent service account. If the Dataproc Service Agent role is attached to the Dataproc Service Agent service account, you can skip this step.

  3. You can use the gcloud CLI or the Dataproc APIto set the key you created in Step 1 on a workflow. Once the key is set on a workflow,all the workflow job arguments and queries are encrypted using the key for any of the job typesand arguments listed inWorkflowTemplate.EncryptionConfig.kmsKey.

    gcloud CLI

    Pass resource ID of your key to the--kms-key flag when you create the workflow template with thegcloud dataproc workflow-templates create command.

    Example:

    gcloud dataproc workflow-templates createmy-template-name \    --region=region \    --kms-key='projects/project-id/locations/region/keyRings/key-ring-name/cryptoKeys/key-name' \    other arguments ...
    You can verify the key setting from thegcloud command-line tool.
    gcloud dataproc workflow-templates describeTEMPLATE_NAME \    --region=REGION
    ...id:my-template-nameencryptionConfig:kmsKey: projects/PROJECT_ID/locations/REGION/keyRings/KEY_RING_NAME/cryptoKeys/KEY_NAME...

    REST API

    UseWorkflowTemplate.EncryptionConfig.kmsKeyas part of aworkflowTemplates.create request.

    You can verify the key setting by issuing aworkflowTemplates.getrequest. The returned JSON contains thekmsKey:

    ..."id": "my-template-name","encryptionConfig": {"kmsKey": "projects/project-id/locations/region/keyRings/key-ring-name/cryptoKeys/key-name"},

Cloud External Key Manager

Cloud External Key Manager (Cloud EKM)lets you protect Dataproc data using keys managed by asupported external key management partner.The steps you follow to use Cloud EKM in Dataproc are the same asthose you use toset up CMEK keys, with the following difference: your key points to aURI for theexternally managed key (seeCloud EKM Overview).

Cloud EKM errors

When you use Cloud EKM, an attempt to create a cluster can faildue to errors associated with inputs, Cloud EKM, the external keymanagement partner system, or communications between Cloud EKM and the external system.If you use the REST API or the Google Cloud console, errors are loggedinCloud Logging. You can examine the failed cluster'serrors from theView Log tab.

If you useCloud Shell tocreate a cluster, errors are displayed in the Cloud Shell terminal and inLogging.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.