Dataproc security configuration

When you create a Dataproc cluster, you can enableHadoop Secure Mode viaKerberos to provide multi-tenancy via user authentication, isolation, and encryption inside aDataproc cluster.

User Authentication and Other Google Cloud Platform Services.Per-user authentication using Kerberos only applies within the cluster.Interactions with other Google Cloud services, such as Cloud Storage,continue to be authenticated as the service account for the cluster.

Note: Whenfs.defaultFS is set to a Cloud Storage location,HDFS requests don't go through the HDFS NameNode, andauthN andauthZare not performed through Kerberos. Therefore, HDFS doesn't require a Kerberosticket.

Enable Hadoop Secure Mode using Kerberos

Enabling Kerberos and Hadoop Secure Mode for a cluster will include theMIT distribution of Kerberosand configureApache Hadoop YARN,HDFS,Hive,Spark, and related componentsto use it for authentication.

Enabling Kerberos creates an on-clusterKey Distribution Center (KDC),that contains service principals and a root principal. The root principal is the accountwith administrator permissions to the on-cluster KDC. It can also containstandard user principals or be connected viacross-realm trust to another KDC that contains the userprincipals.

Create a Kerberos cluster

You can use the Google Cloud CLI, the Dataproc API, or theGoogle Cloud console toenable Kerberos on clusters that useDataprocimage version 1.3and later.

gcloud command

To automatically configure a new Kerberos Dataproc cluster (image version1.3 and later), use thegcloud dataproc clusters createcommand.

gcloud dataproc clusters createcluster-name \    --image-version=2.0 \    --enable-kerberos

Cluster property:Instead of using the--enable-kerberos flag as shown in the preceding example, you can automatically configure Kerberos by passing the--properties "dataproc:kerberos.beta.automatic-config.enable=true" flag to the clusters create command (seeDataproc service properties).

REST API

Kerberos clusters can be created through theClusterConfig.SecurityConfig.KerberosConfigas part of aclusters.create request. You must setenableKerberos totrue.

Console

You can automatically configure Kerberos on a new cluster by selecting "Enable" from the Kerberos and Hadoop Secure Mode section of the Manage security panel on the DataprocCreate a cluster page of the Google Cloud console.

Create a Kerberos cluster with your own root principal password

Set up your Kerberos root principal password and then create a cluster.

Set up your Kerberos root principal password

The Kerberos root principal is the accountwith administrator permissions to the on-cluster KDC. To securely provide thepassword for The Kerberos root principal, users can encrypt it with aKey Management Service (KMS) key, and thenstore it in aGoogle Cloud Storage bucket that theclusterservice accountcan access. The cluster service account must be granted thecloudkms.cryptoKeyDecrypterIAM role.

  1. Grant the Cloud KMS CryptoKey Encrypter/Decrypter role to thecluster service account:

    gcloud projects add-iam-policy-bindingproject-id \    --member serviceAccount:project-number-compute@developer.gserviceaccount.com \    --role roles/cloudkms.cryptoKeyDecrypter

  2. Create a key ring:

    gcloud kms keyrings createmy-keyring --location global

  3. Create a key in the key ring:

    gcloud kms keys createmy-key \    --location global \    --keyringmy-keyring \    --purpose encryption

  4. Encrypt your Kerberos root principal password:

    echo "my-password" | \  gcloud kms encrypt \    --location=global \    --keyring=my-keyring \    --key=my-key \    --plaintext-file=- \    --ciphertext-file=kerberos-root-principal-password.encrypted

    1. Upload the encrypted password to aCloud Storage bucket in your project.
      1. Example:
        gcloud storage cp kerberos-root-principal-password.encrypted gs://my-bucket

Create the cluster

You can use thegcloud command or the Dataproc API toenable Kerberos on clusters with your own root principal password.

gcloud command

To create a Kerberos Dataproc cluster (image version1.3 and later), use thegcloud dataproc clusters create command.

gcloud dataproc clusters createcluster-name \    --region=region \    --image-version=2.0 \    --kerberos-root-principal-password-uri=gs://my-bucket/kerberos-root-principal-password.encrypted \    --kerberos-kms-key=projects/project-id/locations/global/keyRings/my-keyring/cryptoKeys/my-key

Use a YAML (or JSON) config file. Instead of passingkerberos-*flags to thegcloud command as shown above, you can place kerberos settings in a YAML (or JSON) config file, then reference the configfile to create the kerberos cluster.

  1. Create a config file (seeSSL Certificates,Additional Kerberos Settings, andCross-realm trust for additional config settings that can be included in the file):
    root_principal_password_uri:gs://my-bucket/kerberos-root-principal-password.encryptedkms_key_uri:projects/project-id/locations/global/keyRings/mykeyring/cryptoKeys/my-key
  2. Use the followinggcloud command to create the kerberos cluster:
    gcloud dataproc clusters createcluster-name \    --region=region \    --kerberos-config-file=local path to config-file \    --image-version=2.0

Security Considerations. Dataproc discardsthe decrypted form of the password after adding the root principal to the KDC.For security purposes, after creating the cluster you may decide to delete thepassword file and the key used to decrypt the secret, and remove the serviceaccount from thekmsKeyDecrypter role. Don't do this if you planon scaling the cluster up, which requires the password file and key, and theservice account role.

REST API

Kerberos clusters can be created through theClusterConfig.SecurityConfig.KerberosConfigas part of aclusters.create request. SetenableKerberos to true and set therootPrincipalPasswordUri andkmsKeyUri fields.

Console

Whencreating a cluster with image version 1.3+, select "Enable" from the Kerberos and Hadoop Secure Mode section of the Manage security panel on the DataprocCreate a cluster page of the Google Cloud console, then complete the security options (discussed in the following sections).

OS login

On-cluster KDC management can be performed with thekadmin commandusing the root Kerberos user principal or usingsudo kadmin.local.Enable OS Loginto control who can run superuser commands.

SSL certificates

As part of enabling Hadoop Secure Mode, Dataproc createsa self-signed certificate to enable cluster SSL encryption.As an alternative, you can provide a certificate for cluster SSL encryption by adding the following settings to theconfiguration filewhen youcreate a kerberos cluster:

  • ssl:keystore_password_uri: Location in Cloud Storage of theKMS-encrypted file containing the password to the keystore file.
  • ssl:key_password_uri: Location in Cloud Storage of the KMS-encryptedfile containing the password to the key in the keystore file.
  • ssl:keystore_uri: Location in Cloud Storage of the keystore filecontaining the wildcard certificate and the private key used by cluster nodes.
  • ssl:truststore_password_uri: Location in Cloud Storage of theKMS-encrypted file that contains the password to the truststore file.
  • ssl:truststore_uri: Location in Cloud Storage of the trust store filecontaining trusted certificates.

Sample config file:

root_principal_password_uri:gs://my-bucket/kerberos-root-principal-password.encryptedkms_key_uri:projects/project-id/locations/global/keyRings/mykeyring/cryptoKeys/my-keyssl:key_password_uri:gs://bucket/key_password.encryptedkeystore_password_uri:gs://bucket/keystore_password.encryptedkeystore_uri:gs://bucket/keystore.jkstruststore_password_uri:gs://bucket/truststore_password.encryptedtruststore_uri:gs://bucket/truststore.jks

Additional Kerberos Settings

To specify a Kerberos realm,create a kerberos cluster with the followingproperty added in the Kerberosconfiguration file:

  • realm: The name of the on-cluster Kerberos realm.

If this property is not set, the hostnames' domain (in uppercase) will be the realm.

To specify the master key of the KDC database,create a kerberos cluster with the followingproperty added in the Kerberosconfiguration file:

  • kdc_db_key_uri: Location in Cloud Storage of the KMS-encrypted file containing the KDC database master key.

If this property is not set, Dataproc will generate the master key.

To specify the ticket granting ticket's maximum lifetime (in hours),create a kerberos cluster with the followingproperty added in the Kerberosconfiguration file:

  • tgt_lifetime_hours: Max lifetime of the ticket granting ticket in hours.

If this property is not set, Dataproc will set theticket granting ticket's lifetime to 10 hours.

Cross-realm trust

The KDC on the cluster initially contains only the root administratorprincipal and service principals. You can add userprincipals manually or establish a cross-realm trust with an external KDC orActive Directory server that holds user principals.Cloud VPNorCloud Interconnect is recommendedto connect to an on-premise KDC/Active Directory,.

To create a kerberos cluster that supports cross-realm trust,add the following settings to the Kerberosconfiguration file whenyoucreate a kerberos cluster.Encrypt the shared password with KMSand store it in a Cloud Storage bucket that thecluster service account can access.

  • cross_realm_trust:admin_server: hostname or address of the remote admin server
  • cross_realm_trust:kdc: hostname or address of the remote KDC
  • cross_realm_trust:realm: name of the remote realm to be trusted
  • cross_realm_trust:shared_password_uri: Location in Cloud Storage of the KMS-encrypted shared password

Sample config file:

root_principal_password_uri:gs://my-bucket/kerberos-root-principal-password.encryptedkms_key_uri:projects/project-id/locations/global/keyRings/mykeyring/cryptoKeys/my-keycross_realm_trust:admin_server:admin.remote.realmkdc:kdc.remote.realmrealm:REMOTE.REALMshared_password_uri:gs://bucket/shared_password.encrypted

To enable cross-realm trust to a remote KDC, follow these steps:

  1. Add the following in the/etc/krb5.conf file in the remote KDC:

    [realms]DATAPROC.REALM={kdc=MASTER-NAME-OR-ADDRESSadmin_server=MASTER-NAME-OR-ADDRESS}

  2. Create the trust user:

    kadmin -q "addprinc krbtgt/DATAPROC.REALM@REMOTE.REALM"

  3. When prompted, enter the user's password. The password should matchthe contents of the encrypted shared password file

To enable cross-realm trust with Active Directory, run the following commandsin a PowerShell as Administrator:

  1. Create a KDC definition in Active Directory.

    ksetup /addkdcDATAPROC.REALMDATAPROC-CLUSTER-MASTER-NAME-OR-ADDRESS

  2. Create trust in Active Directory.

    netdom trustDATAPROC.REALM /DomainAD.REALM /add /realm /passwordt:TRUST-PASSWORD
    The password should match the contents of the encrypted shared password file.

dataproc principal

When you submit a job using the Dataprocjobs APIto a Dataproc kerberos cluster, it runs as thedataprockerberos principal from the cluster's kerberos realm.

Multi-tenancy is supported within a Dataproc kerberos cluster if yousubmit a job directly,to the cluster, for example using SSH. However, if the job reads or writes toother Google Cloud services, such as Cloud Storage, the job acts as thecluster's service account.

Default and custom cluster properties

Hadoop secure mode is configured withproperties in config files.Dataproc sets default values for these properties.

You can override the default properties when you create the cluster with thegcloud dataproc clusters create--properties flag or by calling theclusters.create API and settingSoftwareConfig properties (seecluster properties examples).

High-Availability mode

InHigh Availability (HA) mode,a kerberos cluster will have 3 KDCs: one on each master. The KDC running onthe "first" master ($CLUSTER_NAME-m-0) will be the master KDC and also serve as the admin server.The master KDC's database will be synced to the two replica KDCs at 5 minute intervalsthrough a cron job, and the 3 KDCs will serve read traffic.

Kerberos does not natively support real-time replication or automatic failover ifthe master KDC is down. To perform a manual failover:

  1. On all KDC machines, in/etc/krb5.conf, changeadmin_server to the new master'sFQDN (Fully Qualified Domain Name). Remove the old master from the KDC list.
  2. On the new master KDC, set up a cron job to propagate the database.
  3. On the new master KDC, restart the admin_server process (krb5-admin-server).
  4. On all KDC machines, restart the KDC process (krb5-kdc).

Network configuration

To make sure that worker nodes can talk to the KDC and Kerberos admin Server runningon the master, verify that theVPC firewall rules allow ingressTCP and UDP traffic on port 88 and ingress TCP traffic on port 749 on the master.In High-Availability mode, make sure that VPC firewall rules allow ingressTCP traffic on port 754 on the masters to allow the propagation of changes madeto the master KDC. Kerberos requires reverse DNS to be properly set up.Also, for host-based service principal canonicalization, make surereverse DNS is properly set up for the cluster's network.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.