Dataproc security configuration Stay organized with collections Save and categorize content based on your preferences.
When you create a Dataproc cluster, you can enableHadoop Secure Mode viaKerberos to provide multi-tenancy via user authentication, isolation, and encryption inside aDataproc cluster.
User Authentication and Other Google Cloud Platform Services.Per-user authentication using Kerberos only applies within the cluster.Interactions with other Google Cloud services, such as Cloud Storage,continue to be authenticated as the service account for the cluster.
Note: Whenfs.defaultFS is set to a Cloud Storage location,HDFS requests don't go through the HDFS NameNode, andauthN andauthZare not performed through Kerberos. Therefore, HDFS doesn't require a Kerberosticket.Enable Hadoop Secure Mode using Kerberos
Enabling Kerberos and Hadoop Secure Mode for a cluster will include theMIT distribution of Kerberosand configureApache Hadoop YARN,HDFS,Hive,Spark, and related componentsto use it for authentication.
Enabling Kerberos creates an on-clusterKey Distribution Center (KDC),that contains service principals and a root principal. The root principal is the accountwith administrator permissions to the on-cluster KDC. It can also containstandard user principals or be connected viacross-realm trust to another KDC that contains the userprincipals.
Create a Kerberos cluster
You can use the Google Cloud CLI, the Dataproc API, or theGoogle Cloud console toenable Kerberos on clusters that useDataprocimage version 1.3and later.
gcloud command
To automatically configure a new Kerberos Dataproc cluster (image version1.3 and later), use thegcloud dataproc clusters createcommand.
gcloud dataproc clusters createcluster-name \ --image-version=2.0 \ --enable-kerberos
Cluster property:Instead of using the--enable-kerberos flag as shown in the preceding example, you can automatically configure Kerberos by passing the--properties "dataproc:kerberos.beta.automatic-config.enable=true" flag to the clusters create command (seeDataproc service properties).
REST API
Kerberos clusters can be created through theClusterConfig.SecurityConfig.KerberosConfigas part of aclusters.create request. You must setenableKerberos totrue.
Console
You can automatically configure Kerberos on a new cluster by selecting "Enable" from the Kerberos and Hadoop Secure Mode section of the Manage security panel on the DataprocCreate a cluster page of the Google Cloud console.
Create a Kerberos cluster with your own root principal password
Set up your Kerberos root principal password and then create a cluster.
Set up your Kerberos root principal password
The Kerberos root principal is the accountwith administrator permissions to the on-cluster KDC. To securely provide thepassword for The Kerberos root principal, users can encrypt it with aKey Management Service (KMS) key, and thenstore it in aGoogle Cloud Storage bucket that theclusterservice accountcan access. The cluster service account must be granted thecloudkms.cryptoKeyDecrypterIAM role.
Grant the Cloud KMS CryptoKey Encrypter/Decrypter role to thecluster service account:
gcloud projects add-iam-policy-bindingproject-id \ --member serviceAccount:project-number-compute@developer.gserviceaccount.com \ --role roles/cloudkms.cryptoKeyDecrypter
Create a key ring:
gcloud kms keyrings createmy-keyring --location global
Create a key in the key ring:
gcloud kms keys createmy-key \ --location global \ --keyringmy-keyring \ --purpose encryption
Encrypt your Kerberos root principal password:
echo "my-password" | \ gcloud kms encrypt \ --location=global \ --keyring=my-keyring \ --key=my-key \ --plaintext-file=- \ --ciphertext-file=kerberos-root-principal-password.encrypted
- Upload the encrypted password to aCloud Storage bucket in your project.
- Example:
gcloud storage cp kerberos-root-principal-password.encrypted gs://my-bucket
- Example:
- Upload the encrypted password to aCloud Storage bucket in your project.
Create the cluster
You can use thegcloud command or the Dataproc API toenable Kerberos on clusters with your own root principal password.
gcloud command
To create a Kerberos Dataproc cluster (image version1.3 and later), use thegcloud dataproc clusters create command.
gcloud dataproc clusters createcluster-name \ --region=region \ --image-version=2.0 \ --kerberos-root-principal-password-uri=gs://my-bucket/kerberos-root-principal-password.encrypted \ --kerberos-kms-key=projects/project-id/locations/global/keyRings/my-keyring/cryptoKeys/my-key
Use a YAML (or JSON) config file. Instead of passingkerberos-*flags to thegcloud command as shown above, you can place kerberos settings in a YAML (or JSON) config file, then reference the configfile to create the kerberos cluster.
- Create a config file (seeSSL Certificates,Additional Kerberos Settings, andCross-realm trust for additional config settings that can be included in the file):
root_principal_password_uri:gs://my-bucket/kerberos-root-principal-password.encryptedkms_key_uri:projects/project-id/locations/global/keyRings/mykeyring/cryptoKeys/my-key
- Use the following
gcloudcommand to create the kerberos cluster:gcloud dataproc clusters createcluster-name \ --region=region \ --kerberos-config-file=local path to config-file \ --image-version=2.0
Security Considerations. Dataproc discardsthe decrypted form of the password after adding the root principal to the KDC.For security purposes, after creating the cluster you may decide to delete thepassword file and the key used to decrypt the secret, and remove the serviceaccount from thekmsKeyDecrypter role. Don't do this if you planon scaling the cluster up, which requires the password file and key, and theservice account role.
REST API
Kerberos clusters can be created through theClusterConfig.SecurityConfig.KerberosConfigas part of aclusters.create request. SetenableKerberos to true and set therootPrincipalPasswordUri andkmsKeyUri fields.
Console
Whencreating a cluster with image version 1.3+, select "Enable" from the Kerberos and Hadoop Secure Mode section of the Manage security panel on the DataprocCreate a cluster page of the Google Cloud console, then complete the security options (discussed in the following sections).
OS login
On-cluster KDC management can be performed with thekadmin commandusing the root Kerberos user principal or usingsudo kadmin.local.Enable OS Loginto control who can run superuser commands.
SSL certificates
As part of enabling Hadoop Secure Mode, Dataproc createsa self-signed certificate to enable cluster SSL encryption.As an alternative, you can provide a certificate for cluster SSL encryption by adding the following settings to theconfiguration filewhen youcreate a kerberos cluster:
ssl:keystore_password_uri: Location in Cloud Storage of theKMS-encrypted file containing the password to the keystore file.ssl:key_password_uri: Location in Cloud Storage of the KMS-encryptedfile containing the password to the key in the keystore file.ssl:keystore_uri: Location in Cloud Storage of the keystore filecontaining the wildcard certificate and the private key used by cluster nodes.ssl:truststore_password_uri: Location in Cloud Storage of theKMS-encrypted file that contains the password to the truststore file.ssl:truststore_uri: Location in Cloud Storage of the trust store filecontaining trusted certificates.
Sample config file:
root_principal_password_uri:gs://my-bucket/kerberos-root-principal-password.encryptedkms_key_uri:projects/project-id/locations/global/keyRings/mykeyring/cryptoKeys/my-keyssl:key_password_uri:gs://bucket/key_password.encryptedkeystore_password_uri:gs://bucket/keystore_password.encryptedkeystore_uri:gs://bucket/keystore.jkstruststore_password_uri:gs://bucket/truststore_password.encryptedtruststore_uri:gs://bucket/truststore.jks
Additional Kerberos Settings
To specify a Kerberos realm,create a kerberos cluster with the followingproperty added in the Kerberosconfiguration file:
realm: The name of the on-cluster Kerberos realm.
If this property is not set, the hostnames' domain (in uppercase) will be the realm.
To specify the master key of the KDC database,create a kerberos cluster with the followingproperty added in the Kerberosconfiguration file:
kdc_db_key_uri: Location in Cloud Storage of the KMS-encrypted file containing the KDC database master key.
If this property is not set, Dataproc will generate the master key.
To specify the ticket granting ticket's maximum lifetime (in hours),create a kerberos cluster with the followingproperty added in the Kerberosconfiguration file:
tgt_lifetime_hours: Max lifetime of the ticket granting ticket in hours.
If this property is not set, Dataproc will set theticket granting ticket's lifetime to 10 hours.
Cross-realm trust
The KDC on the cluster initially contains only the root administratorprincipal and service principals. You can add userprincipals manually or establish a cross-realm trust with an external KDC orActive Directory server that holds user principals.Cloud VPNorCloud Interconnect is recommendedto connect to an on-premise KDC/Active Directory,.
To create a kerberos cluster that supports cross-realm trust,add the following settings to the Kerberosconfiguration file whenyoucreate a kerberos cluster.Encrypt the shared password with KMSand store it in a Cloud Storage bucket that thecluster service account can access.
cross_realm_trust:admin_server: hostname or address of the remote admin servercross_realm_trust:kdc: hostname or address of the remote KDCcross_realm_trust:realm: name of the remote realm to be trustedcross_realm_trust:shared_password_uri: Location in Cloud Storage of the KMS-encrypted shared password
Sample config file:
root_principal_password_uri:gs://my-bucket/kerberos-root-principal-password.encryptedkms_key_uri:projects/project-id/locations/global/keyRings/mykeyring/cryptoKeys/my-keycross_realm_trust:admin_server:admin.remote.realmkdc:kdc.remote.realmrealm:REMOTE.REALMshared_password_uri:gs://bucket/shared_password.encrypted
To enable cross-realm trust to a remote KDC, follow these steps:
Add the following in the
/etc/krb5.conffile in the remote KDC:[realms]DATAPROC.REALM={kdc=MASTER-NAME-OR-ADDRESSadmin_server=MASTER-NAME-OR-ADDRESS}
Create the trust user:
kadmin -q "addprinc krbtgt/DATAPROC.REALM@REMOTE.REALM"
When prompted, enter the user's password. The password should matchthe contents of the encrypted shared password file
To enable cross-realm trust with Active Directory, run the following commandsin a PowerShell as Administrator:
Create a KDC definition in Active Directory.
ksetup /addkdcDATAPROC.REALMDATAPROC-CLUSTER-MASTER-NAME-OR-ADDRESS
Create trust in Active Directory.
The password should match the contents of the encrypted shared password file.netdom trustDATAPROC.REALM /DomainAD.REALM /add /realm /passwordt:TRUST-PASSWORD
dataproc principal
When you submit a job using the Dataprocjobs APIto a Dataproc kerberos cluster, it runs as thedataprockerberos principal from the cluster's kerberos realm.
Multi-tenancy is supported within a Dataproc kerberos cluster if yousubmit a job directly,to the cluster, for example using SSH. However, if the job reads or writes toother Google Cloud services, such as Cloud Storage, the job acts as thecluster's service account.
Default and custom cluster properties
Hadoop secure mode is configured withproperties in config files.Dataproc sets default values for these properties.
You can override the default properties when you create the cluster with thegcloud dataproc clusters create--properties flag or by calling theclusters.create API and settingSoftwareConfig properties (seecluster properties examples).
High-Availability mode
InHigh Availability (HA) mode,a kerberos cluster will have 3 KDCs: one on each master. The KDC running onthe "first" master ($CLUSTER_NAME-m-0) will be the master KDC and also serve as the admin server.The master KDC's database will be synced to the two replica KDCs at 5 minute intervalsthrough a cron job, and the 3 KDCs will serve read traffic.
Kerberos does not natively support real-time replication or automatic failover ifthe master KDC is down. To perform a manual failover:
- On all KDC machines, in
/etc/krb5.conf, changeadmin_serverto the new master'sFQDN (Fully Qualified Domain Name). Remove the old master from the KDC list. - On the new master KDC, set up a cron job to propagate the database.
- On the new master KDC, restart the admin_server process (
krb5-admin-server). - On all KDC machines, restart the KDC process (
krb5-kdc).
Network configuration
To make sure that worker nodes can talk to the KDC and Kerberos admin Server runningon the master, verify that theVPC firewall rules allow ingressTCP and UDP traffic on port 88 and ingress TCP traffic on port 749 on the master.In High-Availability mode, make sure that VPC firewall rules allow ingressTCP traffic on port 754 on the masters to allow the propagation of changes madeto the master KDC. Kerberos requires reverse DNS to be properly set up.Also, for host-based service principal canonicalization, make surereverse DNS is properly set up for the cluster's network.
What's next
- See theMIT Kerberos documentation.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.