Access control with IAM

This page describes access control options in Cloud Data Fusion.

You can control access to resources in Cloud Data Fusion in the followingways:

For information about the architecture and resources involved inCloud Data Fusion access control, seeNetworking.For information about granting roles and permissions, seeManage access to projects, folders, and organizations.

About IAM in Cloud Data Fusion

You control access to Cloud Data Fusion features by grantingIAM roles and permissions toservice accounts and other principals in your Google Cloud project.

To grant fine-grained access to user accounts so that they can use theCloud Data Fusion web interface, useRBAC.

Key Point: You control access for Cloud Data Fusion at theproject level.For example, you can grant access to all Cloud Data Fusion resources withina project to a group of developers.

By default, Cloud Data Fusion uses the following service accounts:

Cloud Data Fusion Service Account

The Cloud Data Fusion Service Account is a Google-managed service agentthat can access customer resources at pipeline design time. This service agentis automatically added to a project when you enable theCloud Data Fusion API. It's used for all instances in your project.

The service agent has the following responsibilities:

  • Communicating with other services, such as Cloud Storage,BigQuery, or Datastream during pipeline design.

  • Enabling execution by provisioning Dataproc clusters andsubmitting pipeline jobs.

Roles for the Cloud Data Fusion Service Account

By default, the Cloud Data Fusion service account has only theCloud Data Fusion API Service Agent role(roles/datafusion.serviceAgent).

The principal name for this service agent isservice-CUSTOMER_PROJECT_NUMBER@gcp-sa-datafusion.iam.gserviceaccount.com.

Note: To prevent access control issues, we recommend keeping the default role and permissions on the Cloud Data Fusion Service Account. You can add roles based on your use case.

The following default resources are associated with the Cloud Data Fusion APIService Agent role.

RoleResourcePermissions
Cloud Data Fusion API Service AgentAssociated services:
  • BigQuery
  • Bigtable
  • Compute Engine
  • Dataproc
  • Cloud DNS
  • Firebase
  • Cloud Monitoring
  • Network Connectivity
  • Network Security Integration
  • Network Services API
  • Organization Policy
  • Recommender API
  • Cloud Resource Manager API
  • Service Networking
  • Service Usage
  • Spanner
  • Cloud Storage
  • Cloud Service Mesh
See theCloud Data Fusion API Service Agent permissions.

Compute Engine default service account or custom service account

The Compute Engine service account is the default account thatCloud Data Fusion uses to deploy and run jobs that access otherGoogle Cloud resources. By default, it attaches to aDataproc cluster VM to let Cloud Data Fusion accessDataproc resources during a pipeline run.

You can choose a custom service account to attach to the Dataproccluster when creating a Cloud Data Fusion instance or by creating newCompute Profiles in the Cloud Data Fusion web interface.

For more information, seeService accounts in Cloud Data Fusion.

Roles for the Compute Engine service account

By default, to access resources (such as sources and sinks) when you run apipeline, Cloud Data Fusion uses theCompute Engine default service account.

Caution: If your instance uses theCompute Engine default service account, don't remove roles. Removing them might cause problems with other Google Cloud services.

You can set up auser-managed custom service account forCloud Data Fusion instances and grant a role to this account. Afterwards,you can choose this service account when creating new instances.

Note: If you launch Dataproc clusters in a different Google Cloud project,grant the roles in the project where Dataproc is running. By default, you grant them in the project containing the Cloud Data Fusion instance.

Cloud Data Fusion Runner role

In the project containing the Cloud Data Fusion instance, for both defaultand user-managed custom service accounts, grant the Cloud Data Fusion Runnerrole (datafusion.runner).

RoleDescriptionPermission
Data Fusion Runner(datafusion.runner)Lets the Compute Engine service account communicate with Cloud Data Fusion services in thetenant projectdatafusion.instances.runtime

Service Account User role

On the default or user-managed service account in the project whereDataproc clusters are launched when you run pipelines, grant theCloud Data Fusion Service Account the Service Account User role(roles/iam.serviceAccountUser).

For more information, seeGrant service account permission.

Dataproc Worker role

To run the jobs on Dataproc clusters, grant the Dataproc Workerrole (roles/dataproc.worker) to the default or user-managed serviceaccounts used by your Cloud Data Fusion pipelines.

Roles for users

To trigger any operation in Cloud Data Fusion, you (the principal) musthave enough permissions. Individual permissions are grouped into roles, and yougrant roles to that principal.

Note: The rest of this page describes managing permissions using IAM for access control. To use RBAC, seeRBAC roles and permissions.

If RBAC isn't enabled, or if you're using a Cloud Data Fusion edition thatdoesn't support RBAC, users with any Cloud Data Fusion IAMrole have full access to the Cloud Data Fusion webinterface. The Admin role only allows users to manage the instance, such asCreate,Update,Upgrade, andDelete operations.

Grant the following roles to principals, depending on the permissions theyneed in Cloud Data Fusion.

RoleDescriptionPermissions
Cloud Data Fusion Admin (roles/datafusion.admin)All viewer permissions, plus permissions to create, update, and delete Cloud Data Fusion instances.
  • datafusion.instances.get
  • datafusion.instances.list
  • datafusion.instances.create
  • datafusion.instances.delete
  • datafusion.instances.update
  • datafusion.operations.get
  • datafusion.operations.list
  • datafusion.operations.cancel
  • resourcemanager.projects.get
  • resourcemanager.projects.list
Cloud Data Fusion Viewer (roles/datafusion.viewer)
  • Can view the project's Cloud Data Fusion instances in the Google Cloud console.
  • Cannot create, update, or delete Cloud Data Fusion instances.
  • datafusion.instances.get
  • datafusion.instances.list
  • datafusion.operations.get
  • datafusion.operations.list
  • resourcemanager.projects.get
  • resourcemanager.projects.list

Access resources in another project at design time

This section describes access control on resources that are located in adifferent Google Cloud project than your Cloud Data Fusion instance at designtime.

When you design pipelines in the Cloud Data Fusion webinterface, you might use functions, such as Wrangler or Preview, whichaccess resources in other projects.

The following sections describe how you determine the service account in yourenvironment and then give the appropriate permissions.

Determine the service account of your environment

The service account name is Cloud Data Fusion Service Account and the principalfor this service agent isservice-CUSTOMER_PROJECT_NUMBER@gcp-sa-datafusion.iam.gserviceaccount.com.

Give permission to access resources in another project

To grant the roles that give permission to access various resources, followthese steps:

  1. In the project where the target resource exists, add theCloud Data Fusion Service Account (service-CUSTOMER_PROJECT_NUMBER@gcp-sa-datafusion.iam.gserviceaccount.com) as a principal.
  2. Grant roles to the Cloud Data Fusion Service Account on the targetresource in the project where the target resource exists.

After you grant the roles, you can access resources in a different project atdesign time in the same way that you access resources in the project where yourinstance is located.

Access resources in another project at execution time

This section describes access control on resources that are located in adifferent Google Cloud project than your Cloud Data Fusion instance atexecution time.

At execution time, you execute the pipeline on a Dataproccluster, which may access resources in other projects. By default, theDataproc cluster itself is launched in the same project asthe Cloud Data Fusion instance, but you can use clusters in anotherproject.

To access the resources in other Google Cloud projects, follow thesesteps:

  1. Determine the service account for your project.
  2. In the project where the resource is, grant IAM rolesto the Compute Engine default service account to give it access toresources in another project.

Determine the Compute Engine service account

For more information about the Compute Engine service account, seeAbout IAM in Cloud Data Fusion.

Grant IAM access resources in another project

The Compute Engine default service account requires permissions toaccess resources in another project. These roles and permissions can bedifferent depending on the resource you want to access.

To access the resources, follow these steps:

  1. Grant roles and permissions, specifying your Compute Engine serviceaccount as a principal in the project where the target resource exists.
  2. Add appropriate roles to access the resource.
Note: For more information, see the relateduse case for running Dataproc clusters in another project. This use case has a service account that's defined in the other project and has access to BigQuery resources in that project. You can elaborate on the use case for other scenarios.

Cloud Data Fusion API permissions

The following permissions are required to execute theCloud Data Fusion API.

API callPermission
instances.createdatafusion.instances.create
instances.deletedatafusion.instances.delete
instances.listdatafusion.instances.list
instances.getdatafusion.instances.get
instances.updatedatafusion.instances.update
operations.canceldatafusion.operations.cancel
operations.listdatafusion.operations.list
operations.getdatafusion.operations.get

Permissions for common tasks

Common tasks in Cloud Data Fusion require the following permissions:

TaskPermissions
Accessing the Cloud Data Fusion web interfacedatafusion.instances.get
Accessing the Cloud Data FusionInstances page in the Google Cloud consoledatafusion.instances.list
Accessing theDetails page for an instancedatafusion.instances.get
Creating a new instancedatafusion.instances.create
Updating labels and advanced options to customize an instancedatafusion.instances.update
Deleting an instancedatafusion.instances.delete

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.