Prerequisites for managed migration Stay organized with collections Save and categorize content based on your preferences.
This page shows you how to set up your Google Cloud project to prepare for aDataproc Metastore managed migration.
Before you begin
Understand howmanaged migration works.
Set up or have access to the following services:
- A Dataproc Metastore configured with theSpannerdatabase type.
ACloud SQL for MySQL database instance configured withPrivateIP. For the Cloud SQL instance, ensure the following:
The Cloud SQL instance is configured with a VPC network that usestherequired subnets.
The Cloud SQL instance uses a database schema that iscompatible with the HiveMetastore version that runs on the Dataproc Metastore service(where it's copying data to).
The Cloud SQL instance contains the appropriate users toestablish connectivity between Datastream and Dataproc Metastoreand Dataproc Metastore and Cloud SQL.
Required Roles
To get the permissions that you need to create a Dataproc Metastore and start a managed migration, ask your administrator to grant you the following IAM roles:
- To grant full access to all Dataproc Metastore resources, including setting IAM permissions:Dataproc Metastore Admin (
roles/metastore.admin) on the Dataproc Metastore user account or service account - To grant full control of Dataproc Metastore resources:Dataproc Metastore Editor (
roles/metastore.editor) on the Dataproc Metastore user account or service account - To grant permission to start a migration:Migration Admin (
roles/metastore.migrationAdmin) on the Dataproc Metastoreservice agent in the service project
For more information about granting roles, seeManage access to projects, folders, and organizations.
You might also be able to get the required permissions throughcustom roles or otherpredefined roles.
Grant additional roles depending on your project settings
Depending on how your project is configured, you might need to add the followingadditional roles. Examples on how to grant these roles to the appropriateaccounts are shown in theprerequisites section later on this page.
- Grant the Network User (
roles/compute.networkUser) role to the Dataproc Metastoreservice agent and the[Google APIs Service Agent] on the service project. - Grant the Network Admin (
roles/compute.networkAdmin) role to the Datastream Service Agent on the host project.
If your Cloud SQL instance is in a different project than the Dataproc Metastore service project:
- Grant the
roles/cloudsql.clientrole and theroles/cloudsql.instanceUserrole to the Dataproc Metastore service agent on the Cloud SQL instance project.
If the Cloud Storage bucket for the Change-Data-Capture pipeline is in a different project than your Dataproc Metastore service project:
- Make sure your Datastream service agent has the required permissions to write to the bucket. Typically these are the
roles/storage.objectViewer,roles/storage.objectCreatorandroles/storage.legacyBucketReaderroles.
Managed migration prerequisites
Dataproc Metastore usesproxies anda change data capture pipeline to facilitate the data transfer.It's important to understand how these work before starting a transfer.
Key terms
- Service Project: A service project is the Google Cloud project where youcreated your Dataproc Metastore service.
- Host Project: A host project is the Google Cloud project that holdsyour Shared VPC networks. One or more service projects can be linkedto your host project to use these shared networks. For more information,seeShared VPC.
- Enable the Datastream API in yourservice project.
Grant the
roles/metastore.migrationAdminrole to the Dataproc MetastoreService Agent in your service project.gcloud projects add-iam-policy-bindingSERVICE_PROJECT --role "roles/metastore.migrationAdmin" --member "serviceAccount:service-SERVICE_PROJECT@gcp-sa-metastore.iam.gserviceaccount.com"
Add the following firewall rules.
To establish a connection between Dataproc Metastore and your privateIP Cloud SQL instance.
A firewall rule to allow traffic from thehealth checkprobe probe to the network loadbalancer of SOCKS5 proxy. For example:
gcloud compute firewall-rules createRULE_NAME --direction=INGRESS --priority=1000 --network=CLOUD_SQL_NETWORK--allow=tcp:1080 --source-ranges=35.191.0.0/16,130.211.0.0/22
Port
1080is where the SOCKS5 proxy server is running.A firewall rule to allow traffic from the load balancer to the SOCKS5proxy MIG. For example:
gcloud compute firewall-rules create
RULE_NAME --direction=INGRESS --priority=1000 --network=CLOUD_SQL_NETWORK--action=ALLOW --rules=all --source-ranges=PROXY_SUBNET_RANGE A firewall rule to allow traffic from the Private Service Connect service attachment to the load balancer. For example:
gcloud compute firewall-rules createRULE_NAME --direction=INGRESS --priority=1000 --network=CLOUD_SQL_NETWORK --allow=tcp:1080 --source-ranges=NAT_SUBNET_RANGE
A firewall rule to allow Datastream to use the
/29CIDR IP rangeto create a private IP connection. For example:gcloud compute firewall-rules createRULE_NAME --direction=INGRESS --priority=1000 --network=CLOUD_SQL_NETWORK --action=ALLOW --rules=all --source-ranges=CIDR_RANGE
(Optional) Add roles to Shared VPC
Follow these steps if you use a Shared VPC.
For more details about a Shared VPC, seeService Project Admins.
Note: If you can't assign these roles at the host project level, you can assign themat the individual subnet levels.Grant the
roles/compute.networkUserrole to the Dataproc MetastoreService Agent and the Google API Service Agent on the host project.gcloud projects add-iam-policy-bindingHOST_PROJECT --role "roles/compute.networkUser" --member "serviceAccount:service-SERVICE_ACCOUNT@gcp-sa-metastore.iam.gserviceaccount.com"gcloud projects add-iam-policy-bindingHOST_PROJECT --role "roles/compute.networkUser" --member "serviceAccount:SERVICE_PROJECT@cloudservices.gserviceaccount.com"
Grant the
roles/compute.networkAdminrole to the Datastream Service Agenton the host project.gcloud projects add-iam-policy-bindingHOST_PROJECT --role "roles/compute.networkAdmin" --member "serviceAccount:service-SERVICE_PROJECT@gcp-sa-datastream.iam.gserviceaccount.com"
If you can't grant theroles/compute.networkAdmin role, create acustom role with the permissions listed inShared VPCprerequisites.
These permissions are required at the start of the migration to establishpeering between the VPC network in the host project with Datastream.
This role can be removed as soon as the migration is started. If youremove the role before the migration is complete, Dataproc Metastorecan't clean up the peering job. In this case, you must clean the job up yourself.
What's next
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.