Manage compute profiles Stay organized with collections Save and categorize content based on your preferences.
Acompute profile specifies how and where a pipeline is executed. Itencapsulates any information required to set up and delete the physicalexecution environment of a pipeline. A compute profile specifies aprovisioner name and the configuration settings for that provisioner.
Each compute profile has a scope:system oruser. You can use system computeprofiles for any namespaces under it. User compute profiles exist within anamespace, and only pipelines in that namespace can use user compute profiles.Compute profiles can be assigned to batch pipelines. When a compute profile isassigned to a pipeline, the provisioner specified in the profile will be used tocreate a cluster where the pipeline will run.
For example, an administrator might decide to create small, medium, and largecompute profiles. They configure each profile with the Google Cloudcredentials required to create and delete Dataproc clusters inthe company's Google Cloud account.
- The small profile is configured to create a 5-node cluster.
- The medium profile is configured to create a 20-node cluster.
- The large profile is configured to create a 50-node cluster.
The administrator assigns the small profile to pipelines that are scheduled torun every hour on small amounts of data. They assign the large profile topipelines that are scheduled to run every day on a large amount of data.
Default compute profile
By default, Cloud Data Fusion uses Autoscale as the compute profile.Estimating the appropriate number of cluster workers (nodes) for a workload isdifficult, and a single cluster size for an entire pipeline is often not ideal.Dataproc Autoscaling provides a mechanism for automating clusterresource management and enables cluster worker VM autoscaling. For moreinformation, seeAutoscaling.
On theCompute config page, where you can see a list of profiles, there isaTotal cores column, which has the maximum vCPUs that the profile can scaleup to, such asUp to 84.
System and user compute profiles
A compute profile indicates which provisioner to use when creating a clusterand specifies the cluster configuration. They also specify the provisionerconfiguration that should be used when creating a cluster.
- To create asystem compute profile, go to theSystem admin page inCloud Data Fusion Studio. This page lists all system compute profilesand lets you create new system compute profiles.
- To create auser compute profile, go to theNamespaceadministration page in Cloud Data Fusion Studio, and then select thenamespace to create the profile in. Then, you can create a profile thatexists only within that namespace.
Compute profile assignment
You can assign compute profiles to batch pipelines in the following ways:
- Assign a default profile for the Cloud Data Fusion instance.
- Assign a default profile for a specific namespace.
- Assign a profile to a batch pipeline to use for runs that are startedmanually.
- Assign a profile to a pipeline schedule.
If a profile is set in the schedule that triggers a run, or if you manually runa pipeline and there's a profile assigned to that pipeline,Cloud Data Fusion uses that compute profile.
If no profile is set, Cloud Data Fusion uses the default profile for thenamespace. If no default profile is set for the namespace,
Cloud Data Fusion uses the system default profile. If no system default isset, the built-in profile is used.
Assign a default compute profile
To assign default profiles to a Cloud Data Fusion namespace or instance, goto the Cloud Data Fusion Studio and clickSystem admin>Configuration>System compute profiles. To select thedefault, click the star by the profilename.
Optional: use the Preferences Microservices to set default profiles
- To set the default profile, set a preference on the Cloud Data Fusioninstance with key system.profile.name and value
system:<profile-name>. - To set the default profile for a namespace, set a preference on thechosen namespace with key
system.profile.nameand value<scope>:<profile-name>.
Assign a compute profile for manual runs
To assign a profile to use for manual pipeline runs, follow these steps:
- Navigate to the pipeline detail page.
- ClickConfigure > Compute config.
- Select a profile and clickSave. The selected profile is usedwhenever the pipeline runs manually.
Alternatively, you can use the Preferences Microservices to set the profile formanual runs by setting preference on theDataPipelineWorkflow entity with keysystem.profile.name and value<scope>:<profile-name>.
Assign a compute profile to a schedule
Any time you create a schedule for a pipeline, you can assign a profile to it.Whenever the schedule triggers a pipeline run, it will use that profile for therun. This is true for time schedules and schedules that other pipelinestrigger.
Override a compute profile configuration
When a profile is created, each configuration setting can be made immutable bylocking it. However, if configuration settings are not locked, they can beoverridden at runtime. To override profile configuration, follow these steps:
- From the Pipeline List page, select the deployed pipeline you want to run.
- From the Pipeline Details page, clickConfigure.
- Choose a compute profile and clickCustomize.
- Change any settings and clickSave.
You can use runtime arguments and schedule properties to modify the clustersize and other settings.
- To override the profile used, set a runtime argument with the key
system.profile.nameand value<scope>:<profile-name>. - To override a profile property, set a runtime argument with key
system.profile.properties.<property-name>and value equal to the value forthat property.
For example, to override thenumWorkerssetting to a value of10, set apreference or runtime argument with the keysystem.profile.properties.numWorkers and the value10.
What's next
- Learn more aboutprovisioners in Cloud Data Fusion.
- Learn more aboutDataproc cluster configuration.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.