Manage compute profiles

Acompute profile specifies how and where a pipeline is executed. Itencapsulates any information required to set up and delete the physicalexecution environment of a pipeline. A compute profile specifies aprovisioner name and the configuration settings for that provisioner.

Each compute profile has a scope:system oruser. You can use system computeprofiles for any namespaces under it. User compute profiles exist within anamespace, and only pipelines in that namespace can use user compute profiles.Compute profiles can be assigned to batch pipelines. When a compute profile isassigned to a pipeline, the provisioner specified in the profile will be used tocreate a cluster where the pipeline will run.

For example, an administrator might decide to create small, medium, and largecompute profiles. They configure each profile with the Google Cloudcredentials required to create and delete Dataproc clusters inthe company's Google Cloud account.

The small profile is configured to create a 5-node cluster.
The medium profile is configured to create a 20-node cluster.
The large profile is configured to create a 50-node cluster.

The administrator assigns the small profile to pipelines that are scheduled torun every hour on small amounts of data. They assign the large profile topipelines that are scheduled to run every day on a large amount of data.

Default compute profile

By default, Cloud Data Fusion uses Autoscale as the compute profile.Estimating the appropriate number of cluster workers (nodes) for a workload isdifficult, and a single cluster size for an entire pipeline is often not ideal.Dataproc Autoscaling provides a mechanism for automating clusterresource management and enables cluster worker VM autoscaling. For moreinformation, seeAutoscaling.

On theCompute config page, where you can see a list of profiles, there isaTotal cores column, which has the maximum vCPUs that the profile can scaleup to, such asUp to 84.

Note: Autoscaling can increase costs. For example, it's not recommended for real-time pipelines or replication jobs because clusters only scale up and there might be increased costs from the additional clusters.

System and user compute profiles

A compute profile indicates which provisioner to use when creating a clusterand specifies the cluster configuration. They also specify the provisionerconfiguration that should be used when creating a cluster.

To create asystem compute profile, go to theSystem admin page inCloud Data Fusion Studio. This page lists all system compute profilesand lets you create new system compute profiles.
To create auser compute profile, go to theNamespaceadministration page in Cloud Data Fusion Studio, and then select thenamespace to create the profile in. Then, you can create a profile thatexists only within that namespace.

Compute profile assignment

You can assign compute profiles to batch pipelines in the following ways:

Assign a default profile for the Cloud Data Fusion instance.
Assign a default profile for a specific namespace.
Assign a profile to a batch pipeline to use for runs that are startedmanually.
Assign a profile to a pipeline schedule.

If a profile is set in the schedule that triggers a run, or if you manually runa pipeline and there's a profile assigned to that pipeline,Cloud Data Fusion uses that compute profile.

If no profile is set, Cloud Data Fusion uses the default profile for thenamespace. If no default profile is set for the namespace,

Cloud Data Fusion uses the system default profile. If no system default isset, the built-in profile is used.

Assign a default compute profile

To assign default profiles to a Cloud Data Fusion namespace or instance, goto the Cloud Data Fusion Studio and clickSystem admin>Configuration>System compute profiles. To select thedefault, click the star by the profilename.

Optional: use the Preferences Microservices to set default profiles

To set the default profile, set a preference on the Cloud Data Fusioninstance with key system.profile.name and valuesystem:<profile-name>.
To set the default profile for a namespace, set a preference on thechosen namespace with keysystem.profile.name and value<scope>:<profile-name>.

Assign a compute profile for manual runs

To assign a profile to use for manual pipeline runs, follow these steps:

Navigate to the pipeline detail page.
ClickConfigure > Compute config.
Select a profile and clickSave. The selected profile is usedwhenever the pipeline runs manually.

Alternatively, you can use the Preferences Microservices to set the profile formanual runs by setting preference on theDataPipelineWorkflow entity with keysystem.profile.name and value<scope>:<profile-name>.

Assign a compute profile to a schedule

Any time you create a schedule for a pipeline, you can assign a profile to it.Whenever the schedule triggers a pipeline run, it will use that profile for therun. This is true for time schedules and schedules that other pipelinestrigger.

Override a compute profile configuration

When a profile is created, each configuration setting can be made immutable bylocking it. However, if configuration settings are not locked, they can beoverridden at runtime. To override profile configuration, follow these steps:

From the Pipeline List page, select the deployed pipeline you want to run.
From the Pipeline Details page, clickConfigure.
Choose a compute profile and clickCustomize.
Change any settings and clickSave.

You can use runtime arguments and schedule properties to modify the clustersize and other settings.

To override the profile used, set a runtime argument with the keysystem.profile.nameand value<scope>:<profile-name>.
To override a profile property, set a runtime argument with keysystem.profile.properties.<property-name> and value equal to the value forthat property.

For example, to override thenumWorkerssetting to a value of10, set apreference or runtime argument with the keysystem.profile.properties.numWorkers and the value10.

What's next

Learn more about provisioners in Cloud Data Fusion.
Learn more aboutDataproc cluster configuration.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.

Movatterモバイル変換

Manage compute profiles Stay organized with collections Save and categorize content based on your preferences.