Cloud Profiler

Cloud Profiler continuously gathers and reportsapplication CPU usage and memory-allocation information.

Requirements:

Profiler supports only DataprocHadoop and Spark job types (Spark, PySpark, SparkSql, and SparkR).
Jobs must run longer than 3 minutes to allow Profilerto collect and upload data to your project.

Dataproc recognizescloud.profiler.enable and the othercloud.profiler.* properties (seeProfiler options), and then appendsthe relevant profiler JVM options to the following configurations:

Spark:spark.driver.extraJavaOptions andspark.executor.extraJavaOptions
MapReduce:mapreduce.task.profile and othermapreduce.task.profile.*properties

Note: Overriding Spark or MapReduce properties in your job(for example, by manually constructing a SparkConf and settingspark.executor.extraJavaOptions), prevents the setting ofprofiler options. However, if you provideextraJavaOptions using thegcloud dataproc jobs submit (spark|hadoop) --properties flag,Dataproc retains and sets profiler options.

Enable profiling

Complete the following steps to enable and use the Profileron your Dataproc Spark and Hadoop jobs.

Enable the Profiler.
Create a Dataproc clusterwithservice account scopesset tomonitoring to allow the cluster to talk to the profiler service.
If you are using acustom VM service account,grant theCloud Profiler Agent role to the custom VM service account. Thisrole contains required profiler service permissions.

gcloud

gcloud dataproc clusters createcluster-name \    --scopes=cloud-platform \    --region=region \    other args ...

Submit a Dataproc job with Profiler options

Submit a Dataproc Spark or Hadoop jobwith one or more of the following Profiler options:

Option	Description	Value	Required/Optional	Default	Notes
`cloud.profiler.enable`	Enable profiling of the job	`true` or`false`	Required	`false`
`cloud.profiler.name`	Name used to create profile on the Profiler Service	`profile-name`	Optional	Dataproc job UUID
`cloud.profiler.service.version`	A user-supplied string to identify and distinguish profiler results.	`Profiler Service Version`	Optional	Dataproc job UUID
`mapreduce.task.profile.maps`	Numeric range of map tasks to profile (example: for up to 100, specify "0-100")	`number range`	Optional	0-10000	Applies to Hadoop mapreduce jobs only
`mapreduce.task.profile.reduces`	Numeric range of reducer tasks to profile (example: for up to 100, specify "0-100")	`number range`	Optional	0-10000	Applies to Hadoop mapreduce jobs only

PySpark Example

Google Cloud CLI

PySpark job submit with profiling example:

gcloud dataproc jobs submit pysparkpython-job-file \    --cluster=cluster-name \    --region=region \    --properties=cloud.profiler.enable=true,cloud.profiler.name=profiler_name,cloud.profiler.service.version=version \    --  job args

Two profiles will be created:

profiler_name-driver to profile spark driver tasks
profiler_name-executor to profile spark executor tasks

For example, if theprofiler_name is "spark_word_count_job",spark_word_count_job-driver andspark_word_count_job-executor profiles are created.

Hadoop Example

gcloud CLI

Hadoop (teragen mapreduce) job submit with profiling example:

gcloud dataproc jobs submit hadoop \    --cluster=cluster-name \    --region=region \    --jar=jar-file \    --properties=cloud.profiler.enable=true,cloud.profiler.name=profiler_name,cloud.profiler.service.version=version \    --  teragen 100000 gs://bucket-name

View profiles

View profiles from theProfiler onthe Google Cloud console.

Whats next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.

Movatterモバイル変換

Cloud Profiler Stay organized with collections Save and categorize content based on your preferences.