Cloud Profiler

Cloud Profiler continuously gathers and reportsapplication CPU usage and memory-allocation information.

Requirements:

  • Profiler supports only DataprocHadoop and Spark job types (Spark, PySpark, SparkSql, and SparkR).

  • Jobs must run longer than 3 minutes to allow Profilerto collect and upload data to your project.

Dataproc recognizescloud.profiler.enable and the othercloud.profiler.* properties (seeProfiler options), and then appendsthe relevant profiler JVM options to the following configurations:

  • Spark:spark.driver.extraJavaOptions andspark.executor.extraJavaOptions
  • MapReduce:mapreduce.task.profile and othermapreduce.task.profile.*properties
Note: Overriding Spark or MapReduce properties in your job(for example, by manually constructing a SparkConf and settingspark.executor.extraJavaOptions), prevents the setting ofprofiler options. However, if you provideextraJavaOptions using thegcloud dataproc jobs submit (spark|hadoop) --properties flag,Dataproc retains and sets profiler options.

Enable profiling

Complete the following steps to enable and use the Profileron your Dataproc Spark and Hadoop jobs.

  1. Enable the Profiler.

  2. Create a Dataproc clusterwithservice account scopesset tomonitoring to allow the cluster to talk to the profiler service.

  3. If you are using acustom VM service account,grant theCloud Profiler Agent role to the custom VM service account. Thisrole contains required profiler service permissions.

gcloud

gcloud dataproc clusters createcluster-name \    --scopes=cloud-platform \    --region=region \    other args ...

Submit a Dataproc job with Profiler options

  1. Submit a Dataproc Spark or Hadoop jobwith one or more of the following Profiler options:
    OptionDescriptionValueRequired/OptionalDefaultNotes
    cloud.profiler.enableEnable profiling of the jobtrue orfalseRequiredfalse
    cloud.profiler.nameName used to create profile on the Profiler Serviceprofile-nameOptionalDataproc job UUID
    cloud.profiler.service.versionA user-supplied string to identify and distinguish profiler results.Profiler Service VersionOptionalDataproc job UUID
    mapreduce.task.profile.mapsNumeric range of map tasks to profile (example: for up to 100, specify "0-100")number rangeOptional0-10000Applies to Hadoop mapreduce jobs only
    mapreduce.task.profile.reducesNumeric range of reducer tasks to profile (example: for up to 100, specify "0-100")number rangeOptional0-10000Applies to Hadoop mapreduce jobs only

PySpark Example

Google Cloud CLI

PySpark job submit with profiling example:

gcloud dataproc jobs submit pysparkpython-job-file \    --cluster=cluster-name \    --region=region \    --properties=cloud.profiler.enable=true,cloud.profiler.name=profiler_name,cloud.profiler.service.version=version \    --  job args

Two profiles will be created:

  1. profiler_name-driver to profile spark driver tasks
  2. profiler_name-executor to profile spark executor tasks

For example, if theprofiler_name is "spark_word_count_job",spark_word_count_job-driver andspark_word_count_job-executor profiles are created.

Hadoop Example

gcloud CLI

Hadoop (teragen mapreduce) job submit with profiling example:

gcloud dataproc jobs submit hadoop \    --cluster=cluster-name \    --region=region \    --jar=jar-file \    --properties=cloud.profiler.enable=true,cloud.profiler.name=profiler_name,cloud.profiler.service.version=version \    --  teragen 100000 gs://bucket-name

View profiles

View profiles from theProfiler onthe Google Cloud console.

Whats next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.