Cloud Profiler Stay organized with collections Save and categorize content based on your preferences.
Cloud Profiler continuously gathers and reportsapplication CPU usage and memory-allocation information.
Requirements:
Profiler supports only DataprocHadoop and Spark job types (Spark, PySpark, SparkSql, and SparkR).
Jobs must run longer than 3 minutes to allow Profilerto collect and upload data to your project.
Dataproc recognizescloud.profiler.enable and the othercloud.profiler.* properties (seeProfiler options), and then appendsthe relevant profiler JVM options to the following configurations:
- Spark:
spark.driver.extraJavaOptionsandspark.executor.extraJavaOptions - MapReduce:
mapreduce.task.profileand othermapreduce.task.profile.*properties
spark.executor.extraJavaOptions), prevents the setting ofprofiler options. However, if you provideextraJavaOptions using thegcloud dataproc jobs submit (spark|hadoop) --properties flag,Dataproc retains and sets profiler options.Enable profiling
Complete the following steps to enable and use the Profileron your Dataproc Spark and Hadoop jobs.
Create a Dataproc clusterwithservice account scopesset to
monitoringto allow the cluster to talk to the profiler service.If you are using acustom VM service account,grant theCloud Profiler Agent role to the custom VM service account. Thisrole contains required profiler service permissions.
gcloud
gcloud dataproc clusters createcluster-name \ --scopes=cloud-platform \ --region=region \ other args ...
Submit a Dataproc job with Profiler options
- Submit a Dataproc Spark or Hadoop jobwith one or more of the following Profiler options:
Option Description Value Required/Optional Default Notes cloud.profiler.enableEnable profiling of the job trueorfalseRequired falsecloud.profiler.nameName used to create profile on the Profiler Service profile-name Optional Dataproc job UUID cloud.profiler.service.versionA user-supplied string to identify and distinguish profiler results. Profiler Service Version Optional Dataproc job UUID mapreduce.task.profile.mapsNumeric range of map tasks to profile (example: for up to 100, specify "0-100") number range Optional 0-10000 Applies to Hadoop mapreduce jobs only mapreduce.task.profile.reducesNumeric range of reducer tasks to profile (example: for up to 100, specify "0-100") number range Optional 0-10000 Applies to Hadoop mapreduce jobs only
PySpark Example
Google Cloud CLI
PySpark job submit with profiling example:
gcloud dataproc jobs submit pysparkpython-job-file \ --cluster=cluster-name \ --region=region \ --properties=cloud.profiler.enable=true,cloud.profiler.name=profiler_name,cloud.profiler.service.version=version \ -- job args
Two profiles will be created:
profiler_name-driverto profile spark driver tasksprofiler_name-executorto profile spark executor tasks
For example, if theprofiler_name is "spark_word_count_job",spark_word_count_job-driver andspark_word_count_job-executor profiles are created.
Hadoop Example
gcloud CLI
Hadoop (teragen mapreduce) job submit with profiling example:
gcloud dataproc jobs submit hadoop \ --cluster=cluster-name \ --region=region \ --jar=jar-file \ --properties=cloud.profiler.enable=true,cloud.profiler.name=profiler_name,cloud.profiler.service.version=version \ -- teragen 100000 gs://bucket-name
View profiles
View profiles from theProfiler onthe Google Cloud console.
Whats next
- See theMonitoring documentation
- See theLogging documentation
- ExploreGoogle Cloud Observability
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.