gcloud dataproc batches submit spark Stay organized with collections Save and categorize content based on your preferences.
- NAME
- gcloud dataproc batches submit spark - submit a Spark batch job
- SYNOPSIS
gcloud dataproc batches submit spark(--class=MAIN_CLASS|--jar=MAIN_JAR)[--archives=[ARCHIVE,…]][--async][--batch=BATCH][--container-image=CONTAINER_IMAGE][--deps-bucket=DEPS_BUCKET][--files=[FILE,…]][--history-server-cluster=HISTORY_SERVER_CLUSTER][--jars=[JAR,…]][--kms-key=KMS_KEY][--labels=[KEY=VALUE,…]][--metastore-service=METASTORE_SERVICE][--properties=[PROPERTY=VALUE,…]][--region=REGION][--request-id=REQUEST_ID][--service-account=SERVICE_ACCOUNT][--staging-bucket=STAGING_BUCKET][--tags=[TAGS,…]][--ttl=TTL][--user-workload-authentication-type=USER_WORKLOAD_AUTHENTICATION_TYPE][--version=VERSION][--network=NETWORK|--subnet=SUBNET][GCLOUD_WIDE_FLAG …][--JOB_ARG…]
- DESCRIPTION
- Submit a Spark batch job.
- EXAMPLES
- To submit a Spark job, run:
gclouddataprocbatchessubmitspark--region=us-central1--jar=my_jar.jar--deps-bucket=gs://my-bucket--ARG1ARG2To submit a Spark job that runs a specific class of a jar, run:
gclouddataprocbatchessubmitspark--region=us-central1--class=org.my.main.Class--jars=my_jar1.jar,my_jar2.jar--deps-bucket=gs://my-bucket--ARG1ARG2To submit a Spark job that runs a jar installed on the cluster, run:
gclouddataprocbatchessubmitspark--region=us-central1--class=org.apache.spark.examples.SparkPi--deps-bucket=gs://my-bucket--jars=file:///usr/lib/spark/examples/jars/spark-examples.jar--15 - POSITIONAL ARGUMENTS
- [--
JOB_ARG…] - Arguments to pass to the driver.
The '--' argument must be specified between gcloud specific args on the left andJOB_ARG on the right.
- [--
- REQUIRED FLAGS
- Exactly one of these must be specified:
--class=MAIN_CLASS- Class contains the main method of the job. The jar file that contains the classmust be in the classpath or specified in
jar_files. --jar=MAIN_JAR- URI of the main jar file.
- Exactly one of these must be specified:
- OPTIONAL FLAGS
--archives=[ARCHIVE,…]- Archives to be extracted into the working directory. Supported file types: .jar,.tar, .tar.gz, .tgz, and .zip.
--async- Return immediately without waiting for the operation in progress to complete.
--batch=BATCH- The ID of the batch job to submit. The ID must contain only lowercase letters(a-z), numbers (0-9) and hyphens (-). The length of the name must be between 4and 63 characters. If this argument is not provided, a random generated UUIDwill be used.
--container-image=CONTAINER_IMAGE- Optional custom container image to use for the batch/session runtimeenvironment. If not specified, a default container image will be used. The valueshould follow the container image naming format:{registry}/{repository}/{name}:{tag}, for example,gcr.io/my-project/my-image:1.2.3
--deps-bucket=DEPS_BUCKET- A Cloud Storage bucket to upload workload dependencies.
--files=[FILE,…]- Files to be placed in the working directory.
--history-server-cluster=HISTORY_SERVER_CLUSTER- Spark History Server configuration for the batch/session job. Resource name ofan existing Dataproc cluster to act as a Spark History Server for the workloadin the format: "projects/{project_id}/regions/{region}/clusters/{cluster_name}".
--jars=[JAR,…]- Comma-separated list of jar files to be provided to the classpaths.
--kms-key=KMS_KEY- Cloud KMS key to use for encryption.
--labels=[KEY=VALUE,…]- List of label KEY=VALUE pairs to add.
Keys must start with a lowercase character and contain only hyphens(
-), underscores (_), lowercase characters, andnumbers. Values must contain only hyphens (-), underscores(_), lowercase characters, and numbers. --metastore-service=METASTORE_SERVICE- Name of a Dataproc Metastore service to be used as an external metastore in theformat: "projects/{project-id}/locations/{region}/services/{service-name}".
--properties=[PROPERTY=VALUE,…]- Specifies configuration properties for the workload. SeeDataprocServerless for Spark documentation for the list of supported properties.
- Region resource - Dataproc region to use. Each Dataproc region constitutes anindependent resource namespace constrained to deploying instances into ComputeEngine zones inside the region. This represents a Cloud resource. (NOTE) Someattributes are not given arguments in this group but can be set in other ways.
To set the
projectattribute:- provide the argument
--regionon the command line with a fullyspecified name; - set the property
dataproc/regionwith a fully specified name; - provide the argument
--projecton the command line; - set the property
core/project.
--region=REGION- ID of the region or fully qualified identifier for the region.
To set the
regionattribute:- provide the argument
--regionon the command line; - set the property
dataproc/region.
- provide the argument
- provide the argument
--request-id=REQUEST_ID- A unique ID that identifies the request. If the service receives two batchcreate requests with the same request_id, the second request is ignored and theoperation that corresponds to the first batch created and stored in the backendis returned. Recommendation: Always set this value to a UUID. The value mustcontain only letters (a-z, A-Z), numbers (0-9), underscores (
), andhyphens (-). The maximum length is 40 characters. --service-account=SERVICE_ACCOUNT- The IAM service account to be used for a batch/session job.
--staging-bucket=STAGING_BUCKET- The Cloud Storage bucket to use to store job dependencies, config files, and jobdriver console output. If not specified, the default [staging bucket](https://cloud.google.com/dataproc-serverless/docs/concepts/buckets) is used.
--tags=[TAGS,…]- Network tags for traffic control.
--ttl=TTL- The duration after the workload will be unconditionally terminated, for example,'20m' or '1h'. Rungcloudtopic datetimes for information on duration formats.
--user-workload-authentication-type=USER_WORKLOAD_AUTHENTICATION_TYPE- Whether to use END_USER_CREDENTIALS or SERVICE_ACCOUNT to run the workload.
--version=VERSION- Optional runtime version. If not specified, a default version will be used.
- At most one of these can be specified:
--network=NETWORK- Network URI to connect network to.
--subnet=SUBNET- Subnetwork URI to connect network to. Subnet must have Private Google Accessenabled.
- GCLOUD WIDE FLAGS
- These flags are available to all commands:
--access-token-file,--account,--billing-project,--configuration,--flags-file,--flatten,--format,--help,--impersonate-service-account,--log-http,--project,--quiet,--trace-token,--user-output-enabled,--verbosity.Run
$gcloud helpfor details. - NOTES
- This variant is also available:
gcloudbetadataprocbatchessubmitspark
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-05-13 UTC.