gcloud dataproc batches submit spark

NAME
gcloud dataproc batches submit spark - submit a Spark batch job
SYNOPSIS
gcloud dataproc batches submit spark(--class=MAIN_CLASS    |--jar=MAIN_JAR)[--archives=[ARCHIVE,…]][--async][--batch=BATCH][--container-image=CONTAINER_IMAGE][--deps-bucket=DEPS_BUCKET][--files=[FILE,…]][--history-server-cluster=HISTORY_SERVER_CLUSTER][--jars=[JAR,…]][--kms-key=KMS_KEY][--labels=[KEY=VALUE,…]][--metastore-service=METASTORE_SERVICE][--properties=[PROPERTY=VALUE,…]][--region=REGION][--request-id=REQUEST_ID][--service-account=SERVICE_ACCOUNT][--staging-bucket=STAGING_BUCKET][--tags=[TAGS,…]][--ttl=TTL][--user-workload-authentication-type=USER_WORKLOAD_AUTHENTICATION_TYPE][--version=VERSION][--network=NETWORK    |--subnet=SUBNET][GCLOUD_WIDE_FLAG][--JOB_ARG …]
DESCRIPTION
Submit a Spark batch job.
EXAMPLES
To submit a Spark job, run:
gclouddataprocbatchessubmitspark--region=us-central1--jar=my_jar.jar--deps-bucket=gs://my-bucket--ARG1ARG2

To submit a Spark job that runs a specific class of a jar, run:

gclouddataprocbatchessubmitspark--region=us-central1--class=org.my.main.Class--jars=my_jar1.jar,my_jar2.jar--deps-bucket=gs://my-bucket--ARG1ARG2

To submit a Spark job that runs a jar installed on the cluster, run:

gclouddataprocbatchessubmitspark--region=us-central1--class=org.apache.spark.examples.SparkPi--deps-bucket=gs://my-bucket--jars=file:///usr/lib/spark/examples/jars/spark-examples.jar--15
POSITIONAL ARGUMENTS
[--JOB_ARG …]
Arguments to pass to the driver.

The '--' argument must be specified between gcloud specific args on the left andJOB_ARG on the right.

REQUIRED FLAGS
Exactly one of these must be specified:
--class=MAIN_CLASS
Class contains the main method of the job. The jar file that contains the classmust be in the classpath or specified injar_files.
--jar=MAIN_JAR
URI of the main jar file.
OPTIONAL FLAGS
--archives=[ARCHIVE,…]
Archives to be extracted into the working directory. Supported file types: .jar,.tar, .tar.gz, .tgz, and .zip.
--async
Return immediately without waiting for the operation in progress to complete.
--batch=BATCH
The ID of the batch job to submit. The ID must contain only lowercase letters(a-z), numbers (0-9) and hyphens (-). The length of the name must be between 4and 63 characters. If this argument is not provided, a random generated UUIDwill be used.
--container-image=CONTAINER_IMAGE
Optional custom container image to use for the batch/session runtimeenvironment. If not specified, a default container image will be used. The valueshould follow the container image naming format:{registry}/{repository}/{name}:{tag}, for example,gcr.io/my-project/my-image:1.2.3
--deps-bucket=DEPS_BUCKET
A Cloud Storage bucket to upload workload dependencies.
--files=[FILE,…]
Files to be placed in the working directory.
--history-server-cluster=HISTORY_SERVER_CLUSTER
Spark History Server configuration for the batch/session job. Resource name ofan existing Dataproc cluster to act as a Spark History Server for the workloadin the format: "projects/{project_id}/regions/{region}/clusters/{cluster_name}".
--jars=[JAR,…]
Comma-separated list of jar files to be provided to the classpaths.
--kms-key=KMS_KEY
Cloud KMS key to use for encryption.
--labels=[KEY=VALUE,…]
List of label KEY=VALUE pairs to add.

Keys must start with a lowercase character and contain only hyphens(-), underscores (_), lowercase characters, andnumbers. Values must contain only hyphens (-), underscores(_), lowercase characters, and numbers.

--metastore-service=METASTORE_SERVICE
Name of a Dataproc Metastore service to be used as an external metastore in theformat: "projects/{project-id}/locations/{region}/services/{service-name}".
--properties=[PROPERTY=VALUE,…]
Specifies configuration properties for the workload. SeeDataprocServerless for Spark documentation for the list of supported properties.
Region resource - Dataproc region to use. Each Dataproc region constitutes anindependent resource namespace constrained to deploying instances into ComputeEngine zones inside the region. This represents a Cloud resource. (NOTE) Someattributes are not given arguments in this group but can be set in other ways.

To set theproject attribute:

  • provide the argument--region on the command line with a fullyspecified name;
  • set the propertydataproc/region with a fully specified name;
  • provide the argument--project on the command line;
  • set the propertycore/project.
--region=REGION
ID of the region or fully qualified identifier for the region.

To set theregion attribute:

  • provide the argument--region on the command line;
  • set the propertydataproc/region.
--request-id=REQUEST_ID
A unique ID that identifies the request. If the service receives two batchcreate requests with the same request_id, the second request is ignored and theoperation that corresponds to the first batch created and stored in the backendis returned. Recommendation: Always set this value to a UUID. The value mustcontain only letters (a-z, A-Z), numbers (0-9), underscores (), andhyphens (-). The maximum length is 40 characters.
--service-account=SERVICE_ACCOUNT
The IAM service account to be used for a batch/session job.
--staging-bucket=STAGING_BUCKET
The Cloud Storage bucket to use to store job dependencies, config files, and jobdriver console output. If not specified, the default [staging bucket](https://cloud.google.com/dataproc-serverless/docs/concepts/buckets) is used.
--tags=[TAGS,…]
Network tags for traffic control.
--ttl=TTL
The duration after the workload will be unconditionally terminated, for example,'20m' or '1h'. Rungcloudtopic datetimes for information on duration formats.
--user-workload-authentication-type=USER_WORKLOAD_AUTHENTICATION_TYPE
Whether to use END_USER_CREDENTIALS or SERVICE_ACCOUNT to run the workload.
--version=VERSION
Optional runtime version. If not specified, a default version will be used.
At most one of these can be specified:
--network=NETWORK
Network URI to connect network to.
--subnet=SUBNET
Subnetwork URI to connect network to. Subnet must have Private Google Accessenabled.
GCLOUD WIDE FLAGS
These flags are available to all commands:--access-token-file,--account,--billing-project,--configuration,--flags-file,--flatten,--format,--help,--impersonate-service-account,--log-http,--project,--quiet,--trace-token,--user-output-enabled,--verbosity.

Run$gcloud help for details.

NOTES
This variant is also available:
gcloudbetadataprocbatchessubmitspark

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-05-13 UTC.