gcloud beta dataproc batches submit pyspark

NAME
gcloud beta dataproc batches submit pyspark - submit a PySpark batch job
SYNOPSIS
gcloud beta dataproc batches submit pysparkMAIN_PYTHON_FILE[--archives=[ARCHIVE,…]][--async][--batch=BATCH][--container-image=CONTAINER_IMAGE][--deps-bucket=DEPS_BUCKET][--files=[FILE,…]][--history-server-cluster=HISTORY_SERVER_CLUSTER][--jars=[JAR,…]][--kms-key=KMS_KEY][--labels=[KEY=VALUE,…]][--metastore-service=METASTORE_SERVICE][--properties=[PROPERTY=VALUE,…]][--py-files=[PY,…]][--region=REGION][--request-id=REQUEST_ID][--service-account=SERVICE_ACCOUNT][--staging-bucket=STAGING_BUCKET][--tags=[TAGS,…]][--ttl=TTL][--user-workload-authentication-type=USER_WORKLOAD_AUTHENTICATION_TYPE][--version=VERSION][--network=NETWORK    |--subnet=SUBNET][GCLOUD_WIDE_FLAG][--JOB_ARG …]
DESCRIPTION
(BETA) Submit a PySpark batch job.
EXAMPLES
To submit a PySpark batch job called "my-batch" that runs "my-pyspark.py", run:
gcloudbetadataprocbatchessubmitpysparkmy-pyspark.py--batch=my-batch--deps-bucket=gs://my-bucket--region=us-central1--py-files='path/to/my/python/script.py'
POSITIONAL ARGUMENTS
MAIN_PYTHON_FILE
URI of the main Python file to use as the Spark driver. Must be a.py file.
[--JOB_ARG …]
Arguments to pass to the driver.

The '--' argument must be specified between gcloud specific args on the left andJOB_ARG on the right.

FLAGS
--archives=[ARCHIVE,…]
Archives to be extracted into the working directory. Supported file types: .jar,.tar, .tar.gz, .tgz, and .zip.
--async
Return immediately without waiting for the operation in progress to complete.
--batch=BATCH
The ID of the batch job to submit. The ID must contain only lowercase letters(a-z), numbers (0-9) and hyphens (-). The length of the name must be between 4and 63 characters. If this argument is not provided, a random generated UUIDwill be used.
--container-image=CONTAINER_IMAGE
Optional custom container image to use for the batch/session runtimeenvironment. If not specified, a default container image will be used. The valueshould follow the container image naming format:{registry}/{repository}/{name}:{tag}, for example,gcr.io/my-project/my-image:1.2.3
--deps-bucket=DEPS_BUCKET
A Cloud Storage bucket to upload workload dependencies.
--files=[FILE,…]
Files to be placed in the working directory.
--history-server-cluster=HISTORY_SERVER_CLUSTER
Spark History Server configuration for the batch/session job. Resource name ofan existing Dataproc cluster to act as a Spark History Server for the workloadin the format: "projects/{project_id}/regions/{region}/clusters/{cluster_name}".
--jars=[JAR,…]
Comma-separated list of jar files to be provided to the classpaths.
--kms-key=KMS_KEY
Cloud KMS key to use for encryption.
--labels=[KEY=VALUE,…]
List of label KEY=VALUE pairs to add.

Keys must start with a lowercase character and contain only hyphens(-), underscores (_), lowercase characters, andnumbers. Values must contain only hyphens (-), underscores(_), lowercase characters, and numbers.

--metastore-service=METASTORE_SERVICE
Name of a Dataproc Metastore service to be used as an external metastore in theformat: "projects/{project-id}/locations/{region}/services/{service-name}".
--properties=[PROPERTY=VALUE,…]
Specifies configuration properties for the workload. SeeDataprocServerless for Spark documentation for the list of supported properties.
--py-files=[PY,…]
Comma-separated list of Python scripts to be passed to the PySpark framework.Supported file types:.py,.egg and.zip.
Region resource - Dataproc region to use. Each Dataproc region constitutes anindependent resource namespace constrained to deploying instances into ComputeEngine zones inside the region. This represents a Cloud resource. (NOTE) Someattributes are not given arguments in this group but can be set in other ways.

To set theproject attribute:

  • provide the argument--region on the command line with a fullyspecified name;
  • set the propertydataproc/region with a fully specified name;
  • provide the argument--project on the command line;
  • set the propertycore/project.
--region=REGION
ID of the region or fully qualified identifier for the region.

To set theregion attribute:

  • provide the argument--region on the command line;
  • set the propertydataproc/region.
--request-id=REQUEST_ID
A unique ID that identifies the request. If the service receives two batchcreate requests with the same request_id, the second request is ignored and theoperation that corresponds to the first batch created and stored in the backendis returned. Recommendation: Always set this value to a UUID. The value mustcontain only letters (a-z, A-Z), numbers (0-9), underscores (), andhyphens (-). The maximum length is 40 characters.
--service-account=SERVICE_ACCOUNT
The IAM service account to be used for a batch/session job.
--staging-bucket=STAGING_BUCKET
The Cloud Storage bucket to use to store job dependencies, config files, and jobdriver console output. If not specified, the default [staging bucket](https://cloud.google.com/dataproc-serverless/docs/concepts/buckets) is used.
--tags=[TAGS,…]
Network tags for traffic control.
--ttl=TTL
The duration after the workload will be unconditionally terminated, for example,'20m' or '1h'. Rungcloudtopic datetimes for information on duration formats.
--user-workload-authentication-type=USER_WORKLOAD_AUTHENTICATION_TYPE
Whether to use END_USER_CREDENTIALS or SERVICE_ACCOUNT to run the workload.
--version=VERSION
Optional runtime version. If not specified, a default version will be used.
At most one of these can be specified:
--network=NETWORK
Network URI to connect network to.
--subnet=SUBNET
Subnetwork URI to connect network to. Subnet must have Private Google Accessenabled.
GCLOUD WIDE FLAGS
These flags are available to all commands:--access-token-file,--account,--billing-project,--configuration,--flags-file,--flatten,--format,--help,--impersonate-service-account,--log-http,--project,--quiet,--trace-token,--user-output-enabled,--verbosity.

Run$gcloud help for details.

NOTES
This command is currently in beta and might change without notice. This variantis also available:
gclouddataprocbatchessubmitpyspark

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-05-13 UTC.