gcloud dataplex tasks create Stay organized with collections Save and categorize content based on your preferences.
- NAME
- gcloud dataplex tasks create - create a Dataplex task resource
- SYNOPSIS
gcloud dataplex tasks create(TASK:--lake=LAKE--location=LOCATION)(--execution-service-account=EXECUTION_SERVICE_ACCOUNT:--execution-args=[KEY=VALUE,…]--execution-project=EXECUTION_PROJECT--kms-key=KMS_KEY--max-job-execution-lifetime=MAX_JOB_EXECUTION_LIFETIME)([--notebook=NOTEBOOK:--notebook-archive-uris=[NOTEBOOK_ARCHIVE_URIS,…]--notebook-file-uris=[NOTEBOOK_FILE_URIS,…]--notebook-batch-executors-count=NOTEBOOK_BATCH_EXECUTORS_COUNT--notebook-batch-max-executors-count=NOTEBOOK_BATCH_MAX_EXECUTORS_COUNT--notebook-container-image=NOTEBOOK_CONTAINER_IMAGE--notebook-container-image-java-jars=[NOTEBOOK_CONTAINER_IMAGE_JAVA_JARS,…]--notebook-container-image-properties=[KEY=VALUE,…]--notebook-vpc-network-tags=[NOTEBOOK_VPC_NETWORK_TAGS,…]--notebook-vpc-network-name=NOTEBOOK_VPC_NETWORK_NAME|--notebook-vpc-sub-network-name=NOTEBOOK_VPC_SUB_NETWORK_NAME] | [(--spark-main-class=SPARK_MAIN_CLASS|--spark-main-jar-file-uri=SPARK_MAIN_JAR_FILE_URI|--spark-python-script-file=SPARK_PYTHON_SCRIPT_FILE|--spark-sql-script=SPARK_SQL_SCRIPT|--spark-sql-script-file=SPARK_SQL_SCRIPT_FILE) :--spark-archive-uris=[SPARK_ARCHIVE_URIS,…]--spark-file-uris=[SPARK_FILE_URIS,…]--batch-executors-count=BATCH_EXECUTORS_COUNT--batch-max-executors-count=BATCH_MAX_EXECUTORS_COUNT--container-image=CONTAINER_IMAGE--container-image-java-jars=[CONTAINER_IMAGE_JAVA_JARS,…]--container-image-properties=[KEY=VALUE,…]--container-image-python-packages=[CONTAINER_IMAGE_PYTHON_PACKAGES,…]--vpc-network-tags=[VPC_NETWORK_TAGS,…]--vpc-network-name=VPC_NETWORK_NAME|--vpc-sub-network-name=VPC_SUB_NETWORK_NAME])(--trigger-type=TRIGGER_TYPE:--trigger-disabled--trigger-max-retires=TRIGGER_MAX_RETIRES--trigger-schedule=TRIGGER_SCHEDULE--trigger-start-time=TRIGGER_START_TIME)[--async][--description=DESCRIPTION][--display-name=DISPLAY_NAME][--labels=[KEY=VALUE,…]][GCLOUD_WIDE_FLAG …]
- DESCRIPTION
- Create a Dataplex task resource.
A task represents a user visible job that you want Dataplex to perform on aschedule. It encapsulates your code, your parameters and the schedule.
This task ID must follow these rules: o Must contain only lowercase letters,numbers, and hyphens. o Must start with a letter. o Must end with a number or aletter. o Must be between 1-63 characters. o Must be unique within the customerproject / location.
- EXAMPLES
- To create a Dataplex task
test-taskwith ON_DEMAND trigger type,dataplex-demo-test@test-project.iam.gserviceaccount.comasexecution service account andgs://test-bucket/test-file.pyasspark python script file within laketest-lakein locationus-central1.gclouddataplextaskscreatetest-task--location=us-central1--lake=test-lake--execution-service-account=dataplex-demo-test@test-project.iam.gserviceaccount.com--spark-python-script-file=gs://test-bucket/test-file.py--trigger-type=ON_DEMANDTo create a Dataplex task
test-taskwith RECURRING trigger typestarting every hour at minute 0,dataplex-demo-test@test-project.iam.gserviceaccount.comasexecution service account andgs://test-bucket/test-file.pyasspark python script file within laketest-lakein locationus-central1.gclouddataplextaskscreatetest-task--location=us-central1--lake=test-lake--execution-service-account=dataplex-demo-test@test-project.iam.gserviceaccount.com--spark-python-script-file=gs://test-bucket/test-file.py--trigger-type=RECURRING--trigger-schedule="0 * * * *" - POSITIONAL ARGUMENTS
- Task resource - Arguments and flags that specify the Dataplex Task you want tocreate. The arguments in this group can be used to specify the attributes ofthis resource. (NOTE) Some attributes are not given arguments in this group butcan be set in other ways.
To set the
projectattribute:- provide the argument
taskon the command line with a fullyspecified name; - provide the argument
--projecton the command line; - set the property
core/project.
This must be specified.
TASK- ID of the task or fully qualified identifier for the task.
To set the
taskattribute:- provide the argument
taskon the command line.
This positional argument must be specified if any of the other arguments in thisgroup are specified.
- provide the argument
--lake=LAKE- Identifier of the Dataplex lake resource.To set the
lakeattribute:- provide the argument
taskon the command line with a fullyspecified name; - provide the argument
--lakeon the command line.
- provide the argument
--location=LOCATION- Location of the Dataplex resource.To set the
locationattribute:- provide the argument
taskon the command line with a fullyspecified name; - provide the argument
--locationon the command line; - set the property
dataplex/location.
- provide the argument
- provide the argument
- Task resource - Arguments and flags that specify the Dataplex Task you want tocreate. The arguments in this group can be used to specify the attributes ofthis resource. (NOTE) Some attributes are not given arguments in this group butcan be set in other ways.
- REQUIRED FLAGS
- Spec related to how a task is executed.This must be specified.
--execution-service-account=EXECUTION_SERVICE_ACCOUNT- Service account to use to execute a task.
This flag argument must be specified if any of the other arguments in this groupare specified.
--execution-args=[KEY=VALUE,…]- The arguments to pass to the task. The args can use placeholders of the format${placeholder} as part of key/value string. These will be interpolated beforepassing the args to the driver. Currently supported placeholders:
- ${task_id}
- ${job_time} To pass positional args, set the key as TASK_ARGS. The value shouldbe a comma-separated string of all the positional arguments. Seehttps://cloud.google.com/sdk/gcloud/reference/topic/escapingfor details on using a delimiter other than a comma. In case of other keys beingpresent in the args, then TASK_ARGS will be passed as the last argument.
--execution-project=EXECUTION_PROJECT- The project in which jobs are run. By default, the project containing the Lakeis used. If a project is provided, the --execution-service-account must belongto this same project.
--kms-key=KMS_KEY- The Cloud KMS key to use for encryption, of the form:projects/{project_number}/locations/{location_id}/keyRings/{key-ring-name}/cryptoKeys/{key-name}
--max-job-execution-lifetime=MAX_JOB_EXECUTION_LIFETIME- The maximum duration before the job execution expires.
- Select which task you want to schedule and provide the required arguments forthe task. The 2 types of tasks supported are:-
- spark tasks
- notebook tasks
- Config related to running custom notebook tasks.
--notebook=NOTEBOOK- Path to input notebook. This can be the Google Cloud Storage URI of the notebookfile or the path to a Notebook Content. The execution args are accessible asenvironment variables (
TASK_key=value).This flag argument must be specified if any of the other arguments in this groupare specified.
--notebook-archive-uris=[NOTEBOOK_ARCHIVE_URIS,…]- Google Cloud Storage URIs of archives to be extracted into the working directoryof each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
--notebook-file-uris=[NOTEBOOK_FILE_URIS,…]- Google Cloud Storage URIs of files to be placed in the working directory of eachexecutor.
- Compute resources needed for a Task when using Dataproc Serverless.
--notebook-batch-executors-count=NOTEBOOK_BATCH_EXECUTORS_COUNT- Total number of job executors.
--notebook-batch-max-executors-count=NOTEBOOK_BATCH_MAX_EXECUTORS_COUNT- Max configurable executors. If max_executors_count > executors_count, thenauto-scaling is enabled.
- Container Image Runtime Configuration.
--notebook-container-image=NOTEBOOK_CONTAINER_IMAGE- Optional custom container image for the job.
--notebook-container-image-java-jars=[NOTEBOOK_CONTAINER_IMAGE_JAVA_JARS,…]- A list of Java JARS to add to the classpath. Valid input includes Cloud StorageURIs to Jar binaries. For example, gs://bucket-name/my/path/to/file.jar
--notebook-container-image-properties=[KEY=VALUE,…]- The properties to set on daemon config files. Property keys are specified inprefix:property format, for example core:hadoop.tmp.dir. For more information,see Cluster properties(https://cloud.google.com/dataproc/docs/concepts/cluster-properties)
- Cloud VPC Network used to run the infrastructure.
--notebook-vpc-network-tags=[NOTEBOOK_VPC_NETWORK_TAGS,…]- List of network tags to apply to the job.
- The Cloud VPC network identifier.At most one of these can be specified:
--notebook-vpc-network-name=NOTEBOOK_VPC_NETWORK_NAME- The Cloud VPC network in which the job is run. By default, the Cloud VPC networknamed Default within the project is used.
--notebook-vpc-sub-network-name=NOTEBOOK_VPC_SUB_NETWORK_NAME- The Cloud VPC sub-network in which the job is run.
- Config related to running custom Spark tasks.
--spark-archive-uris=[SPARK_ARCHIVE_URIS,…]- Google Cloud Storage URIs of archives to be extracted into the working directoryof each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
--spark-file-uris=[SPARK_FILE_URIS,…]- Google Cloud Storage URIs of files to be placed in the working directory of eachexecutor.
- The specification of the main method to call to drive the job. Specify eitherthe jar file that contains the main class or the main class name.Exactly one of these must be specified:
--spark-main-class=SPARK_MAIN_CLASS- The name of the driver's main class. The jar file that contains the class mustbe in the default CLASSPATH or specified in
jar_file_uris. Theexecution args are passed in as a sequence of named process arguments(--key=value). --spark-main-jar-file-uri=SPARK_MAIN_JAR_FILE_URI- The Google Cloud Storage URI of the jar file that contains the main class. Theexecution args are passed in as a sequence of named process arguments(
--key=value). --spark-python-script-file=SPARK_PYTHON_SCRIPT_FILE- The Google Cloud Storage URI of the main Python file to use as the driver. Mustbe a .py file.
--spark-sql-script=SPARK_SQL_SCRIPT- The SQL query text.
--spark-sql-script-file=SPARK_SQL_SCRIPT_FILE- A reference to a query file. This can be the Google Cloud Storage URI of thequery file or it can the path to a SqlScript Content.
- Compute resources needed for a Task when using Dataproc Serverless.
--batch-executors-count=BATCH_EXECUTORS_COUNT- Total number of job executors.
--batch-max-executors-count=BATCH_MAX_EXECUTORS_COUNT- Max configurable executors. If max_executors_count > executors_count, thenauto-scaling is enabled.
- Container Image Runtime Configuration.
--container-image=CONTAINER_IMAGE- Optional custom container image for the job.
--container-image-java-jars=[CONTAINER_IMAGE_JAVA_JARS,…]- A list of Java JARS to add to the classpath. Valid input includes Cloud StorageURIs to Jar binaries. For example, gs://bucket-name/my/path/to/file.jar
--container-image-properties=[KEY=VALUE,…]- The properties to set on daemon config files. Property keys are specified inprefix:property format, for example core:hadoop.tmp.dir. For more information,see Cluster properties(https://cloud.google.com/dataproc/docs/concepts/cluster-properties)
--container-image-python-packages=[CONTAINER_IMAGE_PYTHON_PACKAGES,…]- A list of python packages to be installed. Valid formats include Cloud StorageURI to a PIP installable library. For example,gs://bucket-name/my/path/to/lib.tar.gz
- Cloud VPC Network used to run the infrastructure.
--vpc-network-tags=[VPC_NETWORK_TAGS,…]- List of network tags to apply to the job.
- The Cloud VPC network identifier.At most one of these can be specified:
--vpc-network-name=VPC_NETWORK_NAME- The Cloud VPC network in which the job is run. By default, the Cloud VPC networknamed Default within the project is used.
--vpc-sub-network-name=VPC_SUB_NETWORK_NAME- The Cloud VPC sub-network in which the job is run.
- Spec related to Dataplex task scheduling and frequency settings.This must be specified.
--trigger-type=TRIGGER_TYPE- Trigger type of the user-specified Dataplex Task.
TRIGGER_TYPEmust be one of:on-demand- The
ON_DEMANDtrigger type runs the Dataplex task one time shortlyafter task creation. recurring- The
RECURRINGtrigger type makes the task scheduled to runperiodically.
--trigger-disabled- Prevent the task from executing. This does not cancel already running tasks. Itis intended to temporarily disable RECURRING tasks.
--trigger-max-retires=TRIGGER_MAX_RETIRES- Number of retry attempts before aborting. Set to zero to never attempt to retrya failed task.
--trigger-schedule=TRIGGER_SCHEDULE- Cron schedule (https://en.wikipedia.org/wiki/Cron) for running tasksperiodically.
--trigger-start-time=TRIGGER_START_TIME- The first run of the task begins after this time. If not specified, an ON_DEMANDtask runs when it is submitted and a RECURRING task runs based on the triggerschedule.
- Spec related to how a task is executed.This must be specified.
- OPTIONAL FLAGS
--async- Return immediately, without waiting for the operation in progress to complete.
--description=DESCRIPTION- Description of the Dataplex task.
--display-name=DISPLAY_NAME- Display name of the Dataplex task.
--labels=[KEY=VALUE,…]- List of label KEY=VALUE pairs to add.
Keys must start with a lowercase character and contain only hyphens(
-), underscores (_), lowercase characters, andnumbers. Values must contain only hyphens (-), underscores(_), lowercase characters, and numbers.
- GCLOUD WIDE FLAGS
- These flags are available to all commands:
--access-token-file,--account,--billing-project,--configuration,--flags-file,--flatten,--format,--help,--impersonate-service-account,--log-http,--project,--quiet,--trace-token,--user-output-enabled,--verbosity.Run
$gcloud helpfor details. - API REFERENCE
- This command uses the
dataplex/v1API. The full documentation forthis API can be found at:https://cloud.google.com/dataplex/docs - NOTES
- This variant is also available:
gcloudalphadataplextaskscreate
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-01-21 UTC.