Cloud Spanner to Cloud Storage template
Use the Serverless for Apache Spark Cloud Spanner to Cloud Storage template to extractdata from Spanner databases to Cloud Storage.
Use the template
Run the template using the gcloud CLI or DataprocAPI.
gcloud
Before using any of the command data below, make the following replacements:
- PROJECT_ID: Required. Your Google Cloud project ID listed in theIAM Settings.
- REGION: Required.Compute Engine region.
- SUBNET: Optional. If a subnet is not specified, the subnet in the specified REGION in the
defaultnetwork is selected.Example:
projects/PROJECT_ID/regions/REGION/subnetworks/SUBNET_NAME - TEMPLATE_VERSION: Required. Specify
latestfor the latest template version, or the date of a specific version, for example,2023-03-17_v0.1.0-beta(visitgs://dataproc-templates-binaries or rungcloud storage ls gs://dataproc-templates-binariesto list available template versions). - INSTANCE: Required. Spanner instance ID.
- DATABASE: Required. Spanner database ID.
- TABLE: Required. Spanner input table name or a SQL query on the Spanner input table.
Example (the SQL query should be within parentheses):
(select * fromTABLE) - SPANNER_JDBC_DIALECT: Required. Spanner JDBC dialect. Options:
googlesqlorpostgresql. Defaults togooglesql. - CLOUD_STORAGE_OUTPUT_PATH: Required. Cloud Storage path where output will be stored.
Example:
gs://example-bucket/example-folder/ - FORMAT: Required. Output data format. Options:
avro,parquet,csv, orjson.Note: Ifavro, you must add "file:///usr/lib/spark/connector/spark-avro.jar" to thejarsgcloud CLI flag or API field.Example (the
file://prefix references a Serverless for Apache Spark jar file):--jars=file:///usr/lib/spark/connector/spark-avro.jar,[ ... other jars] - MODE: Required. Write mode for Cloud Storage output. Options:
append,overwrite,ignore, orerrorifexists. - NUM_PARTITIONS: Optional. The maximum number of partitions that can be used for parallelism of table reads and writes.
- INPUT_PARTITION_COLUMN,LOWERBOUND,UPPERBOUND: Optional. If used, all of the following parameters must be specified:
- INPUT_PARTITION_COLUMN: Spanner input table partition column name.
- LOWERBOUND: Spanner input table partition column lower bound used to determine the partition stride.
- UPPERBOUND: Spanner input table partition column upper bound used to decide the partition stride.
- TEMP_VIEW andTEMP_QUERY: Optional. You can use these two optional parameters to apply a Spark SQL transformation while loading data into Cloud Storage. TEMP_VIEW must be the same as table name used in query, and TEMP_QUERY is the query statement.
- SERVICE_ACCOUNT: Optional. If not provided, thedefault Compute Engine service account is used.
- PROPERTY andPROPERTY_VALUE: Optional. Comma-separated list ofSpark property=
valuepairs. - LABEL andLABEL_VALUE: Optional. Comma-separated list of
label=valuepairs. - LOG_LEVEL: Optional. Level of logging. Can be one of
ALL,DEBUG,ERROR,FATAL,INFO,OFF,TRACE, orWARN. Default:INFO. - KMS_KEY: Optional. The Cloud Key Management Service key to use for encryption. If a key is not specified, data isencrypted at rest using a Google-owned and Google-managed encryption key.
Example:
projects/PROJECT_ID/regions/REGION/keyRings/KEY_RING_NAME/cryptoKeys/KEY_NAME
Execute the following command:
Linux, macOS, or Cloud Shell
Note: Ensure you have initialized the Google Cloud CLI with authentication and a project by running eithergcloud init; orgcloud auth login andgcloud config set project.gclouddataprocbatchessubmitspark\--class=com.google.cloud.dataproc.templates.main.DataProcTemplate\--version="1.2"\--project="PROJECT_ID"\--region="REGION"\--jars="gs://dataproc-templates-binaries/TEMPLATE_VERSION/java/dataproc-templates.jar"\--subnet="SUBNET"\--kms-key="KMS_KEY"\--service-account="SERVICE_ACCOUNT"\--properties="PROPERTY=PROPERTY_VALUE"\--labels="LABEL=LABEL_VALUE"\----template=SPANNERTOGCS\--templatePropertylog.level="LOG_LEVEL"\--templatePropertyproject.id="PROJECT_ID"\--templatePropertyspanner.gcs.input.spanner.id="INSTANCE"\--templatePropertyspanner.gcs.input.database.id="DATABASE"\--templatePropertyspanner.gcs.input.table.id="TABLE"\--templatePropertyspanner.gcs.output.gcs.path="CLOUD_STORAGE_OUTPUT_PATH"\--templatePropertyspanner.gcs.output.gcs.saveMode="MODE"\--templatePropertyspanner.gcs.output.gcs.format="FORMAT"\--templatePropertyspanner.gcs.input.sql.partitionColumn="INPUT_PARTITION_COLUMN"\--templatePropertyspanner.gcs.input.sql.lowerBound="LOWERBOUND"\--templatePropertyspanner.gcs.input.sql.upperBound="UPPERBOUND"\--templatePropertyspanner.spanner.gcs.input.sql.numPartitions="NUM_PARTITIONS"\--templatePropertyspanner.gcs.temp.table="TEMP_VIEW"\--templatePropertyspanner.gcs.temp.query="TEMP_QUERY"\--templatePropertyspanner.jdbc.dialect="SPANNER_JDBC_DIALECT"
Windows (PowerShell)
Note: Ensure you have initialized the Google Cloud CLI with authentication and a project by running eithergcloud init; orgcloud auth login andgcloud config set project.gclouddataprocbatchessubmitspark`--class=com.google.cloud.dataproc.templates.main.DataProcTemplate`--version="1.2"`--project="PROJECT_ID"`--region="REGION"`--jars="gs://dataproc-templates-binaries/TEMPLATE_VERSION/java/dataproc-templates.jar"`--subnet="SUBNET"`--kms-key="KMS_KEY"`--service-account="SERVICE_ACCOUNT"`--properties="PROPERTY=PROPERTY_VALUE"`--labels="LABEL=LABEL_VALUE"`----template=SPANNERTOGCS`--templatePropertylog.level="LOG_LEVEL"`--templatePropertyproject.id="PROJECT_ID"`--templatePropertyspanner.gcs.input.spanner.id="INSTANCE"`--templatePropertyspanner.gcs.input.database.id="DATABASE"`--templatePropertyspanner.gcs.input.table.id="TABLE"`--templatePropertyspanner.gcs.output.gcs.path="CLOUD_STORAGE_OUTPUT_PATH"`--templatePropertyspanner.gcs.output.gcs.saveMode="MODE"`--templatePropertyspanner.gcs.output.gcs.format="FORMAT"`--templatePropertyspanner.gcs.input.sql.partitionColumn="INPUT_PARTITION_COLUMN"`--templatePropertyspanner.gcs.input.sql.lowerBound="LOWERBOUND"`--templatePropertyspanner.gcs.input.sql.upperBound="UPPERBOUND"`--templatePropertyspanner.spanner.gcs.input.sql.numPartitions="NUM_PARTITIONS"`--templatePropertyspanner.gcs.temp.table="TEMP_VIEW"`--templatePropertyspanner.gcs.temp.query="TEMP_QUERY"`--templatePropertyspanner.jdbc.dialect="SPANNER_JDBC_DIALECT"
Windows (cmd.exe)
Note: Ensure you have initialized the Google Cloud CLI with authentication and a project by running eithergcloud init; orgcloud auth login andgcloud config set project.gclouddataprocbatchessubmitspark^--class=com.google.cloud.dataproc.templates.main.DataProcTemplate^--version="1.2"^--project="PROJECT_ID"^--region="REGION"^--jars="gs://dataproc-templates-binaries/TEMPLATE_VERSION/java/dataproc-templates.jar"^--subnet="SUBNET"^--kms-key="KMS_KEY"^--service-account="SERVICE_ACCOUNT"^--properties="PROPERTY=PROPERTY_VALUE"^--labels="LABEL=LABEL_VALUE"^----template=SPANNERTOGCS^--templatePropertylog.level="LOG_LEVEL"^--templatePropertyproject.id="PROJECT_ID"^--templatePropertyspanner.gcs.input.spanner.id="INSTANCE"^--templatePropertyspanner.gcs.input.database.id="DATABASE"^--templatePropertyspanner.gcs.input.table.id="TABLE"^--templatePropertyspanner.gcs.output.gcs.path="CLOUD_STORAGE_OUTPUT_PATH"^--templatePropertyspanner.gcs.output.gcs.saveMode="MODE"^--templatePropertyspanner.gcs.output.gcs.format="FORMAT"^--templatePropertyspanner.gcs.input.sql.partitionColumn="INPUT_PARTITION_COLUMN"^--templatePropertyspanner.gcs.input.sql.lowerBound="LOWERBOUND"^--templatePropertyspanner.gcs.input.sql.upperBound="UPPERBOUND"^--templatePropertyspanner.spanner.gcs.input.sql.numPartitions="NUM_PARTITIONS"^--templatePropertyspanner.gcs.temp.table="TEMP_VIEW"^--templatePropertyspanner.gcs.temp.query="TEMP_QUERY"^--templatePropertyspanner.jdbc.dialect="SPANNER_JDBC_DIALECT"
REST
Before using any of the request data, make the following replacements:
- PROJECT_ID: Required. Your Google Cloud project ID listed in theIAM Settings.
- REGION: Required.Compute Engine region.
- SUBNET: Optional. If a subnet is not specified, the subnet in the specified REGION in the
defaultnetwork is selected.Example:
projects/PROJECT_ID/regions/REGION/subnetworks/SUBNET_NAME - TEMPLATE_VERSION: Required. Specify
latestfor the latest template version, or the date of a specific version, for example,2023-03-17_v0.1.0-beta(visitgs://dataproc-templates-binaries or rungcloud storage ls gs://dataproc-templates-binariesto list available template versions). - INSTANCE: Required. Spanner instance ID.
- DATABASE: Required. Spanner database ID.
- TABLE: Required. Spanner input table name or a SQL query on the Spanner input table.
Example (the SQL query should be within parentheses):
(select * fromTABLE) - SPANNER_JDBC_DIALECT: Required. Spanner JDBC dialect. Options:
googlesqlorpostgresql. Defaults togooglesql. - CLOUD_STORAGE_OUTPUT_PATH: Required. Cloud Storage path where output will be stored.
Example:
gs://example-bucket/example-folder/ - FORMAT: Required. Output data format. Options:
avro,parquet,csv, orjson.Note: Ifavro, you must add "file:///usr/lib/spark/connector/spark-avro.jar" to thejarsgcloud CLI flag or API field.Example (the
file://prefix references a Serverless for Apache Spark jar file):--jars=file:///usr/lib/spark/connector/spark-avro.jar,[ ... other jars] - MODE: Required. Write mode for Cloud Storage output. Options:
append,overwrite,ignore, orerrorifexists. - NUM_PARTITIONS: Optional. The maximum number of partitions that can be used for parallelism of table reads and writes.
- INPUT_PARTITION_COLUMN,LOWERBOUND,UPPERBOUND: Optional. If used, all of the following parameters must be specified:
- INPUT_PARTITION_COLUMN: Spanner input table partition column name.
- LOWERBOUND: Spanner input table partition column lower bound used to determine the partition stride.
- UPPERBOUND: Spanner input table partition column upper bound used to decide the partition stride.
- TEMP_VIEW andTEMP_QUERY: Optional. You can use these two optional parameters to apply a Spark SQL transformation while loading data into Cloud Storage. TEMP_VIEW must be the same as table name used in query, and TEMP_QUERY is the query statement.
- SERVICE_ACCOUNT: Optional. If not provided, thedefault Compute Engine service account is used.
- PROPERTY andPROPERTY_VALUE: Optional. Comma-separated list ofSpark property=
valuepairs. - LABEL andLABEL_VALUE: Optional. Comma-separated list of
label=valuepairs. - LOG_LEVEL: Optional. Level of logging. Can be one of
ALL,DEBUG,ERROR,FATAL,INFO,OFF,TRACE, orWARN. Default:INFO. - KMS_KEY: Optional. The Cloud Key Management Service key to use for encryption. If a key is not specified, data isencrypted at rest using a Google-owned and Google-managed encryption key.
Example:
projects/PROJECT_ID/regions/REGION/keyRings/KEY_RING_NAME/cryptoKeys/KEY_NAME
HTTP method and URL:
POST https://dataproc.googleapis.com/v1/projects/PROJECT_ID/locations/REGION/batches
Request JSON body:
{ "environmentConfig":{ "executionConfig":{ "subnetworkUri":"SUBNET", "kmsKey": "KMS_KEY", "serviceAccount": "SERVICE_ACCOUNT" } }, "labels": { "LABEL": "LABEL_VALUE" }, "runtimeConfig": { "version": "1.2", "properties": { "PROPERTY": "PROPERTY_VALUE" } }, "sparkBatch":{ "mainClass":"com.google.cloud.dataproc.templates.main.DataProcTemplate", "args":[ "--template","SPANNERTOGCS", "--templateProperty","log.level=LOG_LEVEL", "--templateProperty","project.id=PROJECT_ID", "--templateProperty","spanner.gcs.input.spanner.id=INSTANCE", "--templateProperty","spanner.gcs.input.database.id=DATABASE", "--templateProperty","spanner.gcs.input.table.id=TABLE", "--templateProperty","spanner.gcs.output.gcs.path=CLOUD_STORAGE_OUTPUT_PATH", "--templateProperty","spanner.gcs.output.gcs.saveMode=MODE", "--templateProperty","spanner.gcs.output.gcs.format=FORMAT", "--templateProperty","spanner.gcs.input.sql.partitionColumn=INPUT_PARTITION_COLUMN", "--templateProperty","spanner.gcs.input.sql.lowerBound=LOWERBOUND", "--templateProperty","spanner.gcs.input.sql.upperBound=UPPERBOUND", "--templateProperty","spanner.gcs.input.sql.numPartitions=NUM_PARTITIONS", "--templateProperty","spanner.gcs.temp.table=TEMP_VIEW", "--templateProperty","spanner.gcs.temp.query=TEMP_QUERY", "--templateProperty spanner.jdbc.dialect=SPANNER_JDBC_DIALECT" ], "jarFileUris":[ "file:///usr/lib/spark/connector/spark-avro.jar", "gs://dataproc-templates-binaries/TEMPLATE_VERSION/java/dataproc-templates.jar" ] }}To send your request, expand one of these options:
curl (Linux, macOS, or Cloud Shell)
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json, and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://dataproc.googleapis.com/v1/projects/PROJECT_ID/locations/REGION/batches"
PowerShell (Windows)
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json, and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://dataproc.googleapis.com/v1/projects/PROJECT_ID/locations/REGION/batches" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_ID/regions/REGION/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.dataproc.v1.BatchOperationMetadata", "batch": "projects/PROJECT_ID/locations/REGION/batches/BATCH_ID", "batchUuid": "de8af8d4-3599-4a7c-915c-798201ed1583", "createTime": "2023-02-24T03:31:03.440329Z", "operationType": "BATCH", "description": "Batch" }}Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.