Export databases from Spanner to Avro

This page describes how to export Spanner databases with theGoogle Cloud console.

To export a Spanner database using the REST APIor the Google Cloud CLI, complete the steps in theBefore you begin section on this page, and then see thedetailed instructions inSpanner to Cloud Storage Avro in theDataflow documentation.The export process usesDataflow and writes data to a folder in aCloud Storage bucket. The resulting folder contains a set ofAvro files and JSON manifest files.

Before you begin

To export a Spanner database,first you need to enable the Spanner, Cloud Storage,Compute Engine, and Dataflow APIs:

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

Enable the APIs

You also need enough quota and the required IAM permissions.

Quota requirements

The quota requirements for export jobs are as follows:

Spanner: No additionalcompute capacity is required to export a database, though you might needto add more compute capacity so that your job finishes in a reasonable amountof time. SeeOptimize jobs for more details.
Cloud Storage: To export, you must create a bucket for your exported files ifyou don't already have one. You can do this in the Google Cloud console,either through the Cloud Storage page or while creating your exportthrough the Spanner page.You don't need to set a size for your bucket.
Dataflow: Export jobs are subject to the same CPU,disk usage, and IP addressCompute Engine quotas asother Dataflow jobs.
Compute Engine: Before running your export job, you mustset upinitial quotas forCompute Engine, which Dataflow uses. These quotasrepresent themaximum number of resources that you allowDataflow to use for your job. Recommended starting values are:
- CPUs: 200
- In-use IP addresses: 200
- Standard persistent disk: 50 TB
Generally, you don't have to make any other adjustments.Dataflow provides autoscaling so that you only pay for theactual resources used during the export. If your job can make use ofmore resources, the Dataflow UI displays a warning icon. Thejob should finish even if there is a warning icon.

Required roles

To get the permissions that you need to export a database, ask your administrator to grant you the following IAM roles on your Dataflow worker service account:

Cloud Spanner Viewer (roles/spanner.viewer)
Dataflow Worker (roles/dataflow.worker)
Storage Admin (roles/storage.admin)
Spanner Database Reader (roles/spanner.databaseReader)
Database Admin (roles/spanner.databaseAdmin)

Note: The Spanner Database Admin role is only required for import jobs.

To use the independent compute resources of Spanner Data Boost during an export,you also need thespanner.databases.useDataBoost IAMpermission. For more information, see Data Boost overview.

Export a database

After you satisfy the quota and IAM requirements describedpreviously, you can export an existing Spanner database.

Note: It is safe to export a database that is in use. However, any changes youmake after Dataflow creates a Spanner transactionaren't captured by the export, because all of the exported data is from the samedatabase snapshot.

To export your Spanner database to a Cloud Storagebucket, follow these steps:

Go to the SpannerInstances page.
Go to Instances
Click the name of the instance that contains your database.
Click theImport/Export menu item in the left pane and then click theExport button.
UnderChoose where to store your export, clickBrowse.
If you don't already have a Cloud Storage bucket for your export:
1. ClickNew bucket.
2. Enter a name for your bucket. Bucket namesmust be unique acrossCloud Storage.
3. Select a default storage class and location, then clickCreate.
4. Click your bucket to select it.
If you already have a bucket, either select the bucket from the initial listor clickSearchto filter the list, then click your bucket to select it.
ClickSelect.
Select the database that you want to export in theChoose a database toexport drop-down menu.
Optional: To export your database from an earlier point in time, check thebox and enter a timestamp.
Select a region in theChoose a region for the export job drop-downmenu.
Note: To avoid data transfer charges, choose a region that overlapswith your instance's configuration. For more information, seeChoose a region.
Optional: To encrypt the Dataflow pipeline state with acustomer-managed encryption key:
1. ClickShow encryption options.
2. SelectUse a customer-managed encryption key (CMEK).
3. Select your key from the drop-down list.
This option does not affect the destination Cloud Storage bucket-levelencryption. To enable CMEK for your Cloud Storage bucket, refer toUse CMEK with Cloud Storage.
Optional: To export using Spanner Data Boost, select theUse Spanner Data Boost checkbox. For more information, seeData Boost overview.
Select the checkbox underConfirm charges to acknowledge that there arecharges in addition to those incurred by your existing Spanner instance.
ClickExport.
The Google Cloud console displays theDatabase Import/Export page,which now shows a line item for your export job in the Import/Export jobslist, including the job's elapsed time:

When the job finishes or terminates, the status is updated in the Import/Exportlist. If the job succeeded, the statusSucceeded is displayed:

Export job success message

If the job failed, the statusFailed is displayed:

Export job failure message

To view the details of the Dataflow operation for your job, clickthe job's name in theDataflow job name column.

If your job fails,check the job's Dataflow logs for errordetails.

To avoid Cloud Storage charges for files yourfailed export job created, delete the folder and its files. SeeView yourexport for information on how to find the folder.

A note on exporting generated columns and change streams

The values in a storedgenerated column are not exported. The columndefinition is exported to the Avro schema as a null-type record field, with thecolumn definition as custom properties of the field. Until the backfilloperation of a newly added generated column completes, the generated column isignored as if it doesn't exist in the schema.

Change streams exported as Avro files contain only the schema of thechange streams, and not any data change records.

A note on exporting sequences

Sequences (GoogleSQL,PostgreSQL)are schema objects that you use to generate unique integer values.Spanner exports each of the schema object to the Avro schema as arecord field, with its sequence kind, skipped range, and counter as propertiesof the field. Note that to prevent a sequence from being reset and generatingduplicate values after import, during schema export, theGET_INTERNAL_SEQUENCE_STATE() (GoogleSQL,PostgreSQL)function captures the sequence counter. Spanner adds a buffer of1000 to the counter, and writes the new counter value to the record field. Thisapproach avoids duplicate value errors that might happen after import.If there are more writes to the source database during data export, you shouldadjust the actual sequence counter by using theALTER SEQUENCE(GoogleSQL,PostgreSQL)statement.

At import, the sequence starts from this new counter instead of the counterfound in the schema. Alternatively, you can use theALTER SEQUENCE(GoogleSQL,PostgreSQL)statement to update the sequence with a new counter.

View your export in Cloud Storage

To view the folder that contains your exported database in theGoogle Cloud console, navigate to the Cloud Storage browser andchoose the bucket you selected previously:

Go to Storage browser

The bucket now contains a folder with the exported database inside. The foldername begins with your instance's ID, database name, and the timestamp of yourexport job. The folder contains:

Aspanner-export.json file
ATableName-manifest.json file for each table in thedatabase you exported.
One or moreTableName.avro-#####-of-#####files. The first number in the extension.avro-#####-of-##### represents the indexof the Avro file, starting at zero, and the second represents the number ofAvro files generated for each table.
For example,Songs.avro-00001-of-00002 is the second of two files that contain the data for theSongs table.
AChangeStreamName-manifest.json file for eachchange stream in the database youexported.
AChangeStreamName.avro-00000-of-00001file for each change stream. This file contains empty datawith only the Avro schema of the change stream.

Choose a region for your import job

You might want to choose a different region based on the location of yourCloud Storage bucket. To avoidoutbound data transfer charges, choose a region thatmatches your Cloud Storage bucket's location.

If your Cloud Storage bucket location is aregion, youcan take advantage offree network usage by choosing thesame region for your import job, assuming that region is available.
If your Cloud Storage bucket location is adual-region,you can take advantage offree network usage by choosingone of the two regions that make up the dual-region for your import job,assuming one of the regions is available.

If a co-located region is not available for your import job, or if yourCloud Storage bucket location is amulti-region,outbound data transfer charges apply. Refer to Cloud Storagedata transfer pricing to choose a region that incurs thelowest data transfer charges.

Export a subset of tables

If you want to export only the data from certain tables, and not theentire database, then you can specify those tables during export. Inthis case, Spanner exports the database's entire schema, includingthe data of tables you specify, and leaving all other tables present butempty in the exported file.

You can specify a subset of tables to export using either theDataflow page in the Google Cloud console or thegcloud CLI. (The Spanner page doesn't provide this action.)

If you export the data of a table that is the child of another table, then youshould export itsparenttable's data as well. If parents are not exported, then the export job fails.

To export a subset of tables, start the export using Dataflow'sSpanner to Cloud Storage Avro template, andspecify the tables using either the Dataflow page inthe Google Cloud console or using the gcloud CLI, as described:

Console

If you are using the Dataflow page in Google Cloud console, theCloud Spanner Table name(s) parameter is located in theOptional parameters section of theCreate job from template page.Multiple tables can be specified in a comma-separated format.

Go to Dataflow

gcloud

Run thegcloud dataflow jobs runcommand, and specify thetableNames argument. For example:

gclouddataflowjobsrunmy-export-job\--gcs-location='gs://dataflow-templates/latest/Cloud_Spanner_to_GCS_Avro'\--region=us-central1\--parameters='instanceId=test-instance,databaseId=example-db,tableNames=table1,outputDir=gs://my-gcs-bucket'\--max-workers=10\--network=network-123

Specifying multiple tables in gcloud requiresdictionary-type argument escaping.The following example uses '|' as the escape character:

gclouddataflowjobsrunmy-export-job\--gcs-location='gs://dataflow-templates/latest/Cloud_Spanner_to_GCS_Avro'\--region=us-central1\--parameters='^|^instanceId=test-instance|databaseId=example-db|tableNames=table1,table2|outputDir=gs://my-gcs-bucket'\--max-workers=10\--network=network-123

TheshouldExportRelatedTables parameter is a convenient option toautomatically export allparent tablesof the chosen tables. For example, in thisschema hierarchywith tablesSingers,Albums andSongs, you only need to specifySongs. TheshouldExportRelatedTables option will also exportSingersandAlbums becauseSongs is a descendant of both.

gclouddataflowjobsrunmy-export-job\--gcs-location='gs://dataflow-templates/latest/Cloud_Spanner_to_GCS_Avro'\--region=us-central1\--parameters='instanceId=test-instance,databaseId=example-db,tableNames=Songs,shouldExportRelatedTables=true,outputDir=gs://my-gcs-bucket'\--max-workers=10\--network=network-123

View or troubleshoot jobs in the Dataflow UI

After you start an export job, you can view details of the job, includinglogs, in the Dataflow section of the Google Cloud console.

View Dataflow job details

To see details for any import or export jobs that you ran within the last week,including any jobs that are running now:

Navigate to theDatabase overview page for the database.
Click theImport/Export left pane menu item. The databaseImport/Export page displays a list of recent jobs.
In the databaseImport/Export page, click the job name in theDataflow job name column:
The Google Cloud console displays details of the Dataflowjob.

To view a job that you ran more than one week ago:

Go to the Dataflow jobs page in the Google Cloud console.
Go to Jobs
Find your job in the list, then click its name.
The Google Cloud console displays details of the Dataflowjob.

Note: Jobs of the same type for the same database have the same name. You cantell jobs apart by the values in theirStart time orEnd time columns.

View Dataflow logs for your job

To view a Dataflow job's logs, navigate to the job's detailspage, then clickLogs to the right of the job's name.

If a job fails, look for errors in the logs. If there are errors, the errorcount displays next toLogs:

To view job errors:

Click the error count next toLogs.
The Google Cloud console displays the job's logs. You may need toscroll to see the errors.
Locate entries with the error icon.
Click an individual log entry to expand its contents.

For more information about troubleshooting Dataflow jobs, see Troubleshoot your pipeline.

Troubleshoot failed export jobs

If you see the following errors in your job logs:

com.google.cloud.spanner.SpannerException: NOT_FOUND: Session not found--or--com.google.cloud.spanner.SpannerException: DEADLINE_EXCEEDED: Deadline expired before operation could complete.

Check the99% Read latency in theMonitoring tab of your Spanner database in theGoogle Cloud console. If it is showing high (multiple second) values, thenit indicates that the instance is overloaded, causing reads totimeout and fail.

One cause of high latency is that the Dataflow job is runningusing too many workers, putting too much load on the Spannerinstance.

To specify a limit on the number of Dataflow workers, instead ofusing the Import/Export tab in the instance details page of yourSpanner database in the Google Cloud console, you muststart the export using the DataflowSpanner to Cloud Storage Avro template and specify the maximum number of workers as described:

Console

If you are using the Dataflow console, theMax workers parameter is located in theOptional parameters section of theCreate job from template page.

Go to Dataflow

gcloud

Run thegcloud dataflow jobs run command, and specify themax-workers argument. For example:

gclouddataflowjobsrunmy-export-job\--gcs-location='gs://dataflow-templates/latest/Cloud_Spanner_to_GCS_Avro'\--region=us-central1\--parameters='instanceId=test-instance,databaseId=example-db,outputDir=gs://my-gcs-bucket'\--max-workers=10\--network=network-123

Troubleshoot network error

The following error might occur when you export your Spannerdatabases:

Workflow failed. Causes: Error: Message: Invalid value for field'resource.properties.networkInterfaces[0].subnetwork': ''. Network interfacemust specify a subnet if the network resource is in custom subnet mode.HTTP Code: 400

This error occurs because Spanner assumes that you intend to usean auto mode VPC network nameddefault in the same project as theDataflow job. If you don't have a default VPC network in theproject, or if your VPC network is in a custom mode VPC network, then you mustcreate a Dataflow job andspecify an alternate network or subnetwork.

Optimize slow running export jobs

If you have followed the suggestions ininitial settings, you shouldgenerally not have to make any other adjustments. If your job is running slowly,there are a few other optimizations you can try:

Optimize the job and data location: Run your Dataflow jobin the same region where your Spanner instance andCloud Storage bucket are located.
Ensure sufficient Dataflow resources: If therelevant Compute Engine quotaslimit your Dataflow job's resources, the job'sDataflow page in the Google Cloud consoledisplays a warning icon and logmessages:
In this situation,increasing the quotas forCPUs, in-use IP addresses, and standard persistent disk might shorten therun time of the job, but you might incur more Compute Enginecharges.
Check the Spanner CPU utilization: If you see that the CPUutilization for the instance is over 65%, you canincrease thecompute capacity in that instance. The capacity adds moreSpanner resources and the job should speed up, but you incur moreSpanner charges.

Factors affecting export job performance

Several factors influence the time it takes to complete an export job.

Spanner database size: Processing more data takes more timeand resources.
Spanner database schema, including:
- The number of tables
- The size of the rows
- The number of secondary indexes
- The number of foreign keys
- The number of change streams

Data location: Data is transferred between Spanner andCloud Storage using Dataflow. Ideally all threecomponents are located in the same region. If the components are not in the sameregion, moving the data across regions slows the job down.
Number of Dataflow workers: Optimal Dataflowworkers are necessary for good performance. By using autoscaling,Dataflow chooses the number of workers forthe job depending on the amount of work that needs to be done. The number ofworkers will, however, be capped by the quotas for CPUs, in-use IP addresses,and standard persistent disk. The Dataflow UI displays a warningicon if it encounters quota caps. In this situation, progress is slower, but thejob should still complete.
Existing load on Spanner: An export job typically adds a light load on a Spannerinstance.If the instance already has a substantial existing load, then the job runs moreslowly.
Amount of Spanner compute capacity: If the CPU utilization for the instanceis over 65%, then the job runs more slowly.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

Export databases from Spanner to Avro Stay organized with collections Save and categorize content based on your preferences.

Before you begin

Quota requirements

Required roles

Export a database

A note on exporting generated columns and change streams

A note on exporting sequences

View your export in Cloud Storage

Choose a region for your import job

Export a subset of tables

Console

gcloud

View or troubleshoot jobs in the Dataflow UI

View Dataflow job details

View Dataflow logs for your job

Troubleshoot failed export jobs

Console

gcloud

Troubleshoot network error

Optimize slow running export jobs

Factors affecting export job performance

Export databases from Spanner to Avro