Import and export data in CSV format

This page describes how to export data from Spanner to CSV files orimport data from CSV files into Spanner GoogleSQL-dialect databases orPostgreSQL-dialect databases.

If you want to import a Spanner database that you previouslyexported to Avro files in Cloud Storage, seeImportSpanner Avro files.
If you want to import Avro files from a non-Spanner database,seeImport data from non-Spanner databases.

The process usesDataflow. You can export data fromSpanner to aCloud Storage bucket, or you can import datainto Spanner from a Cloud Storage bucket that contains aJSON manifest file and a set of CSV files.

Note: To explore Spanner using a 90-day free trial instance,see Create a Spanner free trial instance.

Before you begin

To import or export a Spanner database,first you need to enable the Spanner, Cloud Storage,Compute Engine, and Dataflow APIs:

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

Enable the APIs

You also need enough quota and the required IAM permissions.

Quota requirements

The quota requirements for import or export jobs are as follows:

Spanner: You must have enoughcompute capacityto support the amount of data that you are importing. No additionalcompute capacity is required to import or export a database, though you might needto add more compute capacity so that your job finishes in a reasonable amountof time. SeeOptimize jobs for more details.
Cloud Storage: To import, you must have a bucketcontaining your previously exported files.To export, you must create a bucket for your exported files ifyou don't already have one. You can do this in the Google Cloud console,either through the Cloud Storage page or while creating your exportthrough the Spanner page.You don't need to set a size for your bucket.
Dataflow: Import or export jobs are subject to the same CPU,disk usage, and IP addressCompute Engine quotas asother Dataflow jobs.
Compute Engine: Before running your import or export job, you mustset upinitial quotas forCompute Engine, which Dataflow uses. These quotasrepresent themaximum number of resources that you allowDataflow to use for your job. Recommended starting values are:
- CPUs: 200
- In-use IP addresses: 200
- Standard persistent disk: 50 TB
Generally, you don't have to make any other adjustments.Dataflow provides autoscaling so that you only pay for theactual resources used during the import or export. If your job can make use ofmore resources, the Dataflow UI displays a warning icon. Thejob should finish even if there is a warning icon.

Required roles

To get the permissions that you need to export a database, ask your administrator to grant you the following IAM roles on your Dataflow worker service account:

Cloud Spanner Viewer (roles/spanner.viewer)
Dataflow Worker (roles/dataflow.worker)
Storage Admin (roles/storage.admin)
Spanner Database Reader (roles/spanner.databaseReader)
Database Admin (roles/spanner.databaseAdmin)

Note: The Spanner Database Admin role is only required for import jobs.

Export Spanner data to CSV files

To export data from Spanner to CSV files inCloud Storage, follow the instructions for using the Google Cloud CLI torun a job with the Spanner to Cloud Storage Texttemplate.

You can also refer to the information in this page aboutoptimizing slow jobs, andfactors affecting jobperformance.

Note: Neither change stream data nor the values in storedgeneratedcolumns are exported.

Import data from CSV files into Spanner

The process to import data from CSV files includes the following steps:

Export your data to CSV files and store those files inCloud Storage. Don't include a header line.
Create a JSON manifest file and store the file along with your CSV files.
Create empty target tables in your Spanner databaseorensure that the data types for columns in your CSV files match anycorresponding columns in your existing tables.
Run your import job.

Note: The dataflow template can't handle a CSV file with headers.

Step 1: Export data from a non-Spanner database to CSV files

The import process brings data in from CSV files located in aCloud Storage bucket. You can export data in CSV format from anysource.

Keep the following things in mind when exporting your data:

Text files to be imported must be in CSV format.
Data must match one of the following types:

GoogleSQL

BOOLINT64FLOAT64NUMERICSTRINGDATETIMESTAMPBYTESJSON

PostgreSQL

booleanbigintdouble precisionnumericcharacter varying, textdatetimestamp with time zonebytea

You don't have to include or generate any metadata when you export the CSVfiles.
You don't have to follow any particular naming convention for your files.

If you don't export your files directly to Cloud Storage, you mustupload the CSV files to a Cloud Storage bucket.

Step 2: Create a JSON manifest file

You must also create a manifest file with a JSON description of files to importand place it in the same Cloud Storage bucket where you stored your CSVfiles. This manifest file contains atables array that lists the name and datafile locations for each table. The file also specifies the receiving database dialect.If the dialect is omitted, it defaults to GoogleSQL.

Note: If a table has generated columns, the manifest must include anexplicit list of the non-generated columns to import for that table. Spanner uses this list to map CSV columns to the correct table columns.Generated column values automatically computed during import.

The format of the manifest file corresponds to the following message type, shown here inprotocol buffer format:

messageImportManifest{// The per-table import manifest.messageTableManifest{// Required. The name of the destination table.stringtable_name=1;// Required. The CSV files to import. This value can be either a filepath or a glob pattern.repeatedstringfile_patterns=2;// The schema for a table column.messageColumn{// Required for each Column that you specify. The name of the column in the// destination table.stringcolumn_name=1;// Required for each Column that you specify. The type of the column.stringtype_name=2;}// Optional. The schema for the table columns.repeatedColumncolumns=3;}// Required. The TableManifest of the tables to be imported.repeatedTableManifesttables=1;enumProtoDialect{GOOGLE_STANDARD_SQL=0;POSTGRESQL=1;}// Optional. The dialect of the receiving database. Defaults to GOOGLE_STANDARD_SQL.ProtoDialectdialect=2;}

The following example shows a manifest file for importing tables calledAlbums andSingers into a GoogleSQL-dialect database. TheAlbums table uses the column schema that the job retrieves from the database, and theSingers table uses the schema that the manifest file specifies:

{"tables":[{"table_name":"Albums","file_patterns":["gs://bucket1/Albums_1.csv","gs://bucket1/Albums_2.csv"]},{"table_name":"Singers","file_patterns":["gs://bucket1/Singers*.csv"],"columns":[{"column_name":"SingerId","type_name":"INT64"},{"column_name":"FirstName","type_name":"STRING"},{"column_name":"LastName","type_name":"STRING"}]}]}

Step 3: Create the table for your Spanner database

Before you run your import, you must create the target tables in yourSpanner database. If the target Spanner table already has aschema, any columns specified in the manifest filemust have the same datatypes as the corresponding columns in the target table's schema.

We recommend that you create secondary indexes, foreign keys, and changestreams after you import your data into Spanner, not whenyou initially create the table. If your table already contains thesestructures, then we recommend dropping them and re-creating them afteryou import your data.

Step 4: Run a Dataflow import job using gcloud

To start your import job, follow the instructions for using the Google Cloud CLIto run a job with theCloud Storage Text to Spanner template.

After you have started an import job, you cansee details about the job inthe Google Cloud console.

After the import job is finished, add any necessarysecondary indexes,foreign keys, andchange streams.

Note: To avoid outbound data transfer charges,choose a region that overlapswith your Cloud Storage bucket's location.

Choose a region for your import job

You might want to choose a different region based on the location of yourCloud Storage bucket. To avoidoutbound data transfer charges, choose a region thatmatches your Cloud Storage bucket's location.

If your Cloud Storage bucket location is aregion, youcan take advantage offree network usage by choosing thesame region for your import job, assuming that region is available.
If your Cloud Storage bucket location is adual-region,you can take advantage offree network usage by choosingone of the two regions that make up the dual-region for your import job,assuming one of the regions is available.

If a co-located region is not available for your import job, or if yourCloud Storage bucket location is amulti-region,outbound data transfer charges apply. Refer to Cloud Storagedata transfer pricing to choose a region that incurs thelowest data transfer charges.

View or troubleshoot jobs in the Dataflow UI

After you start an import or export job, you can view details of the job, includinglogs, in the Dataflow section of the Google Cloud console.

View Dataflow job details

To see details for any import or export jobs that you ran within the last week,including any jobs that are running now:

Navigate to theDatabase overview page for the database.
Click theImport/Export left pane menu item. The databaseImport/Export page displays a list of recent jobs.
In the databaseImport/Export page, click the job name in theDataflow job name column:
The Google Cloud console displays details of the Dataflowjob.

To view a job that you ran more than one week ago:

Go to the Dataflow jobs page in the Google Cloud console.
Go to Jobs
Find your job in the list, then click its name.
The Google Cloud console displays details of the Dataflowjob.

Note: Jobs of the same type for the same database have the same name. You cantell jobs apart by the values in theirStart time orEnd time columns.

View Dataflow logs for your job

To view a Dataflow job's logs, navigate to the job's detailspage, then clickLogs to the right of the job's name.

If a job fails, look for errors in the logs. If there are errors, the errorcount displays next toLogs:

To view job errors:

Click the error count next toLogs.
The Google Cloud console displays the job's logs. You may need toscroll to see the errors.
Locate entries with the error icon.
Click an individual log entry to expand its contents.

For more information about troubleshooting Dataflow jobs, see Troubleshoot your pipeline.

Troubleshoot failed import or export jobs

If you see the following errors in your job logs:

com.google.cloud.spanner.SpannerException: NOT_FOUND: Session not found--or--com.google.cloud.spanner.SpannerException: DEADLINE_EXCEEDED: Deadline expired before operation could complete.

Check the99% Read/Write latency in theMonitoring tab of your Spanner database in theGoogle Cloud console. If it is showing high (multiple second) values, thenit indicates that the instance is overloaded, causing reads/writes totimeout and fail.

One cause of high latency is that the Dataflow job is runningusing too many workers, putting too much load on the Spannerinstance.

To specify a limit on the number of Dataflow workers:

Console

If you are using the Dataflow console, theMax workers parameter is located in theOptional parameters section of theCreate job from template page.

Go to Dataflow

gcloud

Run thegcloud dataflow jobs run command, and specify themax-workers argument. For example:

gclouddataflowjobsrunmy-import-job\--gcs-location='gs://dataflow-templates/latest/GCS_Text_to_Cloud_Spanner' \    --region=us-central1 \    --parameters='instanceId=test-instance,databaseId=example-db,inputDir=gs://my-gcs-bucket' \    --max-workers=10 \    --network=network-123

Troubleshoot network error

The following error might occur when you export your Spannerdatabases:

Workflow failed. Causes: Error: Message: Invalid value for field'resource.properties.networkInterfaces[0].subnetwork': ''. Network interfacemust specify a subnet if the network resource is in custom subnet mode.HTTP Code: 400

This error occurs because Spanner assumes that you intend to usean auto mode VPC network nameddefault in the same project as theDataflow job. If you don't have a default VPC network in theproject, or if your VPC network is in a custom mode VPC network, then you mustcreate a Dataflow job andspecify an alternate network or subnetwork.

Optimize slow running import or export jobs

If you have followed the suggestions ininitial settings, you shouldgenerally not have to make any other adjustments. If your job is running slowly,there are a few other optimizations you can try:

Optimize the job and data location: Run your Dataflow jobin the same region where your Spanner instance andCloud Storage bucket are located.
Ensure sufficient Dataflow resources: If therelevant Compute Engine quotaslimit your Dataflow job's resources, the job'sDataflow page in the Google Cloud consoledisplays a warning icon and logmessages:
In this situation,increasing the quotas forCPUs, in-use IP addresses, and standard persistent disk might shorten therun time of the job, but you might incur more Compute Enginecharges.
Check the Spanner CPU utilization: If you see that the CPUutilization for the instance is over 65%, you canincrease thecompute capacity in that instance. The capacity adds moreSpanner resources and the job should speed up, but you incur moreSpanner charges.

Factors affecting import or export job performance

Several factors influence the time it takes to complete an import or export job.

Spanner database size: Processing more data takes more timeand resources.
Spanner database schema, including:
- The number of tables
- The size of the rows
- The number of secondary indexes
- The number of foreign keys
- The number of change streams

Data location: Data is transferred between Spanner andCloud Storage using Dataflow. Ideally all threecomponents are located in the same region. If the components are not in the sameregion, moving the data across regions slows the job down.
Number of Dataflow workers: Optimal Dataflowworkers are necessary for good performance. By using autoscaling,Dataflow chooses the number of workers forthe job depending on the amount of work that needs to be done. The number ofworkers will, however, be capped by the quotas for CPUs, in-use IP addresses,and standard persistent disk. The Dataflow UI displays a warningicon if it encounters quota caps. In this situation, progress is slower, but thejob should still complete. Autoscaling can overload Spanner leading to errors when there is alarge amount of data to import.
Existing load on Spanner: An import job addssignificant CPU load on a Spanner instance.An export job typically adds a light load on a Spannerinstance.If the instance already has a substantial existing load, then the job runs moreslowly.
Amount of Spanner compute capacity: If the CPU utilization for the instanceis over 65%, then the job runs more slowly.

Tune workers for good import performance

When starting a Spanner import job, Dataflowworkers must be set to an optimal value for good performance. Too many workersoverloads Spanner and too few workers results in an underwhelmingimport performance.

The maximum number of workers is heavily dependent on the data size,but ideally, the total Spanner CPU utilization should be between70% to 90%. This provides a good balance between Spannerefficiency and error-free job completion.

To achieve that utilization target in the majority of schemas and scenarios, werecommend a max number of worker vCPUs between 4-6x the number ofSpanner nodes.

For example, for a 10 node Spanner instance, using n1-standard-2workers, you would set max workers to 25, giving 50 vCPUs.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

Import and export data in CSV format Stay organized with collections Save and categorize content based on your preferences.