Loading data from Datastore exports

BigQuery supports loading data fromDatastoreexports created using the Datastore managed import and exportservice. You can use the managed import and export service to exportDatastore entities into a Cloud Storage bucket. You can then loadthe export into BigQuery as a table.

To learn how to create a Datastore export file, seeExporting and importing entitiesin the Datastore documentation. For information onscheduling exports, seeScheduling an export.

Note: If you intend to load a Datastore export intoBigQuery, you must specify an entity filter in yourexport command.Data exported without specifying an entity filter cannot be loaded intoBigQuery.

You can control which properties BigQuery should load by settingtheprojectionFields propertyin the API or by using the--projection_fields flag in the bq command-line tool.

If you prefer to skip the loading process, you can query the export directly bysetting it up as an external data source. For more information, seeExternal data sources.

When you load data from Cloud Storage into a BigQuery table,the dataset that contains the table must be in the same region or multi-regionas the Cloud Storage bucket.

Limitations

When you load data into BigQuery from a Datastoreexport, note the following restrictions:

  • You cannot use a wildcard in the Cloud Storage URI when you specify aDatastore export file.
  • You can specify only one Cloud Storage URI when loading data fromDatastore exports.
  • You cannot append Datastore export data to an existing tablewith a defined schema.
  • For a Datastore export to load correctly, entities in theexport data must share a consistent schema with fewer than 10,000 uniqueproperty names.
  • Data exported without specifying an entity filter cannot be loaded intoBigQuery. The export request must include one or more kind namesin the entity filter.
  • The maximum field size for Datastoreexports is 64 KB. When you load a Datastore export, any fieldlarger than 64 KB is truncated.

Before you begin

Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document.

Required permissions

To load data into BigQuery, you need IAM permissions to run a load job and load data into BigQuery tables and partitions. If you are loading data from Cloud Storage, you also need IAM permissions to access the bucket that contains your data.

Permissions to load data into BigQuery

To load data into a new BigQuery table or partition or to append or overwrite an existing table or partition, you need the following IAM permissions:

  • bigquery.tables.create
  • bigquery.tables.updateData
  • bigquery.tables.update
  • bigquery.jobs.create

Each of the following predefined IAM roles includes the permissions that you need in order to load data into a BigQuery table or partition:

  • roles/bigquery.dataEditor
  • roles/bigquery.dataOwner
  • roles/bigquery.admin (includes thebigquery.jobs.create permission)
  • bigquery.user (includes thebigquery.jobs.create permission)
  • bigquery.jobUser (includes thebigquery.jobs.create permission)

Additionally, if you have thebigquery.datasets.create permission, you can create andupdate tables using a load job in the datasets that you create.

For more information on IAM roles and permissions inBigQuery, seePredefined roles and permissions.

Permissions to load data from Cloud Storage

To get the permissions that you need to load data from a Cloud Storage bucket, ask your administrator to grant you theStorage Admin (roles/storage.admin) IAM role on the bucket. For more information about granting roles, seeManage access to projects, folders, and organizations.

This predefined role contains the permissions required to load data from a Cloud Storage bucket. To see the exact permissions that are required, expand theRequired permissions section:

Required permissions

The following permissions are required to load data from a Cloud Storage bucket:

  • storage.buckets.get
  • storage.objects.get
  • storage.objects.list (required if you are using a URIwildcard)

You might also be able to get these permissions withcustom roles or otherpredefined roles.

Loading Datastore export service data

To load data from a Datastore export metadata file:

Console

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the left pane, clickExplorer.
  3. In theExplorer pane, expand your project, clickDatasets, and then select a dataset.
  4. In theDataset info section, clickCreate table.
  5. In theCreate table pane, specify the following details:
    1. In theSource section, selectGoogle Cloud Storage in theCreate table from list. Then, do the following:
      1. Select a file from the Cloud Storage bucket, or enter theCloud Storage URI. You cannot include multiple URIs in the Google Cloud console, butwildcards are supported. The Cloud Storage bucket must be in the same location as the dataset that contains the table you want to create, append, or overwrite.
        The URI for your Datastore export file must end withKIND_NAME.export_metadata orexport[NUM].export_metadata. For example, indefault_namespace_kind_Book.export_metadata,Book is the kind name, anddefault_namespace_kind_Book is the filename generated by Datastore.select source file to create a BigQuery table
      2. ForFile format, selectCloud Datastore Backup.
    2. In theDestination section, specify the following details:
      1. ForDataset, select the dataset in which you want to create the table.
      2. In theTable field, enter the name of the table that you want to create.
      3. Verify that theTable type field is set toNative table.
    3. In theSchema section, no action is necessary. The schema is inferred for a Datastore export.
    4. Optional: SpecifyPartition and cluster settings. For more information, seeCreating partitioned tables andCreating and using clustered tables.
    5. ClickAdvanced options and do the following:
      • ForWrite preference, leaveWrite if empty selected. This option creates a new table and loads your data into it.
      • If you want to ignore values in a row that are not present in the table's schema, then selectUnknown values.
      • ForEncryption, clickCustomer-managed key to use aCloud Key Management Service key. If you leave theGoogle-managed key setting, BigQueryencrypts the data at rest.
      • For information about the available options, see Datastore options.
    6. ClickCreate table.

bq

Use thebq load command withsource_format set toDATASTORE_BACKUP.Supply the--location flag and set the value to yourlocation.

bq--location=LOCATIONload\--source_format=FORMAT\DATASET.TABLE\PATH_TO_SOURCE

Replace the following:

  • LOCATION: your location. The--location flagis optional. Forexample, if you are using BigQuery in the Tokyo region, youcan set the flag's value toasia-northeast1. You can set a default valuefor the location by using the.bigqueryrc file.
  • FORMAT:DATASTORE_BACKUP.
  • DATASET: the dataset that contains the table intowhich you're loading data.
  • TABLE: the table into which you're loading data.If the table does not exist, it is created.
  • PATH_TO_SOURCE: theCloud Storage URI.

For example, the following command loads thegs://mybucket/20180228T1256/default_namespace/kind_Book/default_namespace_kind_Book.export_metadataDatastore export file into a table namedbook_data.mybucket andmydataset were created in theUS multi-region location.

bq--location=USload\--source_format=DATASTORE_BACKUP\mydataset.book_data\gs://mybucket/20180228T1256/default_namespace/kind_Book/default_namespace_kind_Book.export_metadata

API

Set the following properties to load Datastore export datausing theAPI.

  1. Create aload jobthat points to the source data in Cloud Storage.

  2. Specify yourlocation in thelocationproperty in thejobReference section of thejob resource.

  3. Thesource URIsmust be fully qualified, in the format gs://[BUCKET]/[OBJECT]. The file(object) name must end in[KIND_NAME].export_metadata. Only oneURI is allowed for Datastore exports, and you cannot use awildcard.

  4. Specify the data format by setting theJobConfigurationLoad.sourceFormat propertytoDATASTORE_BACKUP.

Appending to or overwriting a table with Datastore data

When you load Datastore export data into BigQuery, youcan create a new table to store the data, or you can overwrite an existingtable. You cannot append Datastore export data to an existingtable.

If you attempt to append Datastore export data to an existingtable, the following error results:Cannot append a datastore backup to a tablethat already has a schema. Try using the WRITE_TRUNCATE write disposition toreplace the existing table.

To overwrite an existing table with Datastore export data:

Console

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the left pane, clickExplorer.
  3. In theExplorer pane, expand your project, clickDatasets, and then select a dataset.
  4. In theDataset info section, clickCreate table.
  5. In theCreate table pane, specify the following details:
    1. In theSource section, selectGoogle Cloud Storage in theCreate table from list. Then, do the following:
      1. Select a file from the Cloud Storage bucket, or enter theCloud Storage URI. You cannot include multiple URIs in the Google Cloud console, butwildcards are supported. The Cloud Storage bucket must be in the same location as the dataset that contains the table you want to create, append, or overwrite.
        The URI for your Datastore export file must end withKIND_NAME.export_metadata orexport[NUM].export_metadata. For example, indefault_namespace_kind_Book.export_metadata,Book is the kind name, anddefault_namespace_kind_Book is the filename generated by Datastore.select source file to create a BigQuery table
      2. ForFile format, selectCloud Datastore Backup.
    2. Note: It is possible to modify the table's schema when you append or overwrite it. For more information about supported schema changes during a load operation, seeModifying table schemas.
    3. In theDestination section, specify the following details:
      1. ForDataset, select the dataset in which you want to create the table.
      2. In theTable field, enter the name of the table that you want to create.
      3. Verify that theTable type field is set toNative table.
    4. In theSchema section, no action is necessary. The schema is inferred for a Datastore export.
    5. Note: It is possible to modify the table's schema when you append or overwrite it. For more information about supported schema changes during a load operation, seeModifying table schemas.
    6. Optional: SpecifyPartition and cluster settings. For more information, seeCreating partitioned tables andCreating and using clustered tables. You cannot convert a table to a partitioned or clustered table by appending or overwriting it. The Google Cloud console does not support appending to or overwriting partitioned or clustered tables in a load job.
    7. ClickAdvanced options and do the following:
      • ForWrite preference, chooseAppend to table orOverwrite table.
      • If you want to ignore values in a row that are not present in the table's schema, then selectUnknown values.
      • ForEncryption, clickCustomer-managed key to use aCloud Key Management Service key. If you leave theGoogle-managed key setting, BigQueryencrypts the data at rest.
      • For information about the available options, see Datastore options.
    8. ClickCreate table.

bq

Use thebq load command with the--replace flag and withsource_formatset toDATASTORE_BACKUP. Supply the--location flag and set the value toyourlocation.

bq--location=LOCATIONload\--source_format=FORMAT\--replace\DATASET.TABLE\PATH_TO_SOURCE

Replace the following:

  • LOCATION: your location. The--location flagis optional. Forexample, if you are using BigQuery in the Tokyo region, youcan set the flag's value toasia-northeast1. You can set a default valuefor the location by using the.bigqueryrc file.
  • FORMAT:DATASTORE_BACKUP.
  • DATASET: the dataset containing the table intowhich you're loading data.
  • TABLE: the table you're overwriting.
  • PATH_TO_SOURCE: theCloud Storage URI.

For example, the following command loads thegs://mybucket/20180228T1256/default_namespace/kind_Book/default_namespace_kind_Book.export_metadataDatastore export file and overwrites a table namedbook_data:

bqload--source_format=DATASTORE_BACKUP\--replace\mydataset.book_data\gs://mybucket/20180228T1256/default_namespace/kind_Book/default_namespace_kind_Book.export_metadata

API

Set the following properties to load data from theAPI.

  1. Create aload jobthat points to the source data in Cloud Storage.

  2. Specify yourlocation in thelocationproperty in thejobReference section of thejob resource.

  3. Thesource URIsmust be fully qualified, in the format gs://[BUCKET]/[OBJECT]. The file(object) name must end in[KIND_NAME].export_metadata. Only oneURI is allowed for Datastore exports, and you cannot use awildcard.

  4. Specify the data format by setting theJobConfigurationLoad.sourceFormat propertytoDATASTORE_BACKUP.

  5. Specify the write disposition by setting theJobConfigurationLoad.writeDisposition propertytoWRITE_TRUNCATE.

Datastore options

To change how BigQuery parses Datastore exportdata, specify the following option:

Console optionbq tool flagBigQuery API propertyDescription
Not available--projection_fieldsprojectionFieldsA comma-separated list that indicates which entity properties to load into BigQuery from a Datastore export. Property names are case-sensitive and must be top-level properties. If no properties are specified, BigQuery loads all properties. If any named property isn't found in the Datastore export, an invalid error is returned in the job result. The default value is ''.

Data type conversion

BigQuery converts data from each entity inDatastore export files to BigQuerydata types.The following table describes the conversion between data types.

Datastore data typeBigQuery data type
ArrayARRAY
BlobBYTES
BooleanBOOLEAN
Date and timeTIMESTAMP
Embedded entityRECORD
Floating-point numberFLOAT
Geographical point

RECORD

[{"lat","DOUBLE"}, {"long","DOUBLE"}]
IntegerINTEGER
KeyRECORD
NullSTRING
Text stringSTRING (truncated to 64 KB)

Datastore key properties

Each entity in Datastore has a unique key that containsinformation such as the namespace and the path. BigQuery createsaRECORD data type for the key, with nested fields for each piece ofinformation, as described in the following table.

Key propertyDescriptionBigQuery data type
__key__.appThe Datastore app name.STRING
__key__.idThe entity's ID, ornull if__key__.name is set.INTEGER
__key__.kindThe entity's kind.STRING
__key__.nameThe entity's name, ornull if__key__.id is set.STRING
__key__.namespaceIf the Datastore app uses a custom namespace, the entity's namespace. Else, the default namespace is represented by an empty string.STRING
__key__.pathThe flattenedancestral path of the entity, consisting of the sequence of kind-identifier pairs from the root entity to the entity itself. For example:"Country", "USA", "PostalCode", 10011, "Route", 1234.STRING

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.