Refresh structured and unstructured data

This page describes refreshingstructured andunstructured data.

To refresh your website apps, seeRefresh your web page.

Refresh structured data

You can refresh the data in a structured data store as long as you use a schemathat is the same or backward compatible with the schema in the data store. Forexample, adding only new fields to an existing schema is backward compatible.

You can refresh structured data in the Google Cloud console or using the API.

Console

To use the Google Cloud console to refresh structured data from a branch of a datastore, follow these steps:

In the Google Cloud console, go to theAI Applications page.
AI Applications
In the navigation menu, clickData Stores.
In theName column, click the data store that you want to edit.
On theDocuments tab, clickImport data.
To refresh from Cloud Storage:
1. In theSelect a data source pane, selectCloud Storage.
2. In theImport data from Cloud Storage pane, clickBrowse,select the bucket that contains your refreshed data, and then clickSelect. Alternatively, enter the bucket location directly in thegs:// field.
3. UnderData Import Options, select an import option.
4. ClickImport.
To refresh from BigQuery:
1. In theSelect a data source pane, selectBigQuery.
2. In theImport data from BigQuery pane, clickBrowse,select a table that contains your refreshed data, and then clickSelect. Alternatively, enter the table location directly in theBigQuery path field.
3. UnderData Import Options, select an import option.
4. ClickImport.

REST

Use thedocuments.import method to refresh your data,specifying the appropriatereconciliationMode value.

To refresh structured data from BigQuery or Cloud Storage using thecommand line, follow these steps:

Find your data store ID. If you already have your data storeID, skip to the next step.
1. In the Google Cloud console, go to theAI Applications page andin the navigation menu, clickData Stores.
  Go to the Data Stores page
2. Click the name of your data store.
3. On theData page for your data store, get the data store ID.
To import your structured data from BigQuery call the followingmethod. You can import eitherfrom BigQuery or Cloud Storage. To import from Cloud Storage, skip tothe next step.
```
curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import"\-d'{  "bigquerySource": {    "projectId": "PROJECT_ID",    "datasetId":"DATASET_ID",    "tableId": "TABLE_ID",    "dataSchema": "DATA_SCHEMA_BQ",  },  "reconciliationMode": "RECONCILIATION_MODE",  "autoGenerateIds":AUTO_GENERATE_IDS,  "idField": "ID_FIELD",  "errorConfig": {    "gcsPrefix": "ERROR_DIRECTORY"  }}'
```
Replace the following:
- PROJECT_ID: the ID of your Google Cloud project.
- DATA_STORE_ID: the ID of the Vertex AI Search data store.
- DATASET_ID: the name of your BigQuery dataset.
- TABLE_ID: the name of your BigQuery table.
- DATA_SCHEMA_BQ: an optional field to specify the schema to usewhen parsing data from the BigQuery source. Can have thefollowing values:
  - document: the default value. The BigQuery table that you usemust conform to the following default BigQuery schema. Youcan define the ID of each document yourself, while wrapping the entiredata in thejson_data string.
  - custom: any BigQuery table schema isaccepted, and Vertex AI Search automatically generates the IDs foreach document that is imported.
- ERROR_DIRECTORY: an optional field to specify a Cloud Storage directory forerror information about the import—for example,gs://<your-gcs-bucket>/directory/import_errors. Google recommends leavingthis field empty to let Vertex AI Search automatically create atemporary directory.
- RECONCILIATION_MODE: an optional field to specify how theimported documents are reconciled with the existing documents in thedestination data store. Can have the following values:
  - INCREMENTAL: the default value. Causes an incremental refresh of datafrom BigQuery to your data store. This does an upsertoperation, which adds new documents and replaces existing documents withupdated documents with the same ID.
  - FULL: causes a full rebase of the documents in your data store.Therefore, new and updated documents are added to your data store, anddocuments that are not in BigQuery are removed from your datastore. TheFULL mode is helpful if you want to automatically deletedocuments that you no longer need.
- AUTO_GENERATE_IDS: an optional field to specify whether toautomatically generate document IDs. If set totrue, document IDsare generated based on a hash of the payload. Note that generateddocument IDs might not remain consistent over multiple imports. Ifyou auto-generate IDs over multiple imports, Google highlyrecommends settingreconciliationMode toFULL to maintainconsistent document IDs.
  SpecifyautoGenerateIds only whenbigquerySource.dataSchema isset tocustom. Otherwise anINVALID_ARGUMENT error isreturned. If you don't specifyautoGenerateIds or set it tofalse, you must specifyidField. Otherwise the documents fail toimport.
- ID_FIELD: an optional field to specify which fields are thedocument IDs. For BigQuery source files,idFieldindicates the name of the column in the BigQuery table thatcontains the document IDs.
  SpecifyidField only when both these conditions are satisfied,otherwise, anINVALID_ARGUMENT error is returned:
  - bigquerySource.dataSchema is set tocustom
  - auto_generate_ids is set tofalse or is unspecified.
  Additionally, the value of the BigQuery column name must be ofstring type, must be between 1 and 63 characters, and must conformtoRFC-1034. Otherwise, thedocuments fail to import.
Here is the default BigQuery schema. Your BigQuerytable must conform to this schema when you setdataSchema todocument.
```
[{"name":"id","mode":"REQUIRED","type":"STRING","fields":[]},{"name":"jsonData","mode":"NULLABLE","type":"STRING","fields":[]}]
```
To import your structured data from Cloud Storage call the following method.You can either import from BigQuery or Cloud Storage. To import fromBigQuery, go to the previous step.
```
curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import"\-d'{  "gcsSource": {    "inputUris": ["GCS_PATHS"],    "dataSchema": "DATA_SCHEMA_GCS",  },  "reconciliationMode": "RECONCILIATION_MODE",  "idField": "ID_FIELD",  "errorConfig": {    "gcsPrefix": "ERROR_DIRECTORY"  }}'
```
Replace the following:
- PROJECT_ID: the ID of your Google Cloud project.
- DATA_STORE_ID: the ID of the Vertex AI Search data store.
- GCS_PATHS: a list of comma-separated URIs toCloud Storage locations from where you want to import. Each URI can be2,000 characters long. URIs can match the full path for a storage objector can match the pattern for one or more objects. For example,gs://bucket/directory/*.json is a valid path.
- DATA_SCHEMA_GCS: an optional field to specify the schema to usewhen parsing data from the BigQuery source. Can have thefollowing values:
  - document: the default value. The BigQuery table that you usemust conform to the following default BigQuery schema. Youcan define the ID of each document yourself, while wrapping the entiredata in thejson_data string.
  - custom: any BigQuery table schema isaccepted, and Vertex AI Search automatically generates the IDs foreach document that is imported.
- ERROR_DIRECTORY: an optional field to specify a Cloud Storage directory forerror information about the import—for example,gs://<your-gcs-bucket>/directory/import_errors. Google recommends leavingthis field empty to let Vertex AI Search automatically create atemporary directory.
- RECONCILIATION_MODE: an optional field to specify how theimported documents are reconciled with the existing documents in thedestination data store. Can have the following values:
  - INCREMENTAL: the default value. Causes an incremental refresh of datafrom BigQuery to your data store. This does an upsertoperation, which adds new documents and replaces existing documents withupdated documents with the same ID.
  - FULL: causes a full rebase of the documents in your data store.Therefore, new and updated documents are added to your data store, anddocuments that are not in BigQuery are removed from your datastore. TheFULL mode is helpful if you want to automatically deletedocuments that you no longer need.

Python

For more information, see theVertex AI SearchPython API reference documentation.

To authenticate to Vertex AI Search, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"# bigquery_dataset = "YOUR_BIGQUERY_DATASET"# bigquery_table = "YOUR_BIGQUERY_TABLE"#  For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DocumentServiceClient(client_options=client_options)# The full resource name of the search engine branch.# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}parent=client.branch_path(project=project_id,location=location,data_store=data_store_id,branch="default_branch",)request=discoveryengine.ImportDocumentsRequest(parent=parent,bigquery_source=discoveryengine.BigQuerySource(project_id=project_id,dataset_id=bigquery_dataset,table_id=bigquery_table,data_schema="custom",),# Options: `FULL`, `INCREMENTAL`reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,)# Make the requestoperation=client.import_documents(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.ImportDocumentsMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)

Refresh unstructured data

You can refresh unstructured data in the Google Cloud console or using the API.

Console

To use the Google Cloud console to refresh unstructured data from a branch of a datastore, follow these steps:

In the Google Cloud console, go to theAI Applications page.
AI Applications
In the navigation menu, clickData Stores.
In theName column, click the data store that you want to edit.
On theDocuments tab, clickImport data.
To ingest from a Cloud Storage bucket (with or without metadata):
1. In theSelect a data source pane, selectCloud Storage.
2. In theImport data from Cloud Storage pane, clickBrowse,select the bucket that contains your refreshed data, and then clickSelect. Alternatively, enter the bucket location directly in thegs:// field.
3. UnderData Import Options, select an import option.
4. ClickImport.
To ingest from BigQuery:
1. In theSelect a data source pane, selectBigQuery.
2. In theImport data from BigQuery pane, clickBrowse,select a table that contains your refreshed data, and thenclickSelect. Alternatively, enter the table location directly intheBigQuery path field.
3. UnderData Import Options, select an import option.
4. ClickImport.

REST

To refresh unstructured data using the API, re-import it using thedocuments.import method, specifying the appropriatereconciliationMode value. For more information about importing unstructureddata, seeUnstructured data.

Python

For more information, see theVertex AI SearchPython API reference documentation.

To authenticate to Vertex AI Search, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"# Examples:# - Unstructured documents#   - `gs://bucket/directory/file.pdf`#   - `gs://bucket/directory/*.pdf`# - Unstructured documents with JSONL Metadata#   - `gs://bucket/directory/file.json`# - Unstructured documents with CSV Metadata#   - `gs://bucket/directory/file.csv`# gcs_uri = "YOUR_GCS_PATH"#  For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DocumentServiceClient(client_options=client_options)# The full resource name of the search engine branch.# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}parent=client.branch_path(project=project_id,location=location,data_store=data_store_id,branch="default_branch",)request=discoveryengine.ImportDocumentsRequest(parent=parent,gcs_source=discoveryengine.GcsSource(# Multiple URIs are supportedinput_uris=[gcs_uri],# Options:# - `content` - Unstructured documents (PDF, HTML, DOC, TXT, PPTX)# - `custom` - Unstructured documents with custom JSONL metadata# - `document` - Structured documents in the discoveryengine.Document format.# - `csv` - Unstructured documents with CSV metadatadata_schema="content",),# Options: `FULL`, `INCREMENTAL`reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,)# Make the requestoperation=client.import_documents(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.ImportDocumentsMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

Refresh structured and unstructured data Stay organized with collections Save and categorize content based on your preferences.

Refresh structured data

Console

REST

Python

Refresh unstructured data

Console

REST

Python

Refresh structured and unstructured data