Create a search data store

This page describes how to create a data store and ingest data for custom searchapps in Vertex AI Search. go to the section for the source you plan touse:

To sync data from a third-party data source instead, seeConnect a third-party data source.

For troubleshooting information, seeTroubleshoot data ingestion.

To create data stores and connect data for Gemini Enterprise apps, seeIntroduction to connectors and data stores.

Create a data store using website content

Use the following procedure to create a data store and index websites.

To use a website data store after creating it, you must attach it to an app thathas Enterprise features turned on. You can turn on Enterprise Edition for an appwhen you create it. This incurs additional costs. SeeCreate a search app andAbout advanced features.

Before you begin

If you use therobots.txt file in your website, update it.For more information, see how toprepare your website'srobots.txt file.

Procedure

Console

To use the Google Cloud console to make a data store and index websites, followthese steps:

  1. In the Google Cloud console, go to theAI Applications page.

    AI Applications

  2. In the navigation menu, clickData Stores.

  3. ClickCreate data store.

  4. On theSource page, selectWebsite Content.

  5. Choose whether to turn onAdvanced website indexing for this data store.If you turn advanced website indexing on now, you can't turn it off later.

    Advanced website indexing provides additional features such assearch summarization, search with follow-ups, and extractive answers.Advanced website indexing incurs additional cost, and requires that youverify domain ownership for any website that you index. For moreinformation, seeAdvanced website indexingandPricing.

  6. In theSites to include field, enter the URL patterns matching thewebsites that you want to include in your data store. Include one URLpattern per line, without comma separators. For example,example.com/docs/*

  7. Optional: In theSites to exclude field, enter URL patterns that youwant to exclude from your data store.

    Excluded sites take priority over included sites. So, if you were toincludeexample.com/docs/* but excludeexample.com, then nowebsites would be indexed. For more information, seeWebsitedata.

  8. ClickContinue.

  9. Select a location for your data store.

    • When you create a basic website search data store, this is always settoglobal (Global).
    • When you create a data store with advanced website indexing, youcan select a location. Because the websites that are indexed must bepublic, Google strongly recommends that you selectglobal (Global) as your location. This ensures maximum availability ofall search and answering services and eliminates the limitations ofregional data stores.
  10. Enter a name for your data store.

  11. ClickCreate. Vertex AI Search creates your data store anddisplays your data stores on theData Stores page.

  12. To view information about your data store, click the name of your data storein theName column. Your data store page appears.

    • If you turned onAdvanced website indexing, a warning appears promptingyou to verify the domains in your data store.
    • If you have a quota shortfall (the number of pages in the websites thatyou specified exceeds the "Number of documents per project"quota for your project), an additional warningappears prompting you to upgrade your quota.
  13. To verify the domains for the URL patterns in your data store, follow theinstructions on theVerify website domains page.

  14. To upgrade your quota, follow these steps:

    1. ClickUpgrade quota. TheIAM and Admin page of the Google Cloud console appears.
    2. Follow the instructions atRequest a quota adjustment in the Google Cloud documentation. Thequota to increase isNumber of documents in theDiscovery EngineAPI service.
    3. After submitting your request for a higher quota limit, go back to theAI Applications page and clickData Stores in the navigation menu.
    4. Click the name of your data store in theName column. TheStatuscolumn indicates that indexing is in progress for the websites that had surpassed the quota. When theStatus column for a URL showsIndexed,advanced website indexing features are available for that URL or URL pattern.

    For more information, seeQuota for web pageindexing in the "Quotas and limits" page.

Python

For more information, see theVertex AI SearchPython API reference documentation.

To authenticate to Vertex AI Search, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

Create a data store

fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"defcreate_data_store_sample(project_id:str,location:str,data_store_id:str,)->str:#  For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DataStoreServiceClient(client_options=client_options)# The full resource name of the collection# e.g. projects/{project}/locations/{location}/collections/default_collectionparent=client.collection_path(project=project_id,location=location,collection="default_collection",)data_store=discoveryengine.DataStore(display_name="My Data Store",# Options: GENERIC, MEDIA, HEALTHCARE_FHIRindustry_vertical=discoveryengine.IndustryVertical.GENERIC,# Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHATsolution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],# TODO(developer): Update content_config based on data store type.# Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITEcontent_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,)request=discoveryengine.CreateDataStoreRequest(parent=parent,data_store_id=data_store_id,data_store=data_store,# Optional: For Advanced Site Search Only# create_advanced_site_search=True,)# Make the requestoperation=client.create_data_store(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.CreateDataStoreMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)returnoperation.operation.name

Import websites

#     from google.api_core.client_options import ClientOptions##     from google.cloud import discoveryengine_v1 as discoveryengine##     # TODO(developer): Uncomment these variables before running the sample.#     # project_id = "YOUR_PROJECT_ID"#     # location = "YOUR_LOCATION" # Values: "global"#     # data_store_id = "YOUR_DATA_STORE_ID"#     # NOTE: Do not include http or https protocol in the URI pattern#     # uri_pattern = "cloud.google.com/generative-ai-app-builder/docs/*"##     #  For more information, refer to:#     # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store#     client_options = (#         ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")#         if location != "global"#         else None#     )##     # Create a client#     client = discoveryengine.SiteSearchEngineServiceClient(#         client_options=client_options#     )##     # The full resource name of the data store#     # e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}#     site_search_engine = client.site_search_engine_path(#         project=project_id, location=location, data_store=data_store_id#     )##     # Target Site to index#     target_site = discoveryengine.TargetSite(#         provided_uri_pattern=uri_pattern,#         # Options: INCLUDE, EXCLUDE#         type_=discoveryengine.TargetSite.Type.INCLUDE,#         exact_match=False,#     )##     # Make the request#     operation = client.create_target_site(#         parent=site_search_engine,#         target_site=target_site,#     )##     print(f"Waiting for operation to complete: {operation.operation.name}")#     response = operation.result()##     # After the operation is complete,#     # get information from operation metadata#     metadata = discoveryengine.CreateTargetSiteMetadata(operation.metadata)##     # Handle the response#     print(response)#     print(metadata)

Next steps

Import from BigQuery

Vertex AI Search supports searching across BigQuery data.

Note: Natural-language analytical queries are not supported. Semantic search queries are supported. For example, for a query like "flowery dresses", a summarized answer can be returned even if descriptions contain "jasmine" and "orchids" but don't explicitly contain the word "flowery".

You can create data stores from BigQuery tables in two ways:

The following table compares the two ways that you can import BigQuerydata into Vertex AI Search data stores.

One-time ingestionPeriodic ingestion
Generally available (GA).Public preview.
Data must be refreshed manually.Data updates automatically every 1, 3, or 5 days. Data cannot bemanually refreshed.
Vertex AI Search creates a single data store from onetable in a BigQuery.Vertex AI Search creates adata connector for a BigQuerydataset and a data store (called anentity data store) for each table specified. For each data connector, the tables must have the same data type (for example, structured) and be in the same BigQuery dataset.
Data from multiple tables can be combined in one data store by first ingesting data from one table and then more data from another source or BigQuery table.Because manual data import is not supported, the data in an entity data store can only be sourced from one BigQuery table.
Data source access control is supported.Data source access control is not supported. The imported data cancontain access controls but these controls won't be respected.
You can create a data store using either theGoogle Cloud console or the API.You must use the console to create data connectors and their entitydata stores.
CMEK-compliant.CMEK-compliant.

Before you begin

To import data from a source Google Cloud project that's different from theGoogle Cloud project with the Vertex AI Search data store, grant the followingIdentity and Access Management (IAM) roles to theservice-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.comservice account in the project that contains the Vertex AI Search data store:

Caution: When you import data from BigQueryinto a Vertex AI Search data store, BigQuery permissionsaren't imported with the data. After import, any user withsufficient Vertex AI Search permissionscan view the data, even if they don't have permission to view the data inBigQuery.

Import once from BigQuery

To ingest data from a BigQuery table, use the following steps to createa data store and ingest data using either the Google Cloud console or the API.

Before importing your data, reviewPrepare data for ingesting.

Console

To use the Google Cloud console to ingest data from BigQuery, followthese steps:

  1. In the Google Cloud console, go to theAI Applications page.

    AI Applications

  2. Go to theData Stores page.

  3. ClickCreate data store.

  4. On theSource page, selectBigQuery.

  5. Select the data type you are going to import from theWhat kind of data are you importing section.

  6. SelectOne time in theSynchronization frequency section.

  7. In theBigQuery path field, clickBrowse, select a table that youhaveprepared for ingesting, and then clickSelect.Alternatively, enter the table location directly in theBigQuery pathfield.

  8. ClickContinue.

  9. If you are doing one-time import of structured data:

    1. Map fields to key properties.

    2. If there are important fields missing from the schema, useAdd newfield to add them.

      For more information, seeAbout auto-detect andedit.

    3. ClickContinue.

  10. Choose a region for your data store.

  11. Enter a name for your data store.

  12. ClickCreate.

  13. To check the status of your ingestion, go to theData Stores pageand click your data store name to see details about it on itsData page.When the status column on theActivity tab changes fromIn progresstoImport completed, the ingestion is complete.

    Depending on the size of your data, ingestion can take severalminutes to several hours.

REST

To use the command line to create a data store and import data fromBigQuery, follow these steps.

Note: If you want to specify a schema instead of lettingVertex AI auto-detect the schema for you, do the steps inProvide your own schema as a JSON object and thenbegin the following procedure at step 2.
  1. Create a data store.

    curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\-H"X-Goog-User-Project:PROJECT_ID"\"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID"\-d'{  "displayName": "DATA_STORE_DISPLAY_NAME",  "industryVertical": "GENERIC",  "solutionTypes": ["SOLUTION_TYPE_SEARCH"]}'
    Note: Theindustry verticalGENERIC is used to create structured, unstructured, and website data stores for custom search apps.

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • DATA_STORE_ID: the ID of the Vertex AI Search data store that you want to create. This ID can contain only lowercaseletters, digits, underscores, and hyphens.
    • DATA_STORE_DISPLAY_NAME: the display name of the Vertex AISearch data store that you want to create.

    Optional: If you're uploading unstructured data and want to configure documentparsing or to turn on document chunking for RAG, specify thedocumentProcessingConfigobject and include it in your data store creation request. Configuring anOCR parser for PDFs is recommended if you're ingesting scanned PDFs. For howto configure parsing or chunking options, seeParse and chunkdocuments.

  2. Import data from BigQuery.

    If you defined a schema, make sure the data conforms to that schema.

    curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import"\-d'{  "bigquerySource": {    "projectId": "PROJECT_ID",    "datasetId":"DATASET_ID",    "tableId": "TABLE_ID",    "dataSchema": "DATA_SCHEMA",    "aclEnabled": "BOOLEAN"  },  "reconciliationMode": "RECONCILIATION_MODE",  "autoGenerateIds": "AUTO_GENERATE_IDS",  "idField": "ID_FIELD",  "errorConfig": {    "gcsPrefix": "ERROR_DIRECTORY"  }}'

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • DATA_STORE_ID: the ID of the Vertex AI Search data store.
    • DATASET_ID: the ID of the BigQuerydataset.
    • TABLE_ID: the ID of the BigQuery table.
      • If the BigQuery table is not underPROJECT_ID, you need to give the service accountservice-<projectnumber>@gcp-sa-discoveryengine.iam.gserviceaccount.com"BigQuery Data Viewer" permission for theBigQuery table. For example, if you are importinga BigQuery table from source project "123" todestination project "456", giveservice-456@gcp-sa-discoveryengine.iam.gserviceaccount.compermissions for the BigQuery table underproject "123".
    • DATA_SCHEMA: optional. Values aredocumentandcustom. The default isdocument.
      • document: the BigQuery tablethat you use must conform to the default BigQueryschema provided inPrepare data for ingesting.You can define the ID of each document yourself,while wrapping all the data in the jsonData string.
      • custom: Any BigQuery tableschema is accepted, and Vertex AI Search automaticallygenerates the IDs for each document that is imported.
    • ERROR_DIRECTORY: optional. A Cloud Storage directoryfor error information about the import—for example,gs://<your-gcs-bucket>/directory/import_errors. Google recommendsleaving this field empty to let Vertex AI Searchautomatically create a temporary directory.
    • RECONCILIATION_MODE: optional. Values areFULL andINCREMENTAL. Default isINCREMENTAL. SpecifyingINCREMENTALcauses an incremental refresh of data from BigQueryto your data store. This does an upsert operation, which adds newdocuments and replaces existing documents with updated documentswith the same ID. SpecifyingFULL causes a full rebase of thedocuments in your data store. In other words, new and updateddocuments are added to your data store, and documents that are notin BigQuery are removed from your data store. TheFULL mode is helpful if you want to automatically delete documentsthat you no longer need.
    • AUTO_GENERATE_IDS: optional. Specifies whether toautomatically generate document IDs. If set totrue, document IDsare generated based on a hash of the payload. Note that generateddocument IDs might not remain consistent over multiple imports. Ifyou auto-generate IDs over multiple imports, Google highlyrecommends settingreconciliationMode toFULL to maintainconsistent document IDs.

      SpecifyautoGenerateIds only whenbigquerySource.dataSchema isset tocustom. Otherwise anINVALID_ARGUMENT error isreturned. If you don't specifyautoGenerateIds or set it tofalse, you must specifyidField. Otherwise the documents fail toimport.

    • ID_FIELD: optional. Specifies which fields are thedocument IDs. For BigQuery source files,idFieldindicates the name of the column in the BigQuerytable that contains the document IDs.

      SpecifyidField only when: (1)bigquerySource.dataSchema is settocustom, and (2)auto_generate_ids is set tofalse or isunspecified. Otherwise anINVALID_ARGUMENT error is returned.

      The value of the BigQuery column name must be ofstring type, must be between 1 and 63 characters, and must conformtoRFC-1034. Otherwise, thedocuments fail to import.

C#

For more information, see theVertex AI SearchC# API reference documentation.

To authenticate to Vertex AI Search, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

Create a data store

usingGoogle.Cloud.DiscoveryEngine.V1;usingGoogle.LongRunning;publicsealedpartialclassGeneratedDataStoreServiceClientSnippets{/// <summary>Snippet for CreateDataStore</summary>/// <remarks>/// This snippet has been automatically generated and should be regarded as a code template only./// It will require modifications to work:/// - It may require correct/in-range values for request initialization./// - It may require specifying regional endpoints when creating the service client as shown in///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint./// </remarks>publicvoidCreateDataStoreRequestObject(){// Create clientDataStoreServiceClientdataStoreServiceClient=DataStoreServiceClient.Create();// Initialize request argument(s)CreateDataStoreRequestrequest=newCreateDataStoreRequest{ParentAsCollectionName=CollectionName.FromProjectLocationCollection("[PROJECT]","[LOCATION]","[COLLECTION]"),DataStore=newDataStore(),DataStoreId="",CreateAdvancedSiteSearch=false,CmekConfigNameAsCmekConfigName=CmekConfigName.FromProjectLocation("[PROJECT]","[LOCATION]"),SkipDefaultSchemaCreation=false,};// Make the requestOperation<DataStore,CreateDataStoreMetadata>response=dataStoreServiceClient.CreateDataStore(request);// Poll until the returned long-running operation is completeOperation<DataStore,CreateDataStoreMetadata>completedResponse=response.PollUntilCompleted();// Retrieve the operation resultDataStoreresult=completedResponse.Result;// Or get the name of the operationstringoperationName=response.Name;// This name can be stored, then the long-running operation retrieved later by nameOperation<DataStore,CreateDataStoreMetadata>retrievedResponse=dataStoreServiceClient.PollOnceCreateDataStore(operationName);// Check if the retrieved long-running operation has completedif(retrievedResponse.IsCompleted){// If it has completed, then access the resultDataStoreretrievedResult=retrievedResponse.Result;}}}

Import documents

usingGoogle.Cloud.DiscoveryEngine.V1;usingGoogle.LongRunning;usingGoogle.Protobuf.WellKnownTypes;publicsealedpartialclassGeneratedDocumentServiceClientSnippets{/// <summary>Snippet for ImportDocuments</summary>/// <remarks>/// This snippet has been automatically generated and should be regarded as a code template only./// It will require modifications to work:/// - It may require correct/in-range values for request initialization./// - It may require specifying regional endpoints when creating the service client as shown in///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint./// </remarks>publicvoidImportDocumentsRequestObject(){// Create clientDocumentServiceClientdocumentServiceClient=DocumentServiceClient.Create();// Initialize request argument(s)ImportDocumentsRequestrequest=newImportDocumentsRequest{ParentAsBranchName=BranchName.FromProjectLocationDataStoreBranch("[PROJECT]","[LOCATION]","[DATA_STORE]","[BRANCH]"),InlineSource=newImportDocumentsRequest.Types.InlineSource(),ErrorConfig=newImportErrorConfig(),ReconciliationMode=ImportDocumentsRequest.Types.ReconciliationMode.Unspecified,UpdateMask=newFieldMask(),AutoGenerateIds=false,IdField="",ForceRefreshContent=false,};// Make the requestOperation<ImportDocumentsResponse,ImportDocumentsMetadata>response=documentServiceClient.ImportDocuments(request);// Poll until the returned long-running operation is completeOperation<ImportDocumentsResponse,ImportDocumentsMetadata>completedResponse=response.PollUntilCompleted();// Retrieve the operation resultImportDocumentsResponseresult=completedResponse.Result;// Or get the name of the operationstringoperationName=response.Name;// This name can be stored, then the long-running operation retrieved later by nameOperation<ImportDocumentsResponse,ImportDocumentsMetadata>retrievedResponse=documentServiceClient.PollOnceImportDocuments(operationName);// Check if the retrieved long-running operation has completedif(retrievedResponse.IsCompleted){// If it has completed, then access the resultImportDocumentsResponseretrievedResult=retrievedResponse.Result;}}}

Go

For more information, see theVertex AI SearchGo API reference documentation.

To authenticate to Vertex AI Search, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

Create a data store

packagemainimport("context"discoveryengine"cloud.google.com/go/discoveryengine/apiv1"discoveryenginepb"cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb")funcmain(){ctx:=context.Background()// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in://   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Optionsc,err:=discoveryengine.NewDataStoreClient(ctx)iferr!=nil{// TODO: Handle error.}deferc.Close()req:=&discoveryenginepb.CreateDataStoreRequest{// TODO: Fill request struct fields.// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#CreateDataStoreRequest.}op,err:=c.CreateDataStore(ctx,req)iferr!=nil{// TODO: Handle error.}resp,err:=op.Wait(ctx)iferr!=nil{// TODO: Handle error.}// TODO: Use resp._=resp}

Import documents

packagemainimport("context"discoveryengine"cloud.google.com/go/discoveryengine/apiv1"discoveryenginepb"cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb")funcmain(){ctx:=context.Background()// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in://   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Optionsc,err:=discoveryengine.NewDocumentClient(ctx)iferr!=nil{// TODO: Handle error.}deferc.Close()req:=&discoveryenginepb.ImportDocumentsRequest{// TODO: Fill request struct fields.// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#ImportDocumentsRequest.}op,err:=c.ImportDocuments(ctx,req)iferr!=nil{// TODO: Handle error.}resp,err:=op.Wait(ctx)iferr!=nil{// TODO: Handle error.}// TODO: Use resp._=resp}

Java

For more information, see theVertex AI SearchJava API reference documentation.

To authenticate to Vertex AI Search, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

Create a data store

importcom.google.cloud.discoveryengine.v1.CollectionName;importcom.google.cloud.discoveryengine.v1.CreateDataStoreRequest;importcom.google.cloud.discoveryengine.v1.DataStore;importcom.google.cloud.discoveryengine.v1.DataStoreServiceClient;publicclassSyncCreateDataStore{publicstaticvoidmain(String[]args)throwsException{syncCreateDataStore();}publicstaticvoidsyncCreateDataStore()throwsException{// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_librarytry(DataStoreServiceClientdataStoreServiceClient=DataStoreServiceClient.create()){CreateDataStoreRequestrequest=CreateDataStoreRequest.newBuilder().setParent(CollectionName.of("[PROJECT]","[LOCATION]","[COLLECTION]").toString()).setDataStore(DataStore.newBuilder().build()).setDataStoreId("dataStoreId929489618").setCreateAdvancedSiteSearch(true).setSkipDefaultSchemaCreation(true).build();DataStoreresponse=dataStoreServiceClient.createDataStoreAsync(request).get();}}}

Import documents

importcom.google.cloud.discoveryengine.v1.BranchName;importcom.google.cloud.discoveryengine.v1.DocumentServiceClient;importcom.google.cloud.discoveryengine.v1.ImportDocumentsRequest;importcom.google.cloud.discoveryengine.v1.ImportDocumentsResponse;importcom.google.cloud.discoveryengine.v1.ImportErrorConfig;importcom.google.protobuf.FieldMask;publicclassSyncImportDocuments{publicstaticvoidmain(String[]args)throwsException{syncImportDocuments();}publicstaticvoidsyncImportDocuments()throwsException{// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_librarytry(DocumentServiceClientdocumentServiceClient=DocumentServiceClient.create()){ImportDocumentsRequestrequest=ImportDocumentsRequest.newBuilder().setParent(BranchName.ofProjectLocationDataStoreBranchName("[PROJECT]","[LOCATION]","[DATA_STORE]","[BRANCH]").toString()).setErrorConfig(ImportErrorConfig.newBuilder().build()).setUpdateMask(FieldMask.newBuilder().build()).setAutoGenerateIds(true).setIdField("idField1629396127").setForceRefreshContent(true).build();ImportDocumentsResponseresponse=documentServiceClient.importDocumentsAsync(request).get();}}}

Node.js

For more information, see theVertex AI SearchNode.js API reference documentation.

To authenticate to Vertex AI Search, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

Create a data store

/** * This snippet has been automatically generated and should be regarded as a code template only. * It will require modifications to work. * It may require correct/in-range values for request initialization. * TODO(developer): Uncomment these variables before running the sample. *//** *  Resource name of the CmekConfig to use for protecting this DataStore. */// const cmekConfigName = 'abc123'/** *  DataStore without CMEK protections. If a default CmekConfig is set for *  the project, setting this field will override the default CmekConfig as *  well. */// const disableCmek = true/** *  Required. The parent resource name, such as *  `projects/{project}/locations/{location}/collections/{collection}`. */// const parent = 'abc123'/** *  Required. The DataStore google.cloud.discoveryengine.v1.DataStore  to *  create. */// const dataStore = {}/** *  Required. The ID to use for the *  DataStore google.cloud.discoveryengine.v1.DataStore, which will become *  the final component of the *  DataStore google.cloud.discoveryengine.v1.DataStore's resource name. *  This field must conform to RFC-1034 (https://tools.ietf.org/html/rfc1034) *  standard with a length limit of 63 characters. Otherwise, an *  INVALID_ARGUMENT error is returned. */// const dataStoreId = 'abc123'/** *  A boolean flag indicating whether user want to directly create an advanced *  data store for site search. *  If the data store is not configured as site *  search (GENERIC vertical and PUBLIC_WEBSITE content_config), this flag will *  be ignored. */// const createAdvancedSiteSearch = true/** *  A boolean flag indicating whether to skip the default schema creation for *  the data store. Only enable this flag if you are certain that the default *  schema is incompatible with your use case. *  If set to true, you must manually create a schema for the data store before *  any documents can be ingested. *  This flag cannot be specified if `data_store.starting_schema` is specified. */// const skipDefaultSchemaCreation = true// Imports the Discoveryengine libraryconst{DataStoreServiceClient}=require('@google-cloud/discoveryengine').v1;// Instantiates a clientconstdiscoveryengineClient=newDataStoreServiceClient();asyncfunctioncallCreateDataStore(){// Construct requestconstrequest={parent,dataStore,dataStoreId,};// Run requestconst[operation]=awaitdiscoveryengineClient.createDataStore(request);const[response]=awaitoperation.promise();console.log(response);}callCreateDataStore();

Import documents

/** * This snippet has been automatically generated and should be regarded as a code template only. * It will require modifications to work. * It may require correct/in-range values for request initialization. * TODO(developer): Uncomment these variables before running the sample. *//** *  The Inline source for the input content for documents. */// const inlineSource = {}/** *  Cloud Storage location for the input content. */// const gcsSource = {}/** *  BigQuery input source. */// const bigquerySource = {}/** *  FhirStore input source. */// const fhirStoreSource = {}/** *  Spanner input source. */// const spannerSource = {}/** *  Cloud SQL input source. */// const cloudSqlSource = {}/** *  Firestore input source. */// const firestoreSource = {}/** *  AlloyDB input source. */// const alloyDbSource = {}/** *  Cloud Bigtable input source. */// const bigtableSource = {}/** *  Required. The parent branch resource name, such as *  `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}`. *  Requires create/update permission. */// const parent = 'abc123'/** *  The desired location of errors incurred during the Import. */// const errorConfig = {}/** *  The mode of reconciliation between existing documents and the documents to *  be imported. Defaults to *  ReconciliationMode.INCREMENTAL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL. */// const reconciliationMode = {}/** *  Indicates which fields in the provided imported documents to update. If *  not set, the default is to update all fields. */// const updateMask = {}/** *  Whether to automatically generate IDs for the documents if absent. *  If set to `true`, *  Document.id google.cloud.discoveryengine.v1.Document.id s are *  automatically generated based on the hash of the payload, where IDs may not *  be consistent during multiple imports. In which case *  ReconciliationMode.FULL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.FULL *  is highly recommended to avoid duplicate contents. If unset or set to *  `false`, Document.id google.cloud.discoveryengine.v1.Document.id s have *  to be specified using *  id_field google.cloud.discoveryengine.v1.ImportDocumentsRequest.id_field, *  otherwise, documents without IDs fail to be imported. *  Supported data sources: *  * GcsSource google.cloud.discoveryengine.v1.GcsSource. *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown. *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource. *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown. *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource. *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource. *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource. *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource. */// const autoGenerateIds = true/** *  The field indicates the ID field or column to be used as unique IDs of *  the documents. *  For GcsSource google.cloud.discoveryengine.v1.GcsSource  it is the key of *  the JSON field. For instance, `my_id` for JSON `{"my_id": "some_uuid"}`. *  For others, it may be the column name of the table where the unique ids are *  stored. *  The values of the JSON field or the table column are used as the *  Document.id google.cloud.discoveryengine.v1.Document.id s. The JSON field *  or the table column must be of string type, and the values must be set as *  valid strings conform to RFC-1034 (https://tools.ietf.org/html/rfc1034) *  with 1-63 characters. Otherwise, documents without valid IDs fail to be *  imported. *  Only set this field when *  auto_generate_ids google.cloud.discoveryengine.v1.ImportDocumentsRequest.auto_generate_ids *  is unset or set as `false`. Otherwise, an INVALID_ARGUMENT error is thrown. *  If it is unset, a default value `_id` is used when importing from the *  allowed data sources. *  Supported data sources: *  * GcsSource google.cloud.discoveryengine.v1.GcsSource. *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown. *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource. *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown. *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource. *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource. *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource. *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource. */// const idField = 'abc123'/** *  Optional. Whether to force refresh the unstructured content of the *  documents. *  If set to `true`, the content part of the documents will be refreshed *  regardless of the update status of the referencing content. */// const forceRefreshContent = true// Imports the Discoveryengine libraryconst{DocumentServiceClient}=require('@google-cloud/discoveryengine').v1;// Instantiates a clientconstdiscoveryengineClient=newDocumentServiceClient();asyncfunctioncallImportDocuments(){// Construct requestconstrequest={parent,};// Run requestconst[operation]=awaitdiscoveryengineClient.importDocuments(request);const[response]=awaitoperation.promise();console.log(response);}callImportDocuments();

Python

For more information, see theVertex AI SearchPython API reference documentation.

To authenticate to Vertex AI Search, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

Create a data store

fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"defcreate_data_store_sample(project_id:str,location:str,data_store_id:str,)->str:#  For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DataStoreServiceClient(client_options=client_options)# The full resource name of the collection# e.g. projects/{project}/locations/{location}/collections/default_collectionparent=client.collection_path(project=project_id,location=location,collection="default_collection",)data_store=discoveryengine.DataStore(display_name="My Data Store",# Options: GENERIC, MEDIA, HEALTHCARE_FHIRindustry_vertical=discoveryengine.IndustryVertical.GENERIC,# Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHATsolution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],# TODO(developer): Update content_config based on data store type.# Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITEcontent_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,)request=discoveryengine.CreateDataStoreRequest(parent=parent,data_store_id=data_store_id,data_store=data_store,# Optional: For Advanced Site Search Only# create_advanced_site_search=True,)# Make the requestoperation=client.create_data_store(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.CreateDataStoreMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)returnoperation.operation.name

Import documents

fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"# bigquery_dataset = "YOUR_BIGQUERY_DATASET"# bigquery_table = "YOUR_BIGQUERY_TABLE"#  For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DocumentServiceClient(client_options=client_options)# The full resource name of the search engine branch.# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}parent=client.branch_path(project=project_id,location=location,data_store=data_store_id,branch="default_branch",)request=discoveryengine.ImportDocumentsRequest(parent=parent,bigquery_source=discoveryengine.BigQuerySource(project_id=project_id,dataset_id=bigquery_dataset,table_id=bigquery_table,data_schema="custom",),# Options: `FULL`, `INCREMENTAL`reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,)# Make the requestoperation=client.import_documents(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.ImportDocumentsMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)

Ruby

For more information, see theVertex AI SearchRuby API reference documentation.

To authenticate to Vertex AI Search, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

Create a data store

require"google/cloud/discovery_engine/v1"### Snippet for the create_data_store call in the DataStoreService service## This snippet has been automatically generated and should be regarded as a code# template only. It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in https://cloud.google.com/ruby/docs/reference.## This is an auto-generated example demonstrating basic usage of# Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client#create_data_store.#defcreate_data_store# Create a client object. The client can be reused for multiple calls.client=Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client.new# Create a request. To set request fields, pass in keyword arguments.request=Google::Cloud::DiscoveryEngine::V1::CreateDataStoreRequest.new# Call the create_data_store method.result=client.create_data_storerequest# The returned object is of type Gapic::Operation. You can use it to# check the status of an operation, cancel it, or wait for results.# Here is how to wait for a response.result.wait_until_done!timeout:60ifresult.response?presult.responseelseputs"No response received."endend

Import documents

require"google/cloud/discovery_engine/v1"### Snippet for the import_documents call in the DocumentService service## This snippet has been automatically generated and should be regarded as a code# template only. It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in https://cloud.google.com/ruby/docs/reference.## This is an auto-generated example demonstrating basic usage of# Google::Cloud::DiscoveryEngine::V1::DocumentService::Client#import_documents.#defimport_documents# Create a client object. The client can be reused for multiple calls.client=Google::Cloud::DiscoveryEngine::V1::DocumentService::Client.new# Create a request. To set request fields, pass in keyword arguments.request=Google::Cloud::DiscoveryEngine::V1::ImportDocumentsRequest.new# Call the import_documents method.result=client.import_documentsrequest# The returned object is of type Gapic::Operation. You can use it to# check the status of an operation, cancel it, or wait for results.# Here is how to wait for a response.result.wait_until_done!timeout:60ifresult.response?presult.responseelseputs"No response received."endend

Connect to BigQuery with periodic syncing

Note: This feature is a Preview offering, subject to the "Pre-GA Offerings Terms"of theGCP Service Specific Terms.Pre-GA products and features may have limited support, andchanges to pre-GA products and features may not be compatible with other pre-GAversions. For more information, see thelaunch stage descriptions.Further, by using this feature, you agree to theGenerative AI Preview terms and conditions("Preview Terms"). For this feature, you can process personal data as outlined in theCloud Data Processing Addendum,subject to applicable restrictions and obligations in the Agreement (as defined in the Preview Terms).

Before importing your data, reviewPrepare data for ingesting.

The following procedure describes how to create a data connector that associatesa BigQuery dataset with a Vertex AI Search dataconnector and how to specify a table on the dataset for each data store you wantto create. Data stores that are children of data connectors are calledentitydata stores.

Data from the dataset is synced periodically to the entity data stores. You canspecify synchronization daily, every three days, or every five days.

Console

To use the Google Cloud console to create a connector that periodically syncs datafrom a BigQuery dataset to Vertex AI Search, follow thesesteps:

  1. In the Google Cloud console, go to theAI Applications page.

    AI Applications

  2. In the navigation menu, clickData Stores.

  3. ClickCreate data store.

  4. On theSource page, selectBigQuery.

  5. Select the kind of data that you are importing.

  6. ClickPeriodic.

  7. Select theSync frequency, how often you want theVertex AI Search connector to sync with the BigQuerydataset. You can change the frequency later.

  8. In theBigQuery dataset path field, clickBrowse, select the datasetthat contains the tables that you haveprepared foringesting. Alternatively, enter the table location directlyin theBigQuery path field. The format for the path isprojectname.datasetname.

  9. In theTables to sync field, clickBrowse, and then select a tablethat contains the data that you want for your data store.

    Note: Make sure that the data in the tables matches the kind of data thatyou selected in step 5.
    If there is a mismatch you won't know until one of the following happens:
    • You get errors when the connector tries to import data.
    • You see unexpected results. This happens if the selected typewas structured but should have been unstructured or structured withmetadata. The data is imported but the content URL or metadata is notrecognized and is treated as a string.
  10. If there are additional tables in the dataset that that you want to use fordata stores, clickAdd table and specify those tables too.

  11. ClickContinue.

  12. Choose a region for your data store, enter a name for your data connector,and clickCreate.

    You have now created a data connector, which will periodically sync datawith the BigQuery dataset. And, you have created one or more entitydata stores. The data stores have the same names as the BigQuerytables.

  13. To check the status of your ingestion, go to theData Stores pageand click your data connector name to see details about it on itsDatapage >Data ingestion activity tab. When the status column on theActivity tab changes fromIn progress tosucceeded, the firstingestion is complete.

    Depending on the size of your data, ingestion can take severalminutes to several hours.

After you set up your data source and import data the first time, the data storesyncs data from that source at a frequency that you select during setup.About an hour after the data connector is created, the first sync occurs.The next sync then occurs around 24 hours, 72 hours,or 120 hours later.

Next steps

Import from Cloud Storage

Note: Due to a known issue, creating a data store from Cloud Storage by using the Google Cloud console might fail. As a workaround, you can either use the API to create the data store or create a new Cloud Storage bucket in the Google Cloud console before you create the data store.

You can create data stores from Cloud Storage tables in two ways:

The following table compares the two ways that you can import Cloud Storagedata into Vertex AI Search data stores.

One-time ingestionPeriodic ingestion
Generally available (GA).Public preview.
Data must be refreshed manually.Data updates automatically every one, three, or five days. Data cannot bemanually refreshed.
Vertex AI Search creates a single data store from one folder or file in Cloud Storage.Vertex AI Search creates adata connector, and associates a data store (called anentity data store) with it for the file or folder that is specified. Each Cloud Storage data connector can have a single entity data store.
Data from multiple files, folders, and buckets can be combined in one data store by first ingesting data from one Cloud Storage location and then more data from another location.Because manual data import is not supported, the data in an entity data store can only be sourced from one Cloud Storage file or folder.
Data source access control is supported. For more information, seeData source access control.Data source access control is not supported. The imported data cancontain access controls but these controls won't be respected.
You can create a data store using either theGoogle Cloud console or the API.You must use the console to create data connectors and their entitydata stores.
CMEK-compliant.CMEK-compliant.

Before you begin

To import data from a source Google Cloud project that's different from theGoogle Cloud project with the Vertex AI Search data store, grant the followingIdentity and Access Management (IAM) roles to theservice-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.comservice account in the project that contains the Vertex AI Search data store:

Caution: When you import data from Cloud Storageinto a Vertex AI Search data store, Cloud Storage permissionsaren't imported with the data. After import, any user withsufficient Vertex AI Search permissionscan view the data, even if they don't have permission to view the data inCloud Storage.

Import once from Cloud Storage

To ingest data from Cloud Storage, use the following steps to createa data store and ingest data using either the Google Cloud console or the API.

Before importing your data, reviewPrepare data for ingesting.

Note: Data import is recursive. That is, if there are folders within thebucket or folder that you specify, files within those folders are imported.

Console

To use the console to ingest data from a Cloud Storage bucket, follow thesesteps:

  1. In the Google Cloud console, go to theAI Applications page.

    AI Applications

  2. Go to theData Stores page.

  3. ClickCreate data store.

  4. On theSource page, selectCloud Storage.

  5. In theSelect a folder or file you want to import section, selectFolder orFile.

  6. ClickBrowse and choose the data you haveprepared for ingesting, and then clickSelect.Alternatively, enter the location directly in thegs:// field.

  7. Select what kind of data you are importing.

  8. ClickContinue.

  9. If you are doing one-time import of structured data:

    1. Map fields to key properties.

    2. If there are important fields missing from the schema, useAdd newfield to add them.

      For more information, seeAbout auto-detect andedit.

    3. ClickContinue.

  10. Choose a region for your data store.

  11. Enter a name for your data store.

  12. Optional: If you selected unstructured documents, you can select parsing andchunking options for your documents. To compare parsers, seeParsedocuments. For information about chunking seeChunk documents forRAG.

    TheOCR parser andlayout parser can incuradditional costs. SeeDocument AI feature pricing.

    To select a parser, expandDocument processing options and specify theparser options that you want to use.

  13. ClickCreate.

  14. To check the status of your ingestion, go to theData Stores pageand click your data store name to see details about it on itsData page.When the status column on theActivity tab changes fromIn progresstoImport completed, the ingestion is complete.

    Depending on the size of your data, ingestion can take severalminutes or several hours.

REST

To use the command line to create a data store and ingest data fromCloud Storage, follow these steps.

Note: If you are importing structured data and want to specify a schema insteadof letting Vertex AI auto-detect the schema for you, do the stepsinProvide your own schema as a JSON object and thenbegin the following procedure at step 2.
  1. Create a data store.

    curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\-H"X-Goog-User-Project:PROJECT_ID"\"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID"\-d'{  "displayName": "DATA_STORE_DISPLAY_NAME",  "industryVertical": "GENERIC",  "solutionTypes": ["SOLUTION_TYPE_SEARCH"]}'
    Note: Theindustry verticalGENERIC is used to create structured, unstructured, and website data stores for custom search apps.

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • DATA_STORE_ID: the ID of the Vertex AI Search data store that you want to create. This ID can contain only lowercaseletters, digits, underscores, and hyphens.
    • DATA_STORE_DISPLAY_NAME: the display name of the Vertex AISearch data store that you want to create.

    Optional: If you're uploading unstructured data and want to configure documentparsing or to turn on document chunking for RAG, specify thedocumentProcessingConfigobject and include it in your data store creation request. Configuring anOCR parser for PDFs is recommended if you're ingesting scanned PDFs. For howto configure parsing or chunking options, seeParse and chunkdocuments.

  2. Import data from Cloud Storage.

    curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import"\-d'{    "gcsSource": {      "inputUris": ["INPUT_FILE_PATTERN_1", "INPUT_FILE_PATTERN_2"],      "dataSchema": "DATA_SCHEMA",    },    "reconciliationMode": "RECONCILIATION_MODE",    "autoGenerateIds": "AUTO_GENERATE_IDS",    "idField": "ID_FIELD",    "errorConfig": {      "gcsPrefix": "ERROR_DIRECTORY"    }  }'

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • DATA_STORE_ID: the ID of the Vertex AI Search data store.
    • INPUT_FILE_PATTERN: a file pattern in Cloud Storage containing your documents.

      For structured data or for unstructured data with metadata, an example of the input file pattern isgs://<your-gcs-bucket>/directory/object.jsonand an example of pattern matching one or more files isgs://<your-gcs-bucket>/directory/*.json.

      For unstructured documents, an example isgs://<your-gcs-bucket>/directory/*.pdf. Each file that is matched by the pattern becomes a document.

      If<your-gcs-bucket> is not underPROJECT_ID, you need to give the service accountservice-<project number>@gcp-sa-discoveryengine.iam.gserviceaccount.com "Storage Object Viewer" permissions for the Cloud Storage bucket. For example, if you are importing a Cloud Storage bucket from source project "123" to destination project "456", giveservice-456@gcp-sa-discoveryengine.iam.gserviceaccount.com permissions on the Cloud Storage bucket under project "123".

    • DATA_SCHEMA: optional. Values aredocument,custom,csv, andcontent. The default isdocument.

      • document: Upload unstructured data with metadata for unstructured documents. Each line of the file has to follow one of the following formats. You can define the ID of each document:

        • { "id": "<your-id>", "jsonData": "<JSON string>", "content": { "mimeType": "<application/pdf or text/html>", "uri": "gs://<your-gcs-bucket>/directory/filename.pdf" } }
        • { "id": "<your-id>", "structData": <JSON object>, "content": { "mimeType": "<application/pdf or text/html>", "uri": "gs://<your-gcs-bucket>/directory/filename.pdf" } }
      • custom: Upload JSON for structured documents. The data is organized according to a schema. You can specify the schema; otherwise it is auto-detected. You can put the JSON string of the document in a consistent format directly in each line, and Vertex AI Search automatically generates the IDs for each document imported.

      • content: Upload unstructured documents (PDF, HTML, DOC, TXT, PPTX). The ID of each document is automatically generated as the first 128 bits of SHA256(GCS_URI) encoded as a hex string. You can specify multiple input file patterns as long as the matched files don't exceed the 100K files limit.

      • csv: Include a header row in your CSV file, with each header mapped to a document field. Specify the path to the CSV file using theinputUris field.

    • ERROR_DIRECTORY: optional. A Cloud Storage directory for error information about the import—for example,gs://<your-gcs-bucket>/directory/import_errors. Google recommends leaving this field empty to let Vertex AI Search automatically create a temporary directory.

    • RECONCILIATION_MODE: optional. Values areFULL andINCREMENTAL. Default isINCREMENTAL. SpecifyingINCREMENTAL causes an incremental refresh of data from Cloud Storage to your data store. This does an upsert operation, which adds new documents and replaces existing documents with updated documents with the same ID. SpecifyingFULL causes a full rebase of the documents in your data store. In other words, new and updated documents are added to your data store, and documents that are not in Cloud Storage are removed from your data store. TheFULL mode is helpful if you want to automatically delete documents that you no longer need.

    • AUTO_GENERATE_IDS: optional. Specifies whether to automatically generate document IDs. If set totrue, document IDs are generated based on a hash of the payload. Note that generated document IDs might not remain consistent over multiple imports. If you auto-generate IDs over multiple imports, Google highly recommends settingreconciliationMode toFULL to maintain consistent document IDs.

      SpecifyautoGenerateIds only whengcsSource.dataSchema is set tocustom orcsv. Otherwise anINVALID_ARGUMENT error is returned. If you don't specifyautoGenerateIds or set it tofalse, you must specifyidField. Otherwise the documents fail to import.

    • ID_FIELD: optional. Specifies which fields are the document IDs. For Cloud Storage source documents,idField specifies the name in the JSON fields that are document IDs. For example, if{"my_id":"some_uuid"} is the document ID field in one of your documents, specify"idField":"my_id". This identifies all JSON fields with the name"my_id" as document IDs.

      Specify this field only when: (1)gcsSource.dataSchema is set tocustom orcsv, and (2)auto_generate_ids is set tofalse or is unspecified. Otherwise anINVALID_ARGUMENT error is returned.

      Note that the value of the Cloud Storage JSON field must be of string type, must be between 1-63 characters, and must conform toRFC-1034. Otherwise, the documents fail to import.

      Note that the JSON field name specified byid_field must be of string type, must be between 1 and 63 characters, and must conform toRFC-1034. Otherwise, the documents fail to import.

C#

For more information, see theVertex AI SearchC# API reference documentation.

To authenticate to Vertex AI Search, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

Create a data store

usingGoogle.Cloud.DiscoveryEngine.V1;usingGoogle.LongRunning;publicsealedpartialclassGeneratedDataStoreServiceClientSnippets{/// <summary>Snippet for CreateDataStore</summary>/// <remarks>/// This snippet has been automatically generated and should be regarded as a code template only./// It will require modifications to work:/// - It may require correct/in-range values for request initialization./// - It may require specifying regional endpoints when creating the service client as shown in///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint./// </remarks>publicvoidCreateDataStoreRequestObject(){// Create clientDataStoreServiceClientdataStoreServiceClient=DataStoreServiceClient.Create();// Initialize request argument(s)CreateDataStoreRequestrequest=newCreateDataStoreRequest{ParentAsCollectionName=CollectionName.FromProjectLocationCollection("[PROJECT]","[LOCATION]","[COLLECTION]"),DataStore=newDataStore(),DataStoreId="",CreateAdvancedSiteSearch=false,CmekConfigNameAsCmekConfigName=CmekConfigName.FromProjectLocation("[PROJECT]","[LOCATION]"),SkipDefaultSchemaCreation=false,};// Make the requestOperation<DataStore,CreateDataStoreMetadata>response=dataStoreServiceClient.CreateDataStore(request);// Poll until the returned long-running operation is completeOperation<DataStore,CreateDataStoreMetadata>completedResponse=response.PollUntilCompleted();// Retrieve the operation resultDataStoreresult=completedResponse.Result;// Or get the name of the operationstringoperationName=response.Name;// This name can be stored, then the long-running operation retrieved later by nameOperation<DataStore,CreateDataStoreMetadata>retrievedResponse=dataStoreServiceClient.PollOnceCreateDataStore(operationName);// Check if the retrieved long-running operation has completedif(retrievedResponse.IsCompleted){// If it has completed, then access the resultDataStoreretrievedResult=retrievedResponse.Result;}}}

Import documents

usingGoogle.Cloud.DiscoveryEngine.V1;usingGoogle.LongRunning;usingGoogle.Protobuf.WellKnownTypes;publicsealedpartialclassGeneratedDocumentServiceClientSnippets{/// <summary>Snippet for ImportDocuments</summary>/// <remarks>/// This snippet has been automatically generated and should be regarded as a code template only./// It will require modifications to work:/// - It may require correct/in-range values for request initialization./// - It may require specifying regional endpoints when creating the service client as shown in///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint./// </remarks>publicvoidImportDocumentsRequestObject(){// Create clientDocumentServiceClientdocumentServiceClient=DocumentServiceClient.Create();// Initialize request argument(s)ImportDocumentsRequestrequest=newImportDocumentsRequest{ParentAsBranchName=BranchName.FromProjectLocationDataStoreBranch("[PROJECT]","[LOCATION]","[DATA_STORE]","[BRANCH]"),InlineSource=newImportDocumentsRequest.Types.InlineSource(),ErrorConfig=newImportErrorConfig(),ReconciliationMode=ImportDocumentsRequest.Types.ReconciliationMode.Unspecified,UpdateMask=newFieldMask(),AutoGenerateIds=false,IdField="",ForceRefreshContent=false,};// Make the requestOperation<ImportDocumentsResponse,ImportDocumentsMetadata>response=documentServiceClient.ImportDocuments(request);// Poll until the returned long-running operation is completeOperation<ImportDocumentsResponse,ImportDocumentsMetadata>completedResponse=response.PollUntilCompleted();// Retrieve the operation resultImportDocumentsResponseresult=completedResponse.Result;// Or get the name of the operationstringoperationName=response.Name;// This name can be stored, then the long-running operation retrieved later by nameOperation<ImportDocumentsResponse,ImportDocumentsMetadata>retrievedResponse=documentServiceClient.PollOnceImportDocuments(operationName);// Check if the retrieved long-running operation has completedif(retrievedResponse.IsCompleted){// If it has completed, then access the resultImportDocumentsResponseretrievedResult=retrievedResponse.Result;}}}

Go

For more information, see theVertex AI SearchGo API reference documentation.

To authenticate to Vertex AI Search, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

Create a data store

packagemainimport("context"discoveryengine"cloud.google.com/go/discoveryengine/apiv1"discoveryenginepb"cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb")funcmain(){ctx:=context.Background()// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in://   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Optionsc,err:=discoveryengine.NewDataStoreClient(ctx)iferr!=nil{// TODO: Handle error.}deferc.Close()req:=&discoveryenginepb.CreateDataStoreRequest{// TODO: Fill request struct fields.// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#CreateDataStoreRequest.}op,err:=c.CreateDataStore(ctx,req)iferr!=nil{// TODO: Handle error.}resp,err:=op.Wait(ctx)iferr!=nil{// TODO: Handle error.}// TODO: Use resp._=resp}

Import documents

packagemainimport("context"discoveryengine"cloud.google.com/go/discoveryengine/apiv1"discoveryenginepb"cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb")funcmain(){ctx:=context.Background()// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in://   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Optionsc,err:=discoveryengine.NewDocumentClient(ctx)iferr!=nil{// TODO: Handle error.}deferc.Close()req:=&discoveryenginepb.ImportDocumentsRequest{// TODO: Fill request struct fields.// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#ImportDocumentsRequest.}op,err:=c.ImportDocuments(ctx,req)iferr!=nil{// TODO: Handle error.}resp,err:=op.Wait(ctx)iferr!=nil{// TODO: Handle error.}// TODO: Use resp._=resp}

Java

For more information, see theVertex AI SearchJava API reference documentation.

To authenticate to Vertex AI Search, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

Create a data store

importcom.google.cloud.discoveryengine.v1.CollectionName;importcom.google.cloud.discoveryengine.v1.CreateDataStoreRequest;importcom.google.cloud.discoveryengine.v1.DataStore;importcom.google.cloud.discoveryengine.v1.DataStoreServiceClient;publicclassSyncCreateDataStore{publicstaticvoidmain(String[]args)throwsException{syncCreateDataStore();}publicstaticvoidsyncCreateDataStore()throwsException{// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_librarytry(DataStoreServiceClientdataStoreServiceClient=DataStoreServiceClient.create()){CreateDataStoreRequestrequest=CreateDataStoreRequest.newBuilder().setParent(CollectionName.of("[PROJECT]","[LOCATION]","[COLLECTION]").toString()).setDataStore(DataStore.newBuilder().build()).setDataStoreId("dataStoreId929489618").setCreateAdvancedSiteSearch(true).setSkipDefaultSchemaCreation(true).build();DataStoreresponse=dataStoreServiceClient.createDataStoreAsync(request).get();}}}

Import documents

importcom.google.cloud.discoveryengine.v1.BranchName;importcom.google.cloud.discoveryengine.v1.DocumentServiceClient;importcom.google.cloud.discoveryengine.v1.ImportDocumentsRequest;importcom.google.cloud.discoveryengine.v1.ImportDocumentsResponse;importcom.google.cloud.discoveryengine.v1.ImportErrorConfig;importcom.google.protobuf.FieldMask;publicclassSyncImportDocuments{publicstaticvoidmain(String[]args)throwsException{syncImportDocuments();}publicstaticvoidsyncImportDocuments()throwsException{// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_librarytry(DocumentServiceClientdocumentServiceClient=DocumentServiceClient.create()){ImportDocumentsRequestrequest=ImportDocumentsRequest.newBuilder().setParent(BranchName.ofProjectLocationDataStoreBranchName("[PROJECT]","[LOCATION]","[DATA_STORE]","[BRANCH]").toString()).setErrorConfig(ImportErrorConfig.newBuilder().build()).setUpdateMask(FieldMask.newBuilder().build()).setAutoGenerateIds(true).setIdField("idField1629396127").setForceRefreshContent(true).build();ImportDocumentsResponseresponse=documentServiceClient.importDocumentsAsync(request).get();}}}

Node.js

For more information, see theVertex AI SearchNode.js API reference documentation.

To authenticate to Vertex AI Search, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

Create a data store

/** * This snippet has been automatically generated and should be regarded as a code template only. * It will require modifications to work. * It may require correct/in-range values for request initialization. * TODO(developer): Uncomment these variables before running the sample. *//** *  Resource name of the CmekConfig to use for protecting this DataStore. */// const cmekConfigName = 'abc123'/** *  DataStore without CMEK protections. If a default CmekConfig is set for *  the project, setting this field will override the default CmekConfig as *  well. */// const disableCmek = true/** *  Required. The parent resource name, such as *  `projects/{project}/locations/{location}/collections/{collection}`. */// const parent = 'abc123'/** *  Required. The DataStore google.cloud.discoveryengine.v1.DataStore  to *  create. */// const dataStore = {}/** *  Required. The ID to use for the *  DataStore google.cloud.discoveryengine.v1.DataStore, which will become *  the final component of the *  DataStore google.cloud.discoveryengine.v1.DataStore's resource name. *  This field must conform to RFC-1034 (https://tools.ietf.org/html/rfc1034) *  standard with a length limit of 63 characters. Otherwise, an *  INVALID_ARGUMENT error is returned. */// const dataStoreId = 'abc123'/** *  A boolean flag indicating whether user want to directly create an advanced *  data store for site search. *  If the data store is not configured as site *  search (GENERIC vertical and PUBLIC_WEBSITE content_config), this flag will *  be ignored. */// const createAdvancedSiteSearch = true/** *  A boolean flag indicating whether to skip the default schema creation for *  the data store. Only enable this flag if you are certain that the default *  schema is incompatible with your use case. *  If set to true, you must manually create a schema for the data store before *  any documents can be ingested. *  This flag cannot be specified if `data_store.starting_schema` is specified. */// const skipDefaultSchemaCreation = true// Imports the Discoveryengine libraryconst{DataStoreServiceClient}=require('@google-cloud/discoveryengine').v1;// Instantiates a clientconstdiscoveryengineClient=newDataStoreServiceClient();asyncfunctioncallCreateDataStore(){// Construct requestconstrequest={parent,dataStore,dataStoreId,};// Run requestconst[operation]=awaitdiscoveryengineClient.createDataStore(request);const[response]=awaitoperation.promise();console.log(response);}callCreateDataStore();

Import documents

/** * This snippet has been automatically generated and should be regarded as a code template only. * It will require modifications to work. * It may require correct/in-range values for request initialization. * TODO(developer): Uncomment these variables before running the sample. *//** *  The Inline source for the input content for documents. */// const inlineSource = {}/** *  Cloud Storage location for the input content. */// const gcsSource = {}/** *  BigQuery input source. */// const bigquerySource = {}/** *  FhirStore input source. */// const fhirStoreSource = {}/** *  Spanner input source. */// const spannerSource = {}/** *  Cloud SQL input source. */// const cloudSqlSource = {}/** *  Firestore input source. */// const firestoreSource = {}/** *  AlloyDB input source. */// const alloyDbSource = {}/** *  Cloud Bigtable input source. */// const bigtableSource = {}/** *  Required. The parent branch resource name, such as *  `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}`. *  Requires create/update permission. */// const parent = 'abc123'/** *  The desired location of errors incurred during the Import. */// const errorConfig = {}/** *  The mode of reconciliation between existing documents and the documents to *  be imported. Defaults to *  ReconciliationMode.INCREMENTAL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL. */// const reconciliationMode = {}/** *  Indicates which fields in the provided imported documents to update. If *  not set, the default is to update all fields. */// const updateMask = {}/** *  Whether to automatically generate IDs for the documents if absent. *  If set to `true`, *  Document.id google.cloud.discoveryengine.v1.Document.id s are *  automatically generated based on the hash of the payload, where IDs may not *  be consistent during multiple imports. In which case *  ReconciliationMode.FULL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.FULL *  is highly recommended to avoid duplicate contents. If unset or set to *  `false`, Document.id google.cloud.discoveryengine.v1.Document.id s have *  to be specified using *  id_field google.cloud.discoveryengine.v1.ImportDocumentsRequest.id_field, *  otherwise, documents without IDs fail to be imported. *  Supported data sources: *  * GcsSource google.cloud.discoveryengine.v1.GcsSource. *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown. *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource. *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown. *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource. *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource. *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource. *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource. */// const autoGenerateIds = true/** *  The field indicates the ID field or column to be used as unique IDs of *  the documents. *  For GcsSource google.cloud.discoveryengine.v1.GcsSource  it is the key of *  the JSON field. For instance, `my_id` for JSON `{"my_id": "some_uuid"}`. *  For others, it may be the column name of the table where the unique ids are *  stored. *  The values of the JSON field or the table column are used as the *  Document.id google.cloud.discoveryengine.v1.Document.id s. The JSON field *  or the table column must be of string type, and the values must be set as *  valid strings conform to RFC-1034 (https://tools.ietf.org/html/rfc1034) *  with 1-63 characters. Otherwise, documents without valid IDs fail to be *  imported. *  Only set this field when *  auto_generate_ids google.cloud.discoveryengine.v1.ImportDocumentsRequest.auto_generate_ids *  is unset or set as `false`. Otherwise, an INVALID_ARGUMENT error is thrown. *  If it is unset, a default value `_id` is used when importing from the *  allowed data sources. *  Supported data sources: *  * GcsSource google.cloud.discoveryengine.v1.GcsSource. *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown. *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource. *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown. *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource. *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource. *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource. *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource. */// const idField = 'abc123'/** *  Optional. Whether to force refresh the unstructured content of the *  documents. *  If set to `true`, the content part of the documents will be refreshed *  regardless of the update status of the referencing content. */// const forceRefreshContent = true// Imports the Discoveryengine libraryconst{DocumentServiceClient}=require('@google-cloud/discoveryengine').v1;// Instantiates a clientconstdiscoveryengineClient=newDocumentServiceClient();asyncfunctioncallImportDocuments(){// Construct requestconstrequest={parent,};// Run requestconst[operation]=awaitdiscoveryengineClient.importDocuments(request);const[response]=awaitoperation.promise();console.log(response);}callImportDocuments();

Python

For more information, see theVertex AI SearchPython API reference documentation.

To authenticate to Vertex AI Search, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

Create a data store

fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"defcreate_data_store_sample(project_id:str,location:str,data_store_id:str,)->str:#  For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DataStoreServiceClient(client_options=client_options)# The full resource name of the collection# e.g. projects/{project}/locations/{location}/collections/default_collectionparent=client.collection_path(project=project_id,location=location,collection="default_collection",)data_store=discoveryengine.DataStore(display_name="My Data Store",# Options: GENERIC, MEDIA, HEALTHCARE_FHIRindustry_vertical=discoveryengine.IndustryVertical.GENERIC,# Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHATsolution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],# TODO(developer): Update content_config based on data store type.# Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITEcontent_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,)request=discoveryengine.CreateDataStoreRequest(parent=parent,data_store_id=data_store_id,data_store=data_store,# Optional: For Advanced Site Search Only# create_advanced_site_search=True,)# Make the requestoperation=client.create_data_store(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.CreateDataStoreMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)returnoperation.operation.name

Import documents

fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"# Examples:# - Unstructured documents#   - `gs://bucket/directory/file.pdf`#   - `gs://bucket/directory/*.pdf`# - Unstructured documents with JSONL Metadata#   - `gs://bucket/directory/file.json`# - Unstructured documents with CSV Metadata#   - `gs://bucket/directory/file.csv`# gcs_uri = "YOUR_GCS_PATH"#  For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DocumentServiceClient(client_options=client_options)# The full resource name of the search engine branch.# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}parent=client.branch_path(project=project_id,location=location,data_store=data_store_id,branch="default_branch",)request=discoveryengine.ImportDocumentsRequest(parent=parent,gcs_source=discoveryengine.GcsSource(# Multiple URIs are supportedinput_uris=[gcs_uri],# Options:# - `content` - Unstructured documents (PDF, HTML, DOC, TXT, PPTX)# - `custom` - Unstructured documents with custom JSONL metadata# - `document` - Structured documents in the discoveryengine.Document format.# - `csv` - Unstructured documents with CSV metadatadata_schema="content",),# Options: `FULL`, `INCREMENTAL`reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,)# Make the requestoperation=client.import_documents(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.ImportDocumentsMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)

Ruby

For more information, see theVertex AI SearchRuby API reference documentation.

To authenticate to Vertex AI Search, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

Create a data store

require"google/cloud/discovery_engine/v1"### Snippet for the create_data_store call in the DataStoreService service## This snippet has been automatically generated and should be regarded as a code# template only. It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in https://cloud.google.com/ruby/docs/reference.## This is an auto-generated example demonstrating basic usage of# Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client#create_data_store.#defcreate_data_store# Create a client object. The client can be reused for multiple calls.client=Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client.new# Create a request. To set request fields, pass in keyword arguments.request=Google::Cloud::DiscoveryEngine::V1::CreateDataStoreRequest.new# Call the create_data_store method.result=client.create_data_storerequest# The returned object is of type Gapic::Operation. You can use it to# check the status of an operation, cancel it, or wait for results.# Here is how to wait for a response.result.wait_until_done!timeout:60ifresult.response?presult.responseelseputs"No response received."endend

Import documents

require"google/cloud/discovery_engine/v1"### Snippet for the import_documents call in the DocumentService service## This snippet has been automatically generated and should be regarded as a code# template only. It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in https://cloud.google.com/ruby/docs/reference.## This is an auto-generated example demonstrating basic usage of# Google::Cloud::DiscoveryEngine::V1::DocumentService::Client#import_documents.#defimport_documents# Create a client object. The client can be reused for multiple calls.client=Google::Cloud::DiscoveryEngine::V1::DocumentService::Client.new# Create a request. To set request fields, pass in keyword arguments.request=Google::Cloud::DiscoveryEngine::V1::ImportDocumentsRequest.new# Call the import_documents method.result=client.import_documentsrequest# The returned object is of type Gapic::Operation. You can use it to# check the status of an operation, cancel it, or wait for results.# Here is how to wait for a response.result.wait_until_done!timeout:60ifresult.response?presult.responseelseputs"No response received."endend

Connect to Cloud Storage with periodic syncing

Note: This feature is a Preview offering, subject to the "Pre-GA Offerings Terms"of theGCP Service Specific Terms.Pre-GA products and features may have limited support, andchanges to pre-GA products and features may not be compatible with other pre-GAversions. For more information, see thelaunch stage descriptions.Further, by using this feature, you agree to theGenerative AI Preview terms and conditions("Preview Terms"). For this feature, you can process personal data as outlined in theCloud Data Processing Addendum,subject to applicable restrictions and obligations in the Agreement (as defined in the Preview Terms).

Before importing your data, reviewPrepare data for ingesting.

The following procedure describes how to create a data connector that associatesa Cloud Storage location with a Vertex AI Search dataconnector and how to specify a folder or file in that location for the datastore that you want to create. Data stores that are children of data connectorsare calledentity data stores.

Data is synced periodically to the entity data store. You can specifysynchronization daily, every three days, or every five days.

Console

  1. In the Google Cloud console, go to theAI Applications page.

    AI Applications

  2. Go to theData Stores page.

  3. ClickCreate data store.

  4. On theSource page, selectCloud Storage.

  5. Select what kind of data you are importing.

  6. ClickPeriodic.

  7. Select theSynchronization frequency, how often you want theVertex AI Search connector to sync with the Cloud Storagelocation. You can change the frequency later.

  8. In theSelect a folder or file you want to import section, selectFolder orFile.

  9. ClickBrowse and choose the data you haveprepared for ingesting, and then clickSelect.Alternatively, enter the location directly in thegs:// field.

  10. ClickContinue.

  11. Choose a region for your data connector.

  12. Enter a name for your data connector.

  13. Optional: If you selected unstructured documents, you can select parsing andchunking options for your documents. To compare parsers, seeParsedocuments. For information about chunking seeChunk documents forRAG.

    To select a parser, expandDocument processing options and specify theparser options that you want to use.

    TheOCR parser andlayout parser can incuradditional costs. SeeDocument AI feature pricing.

  14. ClickCreate.

    You have now created a data connector, which will periodically sync datawith the Cloud Storage location. You have also created an entitydata store, which is namedgcs_store.

  15. To check the status of your ingestion, go to theData Stores page andclick your data connector name to see details about it on itsData page

    Data ingestion activity tab. When the status column on theDataingestion activity tab changes fromIn progress tosucceeded, thefirst ingestion is complete.

    Depending on the size of your data, ingestion can take severalminutes to several hours.

After you set up your data source and import data the first time, data issynced from that source at a frequency that you select during setup.About an hour after the data connector is created, the first sync occurs.The next sync then occurs around 24 hours, 72 hours,or 120 hours later.

Next steps

Connect to Google Drive

Vertex AI Search can search data from Google Drive using data federation,which directly retrieves information from the specified data source. Becausedata isn't copied into the Vertex AI Search index, you don't need toworry about data storage.

Before you begin

  • You must be signed into the Google Cloud console with thesame account that you use for the Google Drive instance that you plan toconnect. Vertex AI Search uses your Google Workspace customer IDto connect to Google Drive.
  • Verify that all the documents are accessible, either by placing them in a shareddrive that is owned by the domain or by assigning the ownership to a user inthe domain.

  • Enable Google Workspace smartfeatures in other Google products to connect Google Drive data to Vertex AI Search. For information,seeTurn Google Workspace smart features on or off.

Limitation

Semantic and natural-language queries might not work if the Google Drive datastore is the only data store connected to the app. Keyword search is notaffected by this limitation.

Create a Google Drive data store

Console

To use the console to make Google Drive data searchable, follow thesesteps:

Note: When using data federation, the Google Drive connector only searchesdocuments that are owned by the domain of the user who performs the search.Before setting up this connector, verify that all documents are accessible,either by placing them in a shared drive that is owned this domain or bysetting ownership to a user in this domain.
  1. In the Google Cloud console, go to theAI Applications page.

    AI Applications

  2. In the navigation menu, clickData Stores.

  3. ClickCreate Data Store.

  4. On theSelect a data source page, selectGoogle Drive.

  5. Specify the drive source for your data store.

    • All: To add your entire drive to the data store.
    • Specific shared drive(s): Add the shared drive's folder ID.
    • Specific shared folder(s): Add the shared folders' ID.

    To locate the shared drive's folder ID or a specific folder ID, navigateto the shared drive or folder and copy the ID from the URL. The URLfollows this format:https://drive.google.com/corp/drive/folders/ID.

    For example,https://drive.google.com/corp/drive/folders/123456789012345678901.

  6. ClickContinue.

  7. Choose a region for your data store.

  8. Enter a name for your data store.

  9. Optional: To exclude the data in this data store from being used forgenerative AI content when you query data using the app, clickGenerative AI options and selectExclude from generative AI features.

  10. ClickCreate.

Error messages

The following table describes error messages that you might encounter when working withthis Google data source, and includes HTTP error codes and suggestedtroubleshooting steps.

Error codeError messageDescriptionTroubleshooting
403 (Permission Denied)Searching using service account credentials isn't supported for Google Workspace data stores.The engine being searched has Google Workspace data stores, and the credentials passed are of a service account. Searching using service account credentials on Google Workspace data stores isn't supported.Call search using user credentials, or remove Google Workspace data stores from the engine.
403 (Permission Denied)Consumer accounts aren't supported for Google Workspace data stores.Search is called using a consumer account (@gmail.com) credential, which isn't supported for Google Workspace data stores.Remove Google Workspace data stores from the engine or use a managed Google Account.
403 (Permission Denied)Customer id mismatch for datastoreSearch is only allowed for users who belong to same organization as Google Workspace data stores.Remove Google Workspace data stores from the engine or contact support if the user and Google Workspace data stores are meant to be in different organizations.
403 (Permission Denied)Workspace access for Agentspace disabled by organization admin.A Google Workspace administrator has disabled access to Google Workspace data for Vertex AI Search.Contact your Google Workspace administrator toenable access.
400 (Invalid Argument)Engine cannot contain both default and shared Google Drive data stores.You cannot connect a data store that has all your drives (default) and a data store that has a specific shared drives to the same app.To connect a new Google Drive data source to your app, first unlink the unneeded data store, then add the new data store you want to use.

Troubleshooting

If your search doesn't return the file you're looking for, it might be due toone of the following search index limitations of Google Drive:

Next steps

Connect to Gmail

Note: You must enable Google Workspace smart features in other Google products to connectGmail data to Vertex AI Search. For information, seeTurn Google Workspace smart features on or off.

Use the following steps to create a data store that connects to Gmailin Google Cloud console. After connecting the data store, you can attach the datastore to your search app and search over your Gmail data.

Before you begin

  • You must be signed into the Google Cloud console with the same account that youuse for the Google Workspace instance that you plan to connect.Vertex AI Searchuses your Google Workspace customer IDto connect to Gmail.

Limitations

Create a Gmail data store

Console

To use the console to make Gmail data searchable, follow thesesteps:

  1. In the Google Cloud console, go to theAI Applications page.

    AI Applications

  2. In the navigation menu, clickData Stores.

  3. ClickaddCreate Data Store.

  4. On theSelect a data source page, selectGoogle Gmail.

  5. Choose a region for your data store.

  6. Enter a name for your data store.

  7. ClickCreate.

  8. Follow the steps inCreate a search app and attach the createddata store toa Vertex AI Search app.

Error messages

The following table describes error messages that you might encounter when working withthis Google data source, and includes HTTP error codes and suggestedtroubleshooting steps.

Error codeError messageDescriptionTroubleshooting
403 (Permission Denied)Searching using service account credentials isn't supported for Google Workspace data stores.The engine being searched has Google Workspace data stores, and the credentials passed are of a service account. Searching using service account credentials on Google Workspace data stores isn't supported.Call search using user credentials, or remove Google Workspace data stores from the engine.
403 (Permission Denied)Consumer accounts aren't supported for Google Workspace data stores.Search is called using a consumer account (@gmail.com) credential, which isn't supported for Google Workspace data stores.Remove Google Workspace data stores from the engine or use a managed Google Account.
403 (Permission Denied)Customer id mismatch for datastoreSearch is only allowed for users who belong to same organization as Google Workspace data stores.Remove Google Workspace data stores from the engine or contact support if the user and Google Workspace data stores are meant to be in different organizations.
403 (Permission Denied)Workspace access for Agentspace disabled by organization admin.A Google Workspace administrator has disabled access to Google Workspace data for Vertex AI Search.Contact your Google Workspace administrator toenable access.
400 (Invalid Argument)Engine cannot contain both default and shared Google Drive data stores.You cannot connect a data store that has all your drives (default) and a data store that has a specific shared drives to the same app.To connect a new Google Drive data source to your app, first unlink the unneeded data store, then add the new data store you want to use.

Next steps

Connect to Google Sites

Note: This feature is a Preview offering, subject to the "Pre-GA Offerings Terms"of theGCP Service Specific Terms.Pre-GA products and features may have limited support, andchanges to pre-GA products and features may not be compatible with other pre-GAversions. For more information, see thelaunch stage descriptions.Further, by using this feature, you agree to theGenerative AI Preview terms and conditions("Preview Terms"). For this feature, you can process personal data as outlined in theCloud Data Processing Addendum,subject to applicable restrictions and obligations in the Agreement (as defined in the Preview Terms).

To search data from Google Sites, use the following steps to createa connector using the Google Cloud console.

Before you begin:

  • You must be signed into the Google Cloud console with the same account that youuse for the Google Workspace instance that you plan to connect.Vertex AI Search uses your Google Workspace customer ID to connectto Google Sites.

  • To enforce data source access control and secure data inVertex AI Search, ensure that you haveconfigured your identity provider.

Console

To use the console to make Google Sites data searchable, follow thesesteps:

  1. In the Google Cloud console, go to theAI Applications page.

    AI Applications

  2. Go to theData Stores page.

  3. ClickNew data store.

  4. On theSource page, selectGoogle Sites.

  5. Choose a region for your data store.

  6. Enter a name for your data store.

  7. ClickCreate.

Next steps

Connect to Google Calendar

Note: This feature is a Preview offering, subject to the "Pre-GA Offerings Terms"of theGCP Service Specific Terms.Pre-GA products and features may have limited support, andchanges to pre-GA products and features may not be compatible with other pre-GAversions. For more information, see thelaunch stage descriptions.Further, by using this feature, you agree to theGenerative AI Preview terms and conditions("Preview Terms"). For this feature, you can process personal data as outlined in theCloud Data Processing Addendum,subject to applicable restrictions and obligations in the Agreement (as defined in the Preview Terms).

Note: You must enable Google Workspace smart features in other Google products to connectGoogle Calendar data to Vertex AI Search. For information, seeTurn Google Workspace smart features on or off.

To search data from Google Calendar, use the following steps to createa data store using the Google Cloud console.

Before you begin

  • You must be signed into the Google Cloud console with the same account that youuse for the Google Workspace instance that you plan to connect.Vertex AI Search uses your Google Workspace customer ID to connectto Google Calendar.

Create a Google Calendar data store

To use the console to make Google Calendar data searchable, follow thesesteps:

  1. In the Google Cloud console, go to theAI Applications page.

    AI Applications

  2. In the navigation menu, clickData Stores.

  3. ClickCreate Data Store.

  4. On theSelect a data source page, selectGoogle Calendar.

  5. Choose a region for your data store.

  6. Enter a name for your data store.

  7. ClickCreate.

Error messages

The following table describes error messages that you might encounter when working withthis Google data source, and includes HTTP error codes and suggestedtroubleshooting steps.

Error codeError messageDescriptionTroubleshooting
403 (Permission Denied)Searching using service account credentials isn't supported for Google Workspace data stores.The engine being searched has Google Workspace data stores, and the credentials passed are of a service account. Searching using service account credentials on Google Workspace data stores isn't supported.Call search using user credentials, or remove Google Workspace data stores from the engine.
403 (Permission Denied)Consumer accounts aren't supported for Google Workspace data stores.Search is called using a consumer account (@gmail.com) credential, which isn't supported for Google Workspace data stores.Remove Google Workspace data stores from the engine or use a managed Google Account.
403 (Permission Denied)Customer id mismatch for datastoreSearch is only allowed for users who belong to same organization as Google Workspace data stores.Remove Google Workspace data stores from the engine or contact support if the user and Google Workspace data stores are meant to be in different organizations.
403 (Permission Denied)Workspace access for Agentspace disabled by organization admin.A Google Workspace administrator has disabled access to Google Workspace data for Vertex AI Search.Contact your Google Workspace administrator toenable access.
400 (Invalid Argument)Engine cannot contain both default and shared Google Drive data stores.You cannot connect a data store that has all your drives (default) and a data store that has a specific shared drives to the same app.To connect a new Google Drive data source to your app, first unlink the unneeded data store, then add the new data store you want to use.

Next steps

Connect to Google Groups

Note: This feature is a Preview offering, subject to the "Pre-GA Offerings Terms"of theGCP Service Specific Terms.Pre-GA products and features may have limited support, andchanges to pre-GA products and features may not be compatible with other pre-GAversions. For more information, see thelaunch stage descriptions.Further, by using this feature, you agree to theGenerative AI Preview terms and conditions("Preview Terms"). For this feature, you can process personal data as outlined in theCloud Data Processing Addendum,subject to applicable restrictions and obligations in the Agreement (as defined in the Preview Terms).

To search data from Google Groups, use the following steps to createa connector using the Google Cloud console.

Before you begin:

  • You must be signed into the Google Cloud console with the same account that youuse for the Google Workspace instance that you plan to connect.Vertex AI Search uses your Google Workspace customer ID to connectto Google Groups.

  • To enforce data source access control and secure data inVertex AI Search, ensure that you haveconfigured your identity provider.

Console

To use the console to make Google Groups data searchable, follow thesesteps:

  1. In the Google Cloud console, go to theAI Applications page.

    AI Applications

  2. Go to theData Stores page.

  3. ClickNew data store.

  4. On theSource page, selectGoogle Groups.

  5. Choose a region for your data store.

  6. Enter a name for your data store.

  7. ClickCreate. Depending on the size of your data, ingestion can takeseveral minutes to several hours. Wait at least an hour before using yourdata store for searching.

Next steps

Import from Cloud SQL

To ingest data from Cloud SQL, use the following steps to set upCloud SQL access, create a data store, and ingest data.

Set up staging bucket access for Cloud SQL instances

When ingesting data from Cloud SQL, data is first staged to aCloud Storage bucket. Follow these steps to give a Cloud SQLinstance access toCloud Storage buckets.

  1. In the Google Cloud console, go to theSQL page.

    SQL

  2. Click the Cloud SQL instance that you plan to import from.

  3. Copy the identifier for the instance's service account, which looks like anemail address—for example,p9876-abcd33f@gcp-sa-cloud-sql.iam.gserviceaccount.com.

  4. Go to theIAM & Admin page.

    IAM & Admin

  5. ClickGrant access.

  6. ForNew principals, enter the instance's service account identifier andselect theCloud Storage > Storage Admin role.

  7. ClickSave.

Next:

Set up Cloud SQL access from a different project

To give Vertex AI Search access to Cloud SQL data that's in adifferent project, follow these steps:

  1. Replace the followingPROJECT_NUMBER variable with yourVertex AI Search project number, and then copy the contents of thecode block. This is your Vertex AI Search service accountidentifier:

    service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com`
  2. Go to theIAM & Admin page.

    IAM & Admin

  3. Switch to your Cloud SQL project on theIAM & Admin pageand clickGrant Access.

  4. ForNew principals, enter the identifier for the service account andselect theCloud SQL > Cloud SQL Viewer role.

  5. ClickSave.

Next, go toImport data from Cloud SQL.

Import data from Cloud SQL

Console

To use the console to ingest data from Cloud SQL, follow thesesteps:

  1. In the Google Cloud console, go to theAI Applications page.

    AI Applications

  2. Go to theData Stores page.

  3. ClickNew data store.

  4. On theSource page, selectCloud SQL.

  5. Specify the project ID, instance ID, database ID, and table ID of the datathat you plan to import.

  6. ClickBrowse and choose an intermediate Cloud Storage location toexport data to, and then clickSelect. Alternatively, enter the locationdirectly in thegs:// field.

  7. Select whether to turn on serverless export. Serverless export incursadditional cost. For information about serverless export, seeMinimize theperformance impact of exports inthe Cloud SQL documentation.

  8. ClickContinue.

  9. Choose a region for your data store.

  10. Enter a name for your data store.

  11. ClickCreate.

  12. To check the status of your ingestion, go to theData Stores pageand click your data store name to see details about it on itsData page.When the status column on theActivity tab changes fromIn progresstoImport completed, the ingestion is complete.

    Depending on the size of your data, ingestion can take severalminutes or several hours.

REST

To use the command line to create a data store and ingest data fromCloud SQL, follow these steps:

  1. Create a data store.

    curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\-H"X-Goog-User-Project:PROJECT_ID"\"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID"\-d'{  "displayName": "DISPLAY_NAME",  "industryVertical": "GENERIC",  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],}'

    Replace the following:

    • PROJECT_ID: the ID of your project.
    • DATA_STORE_ID: the ID of the data store. The ID cancontain only lowercase letters, digits, underscores, and hyphens.
    • DISPLAY_NAME: the display name of the data store. This mightbe displayed in the Google Cloud console.
    Note: Theindustry verticalGENERIC is used to create structured, unstructured, and website data stores for custom search apps.
  2. Import data from Cloud SQL.

    curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import"\-d'{    "cloudSqlSource": {      "projectId": "SQL_PROJECT_ID",      "instanceId": "INSTANCE_ID",      "databaseId": "DATABASE_ID",      "tableId": "TABLE_ID",      "gcsStagingDir": "STAGING_DIRECTORY"    },    "reconciliationMode": "RECONCILIATION_MODE",    "autoGenerateIds": "AUTO_GENERATE_IDS",    "idField": "ID_FIELD",  }'

    Replace the following:

    • PROJECT_ID: the ID of your Vertex AI Search project.
    • DATA_STORE_ID: the ID of the data store. The ID can contain only lowercase letters, digits, underscores, and hyphens.
    • SQL_PROJECT_ID: the ID of your Cloud SQL project.
    • INSTANCE_ID: the ID of your Cloud SQL instance.
    • DATABASE_ID: the ID of your Cloud SQL database.
    • TABLE_ID: the ID of your Cloud SQL table.
    • STAGING_DIRECTORY: optional. A Cloud Storage directory—for example,gs://<your-gcs-bucket>/directory/import_errors.
    • RECONCILIATION_MODE: optional. Values areFULL andINCREMENTAL. Default isINCREMENTAL. SpecifyingINCREMENTAL causes an incremental refresh of data from Cloud SQL to your data store. This does an upsert operation, which adds new documents and replaces existing documents with updated documents with the same ID. SpecifyingFULL causes a full rebase of the documents in your data store. In other words, new and updated documents are added to your data store, and documents that are not in Cloud SQL are removed from your data store. TheFULL mode is helpful if you want to automatically delete documents that you no longer need.

Python

For more information, see theVertex AI SearchPython API reference documentation.

To authenticate to Vertex AI Search, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

Create a data store

fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"defcreate_data_store_sample(project_id:str,location:str,data_store_id:str,)->str:#  For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DataStoreServiceClient(client_options=client_options)# The full resource name of the collection# e.g. projects/{project}/locations/{location}/collections/default_collectionparent=client.collection_path(project=project_id,location=location,collection="default_collection",)data_store=discoveryengine.DataStore(display_name="My Data Store",# Options: GENERIC, MEDIA, HEALTHCARE_FHIRindustry_vertical=discoveryengine.IndustryVertical.GENERIC,# Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHATsolution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],# TODO(developer): Update content_config based on data store type.# Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITEcontent_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,)request=discoveryengine.CreateDataStoreRequest(parent=parent,data_store_id=data_store_id,data_store=data_store,# Optional: For Advanced Site Search Only# create_advanced_site_search=True,)# Make the requestoperation=client.create_data_store(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.CreateDataStoreMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)returnoperation.operation.name

Import documents

fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"# sql_project_id = "YOUR_SQL_PROJECT_ID"# sql_instance_id = "YOUR_SQL_INSTANCE_ID"# sql_database_id = "YOUR_SQL_DATABASE_ID"# sql_table_id = "YOUR_SQL_TABLE_ID"#  For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DocumentServiceClient(client_options=client_options)# The full resource name of the search engine branch.# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}parent=client.branch_path(project=project_id,location=location,data_store=data_store_id,branch="default_branch",)request=discoveryengine.ImportDocumentsRequest(parent=parent,cloud_sql_source=discoveryengine.CloudSqlSource(project_id=sql_project_id,instance_id=sql_instance_id,database_id=sql_database_id,table_id=sql_table_id,),# Options: `FULL`, `INCREMENTAL`reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,)# Make the requestoperation=client.import_documents(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.ImportDocumentsMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)

Next steps

Import from Spanner

Note: Importing data from Spanner is in Public preview.

To ingest data from Spanner, use the following steps to createa data store and ingest data using either the Google Cloud console or the API.

Set up Spanner access from a different project

If your Spanner data is in the same project asVertex AI Search, skip toImport data fromSpanner.

To give Vertex AI Search access to Spanner data that isin a different project, follow these steps:

  1. Replace the followingPROJECT_NUMBER variable with yourVertex AI Search project number, and then copy the contents of thiscode block. This is your Vertex AI Search service accountidentifier:

    service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com
  2. Go to theIAM & Admin page.

    IAM & Admin

  3. Switch to your Spanner project on theIAM & Admin pageand clickGrant Access.

  4. ForNew principals, enter the identifier for the service account andselect one of the following:

    • If you won't use data boost during import, select theCloud Spanner >Cloud Spanner Database Reader role.
    • If you plan to use data boost during import, select theCloud Spanner >Cloud Spanner Database Admin role, or a custom role with the permissions ofCloud Spanner Database Reader andspanner.databases.useDataBoost.For information about Data Boost, seeData Boost overview in theSpanner documentation.
  5. ClickSave.

Next, go toImport data from Spanner.

Import data from Spanner

Console

To use the console to ingest data from Spanner, follow thesesteps:

  1. In the Google Cloud console, go to theAI Applications page.

    AI Applications

  2. Go to theData Stores page.

  3. ClickNew data store.

  4. On theSource page, selectCloud Spanner.

  5. Specify the project ID, instance ID, database ID, and table ID of the datathat you plan to import.

  6. Select whether to turn on Data Boost. For information aboutData Boost, seeData Boost overview in theSpanner documentation.

  7. ClickContinue.

  8. Choose a region for your data store.

  9. Enter a name for your data store.

  10. ClickCreate.

  11. To check the status of your ingestion, go to theData Stores pageand click your data store name to see details about it on itsData page.When the status column on theActivity tab changes fromIn progresstoImport completed, the ingestion is complete.

    Depending on the size of your data, ingestion can take severalminutes or several hours.

REST

To use the command line to create a data store and ingest data fromSpanner, follow these steps:

  1. Create a data store.

    curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\-H"X-Goog-User-Project:PROJECT_ID"\"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID"\-d'{  "displayName": "DISPLAY_NAME",  "industryVertical": "GENERIC",  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],  "contentConfig": "CONTENT_REQUIRED",}'

    Replace the following:

    • PROJECT_ID: the ID of your Vertex AI Search project.
    • DATA_STORE_ID: the ID of the data store. The ID cancontain only lowercase letters, digits, underscores, and hyphens.
    • DISPLAY_NAME: the display name of the data store. This mightbe displayed in the Google Cloud console.
    Note: Theindustry verticalGENERIC is used to create structured, unstructured, and website data stores for custom search apps.
  2. Import data from Spanner.

    curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import"\-d'{    "cloudSpannerSource": {      "projectId": "SPANNER_PROJECT_ID",      "instanceId": "INSTANCE_ID",      "databaseId": "DATABASE_ID",      "tableId": "TABLE_ID",      "enableDataBoost": "DATA_BOOST_BOOLEAN"    },    "reconciliationMode": "RECONCILIATION_MODE",    "autoGenerateIds": "AUTO_GENERATE_IDS",    "idField": "ID_FIELD",  }'

    Replace the following:

    • PROJECT_ID: the ID of your Vertex AI Search project.
    • DATA_STORE_ID: the ID of the data store.
    • SPANNER_PROJECT_ID: the ID of your Spanner project.
    • INSTANCE_ID: the ID of your Spanner instance.
    • DATABASE_ID: the ID of your Spanner database.
    • TABLE_ID: the ID of your Spanner table.
    • DATA_BOOST_BOOLEAN: optional. Whether to turn on Data Boost. For information about Data Boost, seeData Boost overview in the Spanner documentation.
    • RECONCILIATION_MODE: optional. Values areFULL andINCREMENTAL. Default isINCREMENTAL. SpecifyingINCREMENTAL causes an incremental refresh of data from Spanner to your data store. This does an upsert operation, which adds new documents and replaces existing documents with updated documents with the same ID. SpecifyingFULL causes a full rebase of the documents in your data store. In other words, new and updated documents are added to your data store, and documents that are not in Spanner are removed from your data store. TheFULL mode is helpful if you want to automatically delete documents that you no longer need.
    • AUTO_GENERATE_IDS: optional. Specifies whether to automatically generate document IDs. If set totrue, document IDs are generated based on a hash of the payload. Note that generated document IDs might not remain consistent over multiple imports. If you auto-generate IDs over multiple imports, Google highly recommends settingreconciliationMode toFULL to maintain consistent document IDs.

    • ID_FIELD: optional. Specifies which fields are the document IDs.

Python

For more information, see theVertex AI SearchPython API reference documentation.

To authenticate to Vertex AI Search, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

Create a data store

fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"defcreate_data_store_sample(project_id:str,location:str,data_store_id:str,)->str:#  For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DataStoreServiceClient(client_options=client_options)# The full resource name of the collection# e.g. projects/{project}/locations/{location}/collections/default_collectionparent=client.collection_path(project=project_id,location=location,collection="default_collection",)data_store=discoveryengine.DataStore(display_name="My Data Store",# Options: GENERIC, MEDIA, HEALTHCARE_FHIRindustry_vertical=discoveryengine.IndustryVertical.GENERIC,# Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHATsolution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],# TODO(developer): Update content_config based on data store type.# Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITEcontent_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,)request=discoveryengine.CreateDataStoreRequest(parent=parent,data_store_id=data_store_id,data_store=data_store,# Optional: For Advanced Site Search Only# create_advanced_site_search=True,)# Make the requestoperation=client.create_data_store(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.CreateDataStoreMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)returnoperation.operation.name

Import documents

fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"# spanner_project_id = "YOUR_SPANNER_PROJECT_ID"# spanner_instance_id = "YOUR_SPANNER_INSTANCE_ID"# spanner_database_id = "YOUR_SPANNER_DATABASE_ID"# spanner_table_id = "YOUR_SPANNER_TABLE_ID"#  For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DocumentServiceClient(client_options=client_options)# The full resource name of the search engine branch.# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}parent=client.branch_path(project=project_id,location=location,data_store=data_store_id,branch="default_branch",)request=discoveryengine.ImportDocumentsRequest(parent=parent,spanner_source=discoveryengine.SpannerSource(project_id=spanner_project_id,instance_id=spanner_instance_id,database_id=spanner_database_id,table_id=spanner_table_id,),# Options: `FULL`, `INCREMENTAL`reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,)# Make the requestoperation=client.import_documents(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.ImportDocumentsMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)

Next steps

Import from Firestore

To ingest data from Firestore, use the following steps to createa data store and ingest data using either the Google Cloud console or the API.

If your Firestore data is in the same project asVertex AI Search, go toImport data fromFirestore.

If your Firestore data is in a different project than yourVertex AI Search project, go toSet up Firestoreaccess.

Set up Firestore access from a different project

To give Vertex AI Search access to Firestore data that'sin a different project, follow these steps:

  1. Replace the followingPROJECT_NUMBER variable with yourVertex AI Search project number, and then copy the contents of thiscode block. This is your Vertex AI Search service accountidentifier:

    service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com
  2. Go to theIAM & Admin page.

    IAM & Admin

  3. Switch to your Firestore project on theIAM & Admin pageand clickGrant Access.

  4. ForNew principals, enter the instance's service account identifier andselect theDatastore > Cloud Datastore Import Export Admin role.

  5. ClickSave.

  6. Switch back to your Vertex AI Search project.

Next, go toImport data from Firestore.

Import data from Firestore

Console

To use the console to ingest data from Firestore, follow thesesteps:

  1. In the Google Cloud console, go to theAI Applications page.

    AI Applications

  2. Go to theData Stores page.

  3. ClickNew data store.

  4. On theSource page, selectFirestore.

  5. Specify the project ID, database ID, and collection ID of the data that youplan to import.

  6. ClickContinue.

  7. Choose a region for your data store.

  8. Enter a name for your data store.

  9. ClickCreate.

  10. To check the status of your ingestion, go to theData Stores pageand click your data store name to see details about it on itsData page.When the status column on theActivity tab changes fromIn progresstoImport completed, the ingestion is complete.

    Depending on the size of your data, ingestion can take severalminutes or several hours.

REST

To use the command line to create a data store and ingest data fromFirestore, follow these steps:

  1. Create a data store.

    curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\-H"X-Goog-User-Project:PROJECT_ID"\"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID"\-d'{  "displayName": "DISPLAY_NAME",  "industryVertical": "GENERIC",  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],}'

    Replace the following:

    • PROJECT_ID: the ID of your project.
    • DATA_STORE_ID: the ID of the data store. The ID cancontain only lowercase letters, digits, underscores, and hyphens.
    • DISPLAY_NAME: the display name of the data store. This mightbe displayed in the Google Cloud console.
    Note: Theindustry verticalGENERIC is used to create structured, unstructured, and website data stores for custom search apps.
  2. Import data from Firestore.

    curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import"\-d'{    "firestoreSource": {      "projectId": "FIRESTORE_PROJECT_ID",      "databaseId": "DATABASE_ID",      "collectionId": "COLLECTION_ID",    },    "reconciliationMode": "RECONCILIATION_MODE",    "autoGenerateIds": "AUTO_GENERATE_IDS",    "idField": "ID_FIELD",  }'

    Replace the following:

    • PROJECT_ID: the ID of your Vertex AI Search project.
    • DATA_STORE_ID: the ID of the data store. The ID can contain only lowercase letters, digits, underscores, and hyphens.
    • FIRESTORE_PROJECT_ID: the ID of your Firestore project.
    • DATABASE_ID: the ID of your Firestore database.
    • COLLECTION_ID: the ID of your Firestore collection.
    • RECONCILIATION_MODE: optional. Values areFULL andINCREMENTAL. Default isINCREMENTAL. SpecifyingINCREMENTAL causes an incremental refresh of data from Firestore to your data store. This does an upsert operation, which adds new documents and replaces existing documents with updated documents with the same ID. SpecifyingFULL causes a full rebase of the documents in your data store. In other words, new and updated documents are added to your data store, and documents that are not in Firestore are removed from your data store. TheFULL mode is helpful if you want to automatically delete documents that you no longer need.
    • AUTO_GENERATE_IDS: optional. Specifies whether to automatically generate document IDs. If set totrue, document IDs are generated based on a hash of the payload. Note that generated document IDs might not remain consistent over multiple imports. If you auto-generate IDs over multiple imports, Google highly recommends settingreconciliationMode toFULL to maintain consistent document IDs.
    • ID_FIELD: optional. Specifies which fields are the document IDs.

Python

For more information, see theVertex AI SearchPython API reference documentation.

To authenticate to Vertex AI Search, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

Create a data store

fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"defcreate_data_store_sample(project_id:str,location:str,data_store_id:str,)->str:#  For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DataStoreServiceClient(client_options=client_options)# The full resource name of the collection# e.g. projects/{project}/locations/{location}/collections/default_collectionparent=client.collection_path(project=project_id,location=location,collection="default_collection",)data_store=discoveryengine.DataStore(display_name="My Data Store",# Options: GENERIC, MEDIA, HEALTHCARE_FHIRindustry_vertical=discoveryengine.IndustryVertical.GENERIC,# Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHATsolution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],# TODO(developer): Update content_config based on data store type.# Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITEcontent_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,)request=discoveryengine.CreateDataStoreRequest(parent=parent,data_store_id=data_store_id,data_store=data_store,# Optional: For Advanced Site Search Only# create_advanced_site_search=True,)# Make the requestoperation=client.create_data_store(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.CreateDataStoreMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)returnoperation.operation.name

Import documents

fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"# firestore_project_id = "YOUR_FIRESTORE_PROJECT_ID"# firestore_database_id = "YOUR_FIRESTORE_DATABASE_ID"# firestore_collection_id = "YOUR_FIRESTORE_COLLECTION_ID"#  For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DocumentServiceClient(client_options=client_options)# The full resource name of the search engine branch.# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}parent=client.branch_path(project=project_id,location=location,data_store=data_store_id,branch="default_branch",)request=discoveryengine.ImportDocumentsRequest(parent=parent,firestore_source=discoveryengine.FirestoreSource(project_id=firestore_project_id,database_id=firestore_database_id,collection_id=firestore_collection_id,),# Options: `FULL`, `INCREMENTAL`reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,)# Make the requestoperation=client.import_documents(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.ImportDocumentsMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)

Next steps

Import from Bigtable

Note: Importing data from Bigtable is in Public preview.

To ingest data from Bigtable, use the following steps to createa data store and ingest data using the API.

Set up Bigtable access

To give Vertex AI Search access to Bigtable data that'sin a different project, follow these steps:

  1. Replace the followingPROJECT_NUMBER variable with yourVertex AI Search project number, then copy the contents of thiscode block. This is your Vertex AI Search service accountidentifier:

    service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com`
  2. Go to theIAM & Admin page.

    IAM & Admin

  3. Switch to your Bigtable project on theIAM & Admin pageand clickGrant Access.

  4. ForNew principals, enter the instance's service account identifier andselect theBigtable > Bigtable Reader role.

  5. ClickSave.

  6. Switch back to your Vertex AI Search project.

Next, go toImport data from Bigtable.

Import data from Bigtable

REST

To use the command line to create a data store and ingest data fromBigtable, follow these steps:

  1. Create a data store.

    curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\-H"X-Goog-User-Project:PROJECT_ID"\"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID"\-d'{  "displayName": "DISPLAY_NAME",  "industryVertical": "GENERIC",  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],}'

    Replace the following:

    • PROJECT_ID: the ID of your project.
    • DATA_STORE_ID: the ID of the data store. The ID cancontain only lowercase letters, digits, underscores, and hyphens.
    • DISPLAY_NAME: the display name of the data store. This mightbe displayed in the Google Cloud console.
    Note: Theindustry verticalGENERIC is used to create structured, unstructured, and website data stores for custom search apps.
  2. Import data from Bigtable.

    curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import"\-d'{    "bigtableSource ": {      "projectId": "BIGTABLE_PROJECT_ID",      "instanceId": "INSTANCE_ID",      "tableId": "TABLE_ID",      "bigtableOptions": {        "keyFieldName": "KEY_FIELD_NAME",        "families": {          "key": "KEY",          "value": {            "fieldName": "FIELD_NAME",            "encoding": "ENCODING",            "type": "TYPE",            "columns": [              {                "qualifier": "QUALIFIER",                "fieldName": "FIELD_NAME",                "encoding": "COLUMN_ENCODING",                "type": "COLUMN_VALUES_TYPE"              }            ]          }         }         ...      }    },    "reconciliationMode": "RECONCILIATION_MODE",    "autoGenerateIds": "AUTO_GENERATE_IDS",    "idField": "ID_FIELD",  }'

    Replace the following:

    • PROJECT_ID: the ID of your Vertex AI Search project.
    • DATA_STORE_ID: the ID of the data store. The ID can contain only lowercase letters, digits, underscores, and hyphens.
    • BIGTABLE_PROJECT_ID: the ID of your Bigtable project.
    • INSTANCE_ID: the ID of your Bigtable instance.
    • TABLE_ID: the ID of your Bigtable table.
    • KEY_FIELD_NAME: optional but recommended. The field name to use for the row key value after ingesting to Vertex AI Search.
    • KEY: required. A string value for the column family key.
    • ENCODING: optional. The encoding mode of the values when the type is not STRING.This can be overridden for a specific column by listing that column incolumns and specifying an encoding for it.
    • COLUMN_TYPE: optional. The type of values in this column family.
    • QUALIFIER: required. Qualifier of the column.
    • FIELD_NAME: optional but recommended. The field name to use for this column after ingesting to Vertex AI Search.
    • COLUMN_ENCODING: optional. The encoding mode of the values for a specific column when the type is not STRING.
    • RECONCILIATION_MODE: optional. Values areFULL andINCREMENTAL. Default isINCREMENTAL. SpecifyingINCREMENTAL causes an incremental refresh of data from Bigtable to your data store. This does an upsert operation, which adds new documents and replaces existing documents with updated documents with the same ID. SpecifyingFULL causes a full rebase of the documents in your data store. In other words, new and updated documents are added to your data store, and documents that are not in Bigtable are removed from your data store. TheFULL mode is helpful if you want to automatically delete documents that you no longer need.
    • AUTO_GENERATE_IDS: optional. Specifies whether to automatically generate document IDs. If set totrue, document IDs are generated based on a hash of the payload. Note that generated document IDs might not remain consistent over multiple imports. If you auto-generate IDs over multiple imports, Google highly recommends settingreconciliationMode toFULL to maintain consistent document IDs.

      SpecifyautoGenerateIds only whenbigquerySource.dataSchema is set tocustom. Otherwise anINVALID_ARGUMENT error is returned. If you don't specifyautoGenerateIds or set it tofalse, you must specifyidField. Otherwise the documents fail to import.

    • ID_FIELD: optional. Specifies which fields are the document IDs.

Python

For more information, see theVertex AI SearchPython API reference documentation.

To authenticate to Vertex AI Search, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

Create a data store

fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"defcreate_data_store_sample(project_id:str,location:str,data_store_id:str,)->str:#  For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DataStoreServiceClient(client_options=client_options)# The full resource name of the collection# e.g. projects/{project}/locations/{location}/collections/default_collectionparent=client.collection_path(project=project_id,location=location,collection="default_collection",)data_store=discoveryengine.DataStore(display_name="My Data Store",# Options: GENERIC, MEDIA, HEALTHCARE_FHIRindustry_vertical=discoveryengine.IndustryVertical.GENERIC,# Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHATsolution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],# TODO(developer): Update content_config based on data store type.# Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITEcontent_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,)request=discoveryengine.CreateDataStoreRequest(parent=parent,data_store_id=data_store_id,data_store=data_store,# Optional: For Advanced Site Search Only# create_advanced_site_search=True,)# Make the requestoperation=client.create_data_store(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.CreateDataStoreMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)returnoperation.operation.name

Import documents

fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"# bigtable_project_id = "YOUR_BIGTABLE_PROJECT_ID"# bigtable_instance_id = "YOUR_BIGTABLE_INSTANCE_ID"# bigtable_table_id = "YOUR_BIGTABLE_TABLE_ID"#  For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DocumentServiceClient(client_options=client_options)# The full resource name of the search engine branch.# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}parent=client.branch_path(project=project_id,location=location,data_store=data_store_id,branch="default_branch",)bigtable_options=discoveryengine.BigtableOptions(families={"family_name_1":discoveryengine.BigtableOptions.BigtableColumnFamily(type_=discoveryengine.BigtableOptions.Type.STRING,encoding=discoveryengine.BigtableOptions.Encoding.TEXT,columns=[discoveryengine.BigtableOptions.BigtableColumn(qualifier="qualifier_1".encode("utf-8"),field_name="field_name_1",),],),"family_name_2":discoveryengine.BigtableOptions.BigtableColumnFamily(type_=discoveryengine.BigtableOptions.Type.INTEGER,encoding=discoveryengine.BigtableOptions.Encoding.BINARY,),})request=discoveryengine.ImportDocumentsRequest(parent=parent,bigtable_source=discoveryengine.BigtableSource(project_id=bigtable_project_id,instance_id=bigtable_instance_id,table_id=bigtable_table_id,bigtable_options=bigtable_options,),# Options: `FULL`, `INCREMENTAL`reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,)# Make the requestoperation=client.import_documents(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.ImportDocumentsMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)

Next steps

Import from AlloyDB for PostgreSQL

Note: Importing data from AlloyDB for PostgreSQL is in Public preview.

To ingest data from AlloyDB for PostgreSQL, use the following steps to createa data store and ingest data using either the Google Cloud console or the API.

If your AlloyDB for PostgreSQL data is in the same project asVertex AI Search project, go toImport data fromAlloyDB for PostgreSQL.

If your AlloyDB for PostgreSQL data is in a different project than yourVertex AI Search project, go toSet up AlloyDB for PostgreSQLaccess.

Set up AlloyDB for PostgreSQL access from a different project

To give Vertex AI Search access to AlloyDB for PostgreSQL data that'sin a different project, follow these steps:

  1. Replace the followingPROJECT_NUMBER variable with yourVertex AI Search project number, and then copy the contents of thiscode block. This is your Vertex AI Search service accountidentifier:

    service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com
  2. Switch to the Google Cloud project where your AlloyDB for PostgreSQL dataresides.

  3. Go to theIAM page.

    IAM

  4. ClickGrant Access.

  5. ForNew principals, enter the Vertex AI Search service accountidentifier andselect theCloud AlloyDB > Cloud AlloyDB Admin role.

  6. ClickSave.

  7. Switch back to your Vertex AI Search project.

Next, go toImport data from AlloyDB for PostgreSQL.

Import data from AlloyDB for PostgreSQL

Console

To use the console to ingest data from AlloyDB for PostgreSQL, follow thesesteps:

  1. In the Google Cloud console, go to theAI Applications page.

    AI Applications

  2. In the navigation menu, clickData Stores.

  3. ClickCreate data store.

  4. On theSource page, selectAlloyDB.

  5. Specify the project ID, location ID, cluster ID, database ID, and table IDof the data that you plan to import.

  6. ClickContinue.

  7. Choose a region for your data store.

  8. Enter a name for your data store.

  9. ClickCreate.

  10. To check the status of your ingestion, go to theData Stores pageand click your data store name to see details about it on itsData page.When the status column on theActivity tab changes fromIn progresstoImport completed, the ingestion is complete.

    Depending on the size of your data, ingestion can take severalminutes or several hours.

REST

To use the command line to create a data store and ingest data fromAlloyDB for PostgreSQL, follow these steps:

  1. Create a data store.

    curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\-H"X-Goog-User-Project:PROJECT_ID"\"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID"\-d'{  "displayName": "DISPLAY_NAME",  "industryVertical": "GENERIC",  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],}'

    Replace the following:

    • PROJECT_ID: the ID of your project.
    • DATA_STORE_ID: the ID of the data store. The ID cancontain only lowercase letters, digits, underscores, and hyphens.
    • DISPLAY_NAME: the display name of the data store. This mightbe displayed in the Google Cloud console.
    Note: Theindustry verticalGENERIC is used to create structured, unstructured, and website data stores for custom search apps.
  2. Import data from AlloyDB for PostgreSQL.

    curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import"\-d'{    "alloydbSource": {      "projectId": "ALLOYDB_PROJECT_ID",      "locationId": "LOCATION_ID",      "clusterId": "CLUSTER_ID",      "databaseId": "DATABASE_ID",      "tableId": "TABLE_ID",    },    "reconciliationMode": "RECONCILIATION_MODE",    "autoGenerateIds": "AUTO_GENERATE_IDS",    "idField": "ID_FIELD",  }'

    Replace the following:

    • PROJECT_ID: the ID of your Vertex AI Search project.
    • DATA_STORE_ID: the ID of the data store. The ID can contain only lowercase letters, digits, underscores, and hyphens.
    • ALLOYDB_PROJECT_ID: the ID of your AlloyDB for PostgreSQL project.
    • LOCATION_ID: the ID of your AlloyDB for PostgreSQL location.
    • CLUSTER_ID: the ID of your AlloyDB for PostgreSQL cluster.
    • DATABASE_ID: the ID of your AlloyDB for PostgreSQL database.
    • TABLE_ID: the ID of your AlloyDB for PostgreSQL table.
    • RECONCILIATION_MODE: optional. Values areFULL andINCREMENTAL. Default isINCREMENTAL. SpecifyingINCREMENTAL causes an incremental refresh of data from AlloyDB for PostgreSQL to your data store. This does an upsert operation, which adds new documents and replaces existing documents with updated documents with the same ID. SpecifyingFULL causes a full rebase of the documents in your data store. In other words, new and updated documents are added to your data store, and documents that are not in AlloyDB for PostgreSQL are removed from your data store. TheFULL mode is helpful if you want to automatically delete documents that you no longer need.
    • AUTO_GENERATE_IDS: optional. Specifies whether to automatically generate document IDs. If set totrue, document IDs are generated based on a hash of the payload. Note that generated document IDs might not remain consistent over multiple imports. If you auto-generate IDs over multiple imports, Google highly recommends settingreconciliationMode toFULL to maintain consistent document IDs.
    • ID_FIELD: optional. Specifies which fields are the document IDs.

Python

For more information, see theVertex AI SearchPython API reference documentation.

To authenticate to Vertex AI Search, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

Create a data store

fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"defcreate_data_store_sample(project_id:str,location:str,data_store_id:str,)->str:#  For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DataStoreServiceClient(client_options=client_options)# The full resource name of the collection# e.g. projects/{project}/locations/{location}/collections/default_collectionparent=client.collection_path(project=project_id,location=location,collection="default_collection",)data_store=discoveryengine.DataStore(display_name="My Data Store",# Options: GENERIC, MEDIA, HEALTHCARE_FHIRindustry_vertical=discoveryengine.IndustryVertical.GENERIC,# Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHATsolution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],# TODO(developer): Update content_config based on data store type.# Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITEcontent_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,)request=discoveryengine.CreateDataStoreRequest(parent=parent,data_store_id=data_store_id,data_store=data_store,# Optional: For Advanced Site Search Only# create_advanced_site_search=True,)# Make the requestoperation=client.create_data_store(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.CreateDataStoreMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)returnoperation.operation.name

Import documents

fromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloudimportdiscoveryengine_v1asdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"# location = "YOUR_LOCATION" # Values: "global"# data_store_id = "YOUR_DATA_STORE_ID"# alloy_db_project_id = "YOUR_ALLOY_DB_PROJECT_ID"# alloy_db_location_id = "YOUR_ALLOY_DB_LOCATION_ID"# alloy_db_cluster_id = "YOUR_ALLOY_DB_CLUSTER_ID"# alloy_db_database_id = "YOUR_ALLOY_DB_DATABASE_ID"# alloy_db_table_id = "YOUR_ALLOY_DB_TABLE_ID"# For more information, refer to:# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_storeclient_options=(ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")iflocation!="global"elseNone)# Create a clientclient=discoveryengine.DocumentServiceClient(client_options=client_options)# The full resource name of the search engine branch.# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}parent=client.branch_path(project=project_id,location=location,data_store=data_store_id,branch="default_branch",)request=discoveryengine.ImportDocumentsRequest(parent=parent,alloy_db_source=discoveryengine.AlloyDbSource(project_id=alloy_db_project_id,location_id=alloy_db_location_id,cluster_id=alloy_db_cluster_id,database_id=alloy_db_database_id,table_id=alloy_db_table_id,),# Options: `FULL`, `INCREMENTAL`reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,)# Make the requestoperation=client.import_documents(request=request)print(f"Waiting for operation to complete:{operation.operation.name}")response=operation.result()# After the operation is complete,# get information from operation metadatametadata=discoveryengine.ImportDocumentsMetadata(operation.metadata)# Handle the responseprint(response)print(metadata)

Next steps

Upload structured JSON data with the API

To directly upload a JSON document or object using the API, follow these steps.

Before importing your data,Prepare data for ingesting.

REST

To use the command line to create a data store and import structured JSON data,follow these steps.

Note: If you want to specify a schema instead of lettingVertex AI auto-detect the schema for you, do the steps inProvide your own schema as a JSON object and thenbegin the following procedure at step 2.
  1. Create a data store.

    curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\-H"X-Goog-User-Project:PROJECT_ID"\"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID"\-d'{  "displayName": "DATA_STORE_DISPLAY_NAME",  "industryVertical": "GENERIC",  "solutionTypes": ["SOLUTION_TYPE_SEARCH"]}'

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • DATA_STORE_ID: the ID of the Vertex AI Search data store that you want to create. This ID can contain only lowercaseletters, digits, underscores, and hyphens.
    • DATA_STORE_DISPLAY_NAME: the display name of the Vertex AISearch data store that you want to create.
    Note: Theindustry verticalGENERIC is used to create structured, unstructured, and website data stores for custom search apps.
  2. Import structured data.

    There are a few approaches that you can use to upload data, including:

    • Upload a JSON document.

      curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents?documentId=DOCUMENT_ID"\-d'{  "jsonData": "JSON_DOCUMENT_STRING"}'

      Replace the following:

      • DOCUMENT_ID: a unique ID for the document.This ID can be up to 63 characters long and contain only lowercaseletters, digits, underscores, and hyphens.
      • JSON_DOCUMENT_STRING: the JSON document as asingle string. This must conform to the JSON schema that youprovided in the previous step—for example:

        { \"title\": \"test title\", \"categories\": [\"cat_1\", \"cat_2\"], \"uri\": \"test uri\"}
    • Upload a JSON object.

      curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents?documentId=DOCUMENT_ID"\-d'{  "structData":JSON_DOCUMENT_OBJECT}'

      ReplaceJSON_DOCUMENT_OBJECT with the JSON document as a JSON object. This must conform to the JSON schema that you provided in the previous step—for example:

      {"title":"test title","categories":["cat_1","cat_2"],"uri":"test uri"}
    • Update with a JSON document.

      curl-XPATCH\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID"\-d'{  "jsonData": "JSON_DOCUMENT_STRING"}'
    • Update with a JSON object.

      curl-XPATCH\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID"\-d'{  "structData":JSON_DOCUMENT_OBJECT}'

Next steps

Troubleshoot data ingestion

If you are having problems with data ingestion, review these tips:

  • If you're usingcustomer-managed encryption keys and data import fails(with error messageThe caller does not have permission), then make surethat the CryptoKey Encrypter/Decrypter IAM role(roles/cloudkms.cryptoKeyEncrypterDecrypter) on the key has been granted tothe Cloud Storage service agent. For more information, seeBefore you begin in "Customer-managed encryptionkeys".

  • If you are using advanced website indexing and theDocument usage for thedata store is much lower than you expect, then review the URL patterns that youspecified for indexing and make sure that the URL patterns specified cover thepages that you want to index and expand them if needed. For example, ifyou used*.en.example.com/*, you might need to add*.example.com/* to thesites you want indexed.

Create a data store using Terraform

You can use Terraform to create an empty data store. After the empty data storeis created, you can ingest data into the data store using the Google Cloud consoleor API commands.

To learn how to apply or remove a Terraform configuration, seeBasic Terraform commands.

To create an empty data store using Terraform, seegoogle_discovery_engine_data_store.

Connect a third-party data source

Connecting third-party data sources to Vertex AI Search is no longer supported.

See the instructionson how toConnect a third-party datasource withGemini Enterprise documentation.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.