About apps and data stores

This page describes Vertex AI Search apps and data stores.

With Vertex AI Search, you create a search orrecommendations app and connect it to a data store.A Google Cloud project can contain multiple apps.

Relationship between apps and data stores

The relationship between apps and data stores depends on the type of app:

  • Custom search apps have a many-to-many relationship with data stores. Whenmultiple data stores are connected to a single custom search app, this isreferred to asblended search. Forinformation about limitations of connecting a search app to more than onedata store, seeAbout blended search.

  • A custom recommendations app has a one-to-one connection with its datastore.

  • A media app has a many-to-one relationship with its data store. An appcan only connect to one data store, whereas a given data store can beconnected to several apps. For example, a media searchapp and a media recommendations app can share a data store.

  • A healthcare search app has a many-to-one relationship with its data store. An appcan only connect to one data store, whereas a given data store can beconnected to several apps. For example, a patient-facing app and aprovider-facing app can connect to the same data store.

    For a batch data import of healthcare data, data is imported into a datastore that's within an app. For streaming data import (Preview) ofhealthcare data, data is imported into anentity, which is a type of datastore that's within a data connector. A data connector is also a type ofdata store that's within an app.

After a data store is connected to an app, it can't be disconnected.

Method of app creation and data ingestion

How you create an app and ingest data depends on the type of data you have:

  • For website data, you can use either the Google Cloud console or the API.To use a website data created with the API, you must attach it to an appwith Enterprise features enabled in the Google Cloud console.

  • For structured or unstructured data, you can use either theGoogle Cloud console or the API.

  • For healthcare data, you can use either the Google Cloud console or the API.

Documents

Each data store has one or more data records, calleddocuments. What adocument represents varies depending on the type of data in the data store:

  • Website. A document is a web page.

  • Structured data. A document is a row in a table or a JSONrecord that follows a particular schema. You can provide this schema yourselfor you can let Vertex AI Search derive the schema from the ingesteddata.

  • Structured data for media. A document is a row in a table or a JSONrecord that follows a schema that is specific to media. Thedocuments are records pertaining to media content, such as videos, newsarticles, music files, and podcasts. A document contains information thatdescribes the media item, at minimum: title, URI to the content location,categories, duration, and available date.

  • Unstructured data. A document is a file in any of the following formats:TXT, PDF, HTML, DOCX, PPTX, XLSX, and XLSM.

  • Healthcare FHIR data. A document is a supportedFHIR R4resource. For a list of FHIR R4 resources thatVertex AI Search supports, seeHealthcare FHIR R4 data schema reference.

Data stores and apps

In Vertex AI Search, there are various kinds of data stores.A data store can contain only one type of data.

Website data

A data store with website data uses data indexed from public websites. You canprovide a set of URL patterns that you want to include in your data store. Theweb pages that fit the URL patterns are calledincluded web pages.You can then set up search over data crawled from theincluded web pages.

For example, you can provide URL patterns such asexample.com/faq/*andexample.com/events/* and enable search overthe data crawled from these web pages that fit the pattern. This data includestext, images tagged with metadata, and other structured data such asmetatags, PageMap attributes, and schema.org data.

You can also provide URL patterns for portions of websites that you wantexcluded, for example,example.com/events/members-only/* orexample.com/events/past-*. Excluded URLs take priority over included ones.

There are two types of website data stores:

  • Basic website search:

    • Provides search capabilities over the existing Google Search index for theincluded websites.
    • Doesn't require domain verification.
  • Advanced website indexing:

    • Provides advanced search capabilities over an index that's generated basedon either of the following:
      • The Vertex AI Search app owners can control which web pages are indexed bysubmitting sitemaps and maintaining them. For more information, seeIndex and refresh web pages using sitemaps. Thisprocess keeps the index fresh without manual intervention.
      • The Vertex AI Search app owners can perform an initial indexing thatmirrors the Google Search index and then expand the index's coverage byrecrawling the websites whenever necessary, keeping it fresh. For moreinformation, seeRefresh web pages.The advanced capabilities of advanced website indexing are listed inAdvanced website indexing.
    • Requires Vertex AI Search data stores owners to verify the domains to whichthe included websites belong. For more information, seeVerify website domains.
    • Provides the capability to add structured data to the data store schema.A website contains unstructured data, but you can add structured data in theform ofmeta tags, PageMap attributes, and schema.org data to yourweb pages. You can then use this structured data to edit the data storeschema as explained inUse structured data for advanced website indexing.

What's next

For website search:

Structured data

A data store with structured data enables semantic search or recommendationsover structured data. You can import data from BigQuery orCloud Storage. You can also manually upload structured JSON data through theAPI.

For example, you can enable search or recommendations over a product catalog foryour ecommerce experience or a directory of doctors for provider search orrecommendations.

Vertex AI Search auto-detects the schema from the data that youimport. Optionally, you can provide a schema for your data. Providing a schemafor your data typically improves the quality of results.

What's next

For custom search:

For custom recommendations:

Structured data for media

Media apps can only be connected to media data stores. Media datastores are structured data stores with a Google-defined schema or with your owncustom schema that contains a specific set of five media-related fields. Formore information about the schema, seeAbout media documents and datastores.

For example, you can enable recommendations by creating a media recommendationsapp for a movie catalog or a news site so that your users will have suitableand personalized suggestion made for them.

In addition to media documents, media data stores alsocontain the user event information that allows Vertex AI Searchto customize recommendations and search for your users. User events are requiredfor media apps. For information about user events, seeRecord real-time userevents.

What's next

Unstructured data

An unstructured data store enables semantic search over data such as documentsand images.

Unstructured data stores support documents in TXT, PDF, HTML, DOCX, PPTX, XLSX, and XLSMformats.

Search provides results in the form of 10 URLs and summarized answers fornatural language queries. Documents must be uploaded to a Cloud Storagebucket with appropriate access permissions. For example, a financial institutioncan enable search over their private corpus of financialresearch publications, or a biotech company can enable search or recommendationsover their private repository of medical research.

What's next

For search:

Healthcare FHIR data

A healthcare search app uses FHIR R4 data imported from a Cloud Healthcare API FHIRstore. For a list of FHIR R4 resources that Vertex AI Searchsupports, seeHealthcare FHIR R4 data schema reference.A FHIR R4 data store must satisfy some requirements before it can be used as adata source for Vertex AI Search data store. For more information, seehow toprepare healthcare FHIR data for ingestion.

What's next

About blended search

You can create a blended search app, where multiple data stores can be connectedto a single custom search app. This feature lets you use one app to searchacross multiple sources and types of data.

To make a blended search app, select multiple data stores when creating a newcustom search app. If you don't select multiple data stores during creation,then you can't add additional data stores later.

When getting search results, you can either search across all data stores, orfilter for results from a single data store.

The following limitations apply:

  • Adding and removing data stores:
    • To turn on blended search for an app, you must connect at least two datastores to it during app creation.
    • You can add or remove data stores from a blended search app, but the appcan't have fewer than two data stores connected to it at any time.
    • If you connect a single data store to a search app during app creation,then you can't add or remove that data store.
  • Website data stores need to have advanced website indexing turned on inorder to be used for blended search. For more information, seeAdvanced website indexing.
  • Data stores that contain unstructured data imported usingBigQuery are not supported.
  • Blended search allows the following fields insearch requests:
    • boostSpec
    • contentSearchSpec
    • dataStoreSpecs
    • facetSpecs
    • filter
    • languageCode
    • offset
    • oneBoxPageSize
    • orderBy
    • query
    • pageSize
    • pageToken
    • relevanceScoreSpec
    • relevanceThreshold
    • session
    • sessionSpec
    • spellCorrectionSpec
    • userInfo
    • userPseudoId
  • Blended search allows the following fields inDataStoreSpec:
    • dataStore
    • boostSpec: If there are boost specs specified for bothSearchRequestanddataStoreSpecs, both boost specs are applied to search results
    • filter: If there are filters specified for bothSearchRequest anddataStoreSpecs, both filters are applied to search results
  • Create, Read, Update, and Delete (CRUD) operations on serving configs aresupported for blended apps. Only the following fields can be added or updatedin a serving config:
    • boostControlIds
    • displayName
    • filterControlIds
    • genericConfig:
      • contentSearchSpec
    • name
    • solutionType
    • synonymsControlIds
  • CRUD operations on the following controls are supported for blended search apps:
    • boostAction
    • synonymAction
    • filterAction
  • There is a limit of 50 data stores per search app.
  • If one data store uses a CMEK configuration, all other data stores must alsouse the same CMEK configuration.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.