Provide or auto-detect a schema

When you import structured data using the Google Cloud console, Vertex AI Searchauto-detects the schema. You can either use this auto-detected schema in yourengine or use the API to provide a schema to indicate the structure of the data.

If you provide a schema and later update it with a new schema, the new schemamust be backward compatible with the original. Otherwise the schema updatefails.

Important: If you don't provide a schema, the auto-detect feature canupdate your schema by incorporating any newly detected fields when you importnew data. If any documents in your imported data contain new fields that are notbackward compatible with your original schema, those documents fail toimport.

For reference information about the schema, seedataStores.schemas.

Key Terms: In the context of schema, the termsfield andproperty are used interchangeably.

Approaches to providing the schema for your data store

There are various approaches to determining the schema for structured data.

  • Auto-detect and edit. Let Vertex AI Search auto-detect and suggestan initial schema. Then, you refine the schema through theconsole interface. Google highly recommends that,after your fields are auto-detected, you map key properties to all theimportant fields.

    This is the approach that you'll use when following the Google Cloud consoleinstructions for structured data inCreate a search datastore andCreatea custom recommendations data store.

  • Provide the schema as a JSON object. Provide theschema to Vertex AI Search as a JSON object. You need to have prepareda correct JSON object. For an example of a JSON object, seeExample schemaas a JSON object. After creating the schema, youupload your data according to that schema.

    This is the approach that you can use when creating a data store through theAPI using a curl command (or program). For example, seeImport once fromBigQuery. Also see the followinginstructions,Provide your own schema.

  • Media: Provide your data in the Google-defined schema. If you create a datastore for media, you can choose to use the Google predefined schema. Choosingthis option assumes that you have structured your JSON object in the formatgiven inAbout media documents and data store. Bydefault, auto-detect appends to the schema any new fields that it findsduring data ingestion.

    This is the approach that you use when following the instructions inCreate a media app and a data store. It is also theapproach in the tutorials,Get started with mediarecommendations andGet started with mediasearch, where the sample data is provided in theGoogle predefined schema.

  • Media: Auto-detect and edit, making sure to include the required mediaproperties. For media data, you can use auto-detect to suggest the schemaand edit to refine it. In your JSON object, you must include fields that canbe mapped to the media key properties:title,uri,category,media_duration, andmedia_available_time.

    This is the approach that you'll use when importing media data through theGoogle Cloud console if the media data is not in the Google-defined schema.

  • Media: Provide your own schema as a JSON object. Provide theschema to Vertex AI Search as a JSON object. You need to have prepareda correct JSON object. The schema must include fields that can bemapped to the media key properties:title,uri,category,media_duration, andmedia_available_time.

    For an example of a JSON object, seeExample schemaas a JSON object. After creating the schema, youupload the your media data according to that schema.

    For this approach, you use the API through a curl command (or program).See the following instructions,Provide your own schema.

About auto-detect and edit

When you begin importing data, Vertex AI Search samples the firstfew documents that are imported. Based on these documents, it proposes a schema for thedata, which you can then review or edit.

If fields that you want to map to key properties aren't present in the sampleddocuments, then you can manually add these fields when you review theschema.

Tip: Make sure that the documents at the beginning of your dataset are goodquality—that all the key fields are represented. That way, you don't have to addor edit fields later.

If Vertex AI Search encounters additional fields later in thedata import, it still imports these fields and adds them to the schema. Ifyou want to edit the schema after all the data has been imported, seeUpdateyour schema.

Example schema as a JSON object

You can define your own schema usingtheJSON Schema format, which is an open source, declarative language to define, annotate, andvalidate JSON documents. For example, this is a valid JSON schema annotation:

{"$schema":"https://json-schema.org/draft/2020-12/schema","type":"object","dynamic":"true","datetime_detection":true,"geolocation_detection":true,"properties":{"title":{"type":"string","keyPropertyMapping":"title","retrievable":true,"completable":true},"description":{"type":"string","keyPropertyMapping":"description"},"categories":{"type":"array","items":{"type":"string","keyPropertyMapping":"category"}},"uri":{"type":"string","keyPropertyMapping":"uri"},"brand":{"type":"string","indexable":true,"dynamicFacetable":true},"location":{"type":"geolocation","indexable":true,"retrievable":true},"creationDate":{"type":"datetime","indexable":true,"retrievable":true},"isCurrent":{"type":"boolean","indexable":true,"retrievable":true},"runtime":{"type":"string","keyPropertyMapping":"media_duration"},"releaseDate":{"type":"string","keyPropertyMapping":"media_available_time"}}}

If you are defining a media schema, you must include fields that can bemapped to the media key properties. These key properties are shown in thisexample.

Here are some of the fields in this schema example:

  • dynamic. Ifdynamic is set to the string value"true", then anynew properties found in the imported data is added to the schema.Ifdynamic is set to"false", new properties found in importeddata are ignored; the properties are not added to the schema nor are thevalues are imported.

    For example, a schema has two properties:title anddescription, andyou upload a data that contains properties fortitle,description, andrating. Ifdynamic is"true", then the ratings property and data areimported. Ifdynamic is"false", thenrating properties are not imported,althoughtitle anddescription are.

    The default is"true".

  • datetime_detection. Ifdatetime_detection is set to the booleantrue, then, when data in datetime format are imported, the schema type isset todatetime. The supported formats areRFC 3339 andISO8601.

    For example:

    • 2024-08-05 08:30:00 UTC

    • 2024-08-05T08:30:00Z

    • 2024-08-05T01:30:00-07:00

    • 2024-08-05

    • 2024-08-05T08:30:00+00:00

    Ifdatatime_detection is set to the booleanfalse, then, when data in datetime format are imported, the schema type isset tostring.

    The default istrue.

  • geolocation_detection. Ifgeolocation_detection is set to the booleantrue, then, when data in geolocation format are imported, the schema type isset togeolocation. Data is detected as geolocation if it is anobject containing a latitude number and a longitude number or an objectcontaining an address string.

    For example:

    • "myLocation": {"latitude":37.42, "longitude":-122.08}

    • "myLocation": {"address": "1600 Amphitheatre Pkwy, Mountain View, CA 94043"}

    Ifgeolocation_detection is set to the booleanfalse, then, when data in geolocation format are imported, the schema type isset toobject.

    The default istrue.

  • keyPropertyMapping. A field that maps predefined keywords to criticalfields in your documents, helping to clarify their semantic meaning. Valuesincludetitle,description,uri, andcategory. Note that your fieldname doesn't need to match thekeyPropertyValues value. For example, for afield that you namedmy_title, you can include akeyPropertyValues fieldwith a value oftitle.

    For search data stores, fields markedwithkeyPropertyMapping are by default indexable and searchable, but notretrievable, completable, or dynamicFacetable. This means that you don'tneed to include theindexable orsearchable fields with akeyPropertyValues field to get the expected default behavior.

    Note: Key properties can improve the quality of search and recommendations results and search autocomplete accuracy. If you use auto-schema detection, key properties are not automatically added. We highly recommended using theschemas.patch method to mark data fields as key properties, especiallytitle,uri, anddescription.
  • type. The type of the field. This is a string value that isdatetime,geolocation or one of the primitive types(integer,boolean,object,array,number, orstring).

The following property fields apply only for search apps:

Additionally, the following field applies only for recommendations apps:

  • recommendationsFilterable. Indicates that the field can be used in arecommendations filter expression. For general information about filteringrecommendations, seeFilter recommendations.

    ..."genres":{"type":"string","recommendationsFilterable":true,...},
Note: For auto-detected schemas, newly detected fields are automaticallyindexable, searchable, and retrievable, as long as they follow the typerequirements and stay within the maximum limits.

Provide your own schema as a JSON object

To provide your own schema, you create a data store that contains an emptyschema and then you update the schema, supplying your schema as a JSON object.Follow these steps:

  1. Prepare the schema as a JSON object, using theExample schema as a JSON object as a guide.

  2. Create a data store.

    curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\-H"X-Goog-User-Project:PROJECT_ID"\"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID"\-d'{  "displayName": "DATA_STORE_DISPLAY_NAME",  "industryVertical": "INDUSTRY_VERTICAL"}'

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • DATA_STORE_ID: the ID of the Vertex AI Search data store that you want to create. This ID can contain only lowercaseletters, digits, underscores, and hyphens.
    • DATA_STORE_DISPLAY_NAME: the display name of the Vertex AISearch data store that you want to create.
    • INDUSTRY_VERTICAL:GENERIC orMEDIA
  3. Use theschemas.patchAPI method to provide your new JSON schema as a JSON object.

    curl-XPATCH\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/schemas/default_schema"\-d'{  "structSchema":JSON_SCHEMA_OBJECT}'

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • DATA_STORE_ID: the ID of the Vertex AI Search data store.
    • JSON_SCHEMA_OBJECT: your new JSON schema as aJSON object. For example:

      {"$schema":"https://json-schema.org/draft/2020-12/schema","type":"object","properties":{"title":{"type":"string","keyPropertyMapping":"title"},"categories":{"type":"array","items":{"type":"string","keyPropertyMapping":"category"}},"uri":{"type":"string","keyPropertyMapping":"uri"}}}

      Example command and result

      curl -X PATCH -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" "https://discoveryengine.googleapis.com/v1/projects/my-project-123/locations/global/collections/default_collection/dataStores/my-data-store/schemas/default_schema" -d '{"structSchema": {"$schema": "https://json-schema.org/draft/2020-12/schema","type": "object","properties": {"title": {"type": "string","keyPropertyMapping": "title"},"categories": {"type": "array","items": {"type": "string","keyPropertyMapping": "category"}},"uri": {"type": "string","keyPropertyMapping": "uri"}}}}'
      {"name": "projects/123456/locations/global/collections/default_collection/dataStores/my-data-store/schemas/default_schema/operations/update-schema-10569824819404198922","metadata": {"@type": "type.googleapis.com/google.cloud.discoveryengine.v1.UpdateSchemaMetadata"}}
  4. Optional: Review the schema by following the procedureView a schema definition.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.