Use structured data for advanced website indexing

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

If advanced website indexing is enabled in your data store, you can use thefollowing types of structured data to enrich your indexing:

This page introduces both these types of structured data for your web pagesand describes how to add custom structured attributes to your data store schema.

About predefined, Google-inferred page dates

When crawling through the web pages in your website data store, Google inferspage data using the properties that apply to your content.Vertex AI Search adds these inferred page data properties to yourschema. This inferred dataincludes the following predefined date properties, which are also calledbyline dates:

  • datePublished: the date and time when the page was first published
  • dateModified: the date and time when the page was most recently modified

These properties are indexed automatically. You can directly use these dateproperties to enrich your search without adding them to your schema.To add byline dates to your website, seeInfluence your byline dates in Google Search.

To understand how to include these predefined date properties in your searchrequests, such as in filter expressions and boost specifications, seeExample use case using a Google-inferred page date.

About custom datetime fields on a web page

You can add custom datetime fields to your web pages. Such tags can be used withadvanced indexing when youadd custom structured data attributes to the data store schema.Here's an example that shows where to adda custom datetime meta tag namedlastModified on your web page.

<!DOCTYPE html><html lang="en"><head>    <meta charset="UTF-8">    <meta name="viewport" content="width=device-width, initial-scale=1.0">    <title>Your web page title</title>    <!-- Vertex AI Search can use this date. -->    <meta name="lastModified" content="2022-07-01"></head><body>    </body></html>

To understand how to include such custom datetime tags in your searchrequests, such as in filter expressions and boost specifications, seeExample use case using a custom datetime attribute.

About custom structured data attributes

You can add structured data attributes asmeta tags and PageMaps to your webpages and use these to enrich your indexing. To use custom structured attributesfor indexing, you must update your schema.

Example use case formeta tags

Suppose you have a large number of web pages that are relevant to variousdepartments in your organization. You can usemeta tags to label the pagesthat are relevant for each department. You can then use the indexed tags asfilters in your queries. This lets you to restrict search results to web pagescontaining a label that matches any of the specified departments.

This process can be summarized as follows:

  1. Add the followingmeta tags to a subset of your web pages:

    • Relevant to engineering and IT departments:

      <metaname="department"content="eng, infotech"><metaproperty="og:title"content="Password best practices">
    • Relevant to finance and HR departments:

      <metaname="department"content="finance, human resources"><metaproperty="og:image"content="https://example.com/images/team-training-contractors.jpg">

      For a more elaborate example, seeExamplemeta tags on a web page.

  2. Recrawl the updated pages.

  3. Adddepartment to your data store schema as an indexable array as describedin theAdd custom structured data attributes to the data store schema section.

After updating your schema, your data store is automatically reindexed.After the reindexing is complete, you can use thedepartment filter in afilter expression to reorder or filter searchresults. For example, when users from the finance department issue queries,the search results can be made more relevant for them with thedepartmentfilter set tofinance.

Examplemeta tags on a web page

Here's an example of themeta tags that you can add toyour web page. Such tags can be used with advanced indexingwhen youadd custom structured data attributes to the data store schema.

<!DOCTYPE html><html lang="en"><head>    <meta charset="UTF-8">    <meta name="viewport" content="width=device-width, initial-scale=1.0">    <title>Your web page title</title>    <!-- Robots instructions for crawlers and for Vertex AI Search. -->    <meta name="robots" content="index,follow">    <!-- Vertex AI Search can use custom datetime fields to filter, boost, and order. -->    <meta name="lastModified" content="2024-09-06">    <!-- Vertex AI Search can filter by category or tags. -->    <meta name="category" content="archived">    <meta name="tags" content="legacy,interesting,faq">    <!-- Vertex AI Search can index these common HTML tags. -->    <meta name="description" content="A description of your web page's content.">    <meta name="author" content="Your name or organization">    <meta name="keywords" content="relevant,keywords,separated,by,commas">    <link rel="canonical" href="https://www.yourwebsite.com/this-page">    <meta property="og:title" content="Your Webpage Title">    <meta property="og:description" content="A description of your webpage's content.">    <meta property="og:image" content="https://www.yourwebsite.com/image.jpg">    <meta property="og:url" content="https://www.yourwebsite.com/this-page">    <meta property="og:type" content="website">    <meta name="twitter:card" content="summary_large_image">    <meta name="twitter:title" content="Your customized Webpage Title">    <meta name="twitter:description" content="A description of your webpage's content.">    <meta name="twitter:image" content="https://www.yourwebsite.com/image.jpg"></head><body>...</body></html>

Example use case for PageMaps

Suppose you have several web pages that contain food recipes. You can addPageMap data to each page's HTML content. You can then use the indexed PageMapattribute names as filters in your queries. For example, if you intend to boostor bury web pages depending on the recipe ratings, you can follow this process:

  1. Add PageMap data similar to the following to your web pages:

    <html><head>...<!--<PageMap>    <DataObject type="document">        <Attribute name="title">Baked potatoes</Attribute>        <Attribute name="author">Dana A.</Attribute>        <Attribute name="description">Homestyle baked potatoes in oven. This        recipe uses Russet potatoes.</Attribute>        <Attribute name="rating">4.9</Attribute>        <Attribute name="lastUpdate">2015-01-01</Attribute>    </DataObject></PageMap>--></head>...</html>
  2. Recrawl the updated pages.

  3. Addrating to your data store schema as an indexable array as describedin theAdd custom structured data attributes to the data store schemasection.

After updating your schema, your data store is automatically reindexed.After the reindexing is complete, you can use therating attribute in afilter expression to reorder or filter searchresults. For example, when users search for recipes, boost the searchresults that are top-rated by usingrating as acustom numericalattribute.

Example use case for schema.org data

Suppose you have a review website and its web pages are annotated withschema.org data in JSON-LD format within the HTMLscript tag. You can then usethe indexed annotations as filters in your queries. For example, if you intendto boost or bury web pages depending on the aggregate ratings, you can followthis process:

  1. Add the schema.org annotations forreview content similar tothe following to your web pages. To view other types of schema.org templatesthat are available, seeSchemas:

    <script type="application/ld+json">{  "@context": "https://schema.org",  "@type": "Review",  "aggregateRating": {    "@type": "Average Rating",    "ratingValue": 3.5,    "reviewCount": 11  },  "description": "Published in 1843, this is the perfect depiction of the Victorian London. A Christmas Carol is the story of Ebenezer Scrooge's transformation.",  "name": "A Christmas Carol",  "image": "christmas-carol-first-ed.jpg",  "review": [    {      "@type": "Review",      "author": "Alex T.",      "datePublished": "2000-01-01",      "reviewBody": "Read this in middle school and have loved this ever since.",      "name": "Worth all the adaptations",      "reviewRating": {        "@type": "Rating",        "bestRating": 5,        "ratingValue": 5,        "worstRating": 1      }    }  ]}</script>
  2. Recrawl the updated pages.

  3. Add the path toratingValue to your data store schema. Use an identifieras the field name in the data store schema, such asrating_value asdescribed in theAdd custom structured data attributes to the data store schemasection.

After updating your schema, your data store is automatically reindexed.After the reindexing is complete, you can use therating_value attribute in afilter expression to reorder or filter searchresults. For example, when users search for books, boost the searchresults that are top-rated by usingrating_value as acustom numericalattribute.

Before you begin

Before you update the data store schema, do the following:

  • Turn on advanced website indexing for the data store. For more information,seeTurn on advanced website indexing.
  • Understand how structured data works.
  • Understand how touse PageMaps. Review the list ofrecognized DataObjects that can be added to PageMap data.
  • Understand how tousemeta tags. Ensure that you don't useanyexcluded orunsupported meta tags.
  • Ensure that the attribute that needs to be indexed doesn't have any of the following values:
    • datePublished
    • dateModified
    • siteSearch
  • Understand that after you add structured data to your web pages, you mustrecrawl the pages. This might take several hours.
  • Understand that after you add structured data attributes to the data storeschema, the web pages in your data store are reindexed automatically.Reindexing is a long-running operation that might take several hours.

Add custom structured data attributes to the data store schema

To add custom structured data attributes to the data store schema:

  1. Addmeta tags, PageMap data, and schema.org data to all the pages in yourwebsite that you want to enrich with structured data indexing:

    • Formeta tags:

      • Eachmeta tag must have itsname attribute set to the field youwant to index and itscontent attribute to a string of one ormore comma-separated values.
      • Vertex AI Search supportsmeta tags with names thatmatch the pattern[a-zA-Z0-9][a-zA-Z0-9-_]*. Ensure that you don'tuse anyexcluded orunsupported meta tags.

        If yourmeta tagname contains a special character, such as a colon (:), you mustchoose a different identifier in the schema to represent it andthen specify the exact name of themeta tag in thesiteSearchMetatagNamefield of the schema.

    • For PageMap data:

      • PageMap data must consist ofrecognized DataObjects that contain Attributenames that you want to index. The Attribute names within theDataObjects must be set to the field you want to index.
    • For schema.org data:

      • The annotations must be in valid JSON-LD, Microdata, or RDFa format.For more information, seeSupported formats.
  2. Recrawl the updated web pages.

  3. View the schema definition for your data store over REST API.

  4. Update the data store schema using Google Cloud consoleor the API. If you choose to do it over the API, learn how toprovide your own schema as a JSON object.

    1. Add objects for each custom attribute that you want to make searchable,retrievable, or indexable.
    2. Add the custom attribute and set itstype toarray.
    3. Add the data type of the custom attribute's value.
    4. Specify the source of the custom attribute where it can be found in thesiteSearchStructuredDataSources field.
    5. For schema.org data: Specify the path of the attribute in the schema.organnotation starting with string_root in thesiteSearchSchemaOrgPathsfield.

    The following is an example of a schema update for a website:

    {"type":"object","properties":{"CUSTOM_ATTRIBUTE":{"type":"array","items":{"type":"DATA_TYPE","searchable":true,"retrievable":true,"indexable":true,"siteSearchMetatagName":"METATAG_NAME","siteSearchStructuredDataSources":["STRUCTURED_DATA_SOURCE_1","STRUCTURED_DATA_SOURCE_2"]}},"IDENTIFIER_FOR_SCHEMA_ORG_FIELD":{"type":"array","items":{"type":"DATA_TYPE_SCHEMA_ORG_FIELD","searchable":true,"retrievable":true,"indexable":true,"siteSearchSchemaOrgPaths":["_root.PATH_TO_THE_SCHEMA_ORG_FIELD"]}}},"$schema":"https://json-schema.org/draft/2020-12/schema"}

    Replace the following:

    • CUSTOM_ATTRIBUTE: the value of thename attribute.For example:

      • For ameta tag defined as<meta name="department" content="eng, infotech">,usedepartment
      • For a PageMap Attribute defined as<Attribute name="rating">4.9</Attribute>,userating
    • DATA_TYPE: the data type of thenameattribute. Must be either string, number, or datetime. For example:

      • For ameta tag defined as<meta name="department" content="eng, infotech">,usestring
      • For a PageMap Attribute defined as<Attribute name="rating">4.9</Attribute>,usenumber
      • For a PageMap Attribute defined as<Attribute name="lastPublished">2015-01-01</Attribute>,usedatetime

      For more information, seeFieldType.

    • METATAG_NAME: The value for thesiteSearchMetatagName field, which lets you specify the exact name of ameta tag from your web page. You only need to use this field when themeta tag'sname attribute contains special characters (such as a colon) and doesn't match the required pattern forCUSTOM_ATTRIBUTE, which is[a-zA-Z0-9][a-zA-Z0-9-_]*.

      For example, if you have a tag&lt;meta name="og:updated_time" ...&gt;,og:updated_time can't be used as theCUSTOM_ATTRIBUTE. Instead, you would use a compliant identifier forCUSTOM_ATTRIBUTE (likeog_updated_time) and then set the value ofsiteSearchMetatagName toog:updated_time.

      When you usesiteSearchMetatagName to update the schema, you must use the v1alpha endpoint instead of the v1 endpoint to call theschema method.

    • STRUCTURED_DATA_SOURCE_N: an array consisting of oneor both of the following structured data sources where theCUSTOM_ATTRIBUTE attribute can be found:

      • If the custom attribute can be found as ameta tag, specifyMETATAGS
      • If the custom attribute can be found as a PageMap attribute, specifyPAGEMAP
      • If the custom attribute can be found as a schema.org data, specifySCHEMA_ORG
      • If thesiteSearchStructuredDataSources field is absent or left empty, thevalues from all three data sources are merged in an array.
    • IDENTIFIER_FOR_SCHEMA_ORG_FIELD: a customidentifier to denote the schema.org field. It doesn't need to be exactlysame as the field name in the schema.org annotation on your web page.For example, if the path of the field is_root.nutrition.calories, theidentifier can becalorific_value ornutrition_value.

    • DATA_TYPE_SCHEMA_ORG_FIELD: the data type of theschema.org field. Must be string, number, or datetime. For example:

      • For a schema.org field defined as "calories":"240 calories",usestring
      • For a schema.org field defined as "calories": 240,usenumber
      • For a schema.org field defined as "foundingDate": "1991-05-01",usedatetime

      For more information, seeFieldType.

    • PATH_TO_THE_SCHEMA_ORG_FIELD: the path to a singlefield in the schema.org field that needs to be accessed. It is specifiedusing dot separators after each nested level. You must specify thecomplete path needed to access the required field. For example, if afieldratingValue is nested in theaggregateRating field, you canspecify the path as_root.aggregateRating.ratingValue.

After you update the website schema, the website is reindexed automatically.This is a long-running operation that can take several hours.

What's next

Use the indexed metadata for the following:

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.