Get an estimate of monthly storage costs

Advanced website indexing incurs monthly datastorage charges based on the size of the web data that you import into your datastore. To get an estimate of the size of your web data before importing it, youcan call theestimateDataSize method and specify the webpages that you want to import. TheestimateDataSize method is along-runningoperation that runs until the process for estimatingthe data size is complete. This can take from a few minutes to over an hour,depending on the number of web pages that you specify. After you have anestimate of the size of your web data, you can get an estimate of your monthlydata storage costs using the Vertex AI Search pricing page (see theData Indexpricing section) or theGoogle Cloud's pricingcalculator (search for Vertex AI Search).

Important: You are permitted to use theestimateDataSize method only on web domains that your company owns or is authorized to utilize.

Before you begin

Determine the URL patterns for the websites that you intend to include (andoptionally exclude) when you import web data into your data store. Youspecify these URL patterns when you call theestimateDataSize method.

Procedure

To get an estimate of the size of your web data, follow these steps:

  1. Call theestimateDataSize method.

    curl-XPOST\-H"Authorization: Bearer$(gcloudauthapplication-defaultprint-access-token)"\-H"Content-Type: application/json"\"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global:estimateDataSize"\-d'{  "website_data_source": {    "estimator_uri_patterns": {      provided_uri_pattern: "URI_PATTERN_TO_INCLUDE",      exact_match:EXACT_MATCH_BOOLEAN    },    "estimator_uri_patterns": {      provided_uri_pattern: "URI_PATTERN_TO_EXCLUDE",      exact_match:EXACT_MATCH_BOOLEAN,      exclusive:EXCLUSIVE_BOOLEAN    }  }}'

    Replace the following:

    • PROJECT_ID: the ID of your project.

    • URI_PATTERN_TO_INCLUDE: the URL patterns for the websites thatyou want to include in your data size estimate.

    • URI_PATTERN_TO_EXCLUDE: (Optional) The URL patterns for thewebsites that you want to exclude from your data size estimate.

      ForURI_PATTERN_TO_INCLUDE andURI_PATTERN_TO_EXCLUDE, you can use patterns similar to thefollowing:

      • Entire website:www.mysite.com
      • Parts of a website:www.mysite.com/faq
      • Entire domain:mysite.com or*.mysite.com
    • EXCLUSIVE_BOOLEAN: (Optional) Iftrue, then the provided URIpattern represents web pages that are excluded from your data sizeestimate. The default isfalse, which means that the provided URIpattern represents web pages that are included in your data size estimate.

    • EXACT_MATCH_BOOLEAN: (Optional) Iftrue, then the providedURI pattern represents a single web page, instead of the web page and allof its children. The default isfalse, which means that the provided URIpattern represents the web page and all of its children.

    The output is similar to the following:

    {"name":"projects/PROJECT_ID/locations/global/operations/estimate-data-size-01234567890123456789","metadata":{"@type":"type.googleapis.com/google.cloud.discoveryengine.v1alpha.EstimateDataSizeMetadata"}}

    This output includes thename field, which is the name of the long-runningoperation. Save thename value to use in the following step.

  2. Poll theoperations.get method.

    curl-XGET\-H"Authorization: Bearer$(gcloudauthapplication-defaultprint-access-token)"\"https://discoveryengine.googleapis.com/v1/OPERATION_NAME"

    ReplaceOPERATION_NAME with thename value that you saved in theprevious step. You can also get the operation name bylisting long-runningoperations.

  3. Evaluate each response.

    • If a response does not contain"done": true, then the process forestimating the data size is not complete. Continue polling.

      The output is similar to the following:

      {"name":"projects/PROJECT_ID/locations/global/operations/estimate-data-size-01234567890123456789","metadata":{"@type":"type.googleapis.com/google.cloud.discoveryengine.v1alpha.EstimateDataSizeMetadata","createTime":"2025-10-29T21:59:59.976752Z"}}
    • If a response contains"done": true, then the process for estimating thedata size is complete. Save theDATA_SIZE_BYTES value from theresponse to use in the following step.

      The output is similar to the following:

      {"name":"projects/PROJECT_ID/locations/global/operations/estimate-data-size-01234567890123456789","metadata":{"@type":"type.googleapis.com/google.cloud.discoveryengine.v1alpha.EstimateDataSizeMetadata","createTime":"2025-10-29T21:59:59.976752Z"},"done":true,"response":{"@type":"type.googleapis.com/google.cloud.discoveryengine.v1alpha.EstimateDataSizeResponse","dataSizeBytes":DATA_SIZE_BYTES,"documentCount":DOCUMENT_COUNT}}

      This output includes the following values:

      • DATA_SIZE_BYTES: the estimated size of your web data, inbytes.

      • DOCUMENT_COUNT: the estimated number of web pages in your webdata.

  4. Divide theDATA_SIZE_BYTESvalue from the previous step by 1,000,000,000 to get gigabytes. Save thisvalue for the following step.

  5. To get an estimate for your monthly data storage costs:

    1. GoGoogle Cloud's pricing calculator.

    2. ClickAdd to estimate.

    3. Search forVertex AI Search and then click theVertex AI Search box.

    4. In theData Index box, enter the estimated size of your web data, ingigabytes, from the previous step.

      See theEstimated cost box for your estimated data storage cost.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.