Choose a document processing function

This document provides a comparison of the document processing functionsavailable in BigQuery ML, which areAI.GENERATE_TEXTandML.PROCESS_DOCUMENT.You can use the information in this document to help you decide which functionto use in cases where the functions have overlapping capabilities.

At a high level, the difference between these functions is as follows:

  • AI.GENERATE_TEXT is a good choice for performing naturallanguage processing (NLP) tasks where some of the content resides indocuments. This function offers the following benefits:

    • Lower costs
    • More language support
    • Faster throughput
    • Model tuning capability
    • Availability of multimodal models

    For examples of document processing tasks that work best with thisapproach, seeExplore document processing capabilities with the Gemini API.

  • ML.PROCESS_DOCUMENT is a good choice for performing document processingtasks that require document parsing and a predefined, structured response.

Function comparison

Use the following table to compare theAI.GENERATE_TEXT andML.PROCESS_DOCUMENT functions:

AI.GENERATE_TEXTML.PROCESS_DOCUMENT
Purpose

Perform any document-related NLP task by passing a prompt to aGemini or partner model or to anopen model.

For example, given a financial document for a company, you can retrieve document information by providing a prompt such asWhat is the quarterly revenue for each division?.

Use theDocument AI API to perform specialized document processing for different document types, such as invoices, tax forms, and financial statements. You can also perform document chunking.
Billing

Incurs BigQuery ML charges for data processed. For more information, seeBigQuery ML pricing.

Incurs Vertex AI charges for calls to the model. If you are using a Gemini 2.0 or greater model, the call is billed at the batch API rate. For more information, seeCost of building and deploying AI models in Vertex AI.

Incurs BigQuery ML charges for data processed. For more information, seeBigQuery ML pricing.

Incurs charges for calls to the Document AI API. For more information, seeDocument AI API pricing.

Requests per minute (RPM)Not applicable for Gemini models. Between 25 and 60 for partner models. For more information, seeRequests per minute limits. 120 RPM per processor type, with an overall limit of 600 RPM per project. For more information, seeQuotas list.
Tokens per minuteRanges from 8,192 to over 1 million, depending on the model used. No token limit. However, this function does have different page limits depending on the processor you use. For more information, seeLimits.
Supervised tuningSupervised tuning is supported for some models.Not supported.
Supported languagesSupport varies based on the LLM you choose.Language support depends on the document processor type; most only support English. For more information, seeProcessor list.
Supported regionsSupported in all Generative AI for Vertex AIregions.Supported in theEU andUS multi-regions for all processors. Some processors are also available in certain single regions. For more information, seeRegional and multi-regional support.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.