The ML.PROCESS_DOCUMENT function

This document describes theML.PROCESS_DOCUMENT function, which lets youprocess unstructured documents from anobject table by using theDocument AI API.

Syntax

ML.PROCESS_DOCUMENT(  MODEL `PROJECT_ID.DATASET.MODEL`,  { TABLE `PROJECT_ID.DATASET.OBJECT_TABLE` | (QUERY_STATEMENT) },   [, PROCESS_OPTIONS => ( JSON 'PROCESS_OPTIONS')])

Arguments

ML.PROCESS_DOCUMENT takes the following arguments:

  • PROJECT_ID: the project that contains theresource.

  • DATASET: the dataset that contains theresource.

  • MODEL: the name of aremote modelwith aREMOTE_SERVICE_TYPEofCLOUD_AI_DOCUMENT_V1.

  • OBJECT_TABLE: the name of theobject tablethat contains URIs of the documents.

    The documents in the object table must be of asupported type. An error is returned forany row that contains a document of an unsupported type.

  • QUERY_STATEMENT: a GoogleSQLSELECT querythat only references the object table. The query can't containJOINoperations and can't use aliases to rename columns. You must include theuri andcontent_type columns from the object table in theSELECTstatement. Other columns are optional.

  • PROCESS_OPTIONS: aSTRING value that contains aProcessOptions resourcein JSON format. Use this option to configure custom processing optionscorresponding to the document processor for your use case.

    For example, you might configure process options when using thelayout parser to perform document chunking. The JSON configuration would look similar to'{"layout_config": {"chunking_config": {"chunk_size": 250,"include_ancestor_headings": true}}}'.

Output

ML.PROCESS_DOCUMENT returns the following columns:

  • ml_process_document_result: aJSON value that contains the entitiesreturned by the Document AI API.
  • ml_process_document_status: aSTRING value that contains the APIresponse status for the corresponding row. This value is empty if theoperation was successful.
  • The fields returned by the processor specified in the model.
  • The columns from the object table or query referenced in the functioninput.

Quotas

SeeCloud AI service functions quotas and limits.

For quick links to update the quotas for specific Document AI APImetrics, seeQuotas list.

Known issues

Sometimes after a query job that uses this function finishes successfully,some returned rows contain the following error message:

Aretryableerroroccurred:RESOURCEEXHAUSTEDerrorfrom<remoteendpoint>

This issue occurs because BigQuery query jobs finish successfullyeven if the function fails for some of the rows. The function fails when thevolume of API calls to the remote endpoint exceeds the quota limits for thatservice. This issue occurs most often when you are running multiple parallelbatch queries. BigQuery retries these calls, but if the retriesfail, theresource exhausted error message is returned.

To iterate through inference calls until all rows are successfully processed,you can use theBigQuery remote inference SQL scriptsor theBigQuery remote inference pipeline Dataform package.

Locations

ML.PROCESS_DOCUMENT must run in the same region as the remote model that thefunction references. You can only create models based onDocument AI in theUS andEUmulti-regions.

Limitations

The function can't process documents with more than 100 pages. Any rowthat contains such a file returns an error.

Example

The following example uses theinvoice parserto process the documents represented by thedocuments table.

Create the model:

#CreatemodelCREATEORREPLACEMODEL`myproject.mydataset.invoice_parser`REMOTEWITHCONNECTION`myproject.myregion.myconnection`OPTIONS(remote_service_type='cloud_ai_document_v1',document_processor='processor_id');
Note: For more information about how to specify a processor ID, seeCreate a model.

Process the documents:

SELECT*FROMML.PROCESS_DOCUMENT(MODEL`myproject.mydataset.invoice_parser`,TABLE`myproject.mydataset.documents`);

The result is similar to the following:

ml_process_document_result|ml_process_document_status|invoice_type|currency|...|-------|--------|--------|--------|--------|--------|--------|--------|--------{"entities":[{"confidence":1,"id":"0","mentionText":"10 105,93 10,59","pageAnchor":{"pageRefs":[{"boundingPoly":{"normalizedVertices":[{"x":0.40452111,"y":0.67199326},{"x":0.74776918,"y":0.67199326},{"x":0.74776918,"y":0.68208581},{"x":0.40452111,"y":0.68208581}]}}]},"properties":[{"confidence":0.66...|||USD|

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.