The ML.PROCESS_DOCUMENT function
This document describes theML.PROCESS_DOCUMENT function, which lets youprocess unstructured documents from anobject table by using theDocument AI API.
Syntax
ML.PROCESS_DOCUMENT( MODEL `PROJECT_ID.DATASET.MODEL`, { TABLE `PROJECT_ID.DATASET.OBJECT_TABLE` | (QUERY_STATEMENT) }, [, PROCESS_OPTIONS => ( JSON 'PROCESS_OPTIONS')])Arguments
ML.PROCESS_DOCUMENT takes the following arguments:
PROJECT_ID: the project that contains theresource.DATASET: the dataset that contains theresource.MODEL: the name of aremote modelwith aREMOTE_SERVICE_TYPEofCLOUD_AI_DOCUMENT_V1.OBJECT_TABLE: the name of theobject tablethat contains URIs of the documents.The documents in the object table must be of asupported type. An error is returned forany row that contains a document of an unsupported type.
QUERY_STATEMENT: a GoogleSQLSELECTquerythat only references the object table. The query can't containJOINoperations and can't use aliases to rename columns. You must include theuriandcontent_typecolumns from the object table in theSELECTstatement. Other columns are optional.PROCESS_OPTIONS: aSTRINGvalue that contains aProcessOptionsresourcein JSON format. Use this option to configure custom processing optionscorresponding to the document processor for your use case.For example, you might configure process options when using thelayout parser to perform document chunking. The JSON configuration would look similar to
'{"layout_config": {"chunking_config": {"chunk_size": 250,"include_ancestor_headings": true}}}'.
Output
ML.PROCESS_DOCUMENT returns the following columns:
ml_process_document_result: aJSONvalue that contains the entitiesreturned by the Document AI API.ml_process_document_status: aSTRINGvalue that contains the APIresponse status for the corresponding row. This value is empty if theoperation was successful.- The fields returned by the processor specified in the model.
- The columns from the object table or query referenced in the functioninput.
Quotas
SeeCloud AI service functions quotas and limits.
For quick links to update the quotas for specific Document AI APImetrics, seeQuotas list.
Known issues
Sometimes after a query job that uses this function finishes successfully,some returned rows contain the following error message:
Aretryableerroroccurred:RESOURCEEXHAUSTEDerrorfrom<remoteendpoint>This issue occurs because BigQuery query jobs finish successfullyeven if the function fails for some of the rows. The function fails when thevolume of API calls to the remote endpoint exceeds the quota limits for thatservice. This issue occurs most often when you are running multiple parallelbatch queries. BigQuery retries these calls, but if the retriesfail, theresource exhausted error message is returned.
To iterate through inference calls until all rows are successfully processed,you can use theBigQuery remote inference SQL scriptsor theBigQuery remote inference pipeline Dataform package.
Locations
ML.PROCESS_DOCUMENT must run in the same region as the remote model that thefunction references. You can only create models based onDocument AI in theUS andEUmulti-regions.
Limitations
The function can't process documents with more than 100 pages. Any rowthat contains such a file returns an error.
Example
The following example uses theinvoice parserto process the documents represented by thedocuments table.
Create the model:
#CreatemodelCREATEORREPLACEMODEL`myproject.mydataset.invoice_parser`REMOTEWITHCONNECTION`myproject.myregion.myconnection`OPTIONS(remote_service_type='cloud_ai_document_v1',document_processor='processor_id');
Process the documents:
SELECT*FROMML.PROCESS_DOCUMENT(MODEL`myproject.mydataset.invoice_parser`,TABLE`myproject.mydataset.documents`);
The result is similar to the following:
ml_process_document_result|ml_process_document_status|invoice_type|currency|...|-------|--------|--------|--------|--------|--------|--------|--------|--------{"entities":[{"confidence":1,"id":"0","mentionText":"10 105,93 10,59","pageAnchor":{"pageRefs":[{"boundingPoly":{"normalizedVertices":[{"x":0.40452111,"y":0.67199326},{"x":0.74776918,"y":0.67199326},{"x":0.74776918,"y":0.68208581},{"x":0.40452111,"y":0.68208581}]}}]},"properties":[{"confidence":0.66...|||USD|What's next
- Get step-by-step instructions on how toprocess documentsusing the
ML.PROCESS_DOCUMENTfunction. - To learn more about model inference, including other functions that you can useto analyze BigQuery data, seeModel inference overview.
- For more information about supported SQL statements and functions forgenerative AI models, seeEnd-to-end user journeys for generative AI models.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.