The ML.UNDERSTAND_TEXT function

This document describes theML.UNDERSTAND_TEXT function, which lets youanalyze text that's stored in BigQuery tables by using the Cloud Natural Language API.

Syntax

ML.UNDERSTAND_TEXT(  MODEL `PROJECT_ID.DATASET.MODEL`,  { TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) },  STRUCT('OPTION_NAME' AS nlu_option  [,FLATTEN_JSON_OUTPUT AS flatten_json_output]  [,ENCODING_TYPE AS encoding_type]))

Arguments

ML.UNDERSTAND_TEXT takes the following arguments:

  • PROJECT_ID: the project that contains theresource.

  • DATASET: the dataset that contains theresource.

  • MODEL: the name of aremote modelwith aREMOTE_SERVICE_TYPEofCLOUD_AI_NATURAL_LANGUAGE_V1.

  • TABLE: the name of the BigQuery tablethat contains text data. The text analysis is applied on the column with nametext_content in this table. If your table does not havetext_contentcolumn, use aSELECT statement for this argument to provide an alias for anexisting table column, as shown in the following example:

    SELECT * from ML.UNDERSTAND_TEXT(  mydataset.mymodel,  (SELECT comment AS text_content from mydataset.mytable),  STRUCT('ANALYZE_SYNTAX' AS nlu_option));

    An error occurs if notext_content column is available.

  • QUERY_STATEMENT: a query whose result contains thetext data. The text analysis is applied on the column in the query namedtext_content. You can alias an existing table column astext_content ifnecessary. For information about the supported SQL syntax of theQUERY_STATEMENT clause, seeGoogleSQL query syntax.

  • OPTION_NAME: aSTRING value that specifies thefeature name of a supportedNatural Language API feature. The supported features are as follows:

  • FLATTEN_JSON_OUTPUT: aBOOL value that determineswhether the JSON content returned by the function is parsed into separatecolumns. The default isFALSE.

  • ENCODING_TYPE: aSTRING value that specifies theencoding that the Cloud Natural Language API uses to determine encoding-dependent information such as thebeginOffset value. For more information, seeEncodingType. You can specify this option for any NLU option except forCLASSIFY_TEXT. The default value isNONE. The supported types are as follows:

    • NONE

    • UTF8

    • UTF16

    • UTF32

Output

ML.UNDERSTAND_TEXT returns the input table plus the following columns:

  • ml_understand_text_result: aJSON value that contains the textanalysis result from Natural Language API. This column is returned whenflatten_json_output isFALSE.
  • entities: aJSON value that contains the recognized entities in the input document. This column is returned whenflattened_json_output isTRUE andoption_name isANALYZE_ENTITIES orANALYZE_ENTITY_SENTIMENT.
  • language: aSTRING value that gives the language of the text. This columnis returned whenflattened_json_output isTRUE andoption_name isANALYZE_ENTITIES,ANALYZE_ENTITY_SENTIMENT,ANALYZE_SENTIMENT, orANALYZE_SYNTAX.
  • sentiment: aJSON value that contains the overall sentiment of the input document. This column is returned whenflattened_json_output isTRUE andoption_name isANALYZE_SENTIMENT.
  • sentences: aJSON value that contains the sentiment for all sentences inthe document. This column is returned whenflattened_json_output isTRUE andoption_name isANALYZE_SENTIMENT orANALYZE_SYNTAX.
  • tokens: aJSON value that contains the tokens, along with their syntactic information, in the input document. This column is returned whenflattened_json_output isTRUE andoption_name isANALYZE_SYNTAX.
  • categories: aJSON value that contains the categories representing theinput document. This column is returned whenflattened_json_output isTRUEandoption_name isCLASSIFY_TEXT.
  • ml_understand_text_status: aSTRING value that contains the API responsestatus for the corresponding row. This value is empty if the operation wassuccessful.

Quotas

SeeCloud AI service functions quotas and limits.

Known issues

Sometimes after a query job that uses this function finishes successfully,some returned rows contain the following error message:

Aretryableerroroccurred:RESOURCEEXHAUSTEDerrorfrom<remoteendpoint>

This issue occurs because BigQuery query jobs finish successfullyeven if the function fails for some of the rows. The function fails when thevolume of API calls to the remote endpoint exceeds the quota limits for thatservice. This issue occurs most often when you are running multiple parallelbatch queries. BigQuery retries these calls, but if the retriesfail, theresource exhausted error message is returned.

To iterate through inference calls until all rows are successfully processed,you can use theBigQuery remote inference SQL scriptsor theBigQuery remote inference pipeline Dataform package.

Locations

ML.UNDERSTAND_TEXT must run in the same region as the remote model that thefunction references. For more information about supported locations for modelsbased on the Natural Language API, seeLocations for remote models.

Examples

Example 1

The following example applies classify_text on the bq tablemybqtable inmydataset.

#CreateModelCREATEORREPLACEMODEL`myproject.mydataset.mynlpmodel`REMOTEWITHCONNECTION`myproject.myregion.myconnection`OPTIONS(remote_service_type='cloud_ai_natural_language_v1');
#UnderstandTextSELECT*FROMML.UNDERSTAND_TEXT(MODEL`mydataset.mynlpmodel`,TABLE`mydataset.mybqtable`,STRUCT('classify_text'ASnlu_option));

The output is similar to the following:

ml_understand_text_result|ml_understand_text_status|text_content|-------|--------|--------{"categories":[{"confidence":0.51999998,"name":"/Arts & Entertainment/TV & Video/TV Shows & Programs"}]}||ThatactoronTVmakesmoviesinHollywoodandalsostarsinavarietyofpopularnewTVshows.

Example 2

The following example classify the text in the columntext_content in thetablemybqtable, selects the rows where confidence is higher than0.5, andthen returns the results in separate columns.

CREATETABLE`mydataset.classfied_result`AS(SELECTtext_contentAS`OriginalInput`,STRING(ml_understand_text_result.categories[0].name)AS`ClassifiedName`,FLOAT64(ml_understand_text_result.categories[0].confidence)AS`Confidence`,ml_understand_text_statusAS`Status`FROMML.UNDERSTAND_TEXT(MODEL`mydataset.mynlpmodel`,TABLE`mydataset.mybqtable`,STRUCT('classify_text'ASnlu_option)));SELECT*FROM`mydataset.classfied_result`WHEREconfidence>0.5;

The output is similar to the following:

OriginalInput|ClassifiedName|Confidence|Status|-------|--------|--------|--------ThatactoronTVmakesmoviesinHollywoodandalsostarsinavarietyofpopularnewTVshows.|/Arts &Entertainment/TV &Video/TVShows &Programs|0.51999998||

If you get an error likequery limit exceeded, you might have exceeded thequota for this function, whichcan leave you with unprocessed rows. Use the following query to completeprocessing the unprocessed rows:

CREATETABLE`mydataset.classfied_result_next`AS(SELECTtext_contentAS`OriginalInput`,STRING(ml_understand_text_result.categories[0].name)AS`ClassifiedName`,FLOAT64(ml_understand_text_result.categories[0].confidence)AS`Confidence`,ml_understand_text_statusAS`Status`FROMML.UNDERSTAND_TEXT(MODEL`mydataset.mynlpmodel`,(SELECT`OriginalInput`astext_contentFROM`mydataset.classfied_result`WHEREStatus!=''),STRUCT('classify_text'ASnlu_option)));SELECT*FROM`mydataset.classfied_result_next`;

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.