The AI.GENERATE_TABLE function

This document describes theAI.GENERATE_TABLE function, which lets you performgenerative natural language tasks by using any combination of text andunstructured data from BigQuerystandard tables, and alsospecify a schema to format the response from the model.

The function works by sending requests to a BigQuery ML remote modelthat represents a Vertex AI Gemini model, and then returningthat model's response. The function supports remote models over any of thegenerally availableorpreviewGemini models.

Several of theAI.GENERATE_TABLE function's arguments provide theparameters that shape the Vertex AI model's response.

You can use theAI.GENERATE_TABLE function to perform tasks such asclassification, sentiment analysis, image captioning, and transcription.

Prompt design can strongly affect the responses returned by theVertex AI model. For more information, seeIntroduction to prompting.

Input

Using theAI.GENERATE_TABLE function, you can use the following typesof input:

Text data from standard tables.
ObjectRefRuntimevalues that are generated by theOBJ.GET_ACCESS_URL function.You can useObjectRefvalues from standard tables as input to theOBJ.GET_ACCESS_URL function.(Preview)

When you analyze unstructured data, that data must meet the followingrequirements:

Content must be in one of the supported formats that aredescribed in the Gemini API modelmimeType parameter.
If you are analyzing a video, the maximum supported length is two minutes.If the video is longer than two minutes,AI.GENERATE_TABLE only returnsresults for the first two minutes.

Syntax

AI.GENERATE_TABLE(MODEL `PROJECT_ID.DATASET.MODEL`,{ TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) },STRUCT(OUTPUT_SCHEMA AS output_schema  [,MAX_OUTPUT_TOKENS AS max_output_tokens]  [,TOP_P AS top_p]  [,TEMPERATURE AS temperature]  [,STOP_SEQUENCES AS stop_sequences]  [,SAFETY_SETTINGS AS safety_settings]  [,REQUEST_TYPE AS request_type]))

Arguments

AI.GENERATE_TABLE takes the following arguments:

PROJECT_ID: the project that contains theresource.
DATASET: the dataset that contains theresource.
MODEL: the name of the remote model. For moreinformation, seeTheCREATE MODEL statement for remote models over LLMs.
Note: Using a remote model based on a Gemini 2.5 model incurscharges for the thinking process.
TABLE: the name of the BigQuery tablethat contains the prompt data. The text in the column that's namedprompt issent to the model. If your table does not have aprompt column, use theQUERY_STATEMENT argument instead and provide aSELECT statement thatincludes an alias for an existing table column. An error occurs if nopromptcolumn is available.

QUERY_STATEMENT: the GoogleSQL query thatgenerates the prompt data. The query must produce a column namedprompt.Within the query, you can provide the prompt value in the following ways:

Specify aSTRING value. For example,('Write a poem about birds').

Specify aSTRUCT value that contains one or more fields. You can usethe following types of fields within the struct value:

Field type Description Examples

STRING
or
ARRAY<STRING> A string literal, array of string literals, or the name of aSTRING column. String literal:
'Is Seattle a US city?'

String column name:
my_string_column

Field type	Description	Examples
`STRING` or `ARRAY<STRING>`	A string literal, array of string literals, or the name of a`STRING` column.	String literal: `'Is Seattle a US city?'` String column name: `my_string_column`
`ObjectRefRuntime` or `ARRAY<ObjectRefRuntime>`	An`ObjectRefRuntime` value returned by the`OBJ.GET_ACCESS_URL` function. The`OBJ.GET_ACCESS_URL` function takes an`ObjectRef` value as input, which you can provide by either specifying the name of a column that contains`ObjectRef` values, or by constructing an`ObjectRef` value. `ObjectRefRuntime` values must have the`access_url.read_url` and`details.gcs_metadata.content_type` elements of the JSON value populated.	Function call with`ObjectRef` column: `OBJ.GET_ACCESS_URL(my_objectref_column, 'r')` Function call with constructed`ObjectRef` value: `OBJ.GET_ACCESS_URL(OBJ.MAKE_REF('gs://image.jpg', 'myconnection'), 'r')`

ObjectRefRuntime
or
ARRAY<ObjectRefRuntime>

AnObjectRefRuntime value returned by theOBJ.GET_ACCESS_URL function. TheOBJ.GET_ACCESS_URL function takes anObjectRef value as input, which you can provide by either specifying the name of a column that containsObjectRef values, or by constructing anObjectRef value.

ObjectRefRuntime values must have theaccess_url.read_url anddetails.gcs_metadata.content_type elements of the JSON value populated.

Function call withObjectRef column:
OBJ.GET_ACCESS_URL(my_objectref_column, 'r')

Function call with constructedObjectRef value:
OBJ.GET_ACCESS_URL(OBJ.MAKE_REF('gs://image.jpg', 'myconnection'), 'r')

The function combinesSTRUCT fields similarly to aCONCAToperation and concatenates the fields in their specified order. Thesame is true for the elements of any arrays used within the struct.The following table shows some examples ofSTRUCT prompt values and howthey are interpreted:

Struct field types	Struct value	Semantic equivalent
`STRUCT<STRING, STRING, STRING>`	`('Describe the city ', my_city_column, ' in 15 words')`	'Describe the city`my_city_column_value` in 15 words'
`STRUCT<STRING, ObjectRefRuntime>`	`('Describe this city', OBJ.GET_ACCESS_URL(image_objectref_column, 'r'))`	'Describe this city'`image`

Note: To minimize Vertex AI charges, write query resultsto a table and then reference that table in theAI.GENERATE_TABLEfunction. This can help you ensure that you are sending as few rows aspossible to the model.

OUTPUT_SCHEMA: aSTRING value that specifies theschema of the output as a comma-separated list of fields. Each field consistsof a name, a data type, and an optionalOPTIONS clause in which you canspecify a description of the field.The following example shows how to specify an output schema that containstwo string fields,name andstate, with a description for thestatefield:
```
'''name STRING,state STRING OPTIONS(description = 'The 2-letter abbreviation of the state name')''' AS output_schema
```
Supported data types includeSTRING,INT64,FLOAT64,BOOL,ARRAY,andSTRUCT. For aSTRUCT data type, you can specify theOPTIONS clauseon any of its subfields:
```
'''location STRUCT<city STRING,state STRING OPTIONS(description = 'The 2-letter abbreviation of the state name')>''' AS output_schema
```

Show additional optional parameters

MAX_OUTPUT_TOKENS: anINT64 value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words.This value must be in the range[1,8192]. Specify a lower value for shorter responses and a higher value for longer responses. If you don't specify a value, the model determines an appropriate value.

TOP_P: aFLOAT64 value in the range[0.0,1.0] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. If you don't specify a value, the model determines an appropriate value.

Tokens are selected from the most to least probable until the sum of their probabilities equals theTOP_P value. For example, if tokens A, B, and C have a probability of0.3,0.2, and0.1, and theTOP_P value is0.5, then the model selects either A or B as the next token by using theTEMPERATURE value and doesn't consider C.

TEMPERATURE: aFLOAT64 value in the range[0.0,2.0] that controls the degree of randomness in token selection. LowerTEMPERATURE values are good for prompts that require a more deterministic and less open-ended or creative response, while higherTEMPERATURE values can lead to more diverse or creative results. ATEMPERATURE value of0 is deterministic, meaning that the highest probability response is always selected. If you don't specify a value, the model determines an appropriate value.

STOP_SEQUENCES: anARRAY<STRING> value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.

SAFETY_SETTINGS: anARRAY<STRUCT<STRING AS category, STRING AS threshold>> value that configures content safety thresholds to filter responses. The first element in the struct specifies a harm category, and the second element in the struct specifies a corresponding blocking threshold. The model filters out content that violate these settings. You can only specify each category once. For example, you can't specify bothSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold) andSTRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_ONLY_HIGH' AS threshold). If there is no safety setting for a given category, theBLOCK_MEDIUM_AND_ABOVE safety setting is used.
Supported categories are as follows:
- HARM_CATEGORY_HATE_SPEECH
- HARM_CATEGORY_DANGEROUS_CONTENT
- HARM_CATEGORY_HARASSMENT
- HARM_CATEGORY_SEXUALLY_EXPLICIT
Supported thresholds are as follows:
- BLOCK_NONE (Restricted)
- BLOCK_LOW_AND_ABOVE
- BLOCK_MEDIUM_AND_ABOVE (Default)
- BLOCK_ONLY_HIGH
- HARM_BLOCK_THRESHOLD_UNSPECIFIED
For more information, seeHarm categories andHow to configure content filters.
REQUEST_TYPE: aSTRING value that specifies the type of inference request to send to the Gemini model. The request type determines what quota the request uses. Valid values are as follows:
- DEDICATED: TheAI.GENERATE_TABLE function only uses Provisioned Throughput quota. TheAI.GENERATE_TABLE function returns the errorProvisioned throughput is not purchased or is not active if Provisioned Throughput quota isn't available.
- SHARED: TheAI.GENERATE_TABLE function only usesdynamic shared quota (DSQ), even if you have purchased Provisioned Throughput quota.
- UNSPECIFIED: TheAI.GENERATE_TABLE function uses quota as follows:
  - If you haven't purchased Provisioned Throughput quota, theAI.GENERATE_TABLE function uses DSQ quota.
  - If you have purchased Provisioned Throughput quota, theAI.GENERATE_TABLE function uses the Provisioned Throughput quota first. If requests exceed the Provisioned Throughput quota, the overflow traffic uses DSQ quota.
The default value isUNSPECIFIED.

Examples

The following examples demonstrate how to useAI.GENERATE_TABLE.

Format text input

The following example shows a request thatprovides a SQL schema to format the model's response. It specifies a descriptionin theOPTIONS clause for theweight field to indicate that the resultshould be given in kilograms.

SELECTaddress,age,is_married,name,phone_number,weightFROMAI.GENERATE_TABLE(MODEL`mydataset.gemini_model`,(SELECT'''John Smith is a 20-year old single man living at 1234 NW 45th St, Kirkland WA, 98033.        He has two phone numbers 123-123-1234, and 234-234-2345. He is 200.5 pounds.'''ASprompt),STRUCT('''address STRING, age INT64, is_married BOOL, name STRING, phone_number ARRAY<STRING>,            weight FLOAT64 OPTIONS(description = "in kilograms")'''ASoutput_schema,8192ASmax_output_tokens));

The results look similar to the following:

+-------------------------------------+-----+------------+------------+---------------+-----------+| address                             | age | is_married | name       | phone_number  | weight    |+-------------------------------------+-----+------------+------------+---------------+-----------+| 1234 NW 45th St, Kirkland WA, 98033 | 20  | No         | John Smith | 123-123-1234  | 90.947236 ||                                     |     |            |            | 234-234-2345  |           ||                                     |     |            |            |               |           |+-------------------------------------+-----+------------+------------+---------------+-----------+

Create a column based on image data

The following example shows how to to create and populate animage_description column by analyzing a product image that is stored asanObjectRef value in a standard table:

CREATEORREPLACETABLE`mydataset.products`ASSELECTproduct_id,product_name,image,image_descriptionFROMAI.GENERATE_TABLE(MODEL`mydataset.gemini`,(SELECT('Can you describe the following image?',OBJ.GET_ACCESS_URL(image,'r'))ASprompt,*FROM`mydataset.products`),STRUCT("image_description STRING"ASoutput_schema));

Details

The model and input table must be in the same region.

Output

AI.GENERATE_TABLE returns the following columns:

All columns in the input table.
All columns specified in theoutput_response argument.
full_response: this is theJSON responsefrom theprojects.locations.endpoints.generateContentcall to the model. The generated data is in thetext element.
status: aSTRING value that contains the API responsestatus for the corresponding row. This value is empty if the operation wassuccessful.

Locations

AI.GENERATE_TABLE must run in the sameregion or multi-region as the remote model that thefunction references.

You can create remote models over Gemini models in thesupported regionsfor the given Gemini model, and also in theUS andEU multi-regions.

Quotas

SeeVertex AI and Cloud AI service functions quotas and limits.

Known issues

This section contains information about known issues.

Resource exhausted errors

Sometimes after a query job that uses this function finishes successfully,some returned rows contain the following error message:

Aretryableerroroccurred:RESOURCEEXHAUSTEDerrorfrom<remoteendpoint>

This issue occurs because BigQuery query jobs finish successfullyeven if the function fails for some of the rows. The function fails when thevolume of API calls to the remote endpoint exceeds the quota limits for thatservice. This issue occurs most often when you are running multiple parallelbatch queries. BigQuery retries these calls, but if the retriesfail, theresource exhausted error message is returned.

To iterate through inference calls until all rows are successfully processed,you can use theBigQuery remote inference SQL scriptsor theBigQuery remote inference pipeline Dataform package.

What's next

Get step-by-step instructions on how togenerate structured data using your own data.
For more information about using Vertex AI models togenerate text and embeddings, seeGenerative AI overview.
For more information about using Cloud AI APIs to perform AI tasks, seeAI application overview.
For more information about supported SQL statements and functions forgenerative AI models, seeEnd-to-end user journeys for generative AI models.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換