Prepare data with Gemini
This document describes how to generate and manage SQL code suggestions for yourdata preparations in BigQuery.
For more information, seeIntroduction to BigQuery data preparation.
Before you begin
Open the data preparation editor in BigQuery
You can open the data preparation editor in BigQuery by creating a newdata preparation, creating one from an existing table or Cloud Storage file,or opening an existing data preparation. For more information about what happenswhen you create a data preparation, seeData preparation entry points.
On theBigQuery page, you can go to the data preparation editor inthe following ways:
Create new
To create a new data preparation in BigQuery, follow these steps:
- In the Google Cloud console, go to theBigQuery page.
Go to BigQuery - Go to theCreate new list and clickData preparation. The data preparation editor is displayed in a new untitled data preparation tab.
- In the editor's search bar, enter your table name or keywords and select a table. The data preparation editor for the table opens, showing a preview of your data on theData tab, and an initial set of data preparation suggestions from Gemini.
- Optional: To simplify your view, turn on full screen mode by clickingfullscreenFull screen.
Optional: To view data preparation details, version history, add new comments, or reply to existing comments, use the toolbar.

TheComments toolbar feature is inPreview. To provide feedback or request support for this feature, send an email tobqui-workspace-pod@google.com.
Create from a table
To create a new data preparation from an existing table, follow these steps:
- In the Google Cloud console, go to theBigQuery page.
Go to BigQuery - In the left pane, clickExplorer.
- In theExplorer pane, expand your project, clickDatasets, and then select a dataset.
- For your table name, clickmore_vertActions> Open in> Data preparation. The data preparation editor for the table opens, showing a preview of your data on theData tab, and an initial set of data preparation suggestions from Gemini.
- Optional: To simplify your view, turn on full screen mode by clickingfullscreenFull screen.
Optional: To view data preparation details, version history, add new comments, or reply to existing comments, use the toolbar.

TheComments toolbar feature is inPreview. To provide feedback or request support for this feature, send an email tobqui-workspace-pod@google.com.
Create from a Cloud Storage file
Preview This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.
To create a new data preparation from a file in Cloud Storage, follow these steps:
Load the file
- In the Google Cloud console, go to theBigQuery page.
Go to BigQuery - In theCreate new list, clickData preparation. The data preparation editor is displayed in a new untitled data preparation tab.
- In the list of data sources, clickGoogle Cloud Storage. ThePrepare data dialog opens.
- In theSource section, select the file from a Cloud Storage bucket or enter the path of your source. For example, enter a path to your CSV file:
STORAGE_BUCKET_NAME/FILE_NAME.csv. Wildcard searches, such as*.csv, are supported.
The file format is automatically detected. Supported formats are Avro, CSV, JSONL, ORC, and Parquet. Other compatible file types, such as DAT, TSV, and TXT, are read as the CSV format. - Define the external staging table where you'll upload files. In theStaging table section, enter the project, dataset, and table names for the new table.
- In theSchema section, review the schema. Gemini checks your file for column names. If it doesn't find any, it provides suggestions.
By default, your data preparation file loads data as strings. You can define more specific data types when youprepare the file data. - Optional: InAdvanced options, you can add more information, such as the number of errors allowed before the job fails. Gemini provides additional options based on your file's content.
- ClickCreate. The data preparation editor for the file opens, showing a preview of your data on theData tab, and an initial set of data preparation suggestions from Gemini.
- Optional: To simplify your view, turn on full screen mode by clickingfullscreenFull screen.
Optional: To view data preparation details, version history, add new comments, or reply to existing comments, use the toolbar.

TheComments toolbar feature is inPreview. To provide feedback or request support for this feature, send an email tobqui-workspace-pod@google.com.
Prepare the file
In the data view, prepare the staged Cloud Storage data that you loaded by following these steps:
- Optional: Define stronger data types for relevant columns by browsing the suggestion list for transformation suggestions or selecting a column and generating suggestions for it.
- Optional: Define validation rules. For more information, seeConfigure the error table and add a validation rule.
- Add a destination table.
- To load the Cloud Storage data into the destination table,run the data preparation.
- Optional: Schedule the data preparation run.
- Optional: Optimize data preparation by incrementally processing data.
Open existing
To open the editor for an existing data preparation, follow these steps:
- In the Google Cloud console, go to theBigQuery page.
Go to BigQuery - In the left pane, clickExplorer.
- In theExplorer pane, click your project name, and then clickData preparations.
- Select the existing data preparation. The graph view of the data preparation pipeline is displayed.
- Select one of the nodes in the graph. The data preparation editor for the table opens, showing a preview of your data on theData tab, and an initial set of data preparation suggestions from Gemini.
- Optional: To simplify your view, turn on full screen mode by clickingfullscreenFull screen.
Optional: To view data preparation details, version history, add new comments, or reply to existing comments, use the toolbar.

TheComments toolbar feature is inPreview. To provide feedback or request support for this feature, send an email tobqui-workspace-pod@google.com.
Add data preparation steps
You prepare data in steps. You can preview or apply steps suggested byGemini. You can also improve the suggestions, or apply your ownsteps.
Apply and improve suggestions by Gemini
When you open the data preparation editor for your table, Geminiinspects the data and schema from the table you loaded and generates filter andtransformation suggestions. The suggestions appear on cards in theStepslist.
The following image shows where you can apply and improve steps suggested byGemini:

To apply a suggestion by Gemini as a data preparation step, dothe following:
- In the data view, click a column name or a particular cell.Gemini generates suggestions for filtering and transformingthe data.
Optional: To improve the suggestions, edit the values of one to three cellsin the table to demonstrate what the values in a column should look like.For example, enter a date the way you want to format all dates.Gemini generates new suggestions based on your changes.
Note: Your example change to the data isn't saved.The following image shows how you can edit values to improve the stepssuggested by Gemini:

Select a suggestion card.
- Optional: To preview the result of the suggestion card, clickPreview.
- Optional: To modify the suggestion card using natural language, clickEdit.
ClickApply.
Add steps with natural language or SQL expressions
If existing suggestions don't meet your needs, add a step. Choose columns or astep type, then describe what you want using natural language.
Add a transformation
- In the data or schema view, choose theTransform option. You can alsochoose columns or add examples to help Gemini understand yourdata transformation.
- In theDescription field, enter a prompt, such as
Convert the statecolumn to uppercase. ClicksendSend.
Gemini generates a SQL expression and a new description basedon your prompt.
In theTarget column list, select or enter a column name.
Optional: To update the SQL expression, revise the prompt and clicksendSend, or manually enter a SQL expression.
Optional: ClickPreview and review the step.
ClickApply.
Flatten JSON columns
To make key-value pairs easier to access and analyze, flatten JSON columns. Forexample, if you have a JSON column nameduser_properties that contains thekeyscountry anddevice_type, flattening this column extractscountry anddevice_type into their own top-level columns so you can use them directly inyour analysis.
Gemini for BigQuery suggests operations thatextract fields only from the top level of the JSON. If these extracted fieldscontain more JSON objects, you can flatten them in additional steps to accesstheir contents.
- In the data view for a JSON source table, choose a column or cells.
- ClickFlatten to generate suggestions.
- Optional: To update the SQL expression, you can manually enter a SQLexpression.
- Optional: ClickPreview and review the step.
- ClickApply.
Flattening has the following behaviors:
- TheFlatten option appears in the data view after you select cells orcolumns containing JSON. It doesn't appear by default when you clickAddstep.
- If a JSON key isn't present in the selected rows, the generated suggestiondoesn't contain that key. This issue might cause some columns to be left outwhen data is flattened.
- If column names collide during flattening, the repeated column names end inthis format:
_<i>. For example, if there's already a column namedaddress, the new flattened column name isaddress_1. - Flattened column names follow the BigQuerycolumn naming conventions.
- If you leave the JSON key field empty, the default column name format is
f<i>_.
FlattenRECORD orSTRUCT columns
To make nested fields easier to access and analyze, flatten columns with theRECORD orSTRUCT data type. For example, if you have anevent_log recordthat contains the fieldstimestamp andaction, flattening this recordextractstimestamp andaction into their own top-level columns so you cantransform them directly.
This process extracts all nested columns from the record, up to 10 levels deep,and creates a new column for each. The new column names are created by combiningthe parent column's name with the nested field name, separated by an underscore(for example,PARENT-COLUMN-NAME_FIELD-NAME). The originalcolumn is dropped. To keep the original column, you candelete theDrop column step from theAppliedsteps list.
To flatten records, follow these steps:
- In the data view for a source table, choose a record column.
- ClickFlatten to generate suggestions.
- Optional: To update the SQL expression, you can manually enter a SQLexpression.
- Optional: ClickPreview and review the step.
- ClickApply.
Unnest arrays
Unnesting expands each element in an array into its own row, duplicating theother original column values into each new row. This action is useful foranalyzing columns that contain arrays with a variable number of elements, suchas lists of API responses.
You can unnest the following column types:
ARRAYdata type: Unnests into elements of the array's base type. Forexample, anARRAY<STRUCT<...>>unnesting results in elements of typeSTRUCT.JSONcolumns: Unnests JSON arrays within the column into elements oftypeJSON.
When you unnest an array, a new column is created that contains the unnestedelements. By default, the original array column is dropped. To keep the originalcolumn,delete theDrop column step from theApplied steps list.
To unnest arrays, follow these steps:
- In the data view for a source table, choose an
ARRAYcolumn. - ClickUnnest to generate suggestions.
- Optional: To update the SQL expression, you can manually enter a SQLexpression.
- Optional: ClickPreview and review the step.
- ClickApply.
Filter rows
To add a filter that removes rows, follow these steps:
- In the data or schema view, choose theFilter option. You can alsochoose columns to help Gemini understand your data filter.
- In theDescription field, enter a prompt, such as
Column ID should notbe NULL. - ClickGenerate.Gemini generates a SQL expression and a new description basedon your prompt.
- Optional: To update the SQL expression, revise the prompt and clicksendSend, or enter a SQL expression manually.
- Optional: ClickPreview and review the step.
- ClickApply.
Filter expression format
SQL expressions for filters retain rows that match the specified condition. Thisis equivalent to aSELECT … WHERESQL_EXPRESSIONstatement.
For example, to retain records where the columnyear is greater than or equalto2000, the condition isyear >= 2000.
Expressions must follow the BigQuery SQL syntax for theWHERE clause.
Deduplicate data
To remove duplicate rows from your data, follow these steps:
- In the data or schema view, choose theDeduplicate option.Gemini provides an initial deduplication suggestion.
- Optional: To refine the suggestion, enter a new description and clicksendSend.
- Optional: To manually configure the deduplication step, use the followingoptions:
- In theRecord choosing list, select one of the following strategies:
- First: For each group of rows with the same deduplication keyvalues, this strategy chooses the first row based on the
ORDER BYexpression and removes the rest. - Last: For each group of rows with the same deduplication keyvalues, this strategy chooses the last row based on the
ORDER BYexpression and removes the rest. - Any: For each group of rows with the same deduplication keyvalues, this strategy chooses any row from that group and removesthe rest.
- Distinct: Removes all duplicate rows across all columns in thetable.
- First: For each group of rows with the same deduplication keyvalues, this strategy chooses the first row based on the
- In theDeduplication keys field, choose one or more columns orexpressions to identify duplicate rows. This field is applicablewhen the record choosing strategy isFirst,Last, orAny.
- In theOrder by expression field, enter an expression that definesthe row order. For example, to choose the most recent row, enter
datetime DESC. To choose the first row alphabetically by name, enter acolumn name likelast_name. The expression follows the same rules asthe standardORDER BYclause in BigQuery. This fieldis only applicable when the record choosing strategy isFirst orLast.
- In theRecord choosing list, select one of the following strategies:
- Optional: ClickPreview and review the step.
- ClickApply.
Delete a column
To delete one or more columns from a data preparation, follow these steps:
- In the data or schema view, select the columns you want to drop.
- ClickDrop. A new applied step is added for the deleted columns.
Add a join operation with Gemini
To add a join operation step between two sources in your data preparation,follow these steps:
- In the data view for a node in your data preparation, go to theSuggestions list, and click theJoin option.
- In theAdd join dialog, clickBrowse, and then select the othertable involved in the join operation (referred to as the right side of thejoin).
- Optional: Select the type ofjoin operation that you want to perform, such asInner join.
Review the Gemini-generated join key information in thefollowing fields:
- Join description: The natural language description of the SQLexpression for the join operation. When you edit this description andclicksendSend, Gemini suggests new SQL join conditions.
Join conditions: The SQL expressions within the
Note: If theJoin conditions field is empty, the join operation typeis automatically set toCross join, even if you selected a differentjoin type in the previous step.ONclause for thejoin operation. You can use theLandRqualifiers to refer to theleft and right source tables, respectively. For example, to join thecustomer_idcolumn from the left table to thecustomer_idcolumn fromthe right table, enterL.customerId = R.customerId. These qualifiersaren't case-sensitive.
Optional: To refine the suggestions from Gemini, edit theJoin description field, and then clicksendSend.
Optional: To preview the join operation settings of your data preparation,clickPreview.
ClickApply.
The join operation step is created. The source table that you selected (theright side of the join) and the join operation are reflected in the list ofapplied steps and in the nodes in the graph view of your data preparation.
Aggregate data
- In the data or schema view, choose theAggregate option.
- In theDescription field, enter a prompt, such as
Find the totalrevenue for a region. ClickSend.
Gemini generates grouping keys and aggregation expressionsbased on your prompt.
Optional: Edit the generated grouping keys or aggregation expressions,if needed.
Optional: You can manually add grouping keys and aggregation expressions.
- In theGrouping keys field, enter a column name or expression. If youleave it blank, the resulting table has one row. If you enter anexpression, it must have an alias (an
ASclause)—for exampleEXTRACT(YEAR FROM order_date) AS order_year. No duplicates areallowed. - In theAggregation expressions field, enter an aggregationexpression that has an alias (an
ASclause)—for exampleSUM(quantity) AS total_quantity. You can enter multiple,comma-separated expressions. No duplicates are allowed. For a list ofthe supported aggregation expressions, seeAggregate functions.
- In theGrouping keys field, enter a column name or expression. If youleave it blank, the resulting table has one row. If you enter anexpression, it must have an alias (an
Optional: ClickPreview and review the step.
ClickApply.
Configure the error table and add a validation rule
You can add a filter that creates a validation rule, which sends errors to anerror table or fails the data preparation run.
Configure the error table
To configure your error table, follow these steps:
- In the data preparation editor, go to the toolbar and clickMore> Error table.
- ClickEnable error table.
- Define the table location.
- Optional: Define a maximum duration for keeping errors.
- ClickSave.
Add a validation rule
To add a validation rule, follow these steps:
- In the data or schema view, click theFilter option. You can also choosecolumns to help Gemini understand your data filter.
- Enter a description for the step.
- Enter a SQL expression, in the form of a
WHEREclause. - Optional: If you want the SQL expression to act as a validation rule, selecttheFailed validation rows go to error table checkbox. You can alsochange a filter to a validation in the data preparation toolbar by clickingMore> Error table.
- Optional: ClickPreview and review the step.
- ClickApply.
Add or change a destination table
A destination table is required to run or schedule your data preparation. Toadd or change a destination table for the output of your data preparation,follow these steps:
- In the data or schema view, clickDestination in theSuggestionslist.
- Select the project where the destination table is stored.
- Select one of the datasets, or load a new dataset.
- Enter a destination table. If the table doesn't exist, the data preparationcreates a new table on the first run. For more information, seeWrite mode.
- Select your dataset as the destination dataset.
- ClickSave.
View the data sample and schema for an applied step
To view sample and schema details at a particular step in the data preparation,do the following:
- In the data preparation editor, go to theSteps list and clickAppliedsteps.
- Select a step. TheData andSchema tabs appear, displaying the datasample and schema as of this particular step.
Edit an applied step
To edit an applied step, do the following:
- In the data preparation editor, go to theSteps list and clickAppliedsteps.
- Select a step.
- Next to the step, clickmore_vertMenu> Edit.
- In theEdit Applied Step dialog, you can do the following:
- Edit the description of the step.
- Get suggestions from Gemini by editing the descriptionand clickingsendSend.
- Edit the SQL expression.
- In theTarget column field, select a column.
- Optional: ClickPreview and review the step.
- ClickApply.
Delete an applied step
To delete an applied step, do the following:
- In the data preparation editor, go to theSteps list and clickAppliedsteps.
- Select a step.
- Clickmore_vertMenu> Delete.
Run the data preparation
After you've added your data preparation steps,configured the destination,and fixed any validation errors, you can perform test runs on a sample of thedata, or deploy the steps and schedule data preparation runs. For moreinformation, seeSchedule data preparations.
Refresh data preparation samples
Data in the sample isn't automatically refreshed. If data in the source tablesfor the data preparation has changed, but the changes aren't reflected in thedata sample of the preparation, clickMore> Refresh sample.
What's next
- Learn how toschedule data preparations.
- Learn aboutmanaging data preparations.
- Learn aboutquotas and limits for Gemini in BigQuery.
- ReviewGemini in BigQuery pricing.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.