Prepare data with Gemini

This document describes how to generate and manage SQL code suggestions for yourdata preparations in BigQuery.

For more information, seeIntroduction to BigQuery data preparation.

Before you begin

Open the data preparation editor in BigQuery

You can open the data preparation editor in BigQuery by creating a newdata preparation, creating one from an existing table or Cloud Storage file,or opening an existing data preparation. For more information about what happenswhen you create a data preparation, seeData preparation entry points.

On theBigQuery page, you can go to the data preparation editor inthe following ways:

Create new

To create a new data preparation in BigQuery, follow these steps:

In the Google Cloud console, go to theBigQuery page.
Go to BigQuery
Go to theCreate new list and clickData preparation. The data preparation editor is displayed in a new untitled data preparation tab.
In the editor's search bar, enter your table name or keywords and select a table. The data preparation editor for the table opens, showing a preview of your data on theData tab, and an initial set of data preparation suggestions from Gemini.
Optional: To simplify your view, turn on full screen mode by clickingfullscreenFull screen.
Optional: To view data preparation details, version history, add new comments, or reply to existing comments, use the toolbar.
TheComments toolbar feature is inPreview. To provide feedback or request support for this feature, send an email tobqui-workspace-pod@google.com.

Create from a table

To create a new data preparation from an existing table, follow these steps:

In the Google Cloud console, go to theBigQuery page.
Go to BigQuery
In the left pane, clickExplorer.
In theExplorer pane, expand your project, clickDatasets, and then select a dataset.
For your table name, clickmore_vertActions> Open in> Data preparation. The data preparation editor for the table opens, showing a preview of your data on theData tab, and an initial set of data preparation suggestions from Gemini.
Optional: To simplify your view, turn on full screen mode by clickingfullscreenFull screen.
Optional: To view data preparation details, version history, add new comments, or reply to existing comments, use the toolbar.
TheComments toolbar feature is inPreview. To provide feedback or request support for this feature, send an email tobqui-workspace-pod@google.com.

Create from a Cloud Storage file

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of the Service Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

Note: To provide feedback or request support for this feature, contact bq-datapreparation-feedback@google.com.

To create a new data preparation from a file in Cloud Storage, follow these steps:

Load the file

In the Google Cloud console, go to theBigQuery page.
Go to BigQuery
In theCreate new list, clickData preparation. The data preparation editor is displayed in a new untitled data preparation tab.
In the list of data sources, clickGoogle Cloud Storage. ThePrepare data dialog opens.
In theSource section, select the file from a Cloud Storage bucket or enter the path of your source. For example, enter a path to your CSV file:STORAGE_BUCKET_NAME/FILE_NAME.csv. Wildcard searches, such as*.csv, are supported.

The file format is automatically detected. Supported formats are Avro, CSV, JSONL, ORC, and Parquet. Other compatible file types, such as DAT, TSV, and TXT, are read as the CSV format.
Define the external staging table where you'll upload files. In theStaging table section, enter the project, dataset, and table names for the new table.
In theSchema section, review the schema. Gemini checks your file for column names. If it doesn't find any, it provides suggestions.

By default, your data preparation file loads data as strings. You can define more specific data types when youprepare the file data.
Optional: InAdvanced options, you can add more information, such as the number of errors allowed before the job fails. Gemini provides additional options based on your file's content.
ClickCreate. The data preparation editor for the file opens, showing a preview of your data on theData tab, and an initial set of data preparation suggestions from Gemini.
Optional: To simplify your view, turn on full screen mode by clickingfullscreenFull screen.
Optional: To view data preparation details, version history, add new comments, or reply to existing comments, use the toolbar.
TheComments toolbar feature is inPreview. To provide feedback or request support for this feature, send an email tobqui-workspace-pod@google.com.

Prepare the file

In the data view, prepare the staged Cloud Storage data that you loaded by following these steps:

Optional: Define stronger data types for relevant columns by browsing the suggestion list for transformation suggestions or selecting a column and generating suggestions for it.
Optional: Define validation rules. For more information, seeConfigure the error table and add a validation rule.
Add a destination table.
To load the Cloud Storage data into the destination table,run the data preparation.
Optional: Schedule the data preparation run.
Optional: Optimize data preparation by incrementally processing data.

Open existing

To open the editor for an existing data preparation, follow these steps:

In the Google Cloud console, go to theBigQuery page.
Go to BigQuery
In the left pane, clickExplorer.
In theExplorer pane, click your project name, and then clickData preparations.
Select the existing data preparation. The graph view of the data preparation pipeline is displayed.
Select one of the nodes in the graph. The data preparation editor for the table opens, showing a preview of your data on theData tab, and an initial set of data preparation suggestions from Gemini.
Optional: To simplify your view, turn on full screen mode by clickingfullscreenFull screen.
Optional: To view data preparation details, version history, add new comments, or reply to existing comments, use the toolbar.
TheComments toolbar feature is inPreview. To provide feedback or request support for this feature, send an email tobqui-workspace-pod@google.com.

Add data preparation steps

You prepare data in steps. You can preview or apply steps suggested byGemini. You can also improve the suggestions, or apply your ownsteps.

Apply and improve suggestions by Gemini

When you open the data preparation editor for your table, Geminiinspects the data and schema from the table you loaded and generates filter andtransformation suggestions. The suggestions appear on cards in theStepslist.

The following image shows where you can apply and improve steps suggested byGemini:

Data view in the data preparation editor showing options to preview, edit, or apply suggestions from Gemini.

To apply a suggestion by Gemini as a data preparation step, dothe following:

In the data view, click a column name or a particular cell.Gemini generates suggestions for filtering and transformingthe data.
Optional: To improve the suggestions, edit the values of one to three cellsin the table to demonstrate what the values in a column should look like.For example, enter a date the way you want to format all dates.Gemini generates new suggestions based on your changes.
Note: Your example change to the data isn't saved.
The following image shows how you can edit values to improve the stepssuggested by Gemini:
Select a suggestion card.
1. Optional: To preview the result of the suggestion card, clickPreview.
2. Optional: To modify the suggestion card using natural language, clickEdit.
ClickApply.

Add steps with natural language or SQL expressions

If existing suggestions don't meet your needs, add a step. Choose columns or astep type, then describe what you want using natural language.

Add a transformation

In the data or schema view, choose theTransform option. You can alsochoose columns or add examples to help Gemini understand yourdata transformation.
In theDescription field, enter a prompt, such asConvert the statecolumn to uppercase.
ClicksendSend.
Gemini generates a SQL expression and a new description basedon your prompt.
In theTarget column list, select or enter a column name.
Optional: To update the SQL expression, revise the prompt and clicksendSend, or manually enter a SQL expression.
Optional: ClickPreview and review the step.
ClickApply.

Flatten JSON columns

To make key-value pairs easier to access and analyze, flatten JSON columns. Forexample, if you have a JSON column nameduser_properties that contains thekeyscountry anddevice_type, flattening this column extractscountry anddevice_type into their own top-level columns so you can use them directly inyour analysis.

Gemini for BigQuery suggests operations thatextract fields only from the top level of the JSON. If these extracted fieldscontain more JSON objects, you can flatten them in additional steps to accesstheir contents.

In the data view for a JSON source table, choose a column or cells.
ClickFlatten to generate suggestions.
Optional: To update the SQL expression, you can manually enter a SQLexpression.
Optional: ClickPreview and review the step.
ClickApply.

Flattening has the following behaviors:

TheFlatten option appears in the data view after you select cells orcolumns containing JSON. It doesn't appear by default when you clickAddstep.
If a JSON key isn't present in the selected rows, the generated suggestiondoesn't contain that key. This issue might cause some columns to be left outwhen data is flattened.
If column names collide during flattening, the repeated column names end inthis format:_<i>. For example, if there's already a column namedaddress, the new flattened column name isaddress_1.
Flattened column names follow the BigQuery column naming conventions.
If you leave the JSON key field empty, the default column name format isf<i>_.

Flatten`RECORD` or`STRUCT` columns

To make nested fields easier to access and analyze, flatten columns with theRECORD orSTRUCT data type. For example, if you have anevent_log recordthat contains the fieldstimestamp andaction, flattening this recordextractstimestamp andaction into their own top-level columns so you cantransform them directly.

This process extracts all nested columns from the record, up to 10 levels deep,and creates a new column for each. The new column names are created by combiningthe parent column's name with the nested field name, separated by an underscore(for example,PARENT-COLUMN-NAME_FIELD-NAME). The originalcolumn is dropped. To keep the original column, you candelete theDrop column step from theAppliedsteps list.

To flatten records, follow these steps:

In the data view for a source table, choose a record column.
ClickFlatten to generate suggestions.
Optional: To update the SQL expression, you can manually enter a SQLexpression.
Optional: ClickPreview and review the step.
ClickApply.

Note: Flattening a record doesn't expand nested JSON objects or repeated fields (arrays) within the record. To access their contents, you must flatten these fields in separate steps.

Unnest arrays

Unnesting expands each element in an array into its own row, duplicating theother original column values into each new row. This action is useful foranalyzing columns that contain arrays with a variable number of elements, suchas lists of API responses.

You can unnest the following column types:

ARRAY data type: Unnests into elements of the array's base type. Forexample, anARRAY<STRUCT<...>> unnesting results in elements of typeSTRUCT.
JSON columns: Unnests JSON arrays within the column into elements oftypeJSON.

When you unnest an array, a new column is created that contains the unnestedelements. By default, the original array column is dropped. To keep the originalcolumn,delete theDrop column step from theApplied steps list.

To unnest arrays, follow these steps:

In the data view for a source table, choose anARRAY column.
ClickUnnest to generate suggestions.
Optional: To update the SQL expression, you can manually enter a SQLexpression.
Optional: ClickPreview and review the step.
ClickApply.

Note: When unnesting an array column, you might encounter an error if a column in your original table has the same name as a column in the unnested array. To work around this, rename the conflicting column in your original table before performing the unnest operation.

Filter rows

To add a filter that removes rows, follow these steps:

In the data or schema view, choose theFilter option. You can alsochoose columns to help Gemini understand your data filter.
In theDescription field, enter a prompt, such asColumn ID should notbe NULL.
ClickGenerate.Gemini generates a SQL expression and a new description basedon your prompt.
Optional: To update the SQL expression, revise the prompt and clicksendSend, or enter a SQL expression manually.
Optional: ClickPreview and review the step.
ClickApply.

Filter expression format

SQL expressions for filters retain rows that match the specified condition. Thisis equivalent to aSELECT … WHERESQL_EXPRESSIONstatement.

For example, to retain records where the columnyear is greater than or equalto2000, the condition isyear >= 2000.

Expressions must follow the BigQuery SQL syntax for the WHERE clause.

Deduplicate data

To remove duplicate rows from your data, follow these steps:

In the data or schema view, choose theDeduplicate option.Gemini provides an initial deduplication suggestion.
Optional: To refine the suggestion, enter a new description and clicksendSend.
Optional: To manually configure the deduplication step, use the followingoptions:
- In theRecord choosing list, select one of the following strategies:
  - First: For each group of rows with the same deduplication keyvalues, this strategy chooses the first row based on theORDER BYexpression and removes the rest.
  - Last: For each group of rows with the same deduplication keyvalues, this strategy chooses the last row based on theORDER BYexpression and removes the rest.
  - Any: For each group of rows with the same deduplication keyvalues, this strategy chooses any row from that group and removesthe rest.
  - Distinct: Removes all duplicate rows across all columns in thetable.
- In theDeduplication keys field, choose one or more columns orexpressions to identify duplicate rows. This field is applicablewhen the record choosing strategy isFirst,Last, orAny.
- In theOrder by expression field, enter an expression that definesthe row order. For example, to choose the most recent row, enterdatetime DESC. To choose the first row alphabetically by name, enter acolumn name likelast_name. The expression follows the same rules asthe standardORDER BY clause in BigQuery. This fieldis only applicable when the record choosing strategy isFirst orLast.
Optional: ClickPreview and review the step.
ClickApply.

Delete a column

To delete one or more columns from a data preparation, follow these steps:

In the data or schema view, select the columns you want to drop.
ClickDrop. A new applied step is added for the deleted columns.

Add a join operation with Gemini

To add a join operation step between two sources in your data preparation,follow these steps:

In the data view for a node in your data preparation, go to theSuggestions list, and click theJoin option.
In theAdd join dialog, clickBrowse, and then select the othertable involved in the join operation (referred to as the right side of thejoin).
Optional: Select the type ofjoin operation that you want to perform, such asInner join.
Review the Gemini-generated join key information in thefollowing fields:
- Join description: The natural language description of the SQLexpression for the join operation. When you edit this description andclicksendSend, Gemini suggests new SQL join conditions.
- Join conditions: The SQL expressions within theON clause for thejoin operation. You can use theL andR qualifiers to refer to theleft and right source tables, respectively. For example, to join thecustomer_id column from the left table to thecustomer_id column fromthe right table, enterL.customerId = R.customerId. These qualifiersaren't case-sensitive.
  Note: If theJoin conditions field is empty, the join operation typeis automatically set toCross join, even if you selected a differentjoin type in the previous step.
Optional: To refine the suggestions from Gemini, edit theJoin description field, and then clicksendSend.
Optional: To preview the join operation settings of your data preparation,clickPreview.
ClickApply.
The join operation step is created. The source table that you selected (theright side of the join) and the join operation are reflected in the list ofapplied steps and in the nodes in the graph view of your data preparation.

Aggregate data

In the data or schema view, choose theAggregate option.
In theDescription field, enter a prompt, such asFind the totalrevenue for a region.
ClickSend.
Gemini generates grouping keys and aggregation expressionsbased on your prompt.
Optional: Edit the generated grouping keys or aggregation expressions,if needed.
Optional: You can manually add grouping keys and aggregation expressions.
- In theGrouping keys field, enter a column name or expression. If youleave it blank, the resulting table has one row. If you enter anexpression, it must have an alias (anAS clause)—for exampleEXTRACT(YEAR FROM order_date) AS order_year. No duplicates areallowed.
- In theAggregation expressions field, enter an aggregationexpression that has an alias (anAS clause)—for exampleSUM(quantity) AS total_quantity. You can enter multiple,comma-separated expressions. No duplicates are allowed. For a list ofthe supported aggregation expressions, see Aggregate functions.
Optional: ClickPreview and review the step.
ClickApply.

Configure the error table and add a validation rule

You can add a filter that creates a validation rule, which sends errors to anerror table or fails the data preparation run.

Configure the error table

To configure your error table, follow these steps:

In the data preparation editor, go to the toolbar and clickMore> Error table.
ClickEnable error table.
Define the table location.
Optional: Define a maximum duration for keeping errors.
ClickSave.

Add a validation rule

To add a validation rule, follow these steps:

In the data or schema view, click theFilter option. You can also choosecolumns to help Gemini understand your data filter.
Enter a description for the step.
Enter a SQL expression, in the form of aWHERE clause.
Optional: If you want the SQL expression to act as a validation rule, selecttheFailed validation rows go to error table checkbox. You can alsochange a filter to a validation in the data preparation toolbar by clickingMore> Error table.
Optional: ClickPreview and review the step.
ClickApply.

Add or change a destination table

A destination table is required to run or schedule your data preparation. Toadd or change a destination table for the output of your data preparation,follow these steps:

In the data or schema view, clickDestination in theSuggestionslist.
Select the project where the destination table is stored.
Select one of the datasets, or load a new dataset.
Enter a destination table. If the table doesn't exist, the data preparationcreates a new table on the first run. For more information, seeWrite mode.
Select your dataset as the destination dataset.
ClickSave.

View the data sample and schema for an applied step

To view sample and schema details at a particular step in the data preparation,do the following:

In the data preparation editor, go to theSteps list and clickAppliedsteps.
Select a step. TheData andSchema tabs appear, displaying the datasample and schema as of this particular step.

Edit an applied step

To edit an applied step, do the following:

In the data preparation editor, go to theSteps list and clickAppliedsteps.
Select a step.
Next to the step, clickmore_vertMenu> Edit.
In theEdit Applied Step dialog, you can do the following:
- Edit the description of the step.
- Get suggestions from Gemini by editing the descriptionand clickingsendSend.
- Edit the SQL expression.
In theTarget column field, select a column.
Optional: ClickPreview and review the step.
ClickApply.

Delete an applied step

To delete an applied step, do the following:

In the data preparation editor, go to theSteps list and clickAppliedsteps.
Select a step.
Clickmore_vertMenu> Delete.

Run the data preparation

After you've added your data preparation steps,configured the destination,and fixed any validation errors, you can perform test runs on a sample of thedata, or deploy the steps and schedule data preparation runs. For moreinformation, seeSchedule data preparations.

Refresh data preparation samples

Data in the sample isn't automatically refreshed. If data in the source tablesfor the data preparation has changed, but the changes aren't reflected in thedata sample of the preparation, clickMore> Refresh sample.

What's next

Learn how toschedule data preparations.
Learn aboutmanaging data preparations.
Learn aboutquotas and limits for Gemini in BigQuery.
ReviewGemini in BigQuery pricing.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換