Introduction to data transformation
This document describes the different ways you can transform data in yourBigQuery tables.
For more information about data integrations, seeIntroduction to loading, transforming, and exporting data.
Methods of transforming data
You can transform data in BigQuery in the following ways:
- Usedata manipulation language (DML) to transform data in yourBigQuery tables.
- Usematerialized views to automaticallycache the results of a query for increased performance and efficiency.
- Usecontinuous queries toanalyze incoming data in real time and continuously insert the output rowsinto a BigQuery table or export to Pub/Sub or Bigtable.
- UseBigQuery pipelines orDataform to develop, test, control versions,and schedule pipelines in BigQuery.
- Usedata preparations with context-aware, AI-generatedtransformation recommendations to cleanse data for analysis. Data preparationsare powered by theDataform API.
The following table shows the different characteristics of each transformationmethod.
| Transform method | Transformation target | Definition method | Transformation frequency |
|---|---|---|---|
| Data manipulation language (DML) | Table (in place) | SQL DML | User-initiated or scheduled |
| Materialized views | Materialized view | SQL query | Automatic or manual refresh |
| Continuous queries | Table,Pub/Sub topic,Bigtable table | SQL query with EXPORT DATA | Continuous |
| Dataform | Table | Dataform core (SQLX) | Scheduled (pipelines) |
| BigQuery pipelines | Table | BigQuery pipelines | Scheduled (pipelines) |
| Data preparation | Table | Visual editor | Scheduled |
You can alsoreview the change history of a BigQuery tableto examine the transformations made to a table in a specified time range.
Transform data with DML
You can usedata manipulation language (DML)to transform data in your BigQuery tables. DML statements areGoogleSQL queries thatmanipulate existing table data to add or delete rows, modify datain existing rows, or merge data with values from another table. DMLtransformations are also supported inpartitioned tables.
You can run multiple DML statements concurrently, where BigQueryqueues several DML statements that transform your data one after the other.BigQuery manageshow concurrent DML statements are run,based upon the transformation type.
Transform data with materialized views
Materialized views views areprecomputed views that periodically cache the results of a SQL query for increasedperformance and efficiency. BigQuery leverages precomputedresults from materialized views and whenever possible reads only changes fromthe base tables to compute up-to-date results.
Materialized views are precomputed in the background when the base tables change.Any incremental data changes from the base tables are automatically added to thematerialized views, with no user action required.
Transform data with continuous queries
Preview
This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.
Continuous queriesare SQL statements that run continuously. Continuous queries let you analyzeincoming data in BigQuery in real time. You can insert the outputrows produced by a continuous query into a BigQuery table orexport them to Pub/Sub or Bigtable.
Transform data with Dataform
Dataform lets you manage data transformation in the extract,load, and transform (ELT) process for data integration. After extractingraw data from source systems and loading it into BigQuery, you canuse Dataform to transform it into an organized, tested, and documentedsuite of tables. While in DML you take an imperative approach by telling BigQueryhow exactly to transform your data, in Dataform you writedeclarative statements where Dataform then determines thetransformation needed to achieve that state.
In Dataform, you can develop, test, and version controlSQL workflows for data transformationfrom data source declarations to output tables, views, or materialized views.You can develop SQL workflows with Dataform core or pure JavaScript.Dataform core is an open sourcemeta-language that extends SQL with SQLX and JavaScript. You can useDataform core to manage dependencies, set up automated dataquality testing, and document table or column descriptions within the code.
Dataform stores your SQL workflow code inrepositoriesand uses Git to track file changes. Development workspaces in Dataformlet you work on the contents of the repository without affecting the work ofothers who are working in the same repository. You can connect Dataformrepositories to third-party Git providers, including Azure DevOps Services,Bitbucket, GitHub, and GitLab.
You can run or schedule SQL workflowswith Dataform release configurations and workflow configurations.Alternatively, you can schedule executions either with Cloud Composer, orwith Workflows and Cloud Scheduler. During execution,Dataform executes SQL queries in BigQuery inorder of object dependencies in your SQL workflow. After execution, you can useyour defined tables and views for analysis in BigQuery.
To learn more about creating data transformation SQL workflows inDataform, seeDataform overview andDataform features.
Transform data with BigQuery pipelines
BigQuery pipelines are powered by Dataformand let you create and manage data transformationin extract, load, transform (ELT) or extract, transform, load (ETL) processes.
You can create and manage BigQuery pipelines in a visual wayin BigQuery Studio.
To learn more about creating BigQuery pipelines,seeCreate pipelines.
Prepare data in BigQuery
To reduce the toil of data preparation, BigQuery lets you cleandata with Gemini-generated transformation suggestions. Data preparationin BigQuery offers the following assistance:
- Applying transformations and data quality rules
- Standardizing and enriching data
- Automating schema mapping
You can validate the results in a preview of your data before executing thechanges on all your data.
For more information, seeIntroduction to BigQuery data preparation.
What's next
- To learn more about DML, seeTransform data with data manipulation language (DML).
- To learn more about Dataform, seeDataform overview.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.