Use nested and repeated fields
BigQuery can be used with many different data modelling methods, andgenerally provides high performance across many data model methodologies. To furthertune a data model for performance, one method you might consider is data denormalization,which means adding columns of data to a single table to reduce or remove table joins.
Best practice: Use nested and repeated fields to denormalize datastorage and increase query performance.
Denormalization is a common strategy for increasing read performance forrelational datasets that were previously normalized. The recommended way todenormalize data in BigQuery is to use nested and repeatedfields. It's best to use this strategy when the relationships are hierarchicaland frequently queried together, such as in parent-child relationships.
The storage savings from using normalized data has less of an effect in modernsystems. Increases in storage costs are worth the performance gains of usingdenormalized data. Joins require data coordination (communicationbandwidth). Denormalization localizes the data to individualslots, so that execution can be done in parallel.
To maintain relationships while denormalizing your data, you can use nestedand repeated fields instead of completely flattening your data. When relationaldata is completely flattened, network communication (shuffling) can negativelyimpact query performance.
For example, denormalizing an orders schema without using nested and repeatedfields might require you to group the data by a field likeorder_id(when there is a one-to-many relationship). Because of the shuffling involved,grouping the data is less effective than denormalizing the data by usingnested and repeated fields.
In some circumstances, denormalizing your data and using nested and repeatedfields doesn't result in increased performance. For example, star schemas aretypically optimized schemas for analytics, and as a result, performance mightnot be significantly different if you attempt to denormalize further.
Using nested and repeated fields
BigQuery doesn't require a completely flat denormalization. Youcan use nested and repeated fields to maintain relationships.
Nesting data (
STRUCT)- Nesting data lets you represent foreign entities inline.
- Querying nested data uses "dot" syntax to reference leaf fields, which issimilar to the syntax using a join.
- Nested data is represented as a
STRUCTtypein GoogleSQL.
Repeated data (
ARRAY)- Creating a field of type
RECORDwith the mode set toREPEATEDletsyou preserve a one-to-many relationship inline (so long as the relationshipisn't high cardinality). - With repeated data, shuffling is not necessary.
- Repeated data is represented as an
ARRAY. You can use anARRAYfunctionin GoogleSQL when you query the repeated data.
- Creating a field of type
Nested and repeated data (
ARRAYofSTRUCTs)- Nesting and repetition complement each other.
- For example, in a table of transaction records, you could include an arrayof line item
STRUCTs.
For more information, seeSpecify nested and repeated columns in table schemas.
For more information about denormalizing data, seeDenormalization.
Example
Consider anOrders table with a row for each line item sold:
| Order_Id | Item_Name |
|---|---|
| 001 | A1 |
| 001 | B1 |
| 002 | A1 |
| 002 | C1 |
If you wanted to analyze data from this table, you would need to use aGROUP BY clause, similar to the following:
SELECTCOUNT(Item_Name)FROMOrdersGROUPBYOrder_Id;
TheGROUP BY clause involves additional computation overhead, but this can beavoided by nesting repeated data. You can avoid using aGROUP BY clauseby creating a table with one order per row, where the order line items are in anested field:
| Order_Id | Item_Name |
|---|---|
| 001 | A1 B1 |
| 002 | A1 C1 |
In BigQuery, you typically specify a nested schema as anARRAYofSTRUCT objects. You use theUNNEST operatortoflatten the nested data,as shown in the following query:
SELECT*FROMUNNEST([STRUCT('001'ASOrder_Id,['A1','B1']ASItem_Name),STRUCT('002'ASOrder_Id,['A1','C1']ASItem_Name)]);
This query yields results similar to the following:

If this data wasn't nested, you could potentially have several rows foreach order, one for each item sold in that order, which would result in alarge table and an expensiveGROUP BY operation.
Exercise
You can see the performance difference in queries that use nested fields ascompared to those that don't by following the steps in this section.
Create a table based on the
bigquery-public-data.stackoverflow.commentspublic dataset:CREATEORREPLACETABLE`PROJECT.DATASET.stackoverflow`AS(SELECTuser_id,post_id,creation_dateFROM`bigquery-public-data.stackoverflow.comments`);
Using the
stackoverflowtable, run the following query to see theearliest comment for each user:SELECTuser_id,ARRAY_AGG(STRUCT(post_id,creation_dateASearliest_comment)ORDERBYcreation_dateASCLIMIT1)[OFFSET(0)].*FROM`PROJECT.DATASET.stackoverflow`GROUPBYuser_idORDERBYuser_idASC;
This query takes about 25 seconds to run and processes 1.88 GB of data.
Create a second table with identical data that creates a
commentsfieldusing aSTRUCTtype to store thepost_idandcreation_datedata, insteadof two individual fields:CREATEORREPLACETABLE`PROJECT.DATASET.stackoverflow_nested`AS(SELECTuser_id,ARRAY_AGG(STRUCT(post_id,creation_date)ORDERBYcreation_dateASC)AScommentsFROM`bigquery-public-data.stackoverflow.comments`GROUPBYuser_id);
Using the
stackoverflow_nestedtable, run the following queryto see the earliest comment for each user:SELECTuser_id,(SELECTASSTRUCTpost_id,creation_dateasearliest_commentFROMUNNEST(comments)ORDERBYcreation_dateASCLIMIT1).*FROM`PROJECT.DATASET.stackoverflow_nested`ORDERBYuser_idASC;
This query takes about 10 seconds to run and processes 1.28 GB of data.
Delete the
stackoverflowandstackoverflow_nestedtables when you are finished with them.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.