Specify nested and repeated columns in table schemas

This page describes how to define a table schema with nested and repeatedcolumns in BigQuery. For an overview of table schemas, seeSpecifying a schema.

Define nested and repeated columns

To create a column with nested data, set the data type of the column toRECORD in the schema. ARECORD can be accessed as aSTRUCT typein GoogleSQL. ASTRUCT is a container of ordered fields.

To create a column with repeated data, set themode of the column toREPEATED in the schema.A repeated field can be accessed as anARRAY type inGoogleSQL.

ARECORD column can haveREPEATED mode, which is represented as an array ofSTRUCT types. Also, a field within a record can be repeated, which isrepresented as aSTRUCT that contains anARRAY. An array cannot containanother array directly. For more information, seeDeclaring anARRAY type.

Limitations

Nested and repeated schemas are subject to the following limitations:

A schema cannot contain more than 15 levels of nestedRECORD types.
Columns of typeRECORD can contain nestedRECORD types, also calledchild records. The maximum nested depth limit is 15 levels. This limit isindependent of whether theRECORDs are scalar or array-based (repeated).

RECORD type is incompatible withUNION,INTERSECT,EXCEPT DISTINCT, andSELECT DISTINCT.

Example schema

The following example shows sample nested and repeated data. This table containsinformation about people. It consists of the following fields:

  • id
  • first_name
  • last_name
  • dob (date of birth)
  • addresses (a nested and repeated field)
    • addresses.status (current or previous)
    • addresses.address
    • addresses.city
    • addresses.state
    • addresses.zip
    • addresses.numberOfYears (years at the address)

The JSON data file would look like the following. Notice that the addressescolumn contains an array of values (indicated by[ ]). The multiple addressesin the array are the repeated data. The multiple fields within each address arethe nested data.

{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","addresses":[{"status":"current","address":"123 First Avenue","city":"Seattle","state":"WA","zip":"11111","numberOfYears":"1"},{"status":"previous","address":"456 Main Street","city":"Portland","state":"OR","zip":"22222","numberOfYears":"5"}]}{"id":"2","first_name":"Jane","last_name":"Doe","dob":"1980-10-16","addresses":[{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"},{"status":"previous","address":"321 Main Street","city":"Hoboken","state":"NJ","zip":"44444","numberOfYears":"3"}]}

The schema for this table looks like the following:

[{"name":"id","type":"STRING","mode":"NULLABLE"},{"name":"first_name","type":"STRING","mode":"NULLABLE"},{"name":"last_name","type":"STRING","mode":"NULLABLE"},{"name":"dob","type":"DATE","mode":"NULLABLE"},{"name":"addresses","type":"RECORD","mode":"REPEATED","fields":[{"name":"status","type":"STRING","mode":"NULLABLE"},{"name":"address","type":"STRING","mode":"NULLABLE"},{"name":"city","type":"STRING","mode":"NULLABLE"},{"name":"state","type":"STRING","mode":"NULLABLE"},{"name":"zip","type":"STRING","mode":"NULLABLE"},{"name":"numberOfYears","type":"STRING","mode":"NULLABLE"}]}]

Specifying the nested and repeated columns in the example

To create a new table with the previous nested and repeated columns, select oneof the following options:

Console

Specify the nested and repeatedaddresses column:

  1. In the Google Cloud console, open the BigQuery page.

    Go to BigQuery

  2. In the left pane, clickExplorer:

    Highlighted button for the Explorer pane.

    If you don't see the left pane, clickExpand left pane to open the pane.

  3. In theExplorer pane, expand your project, clickDatasets, andthen select a dataset.

  4. In the details pane, clickCreate table.

  5. On theCreate table page, specify the following details:

    • ForSource, in theCreate table from field, selectEmptytable.
    • In theDestination section, specify the following fields:

      • ForDataset, select the dataset in which you want to create thetable.
      • ForTable, enter the name of the table that you want to create.
    • ForSchema, clickAdd field and enter the following table schema:

      • ForField name, enteraddresses.
      • ForType, selectRECORD.
      • ForMode, chooseREPEATED.Addresses schema
      • Specify the following fields for a nested field:

        • In theField name field, enterstatus.
        • ForType, chooseSTRING.
        • ForMode, leave the value set toNULLABLE.
        • ClickAdd field to add the following fields:

          Field nameTypeMode
          addressSTRINGNULLABLE
          citySTRINGNULLABLE
          stateSTRINGNULLABLE
          zipSTRINGNULLABLE
          numberOfYearsSTRINGNULLABLE

        Alternatively, clickEdit as text and specify the schema as aJSON array.

SQL

Use theCREATE TABLE statement.Specify the schema using thecolumnoption:

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, enter the following statement:

    CREATETABLEIFNOTEXISTSmydataset.mytable(idSTRING,first_nameSTRING,last_nameSTRING,dobDATE,addressesARRAY<STRUCT<statusSTRING,addressSTRING,citySTRING,stateSTRING,zipSTRING,numberOfYearsSTRING>>)OPTIONS(description='Example name and addresses table');

  3. ClickRun.

For more information about how to run queries, seeRun an interactive query.

bq

To specify the nested and repeatedaddresses column in a JSON schema file,use a text editor to create a new file. Paste in the example schemadefinition shown above.

After you create your JSON schema file, you can provide it through thebq command-line tool. For more information, seeUsing a JSON schema file.

Go

Before trying this sample, follow theGo setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryGo API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

import("context""fmt""io""cloud.google.com/go/bigquery")// createTableComplexSchema demonstrates creating a BigQuery table and specifying a complex schema that includes// an array of Struct types.funccreateTableComplexSchema(wio.Writer,projectID,datasetID,tableIDstring)error{// projectID := "my-project-id"// datasetID := "mydatasetid"// tableID := "mytableid"ctx:=context.Background()client,err:=bigquery.NewClient(ctx,projectID)iferr!=nil{returnfmt.Errorf("bigquery.NewClient: %v",err)}deferclient.Close()sampleSchema:=bigquery.Schema{{Name:"id",Type:bigquery.StringFieldType},{Name:"first_name",Type:bigquery.StringFieldType},{Name:"last_name",Type:bigquery.StringFieldType},{Name:"dob",Type:bigquery.DateFieldType},{Name:"addresses",Type:bigquery.RecordFieldType,Repeated:true,Schema:bigquery.Schema{{Name:"status",Type:bigquery.StringFieldType},{Name:"address",Type:bigquery.StringFieldType},{Name:"city",Type:bigquery.StringFieldType},{Name:"state",Type:bigquery.StringFieldType},{Name:"zip",Type:bigquery.StringFieldType},{Name:"numberOfYears",Type:bigquery.StringFieldType},}},}metaData:=&bigquery.TableMetadata{Schema:sampleSchema,}tableRef:=client.Dataset(datasetID).Table(tableID)iferr:=tableRef.Create(ctx,metaData);err!=nil{returnerr}fmt.Fprintf(w,"created table %s\n",tableRef.FullyQualifiedName())returnnil}

Java

Before trying this sample, follow theJava setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryJava API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryException;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.Field;importcom.google.cloud.bigquery.Field.Mode;importcom.google.cloud.bigquery.Schema;importcom.google.cloud.bigquery.StandardSQLTypeName;importcom.google.cloud.bigquery.StandardTableDefinition;importcom.google.cloud.bigquery.TableDefinition;importcom.google.cloud.bigquery.TableId;importcom.google.cloud.bigquery.TableInfo;publicclassNestedRepeatedSchema{publicstaticvoidrunNestedRepeatedSchema(){// TODO(developer): Replace these variables before running the sample.StringdatasetName="MY_DATASET_NAME";StringtableName="MY_TABLE_NAME";createTableWithNestedRepeatedSchema(datasetName,tableName);}publicstaticvoidcreateTableWithNestedRepeatedSchema(StringdatasetName,StringtableName){try{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests.BigQuerybigquery=BigQueryOptions.getDefaultInstance().getService();TableIdtableId=TableId.of(datasetName,tableName);Schemaschema=Schema.of(Field.of("id",StandardSQLTypeName.STRING),Field.of("first_name",StandardSQLTypeName.STRING),Field.of("last_name",StandardSQLTypeName.STRING),Field.of("dob",StandardSQLTypeName.DATE),// create the nested and repeated fieldField.newBuilder("addresses",StandardSQLTypeName.STRUCT,Field.of("status",StandardSQLTypeName.STRING),Field.of("address",StandardSQLTypeName.STRING),Field.of("city",StandardSQLTypeName.STRING),Field.of("state",StandardSQLTypeName.STRING),Field.of("zip",StandardSQLTypeName.STRING),Field.of("numberOfYears",StandardSQLTypeName.STRING)).setMode(Mode.REPEATED).build());TableDefinitiontableDefinition=StandardTableDefinition.of(schema);TableInfotableInfo=TableInfo.newBuilder(tableId,tableDefinition).build();bigquery.create(tableInfo);System.out.println("Table with nested and repeated schema created successfully");}catch(BigQueryExceptione){System.out.println("Table was not created. \n"+e.toString());}}}

Node.js

Before trying this sample, follow theNode.js setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryNode.js API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

// Import the Google Cloud client library and create a clientconst{BigQuery}=require('@google-cloud/bigquery');constbigquery=newBigQuery();asyncfunctionnestedRepeatedSchema(){// Creates a new table named "my_table" in "my_dataset"// with nested and repeated columns in schema./**   * TODO(developer): Uncomment the following lines before running the sample.   */// const datasetId = "my_dataset";// const tableId = "my_table";// const schema = [//   {name: 'Name', type: 'STRING', mode: 'REQUIRED'},//   {//     name: 'Addresses',//     type: 'RECORD',//     mode: 'REPEATED',//     fields: [//       {name: 'Address', type: 'STRING'},//       {name: 'City', type: 'STRING'},//       {name: 'State', type: 'STRING'},//       {name: 'Zip', type: 'STRING'},//     ],//   },// ];// For all options, see https://cloud.google.com/bigquery/docs/reference/v2/tables#resourceconstoptions={schema:schema,location:'US',};// Create a new table in the datasetconst[table]=awaitbigquery.dataset(datasetId).createTable(tableId,options);console.log(`Table${table.id} created.`);}

Python

Before trying this sample, follow thePython setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryPython API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

fromgoogle.cloudimportbigqueryclient=bigquery.Client()# TODO(dev): Change table_id to the full name of the table you want to create.table_id="your-project.your_dataset.your_table_name"schema=[bigquery.SchemaField("id","STRING",mode="NULLABLE"),bigquery.SchemaField("first_name","STRING",mode="NULLABLE"),bigquery.SchemaField("last_name","STRING",mode="NULLABLE"),bigquery.SchemaField("dob","DATE",mode="NULLABLE"),bigquery.SchemaField("addresses","RECORD",mode="REPEATED",fields=[bigquery.SchemaField("status","STRING",mode="NULLABLE"),bigquery.SchemaField("address","STRING",mode="NULLABLE"),bigquery.SchemaField("city","STRING",mode="NULLABLE"),bigquery.SchemaField("state","STRING",mode="NULLABLE"),bigquery.SchemaField("zip","STRING",mode="NULLABLE"),bigquery.SchemaField("numberOfYears","STRING",mode="NULLABLE"),],),]table=bigquery.Table(table_id,schema=schema)table=client.create_table(table)# API requestprint(f"Created table{table.project}.{table.dataset_id}.{table.table_id}.")

Insert data in nested columns in the example

Use the following queries to insert nested data records into tables that haveRECORD data type columns.

Example 1

INSERTINTOmydataset.mytable(id,first_name,last_name,dob,addresses)values("1","Johnny","Dawn","1969-01-22",ARRAY<STRUCT<statusSTRING,addressSTRING,citySTRING,stateSTRING,zipSTRING,numberOfYearsSTRING>>[("current","123 First Avenue","Seattle","WA","11111","1")])

Example 2

INSERTINTOmydataset.mytable(id,first_name,last_name,dob,addresses)values("1","Johnny","Dawn","1969-01-22",[("current","123 First Avenue","Seattle","WA","11111","1")])

Query nested and repeated columns

To select the value of anARRAY at a specific position, use anarray subscriptoperator.To access elements in aSTRUCT, use thedot operator.The following example selects the first name, last name, and first addresslisted in theaddresses field:

SELECTfirst_name,last_name,addresses[offset(0)].addressFROMmydataset.mytable;

The result is the following:

+------------+-----------+------------------+| first_name | last_name | address          |+------------+-----------+------------------+| John       | Doe       | 123 First Avenue || Jane       | Doe       | 789 Any Avenue   |+------------+-----------+------------------+

To extract all elements of anARRAY, use theUNNEST operatorwith aCROSS JOIN.The following example selects the first name, last name, address, and state forall addresses not located in New York:

SELECTfirst_name,last_name,a.address,a.stateFROMmydataset.mytableCROSSJOINUNNEST(addresses)ASaWHEREa.state!='NY';

The result is the following:

+------------+-----------+------------------+-------+| first_name | last_name | address          | state |+------------+-----------+------------------+-------+| John       | Doe       | 123 First Avenue | WA    || John       | Doe       | 456 Main Street  | OR    || Jane       | Doe       | 321 Main Street  | NJ    |+------------+-----------+------------------+-------+

Modify nested and repeated columns

After you add a nested column or a nested and repeated column to a table'sschema definition, you can modify the column as you would any other type ofcolumn. BigQuery natively supports several schema changes such asadding a new nested field to a record or relaxing a nested field's mode. Formore information, seeModifying table schemas.

When to use nested and repeated columns

BigQuery performs best when your data is denormalized. Ratherthan preserving a relational schema such as a star or snowflake schema,denormalize your data and take advantage of nested and repeated columns. Nestedand repeated columns can maintain relationships without the performance impactof preserving a relational (normalized) schema.

For example, a relational database used to track library books would likely keepall author information in a separate table. A key such asauthor_id would beused to link the book to the authors.

In BigQuery, you can preserve the relationship between book andauthor without creating a separate author table. Instead, you create an authorcolumn, and you nest fields within it such as the author's first name, lastname, date of birth, and so on. If a book has multiple authors, you can make thenested author column repeated.

Suppose you have the following tablemydataset.books:

+------------------+------------+-----------+| title            | author_ids | num_pages |+------------------+------------+-----------+| Example Book One | [123, 789] | 487       || Example Book Two | [456]      | 89        |+------------------+------------+-----------+

You also have the following table,mydataset.authors, with completeinformation for each author ID:

+-----------+-------------+---------------+| author_id | author_name | date_of_birth |+-----------+-------------+---------------+| 123       | Alex        | 01-01-1960    || 456       | Rosario     | 01-01-1970    || 789       | Kim         | 01-01-1980    |+-----------+-------------+---------------+

If the tables are large, it might be resource intensive to join them regularly.Depending on your situation, it might be beneficial tocreate a single table that contains all the information:

CREATETABLEmydataset.denormalized_books(titleSTRING,authorsARRAY<STRUCT<idINT64,nameSTRING,date_of_birthSTRING>>,num_pagesINT64)AS(SELECTtitle,ARRAY_AGG(STRUCT(author_id,author_name,date_of_birth))ASauthors,ANY_VALUE(num_pages)FROMmydataset.books,UNNEST(author_ids)idJOINmydataset.authorsONid=author_idGROUPBYtitle);

The resulting table looks like the following:

+------------------+-------------------------------+-----------+| title            | authors                       | num_pages |+------------------+-------------------------------+-----------+| Example Book One | [{123, Alex, 01-01-1960},     | 487       ||                  |  {789, Kim, 01-01-1980}]      |           || Example Book Two | [{456, Rosario, 01-01-1970}]  | 89        |+------------------+-------------------------------+-----------+

BigQuery supports loading nested and repeated data from sourceformats that support object-based schemas, such as JSON files, Avro files,Firestore export files, and Datastore export files.

Deduplicate duplicate records in a table

The following query uses therow_number()function to identify duplicate records that have the same values forlast_name andfirst_name in the examples used and sorts them bydob:

CREATEORREPLACETABLEmydataset.mytableAS(SELECT*except(row_num)FROM(SELECT*,row_number()over(partitionbylast_name,first_nameorderbydob)row_numFROMmydataset.mytable)temp_tableWHERErow_num=1)

Table security

To control access to tables in BigQuery, seeControl access to resources with IAM.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.