Movatterモバイル変換


[0]ホーム

URL:


DOCS >REPLICATION >LOADING >UNDERSTANDING LOADING FOR NESTED DATA STRUCTURES IN GOOGLE BIGQUERY DESTINATIONS

Get started with a free trial today

Already have an account? Sign in

Understanding Loading for Nested Data Structures in Google BigQuery Destinations

Note: This guide is applicable only to Google BigQuery v2 destinations.

Google BigQuery supports nested records within tables,whether it’s a single record or repeated values.

Unlike the conventional method todenormalization, in Google BigQuery records are expressed using nested and repeated fields.

Instead of flattening attributes into a table, this approach localizes a record’s subattributes into a single table. Maintaining nested records removes the need for repeating data, creating additional subtables, or using joins during analysis.

For example: Below is a record from a table namedpeople. In this table, each person can only have a singletype, but they might have multiplefriends:

{"id":1,"name":"Finn","details":{"type":"human","has_magic":false},"friends":[{"id":2,"name":"Jake"},{"id":3,"name":"Bubblegum"},{"id":4,"name":"BMO"}]}

In this guide, we’ll cover how this data will be loaded into Google BigQuery, including:


Storing nested maps (JSON objects)

A nested map is also called anobject in JSON. An object is surrounded by curly braces ({ }) and contains a series of key/value pairs. Keys are strings enclosed in double quotes (" ").

For example: This record contains adetails object, which containstype andhas_magic keys:

{"id":1,"name":"Finn","details":{"type":"human","has_magic":false}}

When records containing objects are loaded into Google BigQuery, the object is loaded using theRECORD type and a mode ofNULLABLE.

For example: The above record would create this table schema in Google BigQuery:

Field nameTypeMode
idINTEGERNULLABLE
nameSTRINGNULLABLE
detailsRECORDNULLABLE
details.typeSTRINGNULLABLE
details.has_magicBOOLEANNULLABLE

And the data in the table would be similar to the following:

idnamedetails.typedetails.has_magic
1Finnhumanfalse

To query nested data using thestandard SQL syntax, you can use dot notation to indicate the field(s) you want to reference. For example: The sample query below will return theid,name, anddetails.type fields:

SELECTid,name,details.typeFROMpeople+----+------+--------------+|id|name|details.type|+----+------+--------------+|1|Finn|human|+----+------+--------------+

Storing nested records (JSON arrays)

A nested record is also called anarray in JSON. An array is surrounded by square brackets ([ ]) and contains an ordered list of values. Values can be strings, numbers, booleans, objects, nulls, or more arrays.

When records containing arrays are loaded into Google BigQuery, the array is loaded using theRECORD type and a mode ofREPEATED. By using theREPEATED mode to store nested records, Stitch can avoid repeating data or creating additional subtables. This functionality removes the need for joins when analyzing data, making raw data easier to read and faster to compute.

For items in the array, Stitch will handle each item like an object field. Items will be loaded using theRECORD type, a mode ofNULLABLE, and a field name ofvalue.

How the array is structured determines how the data it contains will be loaded into Google BigQuery. In this section, we’ll cover some examples and demonstrate how the source data for each one would be loaded into Google BigQuery:

Refer toGoogle’s documentation for more info on querying nested records.

Array of strings or numbers

In this example, the record contains two arrays:friends, an array of strings, andfriend_ids, an array of integers:

{"id":1,"name":"Finn","friends":["Jake","Bubblegum","BMO"],"friend_ids":[2,3,4]}

The above record would create this table schema in Google BigQuery:

Field nameTypeMode
idINTEGERNULLABLE
nameSTRINGNULLABLE
friendsRECORDREPEATED
friends.valueSTRINGNULLABLE
friend_idsRECORDREPEATED
friend_ids.valueINTEGERNULLABLE

And the data in the table would be similar to the following:

idnamefriends.valuefriend_ids.value
1FinnJake2
Bubblegum3
BMO4

Array of objects

In this example, the record contains a single array namedfriends, which contains a series of objects:

{"id":1,"name":"Finn","friends":[{"id":2,"name":"Jake"},{"id":3,"name":"Bubblegum"},{"id":4,"name":"BMO"}]}

The above record would create this table schema in Google BigQuery:

Field nameTypeMode
idINTEGERNULLABLE
nameSTRINGNULLABLE
friendsRECORDREPEATED
friends.valueRECORDNULLABLE
friends.value.idINTEGERNULLABLE
friends.value.nameSTRINGNULLABLE

And the data in the table would be similar to the following:

idnamefriends.value.idfriends.value.name
1Finn2Jake
3Bubblegum
4BMO

Array of arrays

In this example, the record contains an array (friend_ids) which contains a series of arrays:

The above record would create this table schema in Google BigQuery:

Field nameTypeMode
idINTEGERNULLABLE
nameSTRINGNULLABLE
friend_idsRECORDREPEATED
friend_ids.valueRECORDREPEATED
friend_ids.value.valueINTEGERNULLABLE

And the data in the table would be similar to the following:

idnamefriend_ids.value.value
1Finn2
3
4
5

Array containing multiple data types

In this example, the record contains a single array namedfriend_ids. Notice that the first two values in the array are strings (ex:"2" versus2), and the last value is an integer (ex:4 versus"4"):

To accommodate the multiple data types, Stitch will create additionalvalue columns, one for each data type, and append a data type suffix to the name of each additional column.

In this example, thefriend_ids.value column will store allINTEGER data, and Stitch will create an additionalfriend_ids.value__st column to store allSTRING data:

Field nameTypeMode
idINTEGERNULLABLE
nameSTRINGNULLABLE
friend_idsRECORDREPEATED
friend_ids.valueINTEGERNULLABLE
friend_ids.value__stSTRINGNULLABLE

And the data in the table would be similar to the following:

idnamefriend_ids.valuefriend_ids.value__st
1Finn2
3
4

Array of nested arrays

In this example, the record contains an array (friend_ids) which contains a series of nested arrays:

{"id":1,"name":"Finn","friend_ids":[[[2,3]],[[4,5]]]}

The above record would create this table schema in Google BigQuery:

Field nameTypeMode
idINTEGERNULLABLE
nameSTRINGNULLABLE
friend_idsRECORDREPEATED
friend_ids.valueRECORDREPEATED
friend_ids.value.valueRECORDREPEATED
friend_ids.value.value.valueINTEGERNULLABLE

And the data in the table would be similar to the following:

idnamefriend_ids.value.value.value
1Finn2
3
4
5

Resources


RelatedTroubleshooting

Questions? Feedback?

Did this article help? If you have questions or feedback, feel free tosubmit a pull request with your suggestions,open an issue on GitHub, orreach out to us.


[8]ページ先頭

©2009-2026 Movatter.jp