GoogleSQL for Bigtableoverview

You can use GoogleSQL statements to query yourBigtable data. GoogleSQL is an ANSI-compliantstructured query language (SQL) that is also implemented for other Google Cloudservices such as BigQuery and Spanner.

This document provides an overview of GoogleSQL forBigtable. It provides examples of SQL queries that you can usewith Bigtable and describes how they relate to aBigtable table schema. Before you read this document, you shouldbe familiar with theBigtable storagemodel andschema designconcepts.

You can create and run queries inBigtableStudio in the Google Cloud console,or you can run them programmatically using theBigtable clientlibrary forJava,Python, orGo.For more information, seeUse SQL with a Bigtable client library.

SQL queries are handled by cluster nodes in the same way as NoSQL data requests.Therefore, the same best practices apply when creating SQL queries to runagainst your Bigtable data, such as avoiding full table scans orcomplex filters. For more information, seeReads andperformance.

You can't useData Boost withGoogleSQL for Bigtable.

Use cases

GoogleSQL for Bigtable is optimal for low-latencyapplication development. Additionally, running SQL queries in theGoogle Cloud console can be useful for quickly getting a visual representationof a table's schema, verifying that certain data was written, or debuggingpossible data issues.

The current release of GoogleSQL for Bigtabledoesn't support some common SQL constructs, including, but not limited to, thefollowing:

Data Manipulation Language (DML) statements beyondSELECT, such asINSERT,UPDATE, orDELETE
Data Definition Language (DDL) statements such asCREATE,ALTER, orDROP
Data Access Control statements
Query syntax for subqueries,JOIN,UNION, andCTEs

For more information, including supported functions, operators, data types, andquery syntax, see theGoogleSQL for Bigtablereferencedocumentation.

Views

You can use GoogleSQL for Bigtable to create thefollowing resources:

Continuous materialized view - apre-computed result of a continuously running SQL query, includingaggregated data, that synchronizes with its source table with incrementalupdates. This feature is inPreview.
Logical view - a saved, namedquery that can be queried like a table.

To compare these types of views as well as authorized views, seeTables andviews.

Key concepts

This section discusses key concepts to be aware of when you useGoogleSQL to query your Bigtable data.

Column families in SQL responses

In Bigtable, a table contains one or more column families, whichare used to group columns. When you query a Bigtable table withGoogleSQL, the schema for the table consists of the following:

A special column named_key that corresponds to row keys in the queriedtable
A single column for each Bigtable column family in the table,which contains the column family's data in that row

Map data type

GoogleSQL for Bigtable includes the data typeMAP<key, value>,which is designed specifically to accommodate column families.

By default, each row in a map column contains key-value pairs, where a key isthe Bigtable column qualifier in the queried table, and the valueis the most recent value for that column.

The following is an example of a SQL query returns a table with therow key value and the latest value of the qualifier from a map namedcolumnFamily.

SELECT_key,columnFamily['qualifier']FROMmyTable

If your Bigtable schema involves storing multiple cells – orversions of the data – in columns, you can add atemporalfilter, such aswith_history, to your SQL statement.

In this case, maps representing column families are nested and returned as anarray. In the array, each value is itself a map consisting of a timestamp as thekey and cell data as the value. The format isMAP<key, ARRAY<STRUCT<timestamp, value>>>.

The following example returns all cells in the 'info' column family for a singlerow.

SELECT_key,infoFROMusers(with_history=>TRUE)WHERE_key='user_123';

The returned map looks like the following. In the queried table,info is thecolumn family,user_123 is the row key andcity andstate are the columnqualifiers. Each timestamp-value pair (STRUCT) in an array represents cells inthose columns in that row, and they are sorted by timestamp descending.

/*----------+------------------------------------------------------------------+ |   _key   |                              info                                | +----------+------------------------------------------------------------------+ | user_123 | {"city":{<t5>:"Brooklyn", <t0>:"New York"}, "state":{<t0>:"NY"}} | +----------+------------------------------------------------------------------*/

Sparse tables

A key feature of Bigtable is its flexible data model. In aBigtable table, if a column is unused in a row, no data is storedfor the column. A row might have one column and the next row might have 100columns. In contrast, in a relational database table, all rows contain allcolumns, and aNULL value is typically stored in the column of a row that has nodata for that column.

When you query a Bigtable table with GoogleSQL,however, an unused column is represented with an empty map and is returned as aNULL value. TheseNULL values can be used as query predicates. For example, apredicate likeWHERE family['column1'] IS NOT NULL can be used to return a rowonly ifcolumn1 is used in the row.

Bytes

When you provide a string, GoogleSQL by default implicitly castsfromSTRING values toBYTES values. This means, for example, that you canprovide the string'qualifier' rather than the bytes sequenceb'qualifier'.

Because Bigtable by default treats all data as bytes, mostBigtable columns don't contain type information. However, withGoogleSQL you can define a schema at read time with theCASTfunction. For more information about casting, seeConversionfunctions.

Temporal filters

The following table lists the arguments that you can use when accessingtemporal elements of a table. Arguments are listed in the order in which theyare filtered. For example,with_history is applied beforelatest_n. You mustprovide a valid timestamp.

Argument	Description
`as_of`	Timestamp. Returns the latest values with timestamps lessthan or equal to the provided timestamp.
`with_history`	Boolean. Controls whether to return the latest value as ascalar or timestamped values(s) as`STRUCT`.
`after`	Timestamp. Values with timestamps after the input, exclusive.Requires`with_history => TRUE`.
`after_or_equal`	Timestamp. Values with timestamps after the input,inclusive. Requires`with_history => TRUE`.
`before`	Timestamp. Values with timestamps before the input,exclusive. Requires`with_history => TRUE`.
`latest_n`	Integer. The number of timestamped values to return per column qualifier (map key). Must be greater than or equal to 1. Requires`with_history => TRUE`.

For more examples, seeAdvanced querypatterns.

Foundational queries

This section describes and shows examples of basic Bigtable SQLqueries and how they work. For additional sample queries, seeGoogleSQL for Bigtable query patternexamples.

Retrieve the latest version

Although Bigtable lets you store multiple versions of data ineach column, GoogleSQL for Bigtable by defaultreturns the latest version – the most recent cell – of the data for each row.

Consider the following sample dataset, which shows thatuser1 relocated twicein the state of NY and once within the city of Brooklyn. In this example,address is the column family, and the column qualifiers arestreet,city,andstate. Cells in a column are separated by empty lines.

		address
_key	street	city	state
user1	2023/01/10-14:10:01.000: '113 Xyz Street' 2021/12/20-09:44:31.010: '76 Xyz Street' 2005/03/01-11:12:15.112: '123 Abc Street'	2021/12/20-09:44:31.010: 'Brooklyn' 2005/03/01-11:12:15.112: 'Queens'	2005/03/01-11:12:15.112: 'NY'

To retrieve thelatest version of each column foruser1, you can use aSELECT statement like the following.

SELECTaddress['street'],address['city']FROMmyTableWHERE_key='user1'

The response contains the current address, which is a combination of the mostrecent street, city and state values (written at different times) printed asJSON. Timestamps are not included in the response.

_key	address
user1	{street:'113 Xyz Street', city:'Brooklyn', state: :'NY'}

Retrieve all versions

To retrieve earlier versions (cells) of the data, use thewith_history flag. Youcan also alias columns and expressions, as illustrated in the following example.

SELECT_key,columnFamily['qualifier']AScol1FROMmyTable(with_history=>TRUE)

To better understand the events leading to the current state of a row, you canretrieve the timestamps for each value by retrieving the full history. Forexample, to understand whenuser1 moved to their current address and wherethey moved from, you could run the following query:

SELECTaddress['street'][0].valueASmoved_to,address['street'][1].valueASmoved_from,FORMAT_TIMESTAMP('%Y-%m-%d',address['street'][0].timestamp)ASmoved_on,FROMmyTable(with_history=>TRUE)WHERE_key='user1'

When you use thewith_history flag in your SQL query, the response isreturned asMAP<key, ARRAY<STRUCT<timestamp, value>>>. Each item in the arrayis a timestamped value for the specified row, column family, and column.Timestamps are ordered in reverse chronological order, so the latest data isalways the first item returned.

Note: Garbage collection policies control the expiration time or number ofversions retained for a column family. For more information, see the Garbage collection overview.

The query response is as follows.

moved_to			moved_from	moved_on
113 Xyz Street			76 Xyz Street	2023/01/10

You can also retrieve the number of versions in each row using array functionsas demonstrated in the following query:

SELECT_key,ARRAY_LENGTH(MAP_ENTRIES(address))ASversion_countFROMmyTable(with_history=>TRUE)

Retrieve data from a specified time

Using anas_of filter lets you retrieve the state of a row as it was at acertain point in time. For example, if you want to know the address ofuser asof January 10, 2022 1:14 PM, you can run the following query.

SELECTaddressFROMmyTable(as_of=>TIMESTAMP('2022-01-10T13:14:00.234Z'))WHERE_key='user1'

The result shows what would have been the last known address on January 10,2022 1:14 PM, which is the combination of street and city from the2021/12/20-09:44:31.010 update and the state from 2005/03/01-11:12:15.112.

address
{street:'76 Xyz Street', city:'Brooklyn', state: :'NY'}

The same result can also be achieved using Unix timestamps.

SELECTaddressFROMmyTable(as_of=>TIMESTAMP_FROM_UNIX_MILLIS(1641820440000))WHERE_key='user1'

Consider the following dataset, which shows the on-or-off state of smoke andcarbon monoxide alarms. The column family isalarmType and the columnqualifiers aresmoke andcarbonMonoxide. Cells in each column are separatedby empty lines.

	alarmType
_key	smoke	carbonMonoxide
building1#section1	2023/04/01-09:10:15.000: 'off' 2023/04/01-08:41:40.000: 'on' 2020/07/03-06:25:31.000: 'off' 2020/07/03-06:02:04.000: 'on'	2023/04/01-09:22:08.000: 'off' 2023/04/01-08:53:12.000: 'on'


building1#section2	2021/03/11-07:15:04.000: 'off' 2021/03/11-07:00:25.000: 'on'

You can find sections ofbuilding1 where a smoke alarm was on at 9 AM on April1, 2023 and the status of the carbon monoxide alarm at the time using thefollowing query.

SELECT_keyASlocation,alarmType['carbonMonoxide']ASCO_sensorFROMalarms(as_of=>TIMESTAMP('2023-04-01T09:00:00.000Z'))WHERE_keyLIKE'building1%'andalarmType['smoke']='on'

The result is the following:

location	CO_sensor
building1#section1	'on'

Query time series data

A common use case for Bigtable is the storage oftime series data.Consider the following sample dataset, which shows temperature and humidityreadings for weather sensors. The column family ID ismetrics and the columnqualifiers aretemperature andhumidity. Cells in a column are separated byempty lines, and each cell represents a timestamped sensor reading.

	metrics
_key	temperature	humidity
sensorA#20230105	2023/01/05-02:00:00.000: 54 2023/01/05-01:00:00.000: 56 2023/01/05-00:00:00.000: 55	2023/01/05-02:00:00.000: 0.89 2023/01/05-01:00:00.000: 0.9 2023/01/05-00:00:00.000: 0.91


sensorA#20230104	2023/01/04-23:00:00.000: 56 2023/01/04-22:00:00.000: 57	2023/01/04-23:00:00.000: 0.9 2023/01/04-22:00:00.000: 0.91

You can retrieve a specific range of timestamp values using the temporal filtersafter,before, orafter_or_equal. The following example usesafter:

SELECTmetrics['temperature']AStemp_versionedFROMsensorReadings(with_history=>true,after=>TIMESTAMP('2023-01-04T23:00:00.000Z'),before=>TIMESTAMP('2023-01-05T01:00:00.000Z'))WHERE_keyLIKE'sensorA%'

The query returns the data in this format:

temp_versioned
[{timestamp: '2023/01/05-01:00:00.000', value:56} {timestamp: '2023/01/05-00:00:00.000', value: 55}]


[{timestamp: '2023/01/04-23:00:00.000', value:56}]

`UNPACK` time series data

When you're analyzing time series data, it's often preferable to work with thedata in a tabular format. The BigtableUNPACK function can help.

UNPACK is a Bigtable table value function (TVF) that returns anentire output table instead of a single scalar value, and appears in theFROMclause like a table subquery. TheUNPACK TVF expands each timestamped valueinto multiple rows – one per timestamp – and moves the timestamp into the_timestamp column.

The input toUNPACK is a subquery wherewith_history => true.

The output is an expanded table with a_timestamp column in each row.

An input column familyMAP<key, ARRAY<STRUCT<timestamp, value>>>unpacks toMAP<key, value>, and a column qualifierARRAY<STRUCT<timestamp, value>>>unpacks tovalue. Other input column types are unchanged. The columns must beselected in the subquery in order to be unpacked and selected. The new_timestamp column doesn't need to be selected in order for the timestamps tobe unpacked.

Expanding on the time series example inQuery time series data,and using the query in that section as the input, yourUNPACK query isformatted like this:

SELECTtemp_versioned,_timestampFROMUNPACK((SELECTmetrics['temperature']AStemperature_versionedFROMsensorReadings(with_history=>true,after=>TIMESTAMP('2023-01-04T23:00:00.000Z'),before=>TIMESTAMP('2023-01-05T01:00:00.000Z'))WHERE_keyLIKE'sensorA%'));

The query returns the data in this format:

temp_versioned	_timestamp
55	1672898400
55	1672894800
56	1672891200

Query JSON

JSON functions let you manipulate JSON stored as Bigtable valuesfor operational workloads.

For example, you can retrieve the value for the JSON elementabc from thelatest cell in thesession column family along with the row key by using thefollowing query.

SELECT_key,JSON_VALUE(session['payload'],'$.abc')ASabcFROManalytics

Escape special characters and reserved words

Bigtable offers high flexibility in naming tables and columns.As a result, in your SQL queries, your table names might need to be escaped dueto special characters or reserved words.

For example the following query is not valid SQL due to the period in the tablename.

-- ERROR: Table name format not supportedSELECT*FROMmy.tableWHERE_key='r1'

However, you can resolve this issue by enclosing the items with backtick(`) characters.

SELECT*FROM`my.table`WHERE_key='r1'

If a SQL reserved keyword is used as an identifier, it can similarly beescaped.

SELECT*FROM`select`WHERE_key='r1'

Use SQL with a Bigtable client library

The Bigtable client libraries for Java, Python, and Go supportquerying data with SQL using theexecuteQuery API. The following examples showhow to issue a query and access the data:

Go

To use this feature you must usecloud.google.com/go/bigtable version 1.36.0or later. For more information on usage seePrepareStatement,Bind,Execute,andResultRowdocumentation.

import("cloud.google.com/go/bigtable")funcquery(client*bigtable.Client){// Prepare once for queries that will be run multiple times, and reuse// the PreparedStatement for each request. Use query parameters to// construct PreparedStatements that can be reused.ps,err:=client.PrepareStatement("SELECT cf1['bytesCol'] AS bytesCol, CAST(cf2['stringCol'] AS STRING) AS stringCol, cf3 FROM myTable WHERE _key=@keyParam",map[string]SQLType{"keyParam":BytesSQLType{},})iferr!=nil{log.Fatalf("Failed to create PreparedStatement: %v",err)}// For each request, create a BoundStatement with your query parameters set.bs,err:=ps.Bind(map[string]any{"keyParam":[]byte("mykey")})iferr!=nil{log.Fatalf("Failed to bind parameters: %v",err)}err=bs.Execute(ctx,func(rrResultRow)bool{varbyteValue[]byteerr:=rr.GetByName("bytesCol",&byteValue)iferr!=nil{log.Fatalf("Failed to access bytesCol: %v",err)}varstringValuestringerr=rr.GetByName("stringCol",&stringValue)iferr!=nil{log.Fatalf("Failed to access stringCol: %v",err)}// Note that column family maps have byte valued keys. Go maps don't support// byte[] keys, so the map will have Base64 encoded string keys.varcf3map[string][]byteerr=rr.GetByName("cf3",&cf3)iferr!=nil{log.Fatalf("Failed to access cf3: %v",err)}// Do something with the data// ...returntrue})}

Java

To use this feature you must usejava-bigtable version 2.57.3 or later. Formore information on usage, seeprepareStatement,executeQuery,BoundStatement,andResultSetin the Javadoc.

staticvoidquery(BigtableDataClientclient){// Prepare once for queries that will be run multiple times, and reuse// the PreparedStatement for each request. Use query parameters to// construct PreparedStatements that can be reused.PreparedStatementpreparedStatement=client.prepareStatement("SELECT cf1['bytesCol'] AS bytesCol, CAST(cf2['stringCol'] AS STRING) AS stringCol, cf3 FROM myTable WHERE _key=@keyParam",// For queries with parameters, set the parameter names and types here.Map.of("keyParam",SqlType.bytes()));// For each request, create a BoundStatement with your query parameters set.BoundStatementboundStatement=preparedStatement.bind().setBytesParam("keyParam",ByteString.copyFromUtf8("mykey")).build();try(ResultSetresultSet=client.executeQuery(boundStatement)){while(resultSet.next()){ByteStringbyteValue=resultSet.getBytes("bytesCol");StringstringValue=resultSet.getString("stringCol");Map<ByteString,ByteString>cf3Value=resultSet.getMap("cf3",SqlType.mapOf(SqlType.bytes(),SqlType.bytes()));// Do something with the data.}}}

Python asyncio

To use this feature, you must usepython-bigtable version 2.30.1 or later.

fromgoogle.cloud.bigtable.dataimportBigtableDataClientAsyncasyncdefexecute_query(project_id,instance_id,table_id):asyncwithBigtableDataClientAsync(project=project_id)asclient:query=("SELECT cf1['bytesCol'] AS bytesCol, CAST(cf2['stringCol'] AS STRING) AS stringCol,"" cf3 FROM{table_id} WHERE _key='mykey'")asyncforrowinawaitclient.execute_query(query,instance_id):print(row["_key"],row["bytesCol"],row["stringCol"],row["cf3"])

`SELECT *` usage

SELECT * queries can experience temporary errors when a column family is addedor deleted from the queried table. For this reason, for production workloads, werecommend that you specify all column family IDs in your query, rather thanusingSELECT *. For example, useSELECT cf1, cf2, cf3 instead ofSELECT *.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.

Movatterモバイル変換