GoogleSQL for Bigtableoverview
You can use GoogleSQL statements to query yourBigtable data. GoogleSQL is an ANSI-compliantstructured query language (SQL) that is also implemented for other Google Cloudservices such as BigQuery and Spanner.
This document provides an overview of GoogleSQL forBigtable. It provides examples of SQL queries that you can usewith Bigtable and describes how they relate to aBigtable table schema. Before you read this document, you shouldbe familiar with theBigtable storagemodel andschema designconcepts.
You can create and run queries inBigtableStudio in the Google Cloud console,or you can run them programmatically using theBigtable clientlibrary forJava,Python, orGo.For more information, seeUse SQL with a Bigtable client library.
SQL queries are handled by cluster nodes in the same way as NoSQL data requests.Therefore, the same best practices apply when creating SQL queries to runagainst your Bigtable data, such as avoiding full table scans orcomplex filters. For more information, seeReads andperformance.
You can't useData Boost withGoogleSQL for Bigtable.
Use cases
GoogleSQL for Bigtable is optimal for low-latencyapplication development. Additionally, running SQL queries in theGoogle Cloud console can be useful for quickly getting a visual representationof a table's schema, verifying that certain data was written, or debuggingpossible data issues.
The current release of GoogleSQL for Bigtabledoesn't support some common SQL constructs, including, but not limited to, thefollowing:
- Data Manipulation Language (DML) statements beyond
SELECT, such asINSERT,UPDATE, orDELETE - Data Definition Language (DDL) statements such as
CREATE,ALTER, orDROP - Data Access Control statements
- Query syntax for subqueries,
JOIN,UNION, andCTEs
For more information, including supported functions, operators, data types, andquery syntax, see theGoogleSQL for Bigtablereferencedocumentation.
Views
You can use GoogleSQL for Bigtable to create thefollowing resources:
- Continuous materialized view - apre-computed result of a continuously running SQL query, includingaggregated data, that synchronizes with its source table with incrementalupdates. This feature is inPreview.
- Logical view - a saved, namedquery that can be queried like a table.
To compare these types of views as well as authorized views, seeTables andviews.
Key concepts
This section discusses key concepts to be aware of when you useGoogleSQL to query your Bigtable data.
Column families in SQL responses
In Bigtable, a table contains one or more column families, whichare used to group columns. When you query a Bigtable table withGoogleSQL, the schema for the table consists of the following:
- A special column named
_keythat corresponds to row keys in the queriedtable - A single column for each Bigtable column family in the table,which contains the column family's data in that row
Map data type
GoogleSQL for Bigtable includes the data typeMAP<key, value>,which is designed specifically to accommodate column families.
By default, each row in a map column contains key-value pairs, where a key isthe Bigtable column qualifier in the queried table, and the valueis the most recent value for that column.
The following is an example of a SQL query returns a table with therow key value and the latest value of the qualifier from a map namedcolumnFamily.
SELECT_key,columnFamily['qualifier']FROMmyTableIf your Bigtable schema involves storing multiple cells – orversions of the data – in columns, you can add atemporalfilter, such aswith_history, to your SQL statement.
In this case, maps representing column families are nested and returned as anarray. In the array, each value is itself a map consisting of a timestamp as thekey and cell data as the value. The format isMAP<key, ARRAY<STRUCT<timestamp, value>>>.
The following example returns all cells in the 'info' column family for a singlerow.
SELECT_key,infoFROMusers(with_history=>TRUE)WHERE_key='user_123';The returned map looks like the following. In the queried table,info is thecolumn family,user_123 is the row key andcity andstate are the columnqualifiers. Each timestamp-value pair (STRUCT) in an array represents cells inthose columns in that row, and they are sorted by timestamp descending.
/*----------+------------------------------------------------------------------+ | _key | info | +----------+------------------------------------------------------------------+ | user_123 | {"city":{<t5>:"Brooklyn", <t0>:"New York"}, "state":{<t0>:"NY"}} | +----------+------------------------------------------------------------------*/Sparse tables
A key feature of Bigtable is its flexible data model. In aBigtable table, if a column is unused in a row, no data is storedfor the column. A row might have one column and the next row might have 100columns. In contrast, in a relational database table, all rows contain allcolumns, and aNULL value is typically stored in the column of a row that has nodata for that column.
When you query a Bigtable table with GoogleSQL,however, an unused column is represented with an empty map and is returned as aNULL value. TheseNULL values can be used as query predicates. For example, apredicate likeWHERE family['column1'] IS NOT NULL can be used to return a rowonly ifcolumn1 is used in the row.
Bytes
When you provide a string, GoogleSQL by default implicitly castsfromSTRING values toBYTES values. This means, for example, that you canprovide the string'qualifier' rather than the bytes sequenceb'qualifier'.
Because Bigtable by default treats all data as bytes, mostBigtable columns don't contain type information. However, withGoogleSQL you can define a schema at read time with theCASTfunction. For more information about casting, seeConversionfunctions.
Temporal filters
The following table lists the arguments that you can use when accessingtemporal elements of a table. Arguments are listed in the order in which theyare filtered. For example,with_history is applied beforelatest_n. You mustprovide a valid timestamp.
| Argument | Description |
|---|---|
as_of | Timestamp. Returns the latest values with timestamps lessthan or equal to the provided timestamp. |
with_history | Boolean. Controls whether to return the latest value as ascalar or timestamped values(s) asSTRUCT. |
after | Timestamp. Values with timestamps after the input, exclusive.Requireswith_history => TRUE. |
after_or_equal | Timestamp. Values with timestamps after the input,inclusive. Requireswith_history => TRUE. |
before | Timestamp. Values with timestamps before the input,exclusive. Requireswith_history => TRUE. |
latest_n | Integer. The number of timestamped values to return per column qualifier (map key). Must be greater than or equal to 1. Requireswith_history => TRUE. |
For more examples, seeAdvanced querypatterns.
Foundational queries
This section describes and shows examples of basic Bigtable SQLqueries and how they work. For additional sample queries, seeGoogleSQL for Bigtable query patternexamples.
Retrieve the latest version
Although Bigtable lets you store multiple versions of data ineach column, GoogleSQL for Bigtable by defaultreturns the latest version – the most recent cell – of the data for each row.
Consider the following sample dataset, which shows thatuser1 relocated twicein the state of NY and once within the city of Brooklyn. In this example,address is the column family, and the column qualifiers arestreet,city,andstate. Cells in a column are separated by empty lines.
| address | |||
|---|---|---|---|
| _key | street | city | state |
| user1 | 2023/01/10-14:10:01.000: '113 Xyz Street' 2021/12/20-09:44:31.010: '76 Xyz Street' 2005/03/01-11:12:15.112: '123 Abc Street' | 2021/12/20-09:44:31.010: 'Brooklyn' 2005/03/01-11:12:15.112: 'Queens' | 2005/03/01-11:12:15.112: 'NY' |
To retrieve thelatest version of each column foruser1, you can use aSELECT statement like the following.
SELECTaddress['street'],address['city']FROMmyTableWHERE_key='user1'The response contains the current address, which is a combination of the mostrecent street, city and state values (written at different times) printed asJSON. Timestamps are not included in the response.
| _key | address | ||
|---|---|---|---|
| user1 | {street:'113 Xyz Street', city:'Brooklyn', state: :'NY'} | ||
Retrieve all versions
To retrieve earlier versions (cells) of the data, use thewith_history flag. Youcan also alias columns and expressions, as illustrated in the following example.
SELECT_key,columnFamily['qualifier']AScol1FROMmyTable(with_history=>TRUE)To better understand the events leading to the current state of a row, you canretrieve the timestamps for each value by retrieving the full history. Forexample, to understand whenuser1 moved to their current address and wherethey moved from, you could run the following query:
SELECTaddress['street'][0].valueASmoved_to,address['street'][1].valueASmoved_from,FORMAT_TIMESTAMP('%Y-%m-%d',address['street'][0].timestamp)ASmoved_on,FROMmyTable(with_history=>TRUE)WHERE_key='user1'When you use thewith_history flag in your SQL query, the response isreturned asMAP<key, ARRAY<STRUCT<timestamp, value>>>. Each item in the arrayis a timestamped value for the specified row, column family, and column.Timestamps are ordered in reverse chronological order, so the latest data isalways the first item returned.
The query response is as follows.
| moved_to | moved_from | moved_on | ||
|---|---|---|---|---|
| 113 Xyz Street | 76 Xyz Street | 2023/01/10 | ||
You can also retrieve the number of versions in each row using array functionsas demonstrated in the following query:
SELECT_key,ARRAY_LENGTH(MAP_ENTRIES(address))ASversion_countFROMmyTable(with_history=>TRUE)Retrieve data from a specified time
Using anas_of filter lets you retrieve the state of a row as it was at acertain point in time. For example, if you want to know the address ofuser asof January 10, 2022 1:14 PM, you can run the following query.
SELECTaddressFROMmyTable(as_of=>TIMESTAMP('2022-01-10T13:14:00.234Z'))WHERE_key='user1'The result shows what would have been the last known address on January 10,2022 1:14 PM, which is the combination of street and city from the2021/12/20-09:44:31.010 update and the state from 2005/03/01-11:12:15.112.
| address | ||
|---|---|---|
| {street:'76 Xyz Street', city:'Brooklyn', state: :'NY'} |
The same result can also be achieved using Unix timestamps.
SELECTaddressFROMmyTable(as_of=>TIMESTAMP_FROM_UNIX_MILLIS(1641820440000))WHERE_key='user1'Consider the following dataset, which shows the on-or-off state of smoke andcarbon monoxide alarms. The column family isalarmType and the columnqualifiers aresmoke andcarbonMonoxide. Cells in each column are separatedby empty lines.
alarmType | ||
|---|---|---|
| _key | smoke | carbonMonoxide |
| building1#section1 | 2023/04/01-09:10:15.000: 'off' 2023/04/01-08:41:40.000: 'on' 2020/07/03-06:25:31.000: 'off' 2020/07/03-06:02:04.000: 'on' | 2023/04/01-09:22:08.000: 'off' 2023/04/01-08:53:12.000: 'on' |
| building1#section2 | 2021/03/11-07:15:04.000: 'off' 2021/03/11-07:00:25.000: 'on' | |
You can find sections ofbuilding1 where a smoke alarm was on at 9 AM on April1, 2023 and the status of the carbon monoxide alarm at the time using thefollowing query.
SELECT_keyASlocation,alarmType['carbonMonoxide']ASCO_sensorFROMalarms(as_of=>TIMESTAMP('2023-04-01T09:00:00.000Z'))WHERE_keyLIKE'building1%'andalarmType['smoke']='on'The result is the following:
| location | CO_sensor |
|---|---|
| building1#section1 | 'on' |
Query time series data
A common use case for Bigtable is the storage oftime series data.Consider the following sample dataset, which shows temperature and humidityreadings for weather sensors. The column family ID ismetrics and the columnqualifiers aretemperature andhumidity. Cells in a column are separated byempty lines, and each cell represents a timestamped sensor reading.
metrics | ||
|---|---|---|
| _key | temperature | humidity |
| sensorA#20230105 | 2023/01/05-02:00:00.000: 54 2023/01/05-01:00:00.000: 56 2023/01/05-00:00:00.000: 55 | 2023/01/05-02:00:00.000: 0.89 2023/01/05-01:00:00.000: 0.9 2023/01/05-00:00:00.000: 0.91 |
| sensorA#20230104 | 2023/01/04-23:00:00.000: 56 2023/01/04-22:00:00.000: 57 | 2023/01/04-23:00:00.000: 0.9 2023/01/04-22:00:00.000: 0.91 |
You can retrieve a specific range of timestamp values using the temporal filtersafter,before, orafter_or_equal. The following example usesafter:
SELECTmetrics['temperature']AStemp_versionedFROMsensorReadings(with_history=>true,after=>TIMESTAMP('2023-01-04T23:00:00.000Z'),before=>TIMESTAMP('2023-01-05T01:00:00.000Z'))WHERE_keyLIKE'sensorA%'The query returns the data in this format:
| temp_versioned |
|---|
| [{timestamp: '2023/01/05-01:00:00.000', value:56} {timestamp: '2023/01/05-00:00:00.000', value: 55}] |
| [{timestamp: '2023/01/04-23:00:00.000', value:56}] |
UNPACK time series data
When you're analyzing time series data, it's often preferable to work with thedata in a tabular format. The BigtableUNPACK function can help.
UNPACK is a Bigtable table value function (TVF) that returns anentire output table instead of a single scalar value, and appears in theFROMclause like a table subquery. TheUNPACK TVF expands each timestamped valueinto multiple rows – one per timestamp – and moves the timestamp into the_timestamp column.
The input toUNPACK is a subquery wherewith_history => true.
The output is an expanded table with a_timestamp column in each row.
An input column familyMAP<key, ARRAY<STRUCT<timestamp, value>>>unpacks toMAP<key, value>, and a column qualifierARRAY<STRUCT<timestamp, value>>>unpacks tovalue. Other input column types are unchanged. The columns must beselected in the subquery in order to be unpacked and selected. The new_timestamp column doesn't need to be selected in order for the timestamps tobe unpacked.
Expanding on the time series example inQuery time series data,and using the query in that section as the input, yourUNPACK query isformatted like this:
SELECTtemp_versioned,_timestampFROMUNPACK((SELECTmetrics['temperature']AStemperature_versionedFROMsensorReadings(with_history=>true,after=>TIMESTAMP('2023-01-04T23:00:00.000Z'),before=>TIMESTAMP('2023-01-05T01:00:00.000Z'))WHERE_keyLIKE'sensorA%'));The query returns the data in this format:
temp_versioned | _timestamp |
|---|---|
55 | 1672898400 |
55 | 1672894800 |
56 | 1672891200 |
Query JSON
JSON functions let you manipulate JSON stored as Bigtable valuesfor operational workloads.
For example, you can retrieve the value for the JSON elementabc from thelatest cell in thesession column family along with the row key by using thefollowing query.
SELECT_key,JSON_VALUE(session['payload'],'$.abc')ASabcFROManalyticsEscape special characters and reserved words
Bigtable offers high flexibility in naming tables and columns.As a result, in your SQL queries, your table names might need to be escaped dueto special characters or reserved words.
For example the following query is not valid SQL due to the period in the tablename.
-- ERROR: Table name format not supportedSELECT*FROMmy.tableWHERE_key='r1'However, you can resolve this issue by enclosing the items with backtick(`) characters.
SELECT*FROM`my.table`WHERE_key='r1'If a SQL reserved keyword is used as an identifier, it can similarly beescaped.
SELECT*FROM`select`WHERE_key='r1'Use SQL with a Bigtable client library
The Bigtable client libraries for Java, Python, and Go supportquerying data with SQL using theexecuteQuery API. The following examples showhow to issue a query and access the data:
Go
To use this feature you must usecloud.google.com/go/bigtable version 1.36.0or later. For more information on usage seePrepareStatement,Bind,Execute,andResultRowdocumentation.
import("cloud.google.com/go/bigtable")funcquery(client*bigtable.Client){// Prepare once for queries that will be run multiple times, and reuse// the PreparedStatement for each request. Use query parameters to// construct PreparedStatements that can be reused.ps,err:=client.PrepareStatement("SELECT cf1['bytesCol'] AS bytesCol, CAST(cf2['stringCol'] AS STRING) AS stringCol, cf3 FROM myTable WHERE _key=@keyParam",map[string]SQLType{"keyParam":BytesSQLType{},})iferr!=nil{log.Fatalf("Failed to create PreparedStatement: %v",err)}// For each request, create a BoundStatement with your query parameters set.bs,err:=ps.Bind(map[string]any{"keyParam":[]byte("mykey")})iferr!=nil{log.Fatalf("Failed to bind parameters: %v",err)}err=bs.Execute(ctx,func(rrResultRow)bool{varbyteValue[]byteerr:=rr.GetByName("bytesCol",&byteValue)iferr!=nil{log.Fatalf("Failed to access bytesCol: %v",err)}varstringValuestringerr=rr.GetByName("stringCol",&stringValue)iferr!=nil{log.Fatalf("Failed to access stringCol: %v",err)}// Note that column family maps have byte valued keys. Go maps don't support// byte[] keys, so the map will have Base64 encoded string keys.varcf3map[string][]byteerr=rr.GetByName("cf3",&cf3)iferr!=nil{log.Fatalf("Failed to access cf3: %v",err)}// Do something with the data// ...returntrue})}Java
To use this feature you must usejava-bigtable version 2.57.3 or later. Formore information on usage, seeprepareStatement,executeQuery,BoundStatement,andResultSetin the Javadoc.
staticvoidquery(BigtableDataClientclient){// Prepare once for queries that will be run multiple times, and reuse// the PreparedStatement for each request. Use query parameters to// construct PreparedStatements that can be reused.PreparedStatementpreparedStatement=client.prepareStatement("SELECT cf1['bytesCol'] AS bytesCol, CAST(cf2['stringCol'] AS STRING) AS stringCol, cf3 FROM myTable WHERE _key=@keyParam",// For queries with parameters, set the parameter names and types here.Map.of("keyParam",SqlType.bytes()));// For each request, create a BoundStatement with your query parameters set.BoundStatementboundStatement=preparedStatement.bind().setBytesParam("keyParam",ByteString.copyFromUtf8("mykey")).build();try(ResultSetresultSet=client.executeQuery(boundStatement)){while(resultSet.next()){ByteStringbyteValue=resultSet.getBytes("bytesCol");StringstringValue=resultSet.getString("stringCol");Map<ByteString,ByteString>cf3Value=resultSet.getMap("cf3",SqlType.mapOf(SqlType.bytes(),SqlType.bytes()));// Do something with the data.}}}Python asyncio
To use this feature, you must usepython-bigtable version 2.30.1 or later.
fromgoogle.cloud.bigtable.dataimportBigtableDataClientAsyncasyncdefexecute_query(project_id,instance_id,table_id):asyncwithBigtableDataClientAsync(project=project_id)asclient:query=("SELECT cf1['bytesCol'] AS bytesCol, CAST(cf2['stringCol'] AS STRING) AS stringCol,"" cf3 FROM{table_id} WHERE _key='mykey'")asyncforrowinawaitclient.execute_query(query,instance_id):print(row["_key"],row["bytesCol"],row["stringCol"],row["cf3"])SELECT * usage
SELECT * queries can experience temporary errors when a column family is addedor deleted from the queried table. For this reason, for production workloads, werecommend that you specify all column family IDs in your query, rather thanusingSELECT *. For example, useSELECT cf1, cf2, cf3 instead ofSELECT *.
What's next
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.