Bigtable overview

Bigtable is a sparsely populated table that can scale to billions ofrows and thousands of columns, enabling you to store terabytes or even petabytesof data. A single value in each row is indexed; this value is known as the rowkey. Bigtable is ideal for storing large amounts of single-keyeddata with low latency. It supports high read and write throughput at lowlatency, and it's an ideal data source for MapReduce operations.

Bigtable is exposed to applications through multiple clientlibraries, including a supported extension to theApache HBase library forJava. As a result, it integrates with the existing Apache ecosystem ofopen source big data software.

Bigtable's powerful backend servers offer several key advantagesover a self-managed HBase installation:

  • Incredible scalability. Bigtable scales in directproportion to the number of machines in your cluster. A self-managed HBaseinstallation has a design bottleneck that limits the performance after acertain threshold is reached. Bigtable does not have thisbottleneck, so you can scale your cluster up to handle more reads andwrites.
  • Simple administration. Bigtable handles upgrades andrestarts transparently, and it automatically maintains highdatadurability. To replicate your data, add a secondcluster to your instance, and replication starts automatically. No moremanaging replicas or regions; just design your table schemas, andBigtable will handle the rest for you.
  • Cluster resizing without downtime. You can increase the size of aBigtable cluster for a few hours to handle a large load, thenreduce the size of the cluster again—all without any downtime. After you changea cluster's size, it typically takes just a few minutes under load forBigtable to balance performance across all of the nodes inyour cluster.
  • Tiered storage (Preview). You canstore infrequently accessed data in a separate, lower-cost storage tier.Tiered storage lets you choose the storage tier that best suits yourBigtable data access needs.
  • Autoscaling. You can configure Bigtable to continuouslymonitor cluster CPU capacity and automatically adjust the number of nodes ina cluster when necessary.

What it's good for

Bigtable is ideal for applications that need high throughput andscalability for key-value data, where each value is typically no larger than 10MB. Bigtable also excels as a storage engine for batch MapReduceoperations, stream processing/analytics, and machine-learning applications.

You can use Bigtable to store and query all of the following typesof data:

  • Time-series data, such as CPU and memory usage over time for multipleservers.
  • Marketing data, such as purchase histories and customer preferences.
  • Financial data, such as transaction histories, stock prices, andcurrency exchange rates.
  • Internet of Things data, such as usage reports from energy meters andhome appliances.
  • Graph data, such as information about how users are connected to oneanother.

Bigtable storage model

Bigtable stores data in massively scalable tables, each of whichis a sorted key-value map. The table is composed ofrows, each of whichtypically describes a single entity, andcolumns, which contain individualvalues for each row. Each row is indexed by a singlerow key, and columns thatare related to one another are typically grouped into acolumn family. Eachcolumn is identified by a combination of the column family and acolumnqualifier, which is a unique name within the column family.

Each intersection of a row and column can contain multiplecells. Each cellcontains a unique timestamped version of the data for that row and column.Storing multiple cells in a column provides a record of how the stored data forthat row and column has changed over time. Bigtable tables aresparse; if a column is not used in a particular row, it does not take up anyspace.

Bigtable storage model diagram

A few things to notice in this illustration:

  • Columns can be unused in a row.
  • Each cell in a given row and column has a unique timestamp (t).

Bigtable architecture

The following diagram shows a simplified version of Bigtable'soverall architecture:

Overall architecture ofBigtable.

As the diagram illustrates, all client requests go through a frontend serverbefore they are sent to a Bigtable node. (In theoriginalBigtable paper, these nodes are called "tablet servers.") Thenodes are organized into a Bigtable cluster, which belongs to aBigtable instance, a container for the cluster.

Note: The diagram shows an instance with a single cluster. You can also addclusters toreplicate your data, which improves data availabilityand durability.

Each node in the cluster handles a subset of the requests to the cluster. Byadding nodes to a cluster, you can increase the number of simultaneous requeststhat the cluster can handle. Adding nodes also increases the maximum throughputfor the cluster. If you enable replication by adding additional clusters, youcan also send different types of traffic to different clusters. Then if onecluster becomes unavailable, you can fail over to another cluster.

A Bigtable table is sharded into blocks of contiguous rows, calledtablets, to help balance the workload of queries. (Tablets are similar toHBase regions.) Tablets are stored on Colossus, a Google-developed file system,in SSTable format. An SSTable provides a persistent, ordered immutable map fromkeys to values, where both keys and values are arbitrary byte strings. Eachtablet is associated with a specific Bigtable node. In addition tothe SSTable files, all writes are stored in Colossus's shared log as soon asthey are acknowledged by Bigtable, providing increased durability.

Importantly, data is never stored in Bigtable nodes themselves;each node has pointers to a set of tablets that are stored on Colossus. As aresult:

  • Rebalancing tablets from one node to another happens quickly, because theactual data is not copied. Bigtable updates thepointers for each node.
  • Recovery from the failure of a Bigtable node is fast, becauseonly metadata must be migrated to the replacement node.
  • When a Bigtable node fails, no data is lost.

SeeInstances, Clusters, and Nodes for moreinformation about how to work with these fundamental building blocks.

Load balancing

Each Bigtable zone is managed by a primary process, which balancesworkload and data volume within clusters. This process splits busier or largertablets in half and merges less-accessed/smaller tablets together,redistributing them between nodes as needed. If a certain tablet gets a spike oftraffic, Bigtable splits the tablet in two, then moves one of thenew tablets to another node. Bigtable manages the splitting,merging, and rebalancing automatically, saving you the effort of manuallyadministering your tablets.Understand performance provides moredetails about this process.

To get the best write performance from Bigtable, it's important todistribute writes as evenly as possible across nodes. One way to achieve thisgoal is by using row keys that don't follow a predictable order. For example,usernames tend to be distributed more or less evenly throughout the alphabet, soincluding a username at the start of the row key will tend to distribute writesevenly.

At the same time, it's useful to group related rows so they are next to oneanother, which makes it much more efficient to read several rows at the sametime. For example, if you're storing different types of weather data over time,your row key might be the location where the data was collected, followed by atimestamp (for example,WashingtonDC#201803061617). This type of row key wouldgroup all of the data from one location into a contiguous range of rows. Forother locations, the row would start with a different identifier; with manylocations collecting data at the same rate, writes would still be spread evenlyacross tablets.

SeeChoosing a row key for more details about choosing an appropriaterow key for your data.

Compute

By default, Bigtable usesclusternodes for both storage andcompute. For high-throughput read jobs, you can use Data Boost forBigtable for compute. Data Boost lets you send large read jobsand queries using serverless compute while your core application continues usingcluster nodes for compute. For more information, seeData Boostoverview.

Supported data types

Bigtable treats all data as raw byte strings for most purposes.The only time Bigtable tries to determine the type is forincrement operations, where the target must be a 64-bit integer encoded as an8-byte big-endian value.

Memory and disk usage

The following sections describe how several components of Bigtableaffect memory and disk usage for your instance.

Unused columns

Columns that are not used in a Bigtable row don't take up anyspace in that row. Each row is essentially a collection of key-value entries,where the key is a combination of the column family, column qualifier andtimestamp. If a row does not include a value for a specific column, thekey-value entry is not present.

Column qualifiers

Column qualifiers take up space in a row, since each column qualifier used in arow is stored in that row. As a result, it's often efficient to use columnqualifiers as data.

For more information about column qualifiers, seeColumns.

Compactions

Bigtable periodically rewrites your tables to remove deletedentries, to reorganize your data so that reads and writes are more efficient,and to move data as part of tiered storage. This process is known as acompaction. There are no configuration settings for compactions—Bigtable compacts your data automatically.On average, it takesa week for a compaction to complete and execute tasks such as data deletion ormoving data to tiered storage.

Compaction carries out deletions identified by the garbage collection process.For more information, seeGarbage collection. For moreinformation on compactions in tiered storage, seeHow tiered storage works.

Mutations and deletions

Mutations, or changes, to a row take up extra storage space, becauseBigtable stores mutations sequentially and compacts them onlyperiodically. When Bigtable compacts a table, it removes valuesthat are no longer needed. If you update the value in a cell, both the originalvalue and the new value will be stored on disk for some amount of time until thedata is compacted.

Deletions also take up extra storage space, at least in the short term, becausedeletions are actually a specialized type of mutation. Until the table iscompacted, a deletion uses extra storage rather than freeing up space.

Data compression

Bigtable compresses your data automatically using an intelligentalgorithm. You cannot configure compression settings for your table. However, itis useful to know how to store data so that it can be compressed efficiently:

  • Random data cannot be compressed as efficiently as patterned data.Patterned data includes text, such as the page you're reading right now.
  • Compression works best if identical values are near each other, eitherin the same row or in adjoining rows. If you arrange your row keys so thatrows with identical chunks of data are next to each other, the data can becompressed efficiently.
  • Bigtable compresses values that are up to 1 MiB in size.If you store values that are larger than 1 MiB, compress them beforewriting them to Bigtable, so you can save CPU cycles, servermemory, and network bandwidth.

Data durability

When you use Bigtable, your data is stored on Colossus, aGoogle-developed, highly durable file system, using storage devices inGoogle Cloud's data centers. You don't need to run an HDFS cluster or any otherfile system to use Bigtable. Behind the scenes, Colossus usesproprietary storage methods to achieve data durability beyond what standard HDFSthree-way replication provides.

Durability is further improved when usingreplication.Bigtable maintains a separate copy of your data in the locationthat you select for each cluster of a replicated instance.

Consistency model

Single-cluster Bigtable instances provide strong consistency. Bydefault, instances that have more than one cluster provide eventual consistency,but forsome use cases they can be configured toprovide read-your-writes consistency or strong consistency, depending on theworkload and app profile settings.

Security

Access to your Bigtable tables is controlled by your Google Cloudproject and theIdentity and Access Management (IAM) roles that you assign tousers. For example, you can assign IAM roles that preventindividual users from reading from tables, writing to tables, or creating newinstances. If someone does not have access to your project or does not have anIAM role with appropriate permissions for Bigtable,they cannot access any of your tables.

You can also control access to table data by creating an authorized viewof a table that represents a subset of the table data. Then you can grantauthorized view-level permissions to some users withoutgranting them table-level permissions.You can manage security at the project, instance, table, orauthorized view levels. Bigtable does not supportrow-level, column-level, or cell-level security restrictions.

Encryption

By default, all data stored within Google Cloud, including the data inBigtable tables, isencrypted at rest usingthe same hardened key management systems that we use for our own encrypted data.

If you want more control over the keys used to encrypt yourBigtable data at rest, you can usecustomer-managed encryptionkeys (CMEK).

Backups

Bigtable backups let you save a copy of atable's schema and data and then restore to a new table at a later time.Using backups and backup copies, you can restore to a new table in any region orproject where you have a Bigtable instance, regardless of wherethe source table is.

Change data capture

Bigtable provides change data capture (CDC) in the form ofchangestreams. Change streams let you capture and stream out data changes to a tableas the changes happen. You can read a change stream using a service such asDataflow to support use cases including data analytics, audits,archiving requirements, and triggering downstream application logic. For moreinformation, see theOverview of change streams.

Request routing with app profiles

App profilerouting policies let you control which clustershandle incoming requests from your applications. Options for routing policiesinclude the following:

  • Single-cluster routing: sends all requests to a single cluster.
  • Multi-cluster routing to any cluster: sends requests to the nearestavailable cluster in an instance, including the following options:
    • Any cluster: any cluster in the instance can receive requests.
    • Cluster group routing: a specified group of clusters in the instance can receive requests.

Other storage and database options

Bigtable isn't a traditional relational database.While it does supportSQL queries, certain use cases mightbe better suited for another database option.

  • If you must store highly structured objects in a document database, with support for ACID transactions and SQL-like queries, considerFirestore.
  • For in-memory data storage with low latency, considerMemorystore.
  • To sync data between users in real time, consider theFirebase Realtime Database.
  • If you need interactive querying in an online analytical processing (OLAP) system, considerBigQuery.

For more information about other database options, see theoverview of database services. Google Cloud also has variousstorage options.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-17 UTC.