Migrate data from HBase to Bigtable offline Stay organized with collections Save and categorize content based on your preferences.
This page describes considerations and processes for migrating datafrom an Apache HBase cluster to a Bigtable instance onGoogle Cloud.
The process described on this page requires you to take your applicationoffline. If you want to migrate with no downtime, see the guidance for onlinemigration atReplicate from HBase toBigtable.
To migrate data to Bigtable from an HBase cluster that ishostedon a Google Cloud service, such as Dataproc or Compute Engine, seeMigrating HBase hosted on Google Cloud to Bigtable.
Before you begin this migration, you should consider performance implications,Bigtable schema design, your approach to authenticationand authorization, and the Bigtable feature set.
Pre-migration considerations
This section suggests a few things to review and think about before you beginyour migration.
Performance
Under a typical workload, Bigtable delivers highly predictableperformance. Make sure that you understand the factors that affectBigtable performance before you migrate your data.
Bigtable schema design
In most cases, you can use the same schema design in Bigtable asyou do in HBase. If you want to change your schema or if your use case ischanging, review the concepts laid out inDesigning your schema before you migrate your data.
Authentication and authorization
Before you design access control for Bigtable,review the existing HBase authentication and authorization processes.
Bigtable uses Google Cloud's standardmechanisms for authentication andIdentity and Access Management to provide access control, so you convert your existingauthorization on HBase to IAM. Youcan map the existing Hadoop groups that provide access control mechanisms forHBase to different service accounts.
Bigtable allows you to control access at the project, instance,and table levels. For more information, seeAccess Control.
Downtime requirement
The migration approach that is described on this page involves taking yourapplication offline for the duration of the migration. If your business can'ttolerate downtime while you migrate to Bigtable, see the guidancefor online migration atReplicate from HBase toBigtable.
Migrate HBase to Bigtable
To migrate your data from HBase to Bigtable, you export an HBasesnapshot for each table to Cloud Storage and then import the data intoBigtable. These steps are for a single HBase clusterand are described in detail in the next several sections.
- Stop sending writes to your HBase cluster.
- Take snapshots of the HBase cluster's tables.
- Export the snapshot files to Cloud Storage.
- Compute hashes and export them to Cloud Storage.
- Create destination tables in Bigtable.
- Import the HBase data from Cloud Storage intoBigtable.
- Validate the imported data.
- Route writes to Bigtable.
Before you begin
Create a Cloud Storage bucket tostore your snapshots. Create the bucket in the samelocationthat you plan to run your Dataflow job in.
Create a Bigtable instanceto store your new tables.
Identify the Hadoop cluster that you are exporting. You can run the jobs foryour migration either directly on the HBase cluster or on a separate Hadoopcluster that has network connectivity to the HBase cluster's Namenode and Datanodes.
Install and configure the Cloud Storage connector on every node inthe Hadoop cluster as well as the host where the job is initiated from. Fordetailed installation steps, seeInstalling the Cloud Storage connector.
Open a command shell on a host that can connect to your HBase cluster and yourBigtable project. This is where you'll complete the next steps.
Get the Schema Translation tool:
wgetBIGTABLE_HBASE_TOOLS_URLReplace
BIGTABLE_HBASE_TOOLS_URLwith the URL of thelatestJAR with dependenciesavailable in the tool'sMaven repository.The file name is similar tohttps://repo1.maven.org/maven2/com/google/cloud/bigtable/bigtable-hbase-1.x-tools/1.24.0/bigtable-hbase-1.x-tools-1.24.0-jar-with-dependencies.jar.To find the URL or to manually download the JAR, do the following:
- Go to the repository.
- Click the most recent version number.
- Identify the
JAR with dependencies file(usually at the top). - Either right-click and copy the URL, or click to download the file.
Get the Import tool:
wgetBIGTABLE_BEAM_IMPORT_URLReplace
BIGTABLE_BEAM_IMPORT_URLwith the URL of thelatestshaded JARavailable in the tool'sMaven repository.The file name is similar tohttps://repo1.maven.org/maven2/com/google/cloud/bigtable/bigtable-beam-import/1.24.0/bigtable-beam-import-1.24.0-shaded.jar.To find the URL or to manually download the JAR, dothe following:
- Go to the repository.
- Click the most recent version number.
- ClickDownloads.
- Mouse overshaded.jar.
- Either right-click and copy the URL, or click to download the file.
Set the following environment variables:
#Google CloudexportPROJECT_ID=PROJECT_IDexportINSTANCE_ID=INSTANCE_IDexportREGION=REGIONexportCLUSTER_NUM_NODES=CLUSTER_NUM_NODES#JAR filesexportTRANSLATE_JAR=TRANSLATE_JARexportIMPORT_JAR=IMPORT_JAR#Cloud StorageexportBUCKET_NAME="gs://BUCKET_NAME"exportMIGRATION_DESTINATION_DIRECTORY="$BUCKET_NAME/hbase-migration-snap"#HBaseexportZOOKEEPER_QUORUM=ZOOKEPER_QUORUMexportZOOKEEPER_PORT=2181exportZOOKEEPER_QUORUM_AND_PORT="$ZOOKEEPER_QUORUM:$ZOOKEEPER_PORT"exportMIGRATION_SOURCE_DIRECTORY=MIGRATION_SOURCE_DIRECTORYReplace the following:
PROJECT_ID: the Google Cloud project that yourinstance is inINSTANCE_ID: the identifier of theBigtable instance that you are importing your data toREGION: a region that contains one of the clusters inyour Bigtable instance. Example:northamerica-northeast2CLUSTER_NUM_NODES: the number of nodes in yourBigtable instanceTRANSLATE_JAR: the name and version number ofthebigtable hbase toolsJAR file that you downloaded from Maven. The valueshould look something likebigtable-hbase-1.x-tools-1.24.0-jar-with-dependencies.jar.IMPORT_JAR: the name and version number ofthebigtable-beam-importJAR file that you downloaded from Maven. The valueshould look something likebigtable-beam-import-1.24.0-shaded.jar.BUCKET_NAME: the name of theCloud Storage bucket where you are storing your snapshotsZOOKEEPER_QUORUM: the zookeeper host that the toolwill connect to, in the formathost1.myownpersonaldomain.comMIGRATION_SOURCE_DIRECTORY: the directory on yourHBase host that holds the data that you want to migrate, in the formathdfs://host1.myownpersonaldomain.com:8020/hbase
(Optional) To confirm that the variables were set correctly, run the
printenvcommand to view all environment variables.
Stop sending writes to HBase
Before you take snapshots of your HBase tables, stop sending writes to yourHBase cluster.
Take HBase table snapshots
When your HBase cluster is no longer ingesting data, take a snapshot ofeach table that you plan to migrate to Bigtable.
A snapshot has a minimal storage footprint on the HBase cluster at first, butover time it might grow to the same size as the original table. The snapshotdoes not consume any CPU resources.
Run the following command for each table, using a unique name for eachsnapshot:
echo "snapshot 'TABLE_NAME', 'SNAPSHOT_NAME'" | hbase shell -nReplace the following:
TABLE_NAME: the name of the HBase table that youare exporting data from.SNAPSHOT_NAME: the name for the new snapshot
Export the HBase snapshots to Cloud Storage
After you create the snapshots, you need to export them. When you're executingexport jobs on a production HBase cluster, monitor the cluster and other HBaseresources to ensure that the clusters remain in a good state.
For each snapshot that you want to export, run the following:
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot \-Dhbase.zookeeper.quorum=$ZOOKEEPER_QUORUM_AND_PORT -snapshotSNAPSHOT_NAME \ -copy-from $MIGRATION_SOURCE_DIRECTORY \ -copy-to $MIGRATION_DESTINATION_DIRECTORY/dataReplaceSNAPSHOT_NAME with the name of the snapshot toexport.
-mappers parameter with an integer value for specifying how many mappers torun and the-bandwidth parameter with an integer value for MB/sec. Forexample, to limit the concurrency of copy map tasks to 20, set-mappers 20 andto limit the bandwidth that each copy map task uses to 50 MB/sec, add-bandwidth 50 to the command. This makes the total bandwidth 1,000 MB/sec(50 MB/sec * 20 mappers = 1,000 MB/sec).Compute and export hashes
Next, create hashes to use for validation after the migration is complete.HashTable is a validation tool provided by HBase that computes hashes forrow ranges and exports them to files. You can run async-table job on thedestination table to match the hashes and gain confidence in the integrity ofmigrated data.
Run the following command for each table that you exported:
hbase org.apache.hadoop.hbase.mapreduce.HashTable --batchsize=32000 --numhashfiles=20 \TABLE_NAME $MIGRATION_DESTINATION_DIRECTORY/hashtable/TABLE_NAMEReplace the following:
TABLE_NAME: the name of the HBase table that youcreated a snapshot for and exported
ExportSnapshot andHashTable jobs in parallel. This is possible because theexport job is reading from disk, and the HashTable job is busy computing hashesand scanning HBase. This option can reduce the downtime required for themigration.Create destination tables
The next step is to create a destination table in your Bigtableinstance for each snapshot that you exported. Use an account that hasbigtable.tables.create permission for the instance.
This guide uses theBigtable Schema Translation tool,which automatically creates the table for you. However, if you don't want yourBigtable schema to exactly match the HBase schema, you cancreate a table using thecbt command-line tool or the Google Cloud console.
The Bigtable Schema Translation tool captures the schema of the HBasetable, including the table name, column families, garbage collection policies,and splits. Then it creates a similar table in Bigtable.
Note: If your HBase master is in a virtual private cloud or you can't connect tothe internet, you can follow thealternative instructions instead of the instructions in this section. When you use thealternative instructions, you export the HBase schema to a file, and then usethat file to create tables in Bigtable.For each table that you want to import, run the following to copy the schemafrom HBase to Bigtable.
java \ -Dgoogle.bigtable.project.id=$PROJECT_ID \ -Dgoogle.bigtable.instance.id=$INSTANCE_ID \ -Dgoogle.bigtable.table.filter=TABLE_NAME \ -Dhbase.zookeeper.quorum=$ZOOKEEPER_QUORUM \ -Dhbase.zookeeper.property.clientPort=$ZOOKEEPER_PORT \ -jar $TRANSLATE_JARReplaceTABLE_NAME with the name of the HBase tablethat you are importing. The Schema Translation tool uses this name foryour new Bigtable table.
You can also optionally replaceTABLE_NAME with aregular expression, such as ".*", that captures all the tables that you want tocreate, and then run the command only once.
Import the HBase data into Bigtable using Dataflow
After you have a table ready to migrate your data to, you are ready toimport and validate your data.
Uncompressed tables
If your HBase tables are not compressed, run the following command for eachtable that you want to migrate:
java-jar$IMPORT_JARimportsnapshot\--runner=DataflowRunner\--project=$PROJECT_ID\--bigtableInstanceId=$INSTANCE_ID\--bigtableTableId=TABLE_NAME\--hbaseSnapshotSourceDir=$MIGRATION_DESTINATION_DIRECTORY/data\--snapshotName=SNAPSHOT_NAME\--stagingLocation=$MIGRATION_DESTINATION_DIRECTORY/staging\--tempLocation=$MIGRATION_DESTINATION_DIRECTORY/temp\--maxNumWorkers=$(expr3\*$CLUSTER_NUM_NODES)\--region=$REGIONReplace the following:
TABLE_NAME: the name of the HBase tablethat you are importing. The Schema Translation tool gives uses this name foryour new Bigtable table. New table names are not supported.SNAPSHOT_NAME: the name that you assigned to thesnapshot of the table that you are importing
After you run the command, the tool restores the HBase snapshot toyour Cloud Storage bucket, then starts the import job. It can take severalminutes for the process of restoring the snapshot to finish, depending on thesize of the snapshot.
Keep the following tips in mind when you import:
- To improve the performance of data loading, be sure to set
maxNumWorkers.This value helps to ensure that the import job has enough compute power tocomplete in a reasonable amount of time, but not so much that it would overwhelmthe Bigtable instance.- If you are not also using the Bigtable instance for anotherworkload, multiply the number of nodes in your Bigtableinstance by 3, and use that number for
maxNumWorkers. - If you are using the instance for another workload at the same time thatyou are importing your HBase data, reduce the value of
maxNumWorkersappropriately.
- If you are not also using the Bigtable instance for anotherworkload, multiply the number of nodes in your Bigtableinstance by 3, and use that number for
- Use the default worker type.
- During the import, you should monitor the Bigtable instance'sCPU usage. If the CPU utilization across theBigtable instance is too high, you might need to add additionalnodes. It can take up to 20 minutes for the cluster to provide the performancebenefit of additional nodes.
For more information about monitoring the Bigtable instance, seeMonitoring.
Snappy compressed tables
If you are importing Snappy compressed tables, you need to use acustom containerimage inthe Dataflow pipeline. The custom container image that you useto import compressed data into Bigtable providesHadoop native compression library support. You must have the Apache Beam SDKversion 2.30.0 or later to use Dataflow Runner v2, and you musthave version 2.3.0 or later of the HBase client library for Java.
To import Snappy compressed tables, run thesame command thatyou run for uncompressed tables, but add the following option:
--enableSnappy=trueValidate the imported data in Bigtable
To validate the imported data, you need to run thesync-table job. Thesync-table job computes hashes for row ranges in Bigtable,then matches them with the HashTable output that you computed earlier.
To run thesync-table job, run the following in the command shell:
java-jar$IMPORT_JARsync-table\--runner=dataflow\--project=$PROJECT_ID\--bigtableInstanceId=$INSTANCE_ID\--bigtableTableId=TABLE_NAME\--outputPrefix=$MIGRATION_DESTINATION_DIRECTORY/sync-table/output-TABLE_NAME-$(date+"%s")\--stagingLocation=$MIGRATION_DESTINATION_DIRECTORY/sync-table/staging\--hashTableOutputDir=$MIGRATION_DESTINATION_DIRECTORY/hashtable/TABLE_NAME\--tempLocation=$MIGRATION_DESTINATION_DIRECTORY/sync-table/dataflow-test/temp\--region=$REGIONReplaceTABLE_NAME with the name of the HBase tablethat you are importing.
When thesync-table job is complete, open theDataflow Job details page andreview theCustom counters section for the job. If the import jobsuccessfully imports all of the data, the value forranges_matched has avalue and the value forranges_not_matched is 0.

Ifranges_not_matched shows a value, open theLogs page, chooseWorker Logs, and filter byMismatch on range. The machine-readableoutput of these logs is stored in Cloud Storage at the outputdestination that you create in the sync-tableoutputPrefix option.

You can try the import job again or write a script to read the output files todetermine where the mismatches occurred. Each line in the output file is aserialized JSON recordof a mismatched range.
Route writes to Bigtable
After you've validated the data for each table in the cluster, you canconfigure your applications to route alltheir traffic to Bigtable, then deprecate the HBase instance.
When your migration is complete, you can delete the snapshots on your HBaseinstance.
What's next
- Learn more aboutCloud Storage.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.