Bigtable to Cloud Storage SequenceFile template

The Bigtable to Cloud Storage SequenceFile template is a pipeline that reads data from a Bigtable table and writes the data to a Cloud Storage bucket in SequenceFile format. You can use the template to copy data from Bigtable to Cloud Storage.

Pipeline requirements

  • The Bigtable table must exist.
  • The output Cloud Storage bucket must exist before running the pipeline.

Template parameters

Required parameters

  • bigtableProject: The ID of the Google Cloud project that contains the Bigtable instance that you want to read data from.
  • bigtableInstanceId: The ID of the Bigtable instance that contains the table.
  • bigtableTableId: The ID of the Bigtable table to export.
  • destinationPath: The Cloud Storage path where data is written. For example,gs://your-bucket/your-path/.
  • filenamePrefix: The prefix of the SequenceFile filename. For example,output-.

Optional parameters

  • bigtableAppProfileId: The ID of the Bigtable application profile to use for the export. If you don't specify an app profile, Bigtable uses the instance's default app profile:https://cloud.google.com/bigtable/docs/app-profiles#default-app-profile.
  • bigtableStartRow: The row where to start the export from, defaults to the first row.
  • bigtableStopRow: The row where to stop the export, defaults to the last row.
  • bigtableMaxVersions: Maximum number of cell versions. Defaults to: 2147483647.
  • bigtableFilter: Filter string. See:http://hbase.apache.org/book.html#thrift. Defaults to empty.
  • bigtableReadRpcTimeoutMs: The operation timeout of the Bigtable read. Default is 12 hours.
  • bigtableReadRpcAttemptTimeoutMs: The attempt timeout of the Bigtable read. Default is 10 minutes.

Run the template

Console

  1. Go to the DataflowCreate job from template page.
  2. Go to Create job from template
  3. In theJob name field, enter a unique job name.
  4. Optional: ForRegional endpoint, select a value from the drop-down menu. The default region isus-central1.

    For a list of regions where you can run a Dataflow job, seeDataflow locations.

  5. From theDataflow template drop-down menu, select theCloud Bigtable to SequenceFile Files on Cloud Storage template .
  6. In the provided parameter fields, enter your parameter values.
  7. ClickRun job.

gcloud

Note: To use the Google Cloud CLI to run classic templates, you must haveGoogle Cloud CLI version 138.0.0 or later.

In your shell or terminal, run the template:

gclouddataflowjobsrunJOB_NAME\--gcs-locationgs://dataflow-templates-REGION_NAME/VERSION/Cloud_Bigtable_to_GCS_SequenceFile\--regionREGION_NAME\--parameters\bigtableProject=BIGTABLE_PROJECT_ID,\bigtableInstanceId=INSTANCE_ID,\bigtableTableId=TABLE_ID,\bigtableAppProfileId=APPLICATION_PROFILE_ID,\destinationPath=DESTINATION_PATH,\filenamePrefix=FILENAME_PREFIX

Replace the following:

API

To run the template using the REST API, send an HTTP POST request. For more information on the API and its authorization scopes, seeprojects.templates.launch.

POSThttps://dataflow.googleapis.com/v1b3/projects/PROJECT_ID/locations/LOCATION/templates:launch?gcsPath=gs://dataflow-templates-LOCATION/VERSION/Cloud_Bigtable_to_GCS_SequenceFile{"jobName":"JOB_NAME","parameters":{"bigtableProject":"BIGTABLE_PROJECT_ID","bigtableInstanceId":"INSTANCE_ID","bigtableTableId":"TABLE_ID","bigtableAppProfileId":"APPLICATION_PROFILE_ID","destinationPath":"DESTINATION_PATH","filenamePrefix":"FILENAME_PREFIX",},"environment":{"zone":"us-central1-f"}}

Replace the following:

Template source code

Java

This template's source code is in theGoogleCloudPlatform/cloud-bigtable-client repository on GitHub.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.