Read from Bigtable to Dataflow Stay organized with collections Save and categorize content based on your preferences.
To read data from Bigtable to Dataflow, use theApache BeamBigtable I/O connector.
Note: Depending on your scenario, consider using one of theGoogle-provided Dataflow templates.Several of these read from Bigtable.Parallelism
Parallelism is controlled by the number ofnodes in theBigtable cluster. Each node manages one or more key ranges,although key ranges can move between nodes as part ofload balancing. For more information,seeReads and performance in theBigtable documentation.
You are charged for the number of nodes in your instance's clusters. SeeBigtable pricing.
Performance
The following table shows performance metrics for Bigtable readoperations. The workloads were run on onee2-standard2 worker, using theApache Beam SDK 2.48.0 for Java. They did not use Runner v2.
| 100 M records | 1 kB | 1 column | Throughput (bytes) | Throughput (elements) |
|---|---|---|
| Read | 180 MBps | 170,000 elements per second |
These metrics are based on simple batch pipelines. They are intended to compare performance between I/O connectors, and are not necessarily representative of real-world pipelines. Dataflow pipeline performance is complex, and is a function of VM type, the data being processed, the performance of external sources and sinks, and user code. Metrics are based on running the Java SDK, and aren't representative of the performance characteristics of other language SDKs. For more information, seeBeam IO Performance.
Best practices
For new pipelines, use the
BigtableIOconnector, notCloudBigtableIO.Create separateapp profiles for each type ofpipeline. App profiles enable better metrics for differentiating trafficbetween pipelines, both for support and for tracking usage.
Monitor the Bigtable nodes. If you experience performancebottlenecks, check whether resources such as CPU utilization are constrainedwithin Bigtable. For more information, seeMonitoring.
In general, the default timeouts are well tuned for most pipelines. If astreaming pipeline appears to get stuck reading from Bigtable,try calling
withAttemptTimeoutto adjust the attempttimeout.Consider enablingBigtable autoscaling, or resizethe Bigtable cluster to scale with the size of yourDataflow jobs.
Consider setting
maxNumWorkerson the Dataflow job to limit load on theBigtable cluster.If significant processing is done on a Bigtable element beforea shuffle, calls to Bigtable might time out. In that case, youcan call
withMaxBufferElementCountto bufferelements. This method converts the read operation from streaming to paginated,which avoids the issue.If you use a single Bigtable cluster for both streaming andbatch pipelines, and the performance degrades on the Bigtableside, consider setting up replication on the cluster. Then separate the batchand streaming pipelines, so that they read from different replicas. For moreinformation, seeReplication overview.
What's next
- Read theBigtable I/O connector documentation.
- See the list ofGoogle-provided templates.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.