Read from Apache Iceberg to Dataflow

To read from Apache Iceberg to Dataflow, use themanaged I/O connector.

Managed I/O supports the following capabilities for Apache Iceberg:

Catalogs	Hadoop Hive REST-based catalogs BigQuery metastore (requires Apache Beam SDK 2.62.0 or later if not using Runner v2)
Read capabilities	Batch read
Write capabilities	Batch write Streaming write Dynamic destinations Dynamic table creation

ForBigQuery tables for Apache Iceberg,use theBigQueryIO connectorwith BigQuery Storage API. The table must already exist; dynamic table creation isnot supported.

Dependencies

Add the following dependencies to your project:

Java

<dependency><groupId>org.apache.beam</groupId><artifactId>beam-sdks-java-managed</artifactId><version>${beam.version}</version></dependency><dependency><groupId>org.apache.beam</groupId><artifactId>beam-sdks-java-io-iceberg</artifactId><version>${beam.version}</version></dependency>

Example

The following example reads from an Apache Iceberg table and writes thedata to text files.

Java

To authenticate to Dataflow, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

importcom.google.common.collect.ImmutableMap;importjava.util.Map;importorg.apache.beam.sdk.Pipeline;importorg.apache.beam.sdk.io.TextIO;importorg.apache.beam.sdk.managed.Managed;importorg.apache.beam.sdk.options.Description;importorg.apache.beam.sdk.options.PipelineOptions;importorg.apache.beam.sdk.options.PipelineOptionsFactory;importorg.apache.beam.sdk.transforms.MapElements;importorg.apache.beam.sdk.values.PCollectionRowTuple;importorg.apache.beam.sdk.values.TypeDescriptors;publicclassApacheIcebergRead{staticfinalStringCATALOG_TYPE="hadoop";publicinterfaceOptionsextendsPipelineOptions{@Description("The URI of the Apache Iceberg warehouse location")StringgetWarehouseLocation();voidsetWarehouseLocation(Stringvalue);@Description("Path to write the output file")StringgetOutputPath();voidsetOutputPath(Stringvalue);@Description("The name of the Apache Iceberg catalog")StringgetCatalogName();voidsetCatalogName(Stringvalue);@Description("The name of the table to write to")StringgetTableName();voidsetTableName(Stringvalue);}publicstaticvoidmain(String[]args){// Parse the pipeline options passed into the application. Example://   --runner=DirectRunner --warehouseLocation=$LOCATION --catalogName=$CATALOG \//   --tableName= $TABLE_NAME --outputPath=$OUTPUT_FILE// For more information, see https://beam.apache.org/documentation/programming-guide/#configuring-pipeline-optionsOptionsoptions=PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);Pipelinepipeline=Pipeline.create(options);// Configure the Iceberg source I/OMapcatalogConfig=ImmutableMap.<String,Object>builder().put("warehouse",options.getWarehouseLocation()).put("type",CATALOG_TYPE).build();ImmutableMap<String,Object>config=ImmutableMap.<String,Object>builder().put("table",options.getTableName()).put("catalog_name",options.getCatalogName()).put("catalog_properties",catalogConfig).build();// Build the pipeline.pipeline.apply(Managed.read(Managed.ICEBERG).withConfig(config)).getSinglePCollection()// Format each record as a string with the format 'id:name'..apply(MapElements.into(TypeDescriptors.strings()).via((row->{returnString.format("%d:%s",row.getInt64("id"),row.getString("name"));})))// Write to a text file..apply(TextIO.write().to(options.getOutputPath()).withNumShards(1).withSuffix(".txt"));pipeline.run().waitUntilFinish();}}

What's next

Write to Apache Iceberg.
Streaming Write to Apache Iceberg with BigLake REST Catalog.
Learn more aboutManaged I/O.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

Read from Apache Iceberg to Dataflow Stay organized with collections Save and categorize content based on your preferences.

Dependencies

Java

Example

Java

What's next

Read from Apache Iceberg to Dataflow