Dataflow managed I/O for Apache Kafka

Managed I/O supports reading and writing toApache Kafka.

Requirements

The following SDKs support managed I/O for Apache Kafka:

  • Apache Beam SDK for Java version 2.58.0 or later
  • Apache Beam SDK for Python version 2.61.0 or later

Configuration

Managed I/O for BigQuery supports the following configurationparameters:

KAFKA Read

ConfigurationTypeDescription
bootstrap_serversstr A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form `host1:port1,host2:port2,...`
topicstr n/a
allow_duplicatesboolean If the Kafka read allows duplicates.
confluent_schema_registry_subjectstr n/a
confluent_schema_registry_urlstr n/a
consumer_config_updatesmap[str,str] A list of key-value pairs that act as configuration parameters for Kafka consumers. Most of these configurations will not be needed, but if you need to customize your Kafka consumer, you may use this. See a detailed list: https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html
file_descriptor_pathstr The path to the Protocol Buffer File Descriptor Set file. This file is used for schema definition and message serialization.
formatstr The encoding format for the data stored in Kafka. Valid options are: RAW,STRING,AVRO,JSON,PROTO
message_namestr The name of the Protocol Buffer message to be used for schema extraction and data conversion.
offset_deduplicationboolean If the redistribute is using offset deduplication mode.
redistribute_by_record_keyboolean If the redistribute keys by the Kafka record key.
redistribute_num_keysint32 The number of keys for redistributing Kafka inputs.
redistributedboolean If the Kafka read should be redistributed.
schemastr The schema in which the data is encoded in the Kafka topic. For AVRO data, this is a schema defined with AVRO schema syntax (https://avro.apache.org/docs/1.10.2/spec.html#schemas). For JSON data, this is a schema defined with JSON-schema syntax (https://json-schema.org/). If a URL to Confluent Schema Registry is provided, then this field is ignored, and the schema is fetched from Confluent Schema Registry.

KAFKA Write

ConfigurationTypeDescription
bootstrap_serversstr A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. | Format: host1:port1,host2:port2,...
formatstr The encoding format for the data stored in Kafka. Valid options are: RAW,JSON,AVRO,PROTO
topicstr n/a
file_descriptor_pathstr The path to the Protocol Buffer File Descriptor Set file. This file is used for schema definition and message serialization.
message_namestr The name of the Protocol Buffer message to be used for schema extraction and data conversion.
producer_config_updatesmap[str,str] A list of key-value pairs that act as configuration parameters for Kafka producers. Most of these configurations will not be needed, but if you need to customize your Kafka producer, you may use this. See a detailed list: https://docs.confluent.io/platform/current/installation/configuration/producer-configs.html
schemastr n/a

What's next

For more information and code examples, see the following topics:

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.