Movatterモバイル変換

[0]ホーム

Jump to content

Apache Kafka

Edit links

From Wikipedia, the free encyclopedia

Software bus for high-volume data feeds

This article has multiple issues. Please helpimprove it or discuss these issues on thetalk page.(Learn how and when to remove these messages)

This articlecontainsinstructions or advice. Wikipedia is not a guidebook; please helprewrite such content to be encyclopedic or move it toWikiversity,Wikibooks, orWikivoyage.(November 2023)

This articlemay rely excessively on sourcestoo closely associated with the subject, potentially preventing the article from beingverifiable andneutral. Please helpimprove it by replacing them with more appropriatecitations toreliable, independent sources.(November 2023) (Learn how and when to remove this message)

(Learn how and when to remove this message)

Apache Kafka^[1]

Original author	LinkedIn
Developer	Apache Software Foundation
Initial release	January 2011; 15 years ago (2011-01)^[2]

Stable release	4.1.1^[3] / 12 November 2025

Written in	Scala,Java
Operating system	Cross-platform
Type	Stream processing,Message broker
License	Apache License 2.0
Website	kafka.apache.org
Repository	github.com/apache/kafka

Apache Kafka is adistributed event store andstream-processing platform. It is anopen-source system developed by theApache Software Foundation written inJava andScala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Kafka can connect to external systems (for data import/export) via Kafka Connect, and provides the Kafka Streamslibraries for stream processing applications. Kafka uses a binaryTCP-based protocol that is optimized for efficiency and relies on a "message set" abstraction that naturally groups messages together to reduce the overhead of the network roundtrip. This "leads to larger network packets, larger sequential disk operations, contiguous memory blocks [...] which allows Kafka to turn a bursty stream of random message writes into linear writes."^[4]

History

[edit]

Kafka was originally developed atLinkedIn, and was subsequently open sourced in early 2011. Jay Kreps,Neha Narkhede and Jun Rao helped co-create Kafka.^[5] Graduation from the Apache Incubator occurred on 23 October 2012.^[6] Jay Kreps chose to name the software after the authorFranz Kafka because it is "a system optimized for writing", and he liked Kafka's work.^[7]

Operation

[edit]

Apache Kafka is a distributed log-based messaging system that guarantees ordering within individual partitions rather than across the entire topic. Unlike queue-based systems, Kafka retains messages in a durable, append-only log, allowing multiple consumers to read at different offsets. Kafka uses manual offset management, giving consumers control over retries and failure handling. If a consumer fails to process a message, it can delay committing the offset, preventing further progress in that partition while other partitions remain unaffected. This partition-based design enables fault isolation and parallel processing while allowing ordering to be maintained within partitions, depending on consumer handling.^[8]^{[page needed]}

In 2025, Apache Kafka introduced "Queues for Kafka",^[9] adding share groups as an alternative to consumer groups. This feature enables queue-like semantics where consumers can cooperatively process records from the same partitions, with individual message acknowledgment and delivery tracking. Unlike traditional consumer groups where partitions are exclusively assigned, share groups allow the number of consumers to exceed partition count, making it ideal for work-queue patterns while maintaining Kafka's durability and scalability benefits. This development addresses the common challenge of "over-partitioning" that many Kafka users face.^{[citation needed]}

Kafka APIs

[edit]

Connect API

[edit]

This sectionneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources in this section. Unsourced material may be challenged and removed.(May 2025) (Learn how and when to remove this message)

Kafka Connect (or Connect API) is a framework to import/export data from/to other systems.^[10] It was added in the Kafka 0.9.0.0 release and uses the Producer and Consumer API internally. The Connect framework itself executes so-called "connectors" that implement the actual logic to read/write data from other systems.^{[citation needed]}

Streams API

[edit]

Kafka Streams (or Streams API) is a stream-processing library written in Java. It was added in the Kafka 0.10.0.0 release. The library allows for the development of stateful stream-processing applications that are scalable, elastic, and fully fault-tolerant. The main API is a stream-processingdomain-specific language (DSL) that offers high-level operators like filter,map, grouping, windowing, aggregation, joins, and the notion of tables. Additionally, the Processor API can be used to implement custom operators for a more low-level development approach. The DSL and Processor API can be mixed, too. For stateful stream processing, Kafka Streams usesRocksDB to maintain local operator state. Because RocksDB can write to disk, the maintained state can be larger than available main memory. For fault-tolerance, all updates to local state stores are also written into a topic in the Kafka cluster. This allows recreating state by reading those topics and feed all data into RocksDB.^[11]

References

[edit]

^"Apache Kafka at GitHub".github.com.Archived from the original on 16 January 2023. Retrieved5 March 2018.
^"Open-sourcing Kafka, LinkedIn's distributed message queue".Archived from the original on 26 December 2022. Retrieved27 October 2016.
^"Release 4.1.1". 12 November 2025. Retrieved13 November 2025.
^"Efficiency".kafka.apache.org. Retrieved2019-09-19.
^Li, Steven."He Left His High-Paying Job At LinkedIn And Then Built A $4.5 Billion Business In A Niche You've Never Heard Of".Forbes. Retrieved2025-12-02.
^"Apache Incubator: Kafka Incubation Status".Archived from the original on 2022-10-17. Retrieved2022-10-17.
^Narkhede, Neha; Shapira, Gwen; Palino, Todd (2017). "Chapter 1".Kafka: The Definitive Guide. O'Reilly.ISBN 978-1-4919-3611-5.People often ask how Kafka got its name and if it has anything to do with the application itself. Jay Kreps offered the following insight: "I thought that since Kafka was a system optimized for writing using, a writer's name would make sense. I had taken a lot of lit classes in college and liked Franz Kafka."
^Narkhede, Neha; Shapira, Gwen; Palino, Todd (2017).Kafka: the definitive guide: real-time data and stream processing at scale. Sebastopol, CA: O'Reilly Media.ISBN 978-1-4919-3616-0.OCLC 933521388.
^"KIP-932: Queues for Kafka - Apache Kafka - Apache Software Foundation".cwiki.apache.org. Retrieved2025-12-02.
^"Apache Kafka Documentation: Kafka Connect".Apache.
^"Kafka Connect – Import Export for Apache Kafka".SoftwareMill. Retrieved2025-05-08.

External links

[edit]

Official website

v t e The Apache Software Foundation
Top-level projects	Accumulo ActiveMQ Airavata Airflow Allura Ambari Ant Aries Arrow Apache HTTP Server APR Avro Axis Axis2 Beam Bloodhound Brooklyn Calcite Camel CarbonData Cassandra Cayenne CloudStack Cocoon Cordova CouchDB cTAKES CXF Derby Directory Drill Druid Empire-db Felix Flex Flink Flume FreeMarker Geronimo Groovy Guacamole Gump Hadoop HBase Helix Hive Iceberg Ignite Impala Jackrabbit James Jena JMeter Kafka Kudu Kylin Lucene Mahout Maven MINA mod_perl MyFaces Mynewt NiFi NetBeans Nutch NuttX OFBiz Oozie OpenEJB OpenJPA OpenNLP OрenOffice ORC PDFBox Parquet Phoenix POI Pig Pinot Pivot Qpid Roller RocketMQ Samza Shiro SINGA Sling Solr Spark Storm SpamAssassin Struts 1 Subversion Superset SystemDS Tapestry Thrift Tika TinkerPop Tomcat Trafodion Traffic Server UIMA Velocity Wicket Xalan Xerces XMLBeans Yetus ZooKeeper
Commons	BCEL BSF Daemon Jelly Logging
Incubator	Taverna
Other projects	Batik FOP Ivy Log4j
Attic	Apex AxKit Beehive iBATIS Click Continuum Deltacloud Etch Giraph Hama Harmony Jakarta Marmotta MXNet ODE River Shale Slide Sqoop Stanbol Tuscany Wave XML
Licenses	Apache License
Category