- Notifications
You must be signed in to change notification settings - Fork822
segmentio/kafka-go
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
We rely on both Go and Kafka a lot at Segment. Unfortunately, the state of the Goclient libraries for Kafka at the time of this writing was not ideal. The availableoptions were:
sarama, which is by far the most popularbut is quite difficult to work with. It is poorly documented, the API exposeslow level concepts of the Kafka protocol, and it doesn't support recent Go featureslikecontexts. It also passes all values aspointers which causes large numbers of dynamic memory allocations, more frequentgarbage collections, and higher memory usage.
confluent-kafka-go is acgo based wrapper aroundlibrdkafka,which means it introduces a dependency to a C library on all Go code that usesthe package. It has much better documentation than sarama but still lacks supportfor Go contexts.
goka is a more recent Kafka client for Gowhich focuses on a specific usage pattern. It provides abstractions for using Kafkaas a message passing bus between services rather than an ordered log of events, butthis is not the typical use case of Kafka for us at Segment. The package alsodepends on sarama for all interactions with Kafka.
This is wherekafka-go
comes into play. It provides both low and high levelAPIs for interacting with Kafka, mirroring concepts and implementing interfaces ofthe Go standard library to make it easy to use and integrate with existingsoftware.
In order to better align with our newly adopted Code of Conduct, the kafka-goproject has renamed our default branch tomain
. For the full details of ourCode Of Conduct seethis document.
kafka-go
is currently tested with Kafka versions 0.10.1.0 to 2.7.1.While it should also be compatible with later versions, newer features availablein the Kafka API may not yet be implemented in the client.
kafka-go
requires Go version 1.15 or later.
TheConn
type is the core of thekafka-go
package. It wraps around a rawnetwork connection to expose a low-level API to a Kafka server.
Here are some examples showing typical use of a connection object:
// to produce messagestopic:="my-topic"partition:=0conn,err:=kafka.DialLeader(context.Background(),"tcp","localhost:9092",topic,partition)iferr!=nil {log.Fatal("failed to dial leader:",err)}conn.SetWriteDeadline(time.Now().Add(10*time.Second))_,err=conn.WriteMessages( kafka.Message{Value: []byte("one!")}, kafka.Message{Value: []byte("two!")}, kafka.Message{Value: []byte("three!")},)iferr!=nil {log.Fatal("failed to write messages:",err)}iferr:=conn.Close();err!=nil {log.Fatal("failed to close writer:",err)}
// to consume messagestopic:="my-topic"partition:=0conn,err:=kafka.DialLeader(context.Background(),"tcp","localhost:9092",topic,partition)iferr!=nil {log.Fatal("failed to dial leader:",err)}conn.SetReadDeadline(time.Now().Add(10*time.Second))batch:=conn.ReadBatch(10e3,1e6)// fetch 10KB min, 1MB maxb:=make([]byte,10e3)// 10KB max per messagefor {n,err:=batch.Read(b)iferr!=nil {break }fmt.Println(string(b[:n]))}iferr:=batch.Close();err!=nil {log.Fatal("failed to close batch:",err)}iferr:=conn.Close();err!=nil {log.Fatal("failed to close connection:",err)}
By default kafka has theauto.create.topics.enable='true'
(KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE='true'
in the bitnami/kafka kafka docker image). If this value is set to'true'
then topics will be created as a side effect ofkafka.DialLeader
like so:
// to create topics when auto.create.topics.enable='true'conn,err:=kafka.DialLeader(context.Background(),"tcp","localhost:9092","my-topic",0)iferr!=nil {panic(err.Error())}
Ifauto.create.topics.enable='false'
then you will need to create topics explicitly like so:
// to create topics when auto.create.topics.enable='false'topic:="my-topic"conn,err:=kafka.Dial("tcp","localhost:9092")iferr!=nil {panic(err.Error())}deferconn.Close()controller,err:=conn.Controller()iferr!=nil {panic(err.Error())}varcontrollerConn*kafka.ConncontrollerConn,err=kafka.Dial("tcp",net.JoinHostPort(controller.Host,strconv.Itoa(controller.Port)))iferr!=nil {panic(err.Error())}defercontrollerConn.Close()topicConfigs:= []kafka.TopicConfig{ {Topic:topic,NumPartitions:1,ReplicationFactor:1, },}err=controllerConn.CreateTopics(topicConfigs...)iferr!=nil {panic(err.Error())}
// to connect to the kafka leader via an existing non-leader connection rather than using DialLeaderconn,err:=kafka.Dial("tcp","localhost:9092")iferr!=nil {panic(err.Error())}deferconn.Close()controller,err:=conn.Controller()iferr!=nil {panic(err.Error())}varconnLeader*kafka.ConnconnLeader,err=kafka.Dial("tcp",net.JoinHostPort(controller.Host,strconv.Itoa(controller.Port)))iferr!=nil {panic(err.Error())}deferconnLeader.Close()
conn,err:=kafka.Dial("tcp","localhost:9092")iferr!=nil {panic(err.Error())}deferconn.Close()partitions,err:=conn.ReadPartitions()iferr!=nil {panic(err.Error())}m:=map[string]struct{}{}for_,p:=rangepartitions {m[p.Topic]=struct{}{}}fork:=rangem {fmt.Println(k)}
Because it is low level, theConn
type turns out to be a great building blockfor higher level abstractions, like theReader
for example.
AReader
is another concept exposed by thekafka-go
package, which intendsto make it simpler to implement the typical use case of consuming from a singletopic-partition pair.AReader
also automatically handles reconnections and offset management, andexposes an API that supports asynchronous cancellations and timeouts using Gocontexts.
Note that it is important to callClose()
on aReader
when a process exits.The kafka server needs a graceful disconnect to stop it from continuing toattempt to send messages to the connected clients. The given example will notcallClose()
if the process is terminated with SIGINT (ctrl-c at the shell) orSIGTERM (as docker stop or a kubernetes restart does). This can result in adelay when a new reader on the same topic connects (e.g. new process startedor new container running). Use asignal.Notify
handler to close the reader onprocess shutdown.
// make a new reader that consumes from topic-A, partition 0, at offset 42r:=kafka.NewReader(kafka.ReaderConfig{Brokers: []string{"localhost:9092","localhost:9093","localhost:9094"},Topic:"topic-A",Partition:0,MaxBytes:10e6,// 10MB})r.SetOffset(42)for {m,err:=r.ReadMessage(context.Background())iferr!=nil {break }fmt.Printf("message at offset %d: %s = %s\n",m.Offset,string(m.Key),string(m.Value))}iferr:=r.Close();err!=nil {log.Fatal("failed to close reader:",err)}
kafka-go
also supports Kafka consumer groups including broker managed offsets.To enable consumer groups, simply specify the GroupID in the ReaderConfig.
ReadMessage automatically commits offsets when using consumer groups.
// make a new reader that consumes from topic-Ar:=kafka.NewReader(kafka.ReaderConfig{Brokers: []string{"localhost:9092","localhost:9093","localhost:9094"},GroupID:"consumer-group-id",Topic:"topic-A",MaxBytes:10e6,// 10MB})for {m,err:=r.ReadMessage(context.Background())iferr!=nil {break }fmt.Printf("message at topic/partition/offset %v/%v/%v: %s = %s\n",m.Topic,m.Partition,m.Offset,string(m.Key),string(m.Value))}iferr:=r.Close();err!=nil {log.Fatal("failed to close reader:",err)}
There are a number of limitations when using consumer groups:
(*Reader).SetOffset
will return an error when GroupID is set(*Reader).Offset
will always return-1
when GroupID is set(*Reader).Lag
will always return-1
when GroupID is set(*Reader).ReadLag
will return an error when GroupID is set(*Reader).Stats
will return a partition of-1
when GroupID is set
kafka-go
also supports explicit commits. Instead of callingReadMessage
,callFetchMessage
followed byCommitMessages
.
ctx:=context.Background()for {m,err:=r.FetchMessage(ctx)iferr!=nil {break }fmt.Printf("message at topic/partition/offset %v/%v/%v: %s = %s\n",m.Topic,m.Partition,m.Offset,string(m.Key),string(m.Value))iferr:=r.CommitMessages(ctx,m);err!=nil {log.Fatal("failed to commit messages:",err) }}
When committing messages in consumer groups, the message with the highest offsetfor a given topic/partition determines the value of the committed offset forthat partition. For example, if messages at offset 1, 2, and 3 of a singlepartition were retrieved by call toFetchMessage
, callingCommitMessages
with message offset 3 will also result in committing the messages at offsets 1and 2 for that partition.
By default, CommitMessages will synchronously commit offsets to Kafka. Forimproved performance, you can instead periodically commit offsets to Kafkaby setting CommitInterval on the ReaderConfig.
// make a new reader that consumes from topic-Ar:=kafka.NewReader(kafka.ReaderConfig{Brokers: []string{"localhost:9092","localhost:9093","localhost:9094"},GroupID:"consumer-group-id",Topic:"topic-A",MaxBytes:10e6,// 10MBCommitInterval:time.Second,// flushes commits to Kafka every second})
To produce messages to Kafka, a program may use the low-levelConn
API, butthe package also provides a higher levelWriter
type which is more appropriateto use in most cases as it provides additional features:
- Automatic retries and reconnections on errors.
- Configurable distribution of messages across available partitions.
- Synchronous or asynchronous writes of messages to Kafka.
- Asynchronous cancellation using contexts.
- Flushing of pending messages on close to support graceful shutdowns.
- Creation of a missing topic before publishing a message.Note! it was the default behaviour up to the version
v0.4.30
.
// make a writer that produces to topic-A, using the least-bytes distributionw:=&kafka.Writer{Addr:kafka.TCP("localhost:9092","localhost:9093","localhost:9094"),Topic:"topic-A",Balancer:&kafka.LeastBytes{},}err:=w.WriteMessages(context.Background(),kafka.Message{Key: []byte("Key-A"),Value: []byte("Hello World!"),},kafka.Message{Key: []byte("Key-B"),Value: []byte("One!"),},kafka.Message{Key: []byte("Key-C"),Value: []byte("Two!"),},)iferr!=nil {log.Fatal("failed to write messages:",err)}iferr:=w.Close();err!=nil {log.Fatal("failed to close writer:",err)}
// Make a writer that publishes messages to topic-A.// The topic will be created if it is missing.w:=&Writer{Addr:kafka.TCP("localhost:9092","localhost:9093","localhost:9094"),Topic:"topic-A",AllowAutoTopicCreation:true,}messages:= []kafka.Message{ {Key: []byte("Key-A"),Value: []byte("Hello World!"), }, {Key: []byte("Key-B"),Value: []byte("One!"), }, {Key: []byte("Key-C"),Value: []byte("Two!"), },}varerrerrorconstretries=3fori:=0;i<retries;i++ {ctx,cancel:=context.WithTimeout(context.Background(),10*time.Second)defercancel()// attempt to create topic prior to publishing the messageerr=w.WriteMessages(ctx,messages...)iferrors.Is(err,kafka.LeaderNotAvailable)||errors.Is(err,context.DeadlineExceeded) {time.Sleep(time.Millisecond*250)continue }iferr!=nil {log.Fatalf("unexpected error %v",err) }break}iferr:=w.Close();err!=nil {log.Fatal("failed to close writer:",err)}
Normally, theWriterConfig.Topic
is used to initialize a single-topic writer.By excluding that particular configuration, you are given the ability to definethe topic on a per-message basis by settingMessage.Topic
.
w:=&kafka.Writer{Addr:kafka.TCP("localhost:9092","localhost:9093","localhost:9094"),// NOTE: When Topic is not defined here, each Message must define it instead.Balancer:&kafka.LeastBytes{},}err:=w.WriteMessages(context.Background(),// NOTE: Each Message has Topic defined, otherwise an error is returned.kafka.Message{Topic:"topic-A",Key: []byte("Key-A"),Value: []byte("Hello World!"),},kafka.Message{Topic:"topic-B",Key: []byte("Key-B"),Value: []byte("One!"),},kafka.Message{Topic:"topic-C",Key: []byte("Key-C"),Value: []byte("Two!"),},)iferr!=nil {log.Fatal("failed to write messages:",err)}iferr:=w.Close();err!=nil {log.Fatal("failed to close writer:",err)}
NOTE: These 2 patterns are mutually exclusive, if you setWriter.Topic
,you must not also explicitly defineMessage.Topic
on the messages you arewriting. The opposite applies when you do not define a topic for the writer.TheWriter
will return an error if it detects this ambiguity.
If you're switching from Sarama and need/want to use the same algorithm for message partitioning, you can either usethekafka.Hash
balancer or thekafka.ReferenceHash
balancer:
kafka.Hash
=sarama.NewHashPartitioner
kafka.ReferenceHash
=sarama.NewReferenceHashPartitioner
Thekafka.Hash
andkafka.ReferenceHash
balancers would route messages to the same partitions that the twoaforementioned Sarama partitioners would route them to.
w:=&kafka.Writer{Addr:kafka.TCP("localhost:9092","localhost:9093","localhost:9094"),Topic:"topic-A",Balancer:&kafka.Hash{},}
Use thekafka.CRC32Balancer
balancer to get the same behaviour as librdkafka'sdefaultconsistent_random
partition strategy.
w:=&kafka.Writer{Addr:kafka.TCP("localhost:9092","localhost:9093","localhost:9094"),Topic:"topic-A",Balancer: kafka.CRC32Balancer{},}
Use thekafka.Murmur2Balancer
balancer to get the same behaviour as the canonicalJava client's default partitioner. Note: the Java class allows you to directly specifythe partition which is not permitted.
w:=&kafka.Writer{Addr:kafka.TCP("localhost:9092","localhost:9093","localhost:9094"),Topic:"topic-A",Balancer: kafka.Murmur2Balancer{},}
Compression can be enabled on theWriter
by setting theCompression
field:
w:=&kafka.Writer{Addr:kafka.TCP("localhost:9092","localhost:9093","localhost:9094"),Topic:"topic-A",Compression:kafka.Snappy,}
TheReader
will by determine if the consumed messages are compressed byexamining the message attributes. However, the package(s) for all expectedcodecs must be imported so that they get loaded correctly.
Note: in versions prior to 0.4 programs had to import compression packages toinstall codecs and support reading compressed messages from kafka. This is nolonger the case and import of the compression packages are now no-ops.
For a bare bones Conn type or in the Reader/Writer configs you can specify a dialer option for TLS support. If the TLS field is nil, it will not connect with TLS.Note: Connecting to a Kafka cluster with TLS enabled without configuring TLS on the Conn/Reader/Writer can manifest in opaque io.ErrUnexpectedEOF errors.
dialer:=&kafka.Dialer{Timeout:10*time.Second,DualStack:true,TLS:&tls.Config{...tlsconfig...},}conn,err:=dialer.DialContext(ctx,"tcp","localhost:9093")
dialer:=&kafka.Dialer{Timeout:10*time.Second,DualStack:true,TLS:&tls.Config{...tlsconfig...},}r:=kafka.NewReader(kafka.ReaderConfig{Brokers: []string{"localhost:9092","localhost:9093","localhost:9094"},GroupID:"consumer-group-id",Topic:"topic-A",Dialer:dialer,})
Direct Writer creation
w:= kafka.Writer{Addr:kafka.TCP("localhost:9092","localhost:9093","localhost:9094"),Topic:"topic-A",Balancer:&kafka.Hash{},Transport:&kafka.Transport{TLS:&tls.Config{}, }, }
Usingkafka.NewWriter
dialer:=&kafka.Dialer{Timeout:10*time.Second,DualStack:true,TLS:&tls.Config{...tlsconfig...},}w:=kafka.NewWriter(kafka.WriterConfig{Brokers: []string{"localhost:9092","localhost:9093","localhost:9094"},Topic:"topic-A",Balancer:&kafka.Hash{},Dialer:dialer,})
Note thatkafka.NewWriter
andkafka.WriterConfig
are deprecated and will be removed in a future release.
You can specify an option on theDialer
to use SASL authentication. TheDialer
can be used directly to open aConn
or it can be passed to aReader
orWriter
via their respective configs. If theSASLMechanism
field isnil
, it will not authenticate with SASL.
mechanism:= plain.Mechanism{Username:"username",Password:"password",}
mechanism,err:=scram.Mechanism(scram.SHA512,"username","password")iferr!=nil {panic(err)}
mechanism,err:=scram.Mechanism(scram.SHA512,"username","password")iferr!=nil {panic(err)}dialer:=&kafka.Dialer{Timeout:10*time.Second,DualStack:true,SASLMechanism:mechanism,}conn,err:=dialer.DialContext(ctx,"tcp","localhost:9093")
mechanism,err:=scram.Mechanism(scram.SHA512,"username","password")iferr!=nil {panic(err)}dialer:=&kafka.Dialer{Timeout:10*time.Second,DualStack:true,SASLMechanism:mechanism,}r:=kafka.NewReader(kafka.ReaderConfig{Brokers: []string{"localhost:9092","localhost:9093","localhost:9094"},GroupID:"consumer-group-id",Topic:"topic-A",Dialer:dialer,})
mechanism,err:=scram.Mechanism(scram.SHA512,"username","password")iferr!=nil {panic(err)}// Transports are responsible for managing connection pools and other resources,// it's generally best to create a few of these and share them across your// application.sharedTransport:=&kafka.Transport{SASL:mechanism,}w:= kafka.Writer{Addr:kafka.TCP("localhost:9092","localhost:9093","localhost:9094"),Topic:"topic-A",Balancer:&kafka.Hash{},Transport:sharedTransport,}
mechanism,err:=scram.Mechanism(scram.SHA512,"username","password")iferr!=nil {panic(err)}// Transports are responsible for managing connection pools and other resources,// it's generally best to create a few of these and share them across your// application.sharedTransport:=&kafka.Transport{SASL:mechanism,}client:=&kafka.Client{Addr:kafka.TCP("localhost:9092","localhost:9093","localhost:9094"),Timeout:10*time.Second,Transport:sharedTransport,}
startTime:=time.Now().Add(-time.Hour)endTime:=time.Now()batchSize:=int(10e6)// 10MBr:=kafka.NewReader(kafka.ReaderConfig{Brokers: []string{"localhost:9092","localhost:9093","localhost:9094"},Topic:"my-topic1",Partition:0,MaxBytes:batchSize,})r.SetOffsetAt(context.Background(),startTime)for {m,err:=r.ReadMessage(context.Background())iferr!=nil {break }ifm.Time.After(endTime) {break }// TODO: process messagefmt.Printf("message at offset %d: %s = %s\n",m.Offset,string(m.Key),string(m.Value))}iferr:=r.Close();err!=nil {log.Fatal("failed to close reader:",err)}
For visiblity into the operations of the Reader/Writer types, configure a logger on creation.
funclogf(msgstring,a...interface{}) {fmt.Printf(msg,a...)fmt.Println()}r:=kafka.NewReader(kafka.ReaderConfig{Brokers: []string{"localhost:9092","localhost:9093","localhost:9094"},Topic:"my-topic1",Partition:0,Logger:kafka.LoggerFunc(logf),ErrorLogger:kafka.LoggerFunc(logf),})
funclogf(msgstring,a...interface{}) {fmt.Printf(msg,a...)fmt.Println()}w:=&kafka.Writer{Addr:kafka.TCP("localhost:9092"),Topic:"topic",Logger:kafka.LoggerFunc(logf),ErrorLogger:kafka.LoggerFunc(logf),}
Subtle behavior changes in later Kafka versions have caused some historical tests to break, if you are running against Kafka 2.3.1 or later, exporting theKAFKA_SKIP_NETTEST=1
environment variables will skip those tests.
Run Kafka locally in docker
docker-compose up -d
Run tests
KAFKA_VERSION=2.3.1 \ KAFKA_SKIP_NETTEST=1 \ gotest -race ./...
(or) to clean up the cached test results and run tests:
go clean -cache && make test
About
Kafka library in Go
Topics
Resources
License
Code of conduct
Uh oh!
There was an error while loading.Please reload this page.