fillmore-labs/kafka-sensorsPublic

NotificationsYou must be signed in to change notification settings
Fork2
Star4

Kafka Serialization Playground

License

Apache-2.0 license

4 stars 2 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 940 Commits
.buildkite		.buildkite
.github		.github
conf		conf
scripts		scripts
src		src
testdata		testdata
third_party		third_party
toolchain		toolchain
.bazelignore		.bazelignore
.bazelrc		.bazelrc
.bazelversion		.bazelversion
.codeclimate.yml		.codeclimate.yml
.gitattributes		.gitattributes
.gitignore		.gitignore
.scalafmt.conf		.scalafmt.conf
.yamlfmt		.yamlfmt
.yamllint		.yamllint
BUILD		BUILD
CHANGELOG.adoc		CHANGELOG.adoc
LICENSE		LICENSE
README.adoc		README.adoc
WORKSPACE		WORKSPACE
buf.work.yaml		buf.work.yaml
maven_install.json		maven_install.json

Repository files navigation

Kafka Serialization Playground

Table of Contents

Purpose

This source demonstrates how to process a stream of sensor data usingKafka Streams.

The sensors produce a stream of records, including sensor ID, a timestamp and the current state (onor off). The desired result is a stream of records enriched with the duration the sensor has been inthis state.

Example

For example, a stream

Table 1. Sensor Data

Name	Timestamp	State
Sensor 1	1984-01-22T15:45:00Z	off
Sensor 1	1984-01-22T15:45:10Z	off
Sensor 1	1984-01-22T15:45:30Z	on
Sensor 1	1984-01-22T15:46:30Z	off

should produce

Table 2. Enriched Data

Name	Timestamp	State	Duration
Sensor 1	1984-01-22T15:45:00Z	off	10s
Sensor 1	1984-01-22T15:45:00Z	off	30s
Sensor 1	1984-01-22T15:45:30Z	on	60s

Which tells us that “Sensor 1” was “off” from 15:45:00 for 30 seconds and “on” from 15:45:30 for 60seconds.

Note that the second “off” reading produced an intermediate result.

Design decisions

Duplicate readings of the same state generate intermediate results, and delayed readings (timestampspreceding previously seen values) are treated as errors.

These are deliberate choices and can easily be changed.

Implementation of Business Logic

Care has been taken to keep the business logic independent of implementation details likeserialization formats.

The data model is in themodel directory, thebusiness logic inlogic.

Thetests test the topology with ninedifferent formats,Protocol Buffers,JSON,Apache Avro, theConfluent variants of these three,XML,Apache Thrift andAmazon Ion. Different, random combinations of input, result, andstate store formats are tested.

While this abstraction might not be necessary in practice, it demonstrates two important designconsiderations:

The business logic should only depend on a data model, not capabilities of the serializationmechanism.

We can simply useDuration::between,which is a simple call and easy to understand and test, instead of cluttering our logic withconversions and unnecessary error-prone calculations.

The choice of (de-)serializers should depend on the requirements, not on what is just at hand.

While internal processing pipelines tend (but don’t have) to use one serialization mechanism, it isperfectly valid and a good design decision to use different mechanisms for parts interfacing withexternal components.

Since the business logic is independent of the serialization mechanism, changing it is simple andusually does not require retesting.

By refactoring the business logic to depend only on an abstract store, we speed up testing by afactor of seven(bazel test //src/test/java/com/fillmore_labs/kafka/sensors/logic:all vs.bazel test //src/test/java/com/fillmore_labs/kafka/sensors/topology:all), which demonstrates a potentialfor improvement in development speed and testability.

Running

Prerequisites

You needBazelisk installed, see alsoInstalling Bazel using Bazelisk.

macOS

UsingHomeBrew enter

brew install bazelisk

Windows

UsingChocolatey enter

choco install bazelisk

Enable developer mode:

Open Windows settings
Go to “Update & security”, then “For developers”
Under “Developer Mode” section enable “Install apps from any source, including loose files”.

or run with administrator privileges.

Tests

To run all tests, use

bazeltest //src/test/...

To run a single test, use

bazeltest //src/test/java/com/fillmore_labs/kafka/sensors/topology:all

The tests run with an embedded Kafka and mock schema registry, when necessary.

Main App

The main app needs Kafka running atlocalhost, port 9092 (seeapplication.yaml). There is a script doing that:

scripts/kafka-server.sh

When Kafka has finished starting, create the topics in a different terminal:

scripts/kafka-topics.sh

Now start the main app:

bazel run //:kafka-sensors

Open another terminal to watch the results:

scripts/kafka-consume.sh

Publish sensor values:

scripts/kafka-produce.sh

Benchmark

Run theJMH microbenchmarks with

bazel run //:benchmark

Compare deserialization of two formats:

bazel run //:benchmark -- -p"format=proto,thrift""Bench\\.deserialize"

Generate a flame graph for detailed analysis:

bazel run //:benchmark -- -p"format=proto""Bench\\.deserialize" \  -prof"async:output=flamegraph;direction=forward"open"$(bazel info bazel-bin)/src/main/java/com/fillmore_labs/kafka/sensors/benchmark/benchmark.runfiles/com_fillmore_labs_kafka_sensors/com.fillmore_labs.kafka.sensors.benchmark.Bench.deserialize-AverageTime-format-proto/flame-cpu-forward.html"

Run the latest image on your Kubernetes cluster:

kubectl run serialization-benchmark --image=fillmorelabs/serialization-benchmark \  --attach --rm --restart=Never -- -p"format=proto,json,json-iso""Bench\\.serialize"

Notes

Mapping

As noted inImplementation of Business Logic the business login is independent of theserialization, in the spirit of hexagonal architecture. This of course requires some mapping,where we mostly useMapStruct for. This necessitates some limitations indata model naming conventions. MapStruct uses a fixed und quite inflexible accessor naming strategy,so you can’t really decide that Protocol Buffers should have one convention but Immutables another.Especially for Immutables we are forced to use JavaBeans-style naming convention, although this isnot a JEE application.

About

Kafka Serialization Playground

Contributors3

Languages

Java97.9%
Other2.1%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Kafka Serialization Playground

Purpose

Example

Design decisions

Implementation of Business Logic

Running

Prerequisites

macOS

Windows

Tests

Main App

Benchmark

Notes

Mapping

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors3

Uh oh!

Languages

Movatterモバイル変換

License

fillmore-labs/kafka-sensors

Folders and files

Latest commit

History

Repository files navigation

Kafka Serialization Playground

Purpose

Example

Design decisions

Implementation of Business Logic

Running

Prerequisites

macOS

Windows

Tests

Main App

Benchmark

Notes

Mapping

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors3

Uh oh!

Languages