- Notifications
You must be signed in to change notification settings - Fork91
Python Streaming DataFrames for Kafka
License
quixio/quix-streams
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Quix Streams is an end-to-end framework for real-time Python data engineering, operational analytics and machine learning on Apache Kafka data streams. Extract, transform and load data reliably in fewer lines of code using your favourite Python libraries.
Build data pipelines and event-driven microservice architectures leveraging Kafka's low-level scalability, resiliency and durability features in a lightweight library without server-side clusters to manage.
Quix Streams provides the following features to make your life easier:
- Pure Python, meaning no wrappers around Java and no cross-language debugging.
- Streaming DataFrame API for building tabular data processing pipelines.
- Sources &Sinks API for building custom connectors that integrate data with Kafka.
- Serializers API supporting JSON, Avro, Protobuf &Schema Registry.
- Fault-tolerant stateful operations.
- Operators for common processing tasks likeWindowing,Branching andGroup By.
- Exactly-once processing guarantees via Kafka transactions.
- Streaming Joins.
Use Quix Streams to build simple Kafka producer/consumer applications or leverage stream processing to build complex event-driven systems, real-time data pipelines and AI/ML products.
# PyPIpython -m pip install quixstreams# or condaconda install -c conda-forge quixio::quixstreams
Python 3.9+, Apache Kafka 0.10+
Seerequirements.txt for the full list of requirements
Here's an example of how toprocess data from a Kafka Topic with Quix Streams:
fromquixstreamsimportApplication# A minimal application reading temperature data in Celsius from the Kafka topic,# converting it to Fahrenheit and producing alerts to another topic.# Define an application that will connect to Kafkaapp=Application(broker_address="localhost:9092",# Kafka broker address)# Define the Kafka topicstemperature_topic=app.topic("temperature-celsius",value_deserializer="json")alerts_topic=app.topic("temperature-alerts",value_serializer="json")# Create a Streaming DataFrame connected to the input Kafka topicsdf=app.dataframe(topic=temperature_topic)# Convert temperature to Fahrenheit by transforming the input message (with an anonymous or user-defined function)sdf=sdf.apply(lambdavalue: {"temperature_F": (value["temperature"]*9/5)+32})# Filter values above the thresholdsdf=sdf[sdf["temperature_F"]>150]# Produce alerts to the output topicsdf=sdf.to_topic(alerts_topic)# Run the streaming application (app automatically tracks the sdf!)app.run()
To see Quix Streams in action, check out the Quickstart and Tutorials in the docs:
- Quickstart
- Tutorial - Word Count
- Tutorial - Anomaly Detection
- Tutorial - Purchase Filtering
- Tutorial - Solar Farm Telemetry Enrichment
There are two primary objects:
StreamingDataFrame- a predefined declarative pipeline to process and transform incoming messages.Application- to manage the Kafka-related setup, teardown and message lifecycle (consuming, committing). It processes each message with the dataframe you provide for it to run.
Under the hood, theApplication will:
- Consume and deserialize messages.
- Process them with your
StreamingDataFrame. - Produce it to the output topic.
- Automatically checkpoint processed messages and state for resiliency.
- Scale using Kafka's built-in consumer groups mechanism.
You can run Quix Streams pipelines anywhere Python is installed.
Deploy to your own infrastructure or toQuix Cloud on AWS, Azure, GCP or on-premise for a fully managed platform.
You'll get self-service DevOps, CI/CD and monitoring, all built with best in class engineering practices learned from Formula 1 Racing.
Please see theConnecting to Quix Cloud pageto learn how to use Quix Streams and Quix Cloud together.
- Please useGitHub issues to report bugs and suggest new features.
- Join theQuix Community on Slack, a vibrant group of Kafka Python developers, data engineers and newcomers to Apache Kafka, who are learning and leveraging Quix Streams for real-time data processing.
- Watch and subscribe to@QuixStreams on YouTube for code-along tutorials from scratch and interesting community highlights.
- Follow us onX andLinkedIn where we share our latest tutorials, forthcoming community events and the occasional meme.
- If you have any questions or feedback - write to us atsupport@quix.io!
Quix Streams is licensed under the Apache 2.0 license.
View a copy of the License filehere.
About
Python Streaming DataFrames for Kafka
Topics
Resources
License
Code of conduct
Contributing
Uh oh!
There was an error while loading.Please reload this page.
