bytewax/bytewaxPublic

NotificationsYou must be signed in to change notification settings
Fork85
Star1.8k

Python Stream Processing

License

Apache-2.0 license

1.8k stars 85 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 2,575 Commits
.github		.github
docs		docs
examples		examples
pysrc/bytewax		pysrc/bytewax
pytests		pytests
requirements		requirements
src		src
.cbfmt.toml		.cbfmt.toml
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
COPYRIGHT		COPYRIGHT
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
DEVELOPER_CERTIFICATE_OF_ORIGIN.md		DEVELOPER_CERTIFICATE_OF_ORIGIN.md
Dockerfile		Dockerfile
Dockerfile.release		Dockerfile.release
LICENSE		LICENSE
README.md		README.md
entrypoint-recovery.sh		entrypoint-recovery.sh
entrypoint.sh		entrypoint.sh
justfile		justfile
pyproject.toml		pyproject.toml
rust-toolchain.toml		rust-toolchain.toml
setup.cfg		setup.cfg
stubgen.py		stubgen.py
vermin-dev.ini		vermin-dev.ini
vermin-lib.ini		vermin-lib.ini

Repository files navigation

Bytewax: Python Stateful Stream Processing Framework

Bytewax is a Python framework and Rust-based distributed processing engine for stateful event and stream processing. Inspired by capabilities found in tools like Apache Flink, Spark, and Kafka Streams, Bytewax makes stream processing simpler and more accessible by integrating directly with the Python ecosystem you already know and trust.

Key Features:

Python-first: Leverage your existing Python libraries, frameworks, and tooling.
Stateful Stream Processing: Maintain and recover state automatically, enabling advanced online machine learning and complex event-driven applications.
Scalable & Distributed: Easily scale from local development to multi-node, multi-worker deployments on Kubernetes or other infrastructures.
Rich Connector Ecosystem: Ingest data from sources like Kafka, filesystems, or WebSockets, and output to data lakes, key-value stores, or other systems.
Flexible Dataflow API: Compose pipelines using operators (e.g.,map,filter,join,fold_window) to express complex logic.

Quick Start

Install Bytewax from PyPI:

pip install bytewax

Installwaxctl to manage deployments at scale.

Minimal Example:

frombytewax.dataflowimportDataflowfrombytewaximportoperatorsasopfrombytewax.testingimportTestingSourceflow=Dataflow("quickstart")# Input: Local test source for demonstrationinp=op.input("inp",flow,TestingSource([1,2,3,4,5]))# Transform: Filter even numbers and multiply by 10filtered=op.filter("keep_even",inp,lambdax:x%2==0)results=op.map("multiply_by_10",filtered,lambdax:x*10)# Output: Print results to stdoutop.inspect("print_results",results)

Run it locally:

python -m bytewax.run quickstart.py

How Bytewax Works

Bytewax uses adataflow computational model, similar to systems like Flink or Spark, but with a Pythonic interface. You define a dataflow graph of operators and connectors:

Input: Data sources (Kafka, file systems, S3, WebSockets, custom connectors)
Operators: Stateful transformations (map, filter, fold_window, join) defined in Python.
Output: Data sinks (databases, storage systems, message queues).

Stateful operations: Bytewax maintains distributed state, allows for fault tolerance and state recovery, and supports event-time windowing for advanced analytics and machine learning workloads.

waxctl: Bytewax’s CLI tool for deploying and managing dataflows on cloud servers or Kubernetes clusters.Downloadwaxctl here.

Operators Overview

Operators are the building blocks of Bytewax dataflows:

Stateless Operators:map,filter,inspect
Stateful Operators:reduce,fold_window,stateful_map
Windowing & Aggregations: Event-time, processing-time windows, tumbling, sliding, and session windows.
Joins & Merges: Combine multiple input streams withmerge,join, or advanced join patterns.
Premium Operators:

For a comprehensive list, see theOperators API Documentation.

Connectors

Bytewax provides built-in connectors for common data sources and sinks such as Kafka, files, and stdout. You can also write your owncustom connectors.

Examples of Built-in Connectors:

Kafka:bytewax.connectors.kafka
StdIn/StdOut:bytewax.connectors.stdio
Redis, S3, and More: SeeBytewax connectors.

Community & Partner Connectors: Check out theBytewax Module Hub for additional connectors contributed by the community.

Local Development, Testing, and Production

Local Development:

UseTestingSource andinspect operators for debugging.
Iterate quickly by running your flow withpython -m bytewax.run my_flow.py.
Develop custom connectors and sinks locally with Python tooling you already know.

Testing:

Integration tests: UseTestingSource and run flows directly in CI environments.
Unit tests: Test individual functions and operators as normal Python code.
More on Testing

Production:

Scale horizontally by running multiple workers on multiple machines.
Integrate with Kubernetes for dynamic scaling, monitoring, and resilience.
Utilizewaxctl for standardized deployments and lifecycle management.

Deployment Options

Running Locally

For experimentation and small-scale jobs:

python -m bytewax.run my_dataflow.py

Multiple workers and threads:

python -m bytewax.run my_dataflow.py -w 2

Containerized Execution

Run Bytewax inside Docker containers for easy integration with container platforms. See theBytewax Container Guide.

Scaling on Kubernetes

Usewaxctl to package and deploy Bytewax dataflows to Kubernetes clusters for production workloads:

waxctl df deploy my_dataflow.py --name my-dataflow

Learn more about Kubernetes deployment.

Scaling with the Bytewax Platform

Our commerically licensed Platform

Examples

User Guide: End-to-end tutorials and advanced topics.
/examples Folder: Additional sample dataflows and connector usage.

Community and Contributing

Join us onSlack for support and discussion.

Open issues onGitHub Issues for bug reports and feature requests. (For general help, use Slack.)

Contributions Welcome:

Check out theContribution Guide to learn how to get started.
We follow aCode of Conduct.

License

Bytewax is licensed under theApache-2.0 license.

Built with ❤️ by the Bytewax community

About

Python Stream Processing

docs.bytewax.io/

Releases30

v0.21.1 Latest

Nov 25, 2024

+ 29 releases

Movatterモバイル変換

License

bytewax/bytewax

Folders and files

Latest commit

History

Repository files navigation

Bytewax: Python Stateful Stream Processing Framework

Table of Contents

Quick Start

How Bytewax Works

Operators Overview

Connectors

Local Development, Testing, and Production

Deployment Options

Running Locally

Containerized Execution

Scaling on Kubernetes

Scaling with the Bytewax Platform

Examples

Community and Contributing

License

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases30

Packages0

Used by294

Contributors24

Languages

Packages