data-pipeline
Here are 1,722 public repositories matching this topic...
Language:All
Sort:Most stars
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
- Updated
Feb 20, 2026 - Python
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
- Updated
Feb 20, 2026 - Java
Change data capture for a variety of databases. Please log issues athttps://github.com/debezium/dbz/issues.
- Updated
Feb 20, 2026 - Java
The leader in Customer Data Infrastructure
- Updated
Feb 18, 2026 - Scala
Flink CDC is a streaming data integration tool
- Updated
Feb 13, 2026 - Java
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
- Updated
Feb 20, 2026 - Python
Privacy and Security focused Segment-alternative, in Golang and React
- Updated
Feb 20, 2026 - Go
A list of useful resources to learn Data Engineering from scratch
- Updated
Jun 19, 2024
Memphis.dev is a highly scalable and effortless data streaming platform
- Updated
May 27, 2024 - Go
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
- Updated
Feb 20, 2026 - Python
A self-contained, lightweight workflow engine with a built-in Web UI. Define workflows in a simple, declarative YAML format. Execute them anywhere, compose complex pipelines, and distribute tasks. Zero dependencies: runs entirely on the file system and OS without an external database.
- Updated
Feb 19, 2026 - Go
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
- Updated
Jan 10, 2025 - Jupyter Notebook
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
- Updated
Feb 19, 2026 - HTML
A lightweight stream processing library for Go
- Updated
Jan 14, 2026 - Go
CLI task management & automation tool
- Updated
Feb 12, 2026 - Python
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
- Updated
Jan 1, 2024 - Java
🔥🔥🔥 Open source Reverse ETL - alternative to hightouch and census.
- Updated
Feb 20, 2026 - Ruby
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
- Updated
Dec 10, 2025 - Jupyter Notebook
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
- Updated
Oct 24, 2025 - Jupyter Notebook
Example end to end data engineering project.
- Updated
Dec 8, 2022 - Python
Improve this page
Add a description, image, and links to thedata-pipeline topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thedata-pipeline topic, visit your repo's landing page and select "manage topics."