data-pipeline
Here are 1,375 public repositories matching this topic...
Language:All
Sort:Most stars
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
- Updated
Dec 17, 2025 - Java
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
- Updated
Dec 18, 2025 - Python
Change data capture for a variety of databases. Please log issues athttps://github.com/debezium/dbz/issues.
- Updated
Dec 17, 2025 - Java
The leader in Customer Data Infrastructure
- Updated
Jun 4, 2025 - Scala
Flink CDC is a streaming data integration tool
- Updated
Dec 16, 2025 - Java
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
- Updated
Dec 17, 2025 - Python
Privacy and Security focused Segment-alternative, in Golang and React
- Updated
Dec 17, 2025 - Go
A list of useful resources to learn Data Engineering from scratch
- Updated
Jun 19, 2024
Memphis.dev is a highly scalable and effortless data streaming platform
- Updated
May 27, 2024 - Go
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
- Updated
Dec 17, 2025 - Python
Lightweight, local-first workflow engine for enterprise and small teams. 100% open source. No vendor lock-in. Offline or air-gapped environment ready.
- Updated
Dec 17, 2025 - Go
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
- Updated
Jan 10, 2025 - Jupyter Notebook
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
- Updated
Dec 16, 2025 - HTML
A lightweight stream processing library for Go
- Updated
Nov 20, 2025 - Go
CLI task management & automation tool
- Updated
Jul 4, 2024 - Python
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
- Updated
Jan 1, 2024 - Java
🔥🔥🔥 Open source Reverse ETL - alternative to hightouch and census.
- Updated
Dec 17, 2025 - Ruby
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
- Updated
Dec 10, 2025 - Jupyter Notebook
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
- Updated
Oct 24, 2025 - Jupyter Notebook
Example end to end data engineering project.
- Updated
Dec 8, 2022 - Python
Improve this page
Add a description, image, and links to thedata-pipeline topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thedata-pipeline topic, visit your repo's landing page and select "manage topics."