data-pipelines
Here are 307 public repositories matching this topic...
Language:All
Sort:Most stars
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
- Updated
Oct 7, 2025 - Python
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
- Updated
Oct 7, 2025 - Python
An orchestration platform for the development, production, and observation of data assets.
- Updated
Oct 7, 2025 - Python
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
- Updated
Sep 30, 2025 - Java
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
- Updated
Sep 26, 2025 - HTML
🧙 Build, run, and manage data pipelines for integrating and transforming data.
- Updated
Oct 7, 2025 - Python
🦀 event stream processing for developers to collect and transform data in motion to power responsive data intensive applications.
- Updated
Oct 1, 2025 - Rust
Preswald is a WASM packager for Python-based interactive data apps: bundle full complex data workflows, particularly visualizations, into single files, runnable completely in-browser, using Pyodide, DuckDB, Pandas, and Plotly, Matplotlib, etc. Build dashboards, reports, and notebooks that run offline, load fast, and share like a document.
- Updated
Jul 17, 2025 - Python
Build data pipelines, the easy way 🛠️
- Updated
Jun 6, 2023 - TypeScript
Maestro: Netflix’s Workflow Orchestrator
- Updated
Oct 4, 2025 - Java
A system for agentic LLM-powered data processing and ETL
- Updated
Oct 7, 2025 - Python
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
- Updated
Oct 8, 2025 - Python
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
- Updated
Oct 7, 2025 - HTML
The best place to learn data engineering. Built and maintained by the data engineering community.
- Updated
Sep 30, 2025 - CSS
The Feldera Incremental Computation Engine
- Updated
Oct 8, 2025 - Rust
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
- Updated
Dec 21, 2024 - Rust
MLeap: Deploy ML Pipelines to Production
- Updated
Nov 27, 2024 - Scala
Concurrent Python made simple
- Updated
Feb 4, 2025 - Python
Easy Data Preparation with latest LLMs-based Operators and Pipelines.
- Updated
Sep 30, 2025 - Python
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
- Updated
Feb 19, 2025 - Java
Improve this page
Add a description, image, and links to thedata-pipelines topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thedata-pipelines topic, visit your repo's landing page and select "manage topics."