etl-pipeline
Here are 2,659 public repositories matching this topic...
Language:All
Sort:Most stars
Streaming data platform. Real-time stream processing, low-latency serving, and Iceberg table management.
- Updated
Dec 18, 2025 - Rust
No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
- Updated
Dec 17, 2025 - Python
Make stream processing easier! Easy-to-use streaming application development framework and operation platform.
- Updated
Nov 5, 2025 - Java
Build data pipelines, the easy way 🛠️
- Updated
Jun 6, 2023 - TypeScript
Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
- Updated
Dec 6, 2025 - Jupyter Notebook
Implementing best practices for PySpark ETL jobs and applications.
- Updated
Jan 1, 2023 - Python
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
- Updated
Aug 26, 2022 - Python
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
- Updated
Mar 9, 2020 - Python
Enterprise-grade and API-first LLM workspace for unstructured documents, including data extraction, redaction, rights management, prompt playground, and more!
- Updated
Dec 17, 2025 - Python
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TOwww.github.com/dagworks-inc/hamilton
- Updated
Jul 3, 2023 - Python
A Clojure high performance data processing system
- Updated
Dec 17, 2025 - Clojure
A blazingly fast general purpose blockchain analytics engine specialized in systematic mev detection
- Updated
Jul 28, 2025 - Rust
Integrate LLM in any pipeline - fit/predict pattern, JSON driven flows, and built in concurency support.
- Updated
Mar 10, 2025 - Python
A simplified, lightweight ETL Framework based on Apache Spark
- Updated
Jan 24, 2024 - Scala
A comprehensive guide to building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.
- Updated
Apr 23, 2025 - TSQL
The Supabase of AI era. A modular, open-source backend for building AI-native software — designed for knowledge, not static data.
- Updated
Jun 5, 2025 - TypeScript
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.
- Updated
Feb 14, 2025 - Python
fluent concurrent sync/async streams
- Updated
Dec 17, 2025 - Python
Service for bulk-loading data to databases with automatic schema management (Redshift, Snowflake, BigQuery, ClickHouse, Postgres, MySQL)
- Updated
Nov 14, 2025 - Go
This is a template you can use for your next data engineering portfolio project.
- Updated
Sep 10, 2021
Improve this page
Add a description, image, and links to theetl-pipeline topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with theetl-pipeline topic, visit your repo's landing page and select "manage topics."