data-engineering-pipeline
Here are 225 public repositories matching this topic...
Language:All
Sort:Most stars
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
- Updated
Aug 26, 2022 - Python
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
- Updated
Mar 9, 2020 - Python
One framework to develop, deploy and operate data workflows with Python and SQL.
- Updated
Nov 27, 2025 - Python
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
- Updated
Jun 16, 2020 - Python
Data Engineering Project with Hadoop HDFS and Kafka
- Updated
Nov 4, 2023 - Python
Project demonstrating how to automate Prefect 2.0 deployments to AWS ECS Fargate
- Updated
Jul 21, 2023 - Python
Code examples showing flow deployment to various types of infrastructure
- Updated
Jan 11, 2023 - Python
Classwork projects and home works done through Udacity data engineering nano degree
- Updated
Dec 12, 2023 - Jupyter Notebook
Let your pipe lines flow thru the Python code in xonsh.
- Updated
Jun 7, 2024 - Python
Agentic Data Integrator that helps you build production-ready data pipelines so you can connect to more systems, faster. You run it in your terminal as a workflow wizard.
- Updated
Oct 1, 2025 - Python
Deploy a Prefect flow to serverless AWS Lambda function
- Updated
Sep 27, 2022 - Python
Apache Spark Guide
- Updated
Feb 1, 2022 - Python
Distributed Data Processing Pipeline for MCP.
- Updated
Sep 7, 2025 - C#
A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apache Kafka and stored in a local Cassandra database.
- Updated
Jun 7, 2023 - Python
A fully serverless, event-driven data pipeline that ingests, enriches, validates, and visualizes real-time news data using AWS services. Designed for cost-efficient, scalable deployment using only free-tier AWS services.
- Updated
Aug 10, 2025 - Python
A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker. Data from kaggle and youtube-api
- Updated
Nov 19, 2024 - Jupyter Notebook
Analysis of 311 Service Requests for the City of NYC (from 2010 to 2023) Tech: Prefect cloud, dbt core, BigQuery, Compute Engine, CloudRun, Artifact Registry, Terraform, Docker
- Updated
Apr 5, 2023 - Python
End-to-end data engineering pipeline with various technologies to ingest real time data.
- Updated
Nov 3, 2023 - Python
Reusable data engineering toolkit My personal data infrastructure
- Updated
Oct 29, 2025 - Jupyter Notebook
Improve this page
Add a description, image, and links to thedata-engineering-pipeline topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thedata-engineering-pipeline topic, visit your repo's landing page and select "manage topics."