- Notifications
You must be signed in to change notification settings - Fork0
Example pipelines built with Mage, a data orchestration tool. Use these pipelines to learn, explore, and jumpstart your own data workflows with Mage.
License
mage-ai/mage-pipeline-examples
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
A comprehensive collection of example pipelines built withMage, the modern data orchestration platform. Use these pipelines to learn, explore, and jumpstart your own data workflows with Mage.
Mage is a modern data orchestration platform that simplifies building, running, and monitoring data pipelines. Unlike traditional orchestration tools, Mage offers:
- 🎯 Python-first approach - Build pipelines using familiar Python syntax
- 📊 Interactive development - Develop and test pipelines in a notebook-style interface
- 🔄 Real-time monitoring - Built-in observability and monitoring capabilities
- 🧩 Modular architecture - Reusable blocks for data loading, transformation, and exporting
- ☁️ Cloud-native - Easy deployment to AWS, GCP, and Azure
- 🤖 ML-focused - Specialized features for machine learning workflows
For enterprise teams and production environments,Mage Pro provides:
- 💻 Enterprise Support - Dedicated support and SLA guarantees
- 📊 Advanced Analytics - Enhanced monitoring, alerting, and performance insights
- 🔒 Security & Compliance - Enterprise-grade security features and compliance tools
- ⚡ High Performance - Optimized for large-scale data processing
- 🌐 Multi-tenant Architecture - Support for multiple teams and projects
- 🔄 Advanced Scheduling - Complex scheduling and dependency management
- 📊 Custom Dashboards - Tailored monitoring and reporting capabilities
This repository contains a comprehensive collection of pipeline examples organized by category:
- API to Database - Extract data from REST APIs and load into databases
- Multi-source Sync - Combine data from multiple APIs, databases, and files
- Database Replication - Real-time database synchronization and replication
- CSV Processing - Process and transform CSV files with data validation
- JSON ETL - Extract, transform, and load JSON data from various sources
- Combine Python and SQL - Hybrid processing using both Python and SQL operations
- Kafka Consumer - Real-time data processing from Kafka streams
- Real-time Analytics - Live analytics and metrics calculation
- Event Processing - Process and route events in real-time
- Model Training - End-to-end ML model training pipeline
- Model Inference - Deploy and serve ML models in production
- Guide to Accuracy, Precision, and Recall - Learn ML evaluation metrics
- Validation Pipeline - Automated data validation and quality checks
- Monitoring Dashboard - Real-time data quality monitoring and alerting
- Anomaly Detection - Detect and handle data anomalies automatically
- S3 to RDS - Transfer data from AWS S3 to RDS PostgreSQL
- Multi-cloud Sync - Cross-cloud data movement and synchronization
- Infrastructure Monitoring - Monitor cloud resources and costs
- Python 3.8 or higher
- Docker (recommended)
- Git
- Mage Pro - For enterprise features, advanced monitoring, and production-ready deployments
Clone the repository:
git clone https://github.com/your-username/mage-pipeline-examples.gitcd mage-pipeline-examplesSet up Mage using Docker (Recommended):
# Clone Mage's quickstart templategit clone https://github.com/mage-ai/compose-quickstart.git mage-setupcd mage-setup# Copy environment filecp dev.env .env# Start Magedocker compose up
Access Mage UI:Open your browser and navigate to
http://localhost:6789Import a Pipeline:
Method 1: Zip Upload (Recommended)
a.Prepare the pipeline:
# Navigate to the pipeline directory you want to importcd examples/data-integration/api-to-database# Create a zip file of the pipelinezip -r api-to-database-pipeline.zip.
b.Upload to Mage:
- Open Mage UI at
http://localhost:6789 - Click on"Pipelines" in the left sidebar
- Click"Import" button
- Select"Upload zip file"
- Choose your
api-to-database-pipeline.zipfile - Click"Import"
c.Verify import:
- The pipeline should appear in your pipelines list
- Click on the pipeline to view and edit it
- Follow the setup instructions in the pipeline's README
Method 2: Manual Copy
a.Copy pipeline files:
# Copy the entire pipeline directory to your Mage projectcp -r examples/data-integration/api-to-database/* /path/to/your/mage/project/pipelines/
b.Refresh Mage UI:
- The pipeline should appear automatically
- If not, restart your Mage server
Method 3: Git Clone (For Development)
a.Clone into Mage project:
# Navigate to your Mage project directorycd /path/to/your/mage/project# Clone specific pipelinegit clone https://github.com/your-username/mage-pipeline-examples.git tempcp -r temp/examples/data-integration/api-to-database/* pipelines/rm -rf temp
- Open Mage UI at
After importing a pipeline, you'll need to configure it for your environment:
Install Dependencies:
# Install required Python packagespip install -r requirements.txtConfigure Environment Variables:
# Create or update .env file in your Mage project rootecho"API_KEY=your_api_key_here">> .envecho"DATABASE_URL=your_database_url_here">> .env
Update IO Configuration:
# Edit io_config.yaml with your database and API credentialsnano io_config.yamlTest the Pipeline:
- Open the pipeline in Mage UI
- Click"Run" to test the pipeline
- Check logs for any errors
- Verify data output
# Install Magepip install mage-ai# Start Mage servermage start your_project_name
Each pipeline example is organized in its own directory with:
README.md- Detailed explanation and setup instructions- Pipeline files - The actual Mage pipeline code
requirements.txt- Python dependencies- Sample data (if applicable)
- Data Integration (
examples/data-integration/) - Connect and sync data from various sources - Batch ETL (
examples/batch-etl/) - Process large datasets in batches - Streaming Pipelines (
examples/streaming-pipelines/) - Real-time data processing - ML Models (
examples/ml-models/) - Machine learning workflows and MLOps - Data Quality (
examples/data-quality/) - Data validation and monitoring - Cloud Operations (
examples/cloud-ops/) - Cloud infrastructure and data movement
Choose your preferred import method:
- Zip Upload (Recommended) - Upload pipeline as zip file through Mage UI
- Manual Copy - Copy files directly to your Mage project
- Git Clone - Clone specific pipeline for development
Each pipeline includes:
- Prerequisites and dependencies
- Configuration steps
- Sample data setup
- Running instructions
- Modify data sources and destinations
- Adjust transformation logic
- Add your own business logic
- Scale for your data volume
Mage pipelines typically consist of three main components:
Extract data from various sources:
@data_loaderdefload_data_from_api(*args,**kwargs):# Your data loading logic herereturndata
Process and transform your data:
@transformerdeftransform_data(data,*args,**kwargs):# Your transformation logic herereturntransformed_data
Load data to destinations:
@data_exporterdefexport_data_to_database(data,*args,**kwargs):# Your data export logic here
Create a.env file in your Mage project root:
# Database ConfigurationPOSTGRES_DBNAME=your_databasePOSTGRES_USER=your_usernamePOSTGRES_PASSWORD=your_passwordPOSTGRES_HOST=localhostPOSTGRES_PORT=5432# API KeysAPI_KEY=your_api_keyWEATHER_API_KEY=your_weather_api_key# Cloud ConfigurationAWS_ACCESS_KEY_ID=your_aws_keyAWS_SECRET_ACCESS_KEY=your_aws_secret
Configure data connections inio_config.yaml:
dev:POSTGRES_CONNECT_TIMEOUT:10POSTGRES_DBNAME:"{{ env_var('POSTGRES_DBNAME') }}"POSTGRES_USER:"{{ env_var('POSTGRES_USER') }}"POSTGRES_PASSWORD:"{{ env_var('POSTGRES_PASSWORD') }}"POSTGRES_HOST:"{{ env_var('POSTGRES_HOST') }}"POSTGRES_PORT:"{{ env_var('POSTGRES_PORT') }}"
Mage provides built-in monitoring capabilities:
- Pipeline Execution History - Track all pipeline runs
- Real-time Logs - Monitor pipeline execution in real-time
- Data Quality Metrics - Built-in data validation and quality checks
- Performance Metrics - Track execution time and resource usage
- Error Handling - Automatic retry and failure notifications
We welcome contributions! Here's how you can help:
- Fork the repository
- Create a new directory for your pipeline example
- Include:
README.mdwith detailed instructions- Pipeline code files
requirements.txt- Sample data (if applicable)
- Submit a pull request
- Follow Python best practices (PEP 8)
- Include comprehensive documentation
- Test your pipelines before submitting
- Use descriptive commit messages
- Update this README if adding new categories
- Clear, well-commented code
- Comprehensive setup instructions
- Error handling and validation
- Sample data or data generation scripts
- Documentation of data sources and destinations
Pipeline fails to start:
- Check your Python dependencies in
requirements.txt - Verify environment variables are set correctly
- Ensure data sources are accessible
Database connection errors:
- Verify database credentials in
io_config.yaml - Check network connectivity
- Ensure database is running and accessible
Import errors:
- Install missing dependencies:
pip install -r requirements.txt - Check Python version compatibility
- Verify import paths
- Check theMage Documentation
- SearchGitHub Issues
- Join theMage Slack
- Create an issue in this repository
If you find this repository helpful, please:
- ⭐ Star the repository
- 🍴 Fork it for your own use
- 🐛 Report issues
- 💡 Suggest new examples
- 📢 Share with your network
Happy Data Orchestrating! 🎉
Built with ❤️ usingMage
About
Example pipelines built with Mage, a data orchestration tool. Use these pipelines to learn, explore, and jumpstart your own data workflows with Mage.
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.