Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Example pipelines built with Mage, a data orchestration tool. Use these pipelines to learn, explore, and jumpstart your own data workflows with Mage.

License

NotificationsYou must be signed in to change notification settings

mage-ai/mage-pipeline-examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

A comprehensive collection of example pipelines built withMage, the modern data orchestration platform. Use these pipelines to learn, explore, and jumpstart your own data workflows with Mage.

MagePython

🌟 What is Mage?

Mage is a modern data orchestration platform that simplifies building, running, and monitoring data pipelines. Unlike traditional orchestration tools, Mage offers:

  • 🎯 Python-first approach - Build pipelines using familiar Python syntax
  • 📊 Interactive development - Develop and test pipelines in a notebook-style interface
  • 🔄 Real-time monitoring - Built-in observability and monitoring capabilities
  • 🧩 Modular architecture - Reusable blocks for data loading, transformation, and exporting
  • ☁️ Cloud-native - Easy deployment to AWS, GCP, and Azure
  • 🤖 ML-focused - Specialized features for machine learning workflows

🚀 Mage Pro Features

For enterprise teams and production environments,Mage Pro provides:

  • 💻 Enterprise Support - Dedicated support and SLA guarantees
  • 📊 Advanced Analytics - Enhanced monitoring, alerting, and performance insights
  • 🔒 Security & Compliance - Enterprise-grade security features and compliance tools
  • ⚡ High Performance - Optimized for large-scale data processing
  • 🌐 Multi-tenant Architecture - Support for multiple teams and projects
  • 🔄 Advanced Scheduling - Complex scheduling and dependency management
  • 📊 Custom Dashboards - Tailored monitoring and reporting capabilities

📚 Pipeline Examples

This repository contains a comprehensive collection of pipeline examples organized by category:

📊 Data Integration

  • API to Database - Extract data from REST APIs and load into databases
  • Multi-source Sync - Combine data from multiple APIs, databases, and files
  • Database Replication - Real-time database synchronization and replication

📦 Batch ETL

  • CSV Processing - Process and transform CSV files with data validation
  • JSON ETL - Extract, transform, and load JSON data from various sources
  • Combine Python and SQL - Hybrid processing using both Python and SQL operations

🌊 Streaming Pipelines

  • Kafka Consumer - Real-time data processing from Kafka streams
  • Real-time Analytics - Live analytics and metrics calculation
  • Event Processing - Process and route events in real-time

🤖 ML Models

  • Model Training - End-to-end ML model training pipeline
  • Model Inference - Deploy and serve ML models in production
  • Guide to Accuracy, Precision, and Recall - Learn ML evaluation metrics

🔍 Data Quality

  • Validation Pipeline - Automated data validation and quality checks
  • Monitoring Dashboard - Real-time data quality monitoring and alerting
  • Anomaly Detection - Detect and handle data anomalies automatically

☁️ Cloud Operations

  • S3 to RDS - Transfer data from AWS S3 to RDS PostgreSQL
  • Multi-cloud Sync - Cross-cloud data movement and synchronization
  • Infrastructure Monitoring - Monitor cloud resources and costs

🚀 Quick Start

Prerequisites

  • Python 3.8 or higher
  • Docker (recommended)
  • Git
  • Mage Pro - For enterprise features, advanced monitoring, and production-ready deployments

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/mage-pipeline-examples.gitcd mage-pipeline-examples
  2. Set up Mage using Docker (Recommended):

    # Clone Mage's quickstart templategit clone https://github.com/mage-ai/compose-quickstart.git mage-setupcd mage-setup# Copy environment filecp dev.env .env# Start Magedocker compose up
  3. Access Mage UI:Open your browser and navigate tohttp://localhost:6789

  4. Import a Pipeline:

    Method 1: Zip Upload (Recommended)

    a.Prepare the pipeline:

    # Navigate to the pipeline directory you want to importcd examples/data-integration/api-to-database# Create a zip file of the pipelinezip -r api-to-database-pipeline.zip.

    b.Upload to Mage:

    • Open Mage UI athttp://localhost:6789
    • Click on"Pipelines" in the left sidebar
    • Click"Import" button
    • Select"Upload zip file"
    • Choose yourapi-to-database-pipeline.zip file
    • Click"Import"

    c.Verify import:

    • The pipeline should appear in your pipelines list
    • Click on the pipeline to view and edit it
    • Follow the setup instructions in the pipeline's README

    Method 2: Manual Copy

    a.Copy pipeline files:

    # Copy the entire pipeline directory to your Mage projectcp -r examples/data-integration/api-to-database/* /path/to/your/mage/project/pipelines/

    b.Refresh Mage UI:

    • The pipeline should appear automatically
    • If not, restart your Mage server

    Method 3: Git Clone (For Development)

    a.Clone into Mage project:

    # Navigate to your Mage project directorycd /path/to/your/mage/project# Clone specific pipelinegit clone https://github.com/your-username/mage-pipeline-examples.git tempcp -r temp/examples/data-integration/api-to-database/* pipelines/rm -rf temp

Post-Import Configuration

After importing a pipeline, you'll need to configure it for your environment:

  1. Install Dependencies:

    # Install required Python packagespip install -r requirements.txt
  2. Configure Environment Variables:

    # Create or update .env file in your Mage project rootecho"API_KEY=your_api_key_here">> .envecho"DATABASE_URL=your_database_url_here">> .env
  3. Update IO Configuration:

    # Edit io_config.yaml with your database and API credentialsnano io_config.yaml
  4. Test the Pipeline:

    • Open the pipeline in Mage UI
    • Click"Run" to test the pipeline
    • Check logs for any errors
    • Verify data output

Alternative: Local Installation

# Install Magepip install mage-ai# Start Mage servermage start your_project_name

📖 How to Use This Repository

1. Browse Examples

Each pipeline example is organized in its own directory with:

  • README.md - Detailed explanation and setup instructions
  • Pipeline files - The actual Mage pipeline code
  • requirements.txt - Python dependencies
  • Sample data (if applicable)

2. Choose Your Pipeline Category

  • Data Integration (examples/data-integration/) - Connect and sync data from various sources
  • Batch ETL (examples/batch-etl/) - Process large datasets in batches
  • Streaming Pipelines (examples/streaming-pipelines/) - Real-time data processing
  • ML Models (examples/ml-models/) - Machine learning workflows and MLOps
  • Data Quality (examples/data-quality/) - Data validation and monitoring
  • Cloud Operations (examples/cloud-ops/) - Cloud infrastructure and data movement

3. Import the Pipeline

Choose your preferred import method:

  • Zip Upload (Recommended) - Upload pipeline as zip file through Mage UI
  • Manual Copy - Copy files directly to your Mage project
  • Git Clone - Clone specific pipeline for development

4. Configure and Run

Each pipeline includes:

  • Prerequisites and dependencies
  • Configuration steps
  • Sample data setup
  • Running instructions

5. Customize for Your Use Case

  • Modify data sources and destinations
  • Adjust transformation logic
  • Add your own business logic
  • Scale for your data volume

🏗️ Pipeline Structure

Mage pipelines typically consist of three main components:

Data Loaders

Extract data from various sources:

@data_loaderdefload_data_from_api(*args,**kwargs):# Your data loading logic herereturndata

Transformers

Process and transform your data:

@transformerdeftransform_data(data,*args,**kwargs):# Your transformation logic herereturntransformed_data

Data Exporters

Load data to destinations:

@data_exporterdefexport_data_to_database(data,*args,**kwargs):# Your data export logic here

🔧 Configuration

Environment Variables

Create a.env file in your Mage project root:

# Database ConfigurationPOSTGRES_DBNAME=your_databasePOSTGRES_USER=your_usernamePOSTGRES_PASSWORD=your_passwordPOSTGRES_HOST=localhostPOSTGRES_PORT=5432# API KeysAPI_KEY=your_api_keyWEATHER_API_KEY=your_weather_api_key# Cloud ConfigurationAWS_ACCESS_KEY_ID=your_aws_keyAWS_SECRET_ACCESS_KEY=your_aws_secret

IO Configuration

Configure data connections inio_config.yaml:

dev:POSTGRES_CONNECT_TIMEOUT:10POSTGRES_DBNAME:"{{ env_var('POSTGRES_DBNAME') }}"POSTGRES_USER:"{{ env_var('POSTGRES_USER') }}"POSTGRES_PASSWORD:"{{ env_var('POSTGRES_PASSWORD') }}"POSTGRES_HOST:"{{ env_var('POSTGRES_HOST') }}"POSTGRES_PORT:"{{ env_var('POSTGRES_PORT') }}"

📊 Monitoring and Observability

Mage provides built-in monitoring capabilities:

  • Pipeline Execution History - Track all pipeline runs
  • Real-time Logs - Monitor pipeline execution in real-time
  • Data Quality Metrics - Built-in data validation and quality checks
  • Performance Metrics - Track execution time and resource usage
  • Error Handling - Automatic retry and failure notifications

🤝 Contributing

We welcome contributions! Here's how you can help:

Adding New Examples

  1. Fork the repository
  2. Create a new directory for your pipeline example
  3. Include:
    • README.md with detailed instructions
    • Pipeline code files
    • requirements.txt
    • Sample data (if applicable)
  4. Submit a pull request

Guidelines

  • Follow Python best practices (PEP 8)
  • Include comprehensive documentation
  • Test your pipelines before submitting
  • Use descriptive commit messages
  • Update this README if adding new categories

Pipeline Requirements

  • Clear, well-commented code
  • Comprehensive setup instructions
  • Error handling and validation
  • Sample data or data generation scripts
  • Documentation of data sources and destinations

📚 Learning Resources

Official Documentation

Tutorials and Guides

Community

🐛 Troubleshooting

Common Issues

Pipeline fails to start:

  • Check your Python dependencies inrequirements.txt
  • Verify environment variables are set correctly
  • Ensure data sources are accessible

Database connection errors:

  • Verify database credentials inio_config.yaml
  • Check network connectivity
  • Ensure database is running and accessible

Import errors:

  • Install missing dependencies:pip install -r requirements.txt
  • Check Python version compatibility
  • Verify import paths

Getting Help

📞 Support

If you find this repository helpful, please:

  • ⭐ Star the repository
  • 🍴 Fork it for your own use
  • 🐛 Report issues
  • 💡 Suggest new examples
  • 📢 Share with your network

Happy Data Orchestrating! 🎉

Built with ❤️ usingMage

About

Example pipelines built with Mage, a data orchestration tool. Use these pipelines to learn, explore, and jumpstart your own data workflows with Mage.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp