mage-ai/mage-aiPublic

NotificationsYou must be signed in to change notification settings
Fork831
Star8.2k

🧙 Build, run, and manage data pipelines for integrating and transforming data.

License

Apache-2.0 license

8.2k stars 831 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 5,100 Commits
.circleci		.circleci
.git-dev/hooks		.git-dev/hooks
.github		.github
docs		docs
integrations		integrations
kube		kube
mage_ai		mage_ai
mage_custom_path		mage_custom_path
mage_integrations		mage_integrations
scripts		scripts
templates		templates
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
README_dev.md		README_dev.md
SUMMARY.md		SUMMARY.md
dev.Dockerfile		dev.Dockerfile
dev.spark.Dockerfile		dev.spark.Dockerfile
docker-compose.yml		docker-compose.yml
example.ipynb		example.ipynb
package.json		package.json
pg-docker-compose.yml		pg-docker-compose.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
test_branch.Dockerfile		test_branch.Dockerfile
yarn.lock		yarn.lock

Repository files navigation

🧙 A modern replacement for Airflow.

Documentation 🌪️ Get a 5 min overview 🌊 Play with live tool 🔥 Get instant help

Give your data team`magical` powers

Integrate and synchronize data from 3rd party sources

Build real-time and batch pipelines totransform data using Python, SQL, and R

Run, monitor, andorchestrate thousands of pipelines without losing sleep

1️⃣ 🏗️

Build

Have you met anyone who said they loved developing in Airflow?
That’s why we designed an easy developer experience that you’ll enjoy.


Easy developer experience Start developing locally with a single command or launch a dev environment in your cloud using Terraform. Language of choice Write code in Python, SQL, or R in the same data pipeline for ultimate flexibility. Engineering best practices built-in Each step in your pipeline is a standalone file containing modular code that’s reusable and testable with data validations. No more DAGs with spaghetti code.

Easy developer experience
Start developing locally with a single command or launch a dev environment in your cloud using Terraform.

Language of choice
Write code in Python, SQL, or R in the same data pipeline for ultimate flexibility.

Engineering best practices built-in
Each step in your pipeline is a standalone file containing modular code that’s reusable and testable with data validations. No more DAGs with spaghetti code.

↓

2️⃣ 🔮

Preview

Stop wasting time waiting around for your DAGs to finish testing.
Get instant feedback from your code each time you run it.


Interactive code Immediately see results from your code’s output with an interactive notebook UI. Data is a first-class citizen Each block of code in your pipeline produces data that can be versioned, partitioned, and cataloged for future use. Collaborate on cloud Develop collaboratively on cloud resources, version control with Git, and test pipelines without waiting for an available shared staging environment.

Interactive code
Immediately see results from your code’s output with an interactive notebook UI.

Data is a first-class citizen
Each block of code in your pipeline produces data that can be versioned, partitioned, and cataloged for future use.

Collaborate on cloud
Develop collaboratively on cloud resources, version control with Git, and test pipelines without waiting for an available shared staging environment.

↓

3️⃣ 🚀

Launch

Don’t have a large team dedicated to Airflow?
Mage makes it easy for a single developer or small team to scale up and manage thousands of pipelines.


Fast deploy Deploy Mage to AWS, GCP, or Azure with only 2 commands using maintained Terraform templates. Scaling made simple Transform very large datasets directly in your data warehouse or through a native integration with Spark. Observability Operationalize your pipelines with built-in monitoring, alerting, and observability through an intuitive UI.

🧙 Intro

Mage is an open-source data pipeline tool for transforming and integrating data.

🏃‍♀️ Install

The recommended way to install the latest version of Mage is through Docker with the following command:

docker pull mageai/mageai:latest

You can also install Mage using pip or conda, though this may cause dependency issues without the proper environment.

pip install mage-ai

conda install -c conda-forge mage-ai

Looking for help? Thefastest way to get started is by checking out our documentationhere.

Looking for quick examples? Open ademo project right in your browser or check out ourguides.

🎮 Demo

Live demo

Build and run a data pipeline with ourdemo app.

WARNING
The live demo is public to everyone, please don’t save anything sensitive (e.g. passwords, secrets, etc).

Demo video (5 min)

_{Click the image to play video}

👩‍🏫 Tutorials

🔮Features


🎶	Orchestration	Schedule and manage data pipelines with observability.
📓	Notebook	Interactive Python, SQL, & R editor for coding data pipelines.
🏗️	Data integrations	Synchronize data from 3rd party sources to your internal destinations.
🚰	Streaming pipelines	Ingest and transform real-time data.
❎	dbt	Build, run, and manage your dbt models with Mage.

A sample data pipeline defined across 3 files ➝

Load data ➝

@data_loaderdefload_csv_from_file():returnpd.read_csv('default_repo/titanic.csv')

Transform data ➝

@transformerdefselect_columns_from_df(df,*args):returndf[['Age','Fare','Survived']]

Export data ➝

@data_exporterdefexport_titanic_data_to_disk(df)->None:df.to_csv('default_repo/titanic_transformed.csv')

What the data pipeline looks like in the UI ➝

New? We recommend reading aboutblocks andlearning from ahands-on tutorial.

🏔️Core design principles

Every user experience and technical design decision adheres to these principles.


💻	Easy developer experience	Open-source engine that comes with a custom notebook UI for building data pipelines.
🚢	Engineering best practices built-in	Build and deploy data pipelines using modular code. No more writing throwaway code or trying to turn notebooks into scripts.
💳	Data is a first-class citizen	Designed from the ground up specifically for running data-intensive workflows.
🪐	Scaling is made simple	Analyze and process large data quickly for rapid iteration.

🛸Core abstractions

These are the fundamental concepts that Mage uses to operate.


Project	Like a repository on GitHub; this is where you write all your code.
Pipeline	Contains references to all the blocks of code you want to run, charts for visualizing data, and organizes the dependency between each block of code.
Block	A file with code that can be executed independently or within a pipeline.
Data product	Every block produces data after it's been executed. These are called data products in Mage.
Trigger	A set of instructions that determine when or how a pipeline should run.
Run	Stores information about when it was started, its status, when it was completed, any runtime variables used in the execution of the pipeline or block, etc.