| Apache Airflow | |
|---|---|
| Original author | Maxime Beauchemin /Airbnb |
| Developer | Apache Software Foundation |
| Initial release | June 3, 2015; 10 years ago (2015-06-03) |
| Stable release | 3.0.2[1] |
| Written in | Python |
| Operating system | Linux,macOS |
| Type | Workflow management platform |
| License | Apache License 2.0 |
| Website | airflow |
| Repository | |
Apache Airflow is anopen-sourceworkflow management platform for data engineering pipelines. It started atAirbnb in October 2014[2] as a solution to manage the company's increasingly complex workflows. Creating Airflow allowed Airbnb to programmatically author and schedule their workflows and monitor them via the built-in Airflowuser interface.[3][4] From the beginning, the project was made open source, becoming an Apache Incubator project in March 2016 and a top-levelApache Software Foundation project in January 2019.
Airflow is written inPython, and workflows are created via Python scripts. Airflow is designed under the principle of "configuration as code". While other "configuration as code" workflow platforms exist using markup languages likeXML, using Python allows developers to import libraries and classes to help them create their workflows.
Airflow usesdirected acyclic graphs (DAGs) to manage workfloworchestration. Tasks and dependencies are defined in Python and then Airflow manages the scheduling and execution. DAGs can be run either on a defined schedule (e.g. hourly or daily) or based on external event triggers (e.g. a file appearing inHive[5]). Previous DAG-based schedulers likeOozie and Azkaban tended to rely on multipleconfiguration files andfile system trees to create a DAG, whereas in Airflow, DAGs can often be written in one Python file.[6]
Three notable providers offer ancillary services around the core open source project.