- Notifications
You must be signed in to change notification settings - Fork562
puckel/docker-airflow
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This repository containsDockerfile ofapache-airflow forDocker'sautomated build published to the publicDocker Hub Registry.
- Based on Python (3.7-slim-buster) official Imagepython:3.7-slim-buster and uses the officialPostgres as backend andRedis as queue
- InstallDocker
- InstallDocker Compose
- Following the Airflow release fromPython Package Index
Pull the image from the Docker repository.
docker pull puckel/docker-airflowOptionally installExtra Airflow Packages and/or python dependencies at build time :
docker build --rm --build-arg AIRFLOW_DEPS="datadog,dask" -t puckel/docker-airflow .docker build --rm --build-arg PYTHON_DEPS="flask_oauthlib>=0.9" -t puckel/docker-airflow .or combined
docker build --rm --build-arg AIRFLOW_DEPS="datadog,dask" --build-arg PYTHON_DEPS="flask_oauthlib>=0.9" -t puckel/docker-airflow .Don't forget to update the airflow images in the docker-compose files to puckel/docker-airflow:latest.
By default, docker-airflow runs Airflow withSequentialExecutor :
docker run -d -p 8080:8080 puckel/docker-airflow webserverIf you want to run another executor, use the other docker-compose.yml files provided in this repository.
ForLocalExecutor :
docker-compose -f docker-compose-LocalExecutor.yml up -dForCeleryExecutor :
docker-compose -f docker-compose-CeleryExecutor.yml up -dNB : If you want to have DAGs example loaded (default=False), you've to set the following environment variable :
LOAD_EX=n
docker run -d -p 8080:8080 -e LOAD_EX=y puckel/docker-airflowIf you want to use Ad hoc query, make sure you've configured connections:Go to Admin -> Connections and Edit "postgres_default" set this values (equivalent to values in airflow.cfg/docker-compose*.yml) :
- Host : postgres
- Schema : airflow
- Login : airflow
- Password : airflow
For encrypted connection passwords (in Local or Celery Executor), you must have the same fernet_key. By default docker-airflow generates the fernet_key at startup, you have to set an environment variable in the docker-compose (ie: docker-compose-LocalExecutor.yml) file to set the same key accross containers. To generate a fernet_key :
docker run puckel/docker-airflow python -c "from cryptography.fernet import Fernet; FERNET_KEY = Fernet.generate_key().decode(); print(FERNET_KEY)"It's possible to set any configuration value for Airflow from environment variables, which are used over values from the airflow.cfg.
The general rule is the environment variable should be namedAIRFLOW__<section>__<key>, for exampleAIRFLOW__CORE__SQL_ALCHEMY_CONN sets thesql_alchemy_conn config option in the[core] section.
Check out theAirflow documentation for more details
You can also define connections via environment variables by prefixing them withAIRFLOW_CONN_ - for exampleAIRFLOW_CONN_POSTGRES_MASTER=postgres://user:password@localhost:5432/master for a connection called "postgres_master". The value is parsed as a URI. This will work for hooks etc, but won't show up in the "Ad-hoc Query" section unless an (empty) connection is also created in the DB
Airflow allows for custom user-created plugins which are typically found in${AIRFLOW_HOME}/plugins folder. Documentation on plugins can be foundhere
In order to incorporate plugins into your docker container
- Create the plugins folders
plugins/with your custom plugins. - Mount the folder as a volume by doing either of the following:
- Include the folder as a volume in command-line
-v $(pwd)/plugins/:/usr/local/airflow/plugins - Use docker-compose-LocalExecutor.yml or docker-compose-CeleryExecutor.yml which contains support for adding the plugins folder as a volume
- Include the folder as a volume in command-line
- Create a file "requirements.txt" with the desired python modules
- Mount this file as a volume
-v $(pwd)/requirements.txt:/requirements.txt(or add it as a volume in docker-compose file) - The entrypoint.sh script execute the pip install command (with --user option)
- Airflow:localhost:8080
- Flower:localhost:5555
Easy scaling using docker-compose:
docker-compose -f docker-compose-CeleryExecutor.yml scale worker=5This can be used to scale to a multi node setup using docker swarm.
If you want to run other airflow sub-commands, such aslist_dags orclear you can do so like this:
docker run --rm -ti puckel/docker-airflow airflow list_dagsor with your docker-compose set up like this:
docker-compose -f docker-compose-CeleryExecutor.yml run --rm webserver airflow list_dagsYou can also use this to run a bash shell or any other command in the same environment that airflow would be run in:
docker run --rm -ti puckel/docker-airflow bashdocker run --rm -ti puckel/docker-airflow ipythonIf the executor type is set to anything else thanSequentialExecutor you'll need an SQL database.Here is a list of PostgreSQL configuration variables and their default values. They're used to computetheAIRFLOW__CORE__SQL_ALCHEMY_CONN andAIRFLOW__CELERY__RESULT_BACKEND variables when needed for youif you don't provide them explicitly:
| Variable | Default value | Role |
|---|---|---|
POSTGRES_HOST | postgres | Database server host |
POSTGRES_PORT | 5432 | Database server port |
POSTGRES_USER | airflow | Database user |
POSTGRES_PASSWORD | airflow | Database password |
POSTGRES_DB | airflow | Database name |
POSTGRES_EXTRAS | empty | Extras parameters |
You can also use those variables to adapt your compose file to match an existing PostgreSQL instance managed elsewhere.
Please refer to the Airflow documentation to understand the use of extras parameters, for example in order to configurea connection that uses TLS encryption.
Here's an important thing to consider:
When specifying the connection as URI (in AIRFLOW_CONN_* variable) you should specify it following the standard syntax of DB connections,where extras are passed as parameters of the URI (note that all components of the URI should be URL-encoded).
Therefore you must provide extras parameters URL-encoded, starting with a leading?. For example:
POSTGRES_EXTRAS="?sslmode=verify-full&sslrootcert=%2Fetc%2Fssl%2Fcerts%2Fca-certificates.crt"If the executor type is set toCeleryExecutor you'll need a Celery broker. Here is a list of Redis configuration variablesand their default values. They're used to compute theAIRFLOW__CELERY__BROKER_URL variable for you if you don't provideit explicitly:
| Variable | Default value | Role |
|---|---|---|
REDIS_PROTO | redis:// | Protocol |
REDIS_HOST | redis | Redis server host |
REDIS_PORT | 6379 | Redis server port |
REDIS_PASSWORD | empty | If Redis is password protected |
REDIS_DBNUM | 1 | Database number |
You can also use those variables to adapt your compose file to match an existing Redis instance managed elsewhere.
Fork, improve and PR.
About
Docker Apache Airflow
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.