About
Project using data engineering concepts.
The project is an ELT (Extract, Load, Transform) data pipeline, orchestrated with Apache Airflow through Docker containers.
Faker is used as a package to generate data to a mysql database. The data is extracted from mysql, transformed with pandas and Sql and then loaded into an Olap postgres database. A notification is then sent by email once the whole process is completed.
Architecture
Prerequisites
Setup mailtrap
One platform toTest, Send, Control your emails:
Setup
Clone the project to your desired location:
$ git clone https://github.com/Stefen-Taime/airflow_etl.git
fill the AIRFLOW_SMTPSMTP_USER, AIRFLOWSMTPSMTP_PASSWORD, AIRFLOWSMTP_SMTP_MAIL_FROM in .envExample file:
AIRFLOW_ADMIN_MAIL=airflowAIRFLOW_ADMIN_FIRSTNAME=airflowAIRFLOW_ADMIN_NAME=airflowAIRFLOW_ADMIN_PASSWORD=airflowpasswordAIRFLOW__CORE__LOAD_DEFAULT_CONNECTIONS=FalseAIRFLOW__CORE__SQL_ALCHEMY_CONN=postgres+psycopg2://airflow:airflowpassword@postgres:5432/airflowAIRFLOW__CORE__FERNET_KEY=81HqDtbqAywKSOumSha3BhWNOdQ26slT6K0YaZeZyPs=AIRFLOW_CONN_METADATA_DB=postgres+psycopg2://airflow:airflowpassword@postgres:5432/airflowAIRFLOW_VAR__METADATA_DB_SCHEMA=airflowAIRFLOW__SCHEDULER__SCHEDULER_HEARTBEAT_SEC=5AIRFLOW__CORE__EXECUTOR=LocalExecutorAIRFLOW__SMTP__SMTP_HOST=smtp.mailtrap.ioAIRFLOW__SMTP__SMTP_PORT=2525AIRFLOW__SMTP__SMTP_USER=xxxxxxxxxxxAIRFLOW__SMTP__SMTP_PASSWORD=xxxxxxxAIRFLOW__SMTP__SMTP_MAIL_FROM=your_email@gmail.comAIRFLOW__WEBSERVER__BASE_URL=http://localhost:8080POSTGRES_USER=airflowPOSTGRES_PASSWORD=airflowpasswordPOSTGRES_DB=airflowAIRFLOW_UID=1000AIRFLOW_GID=0AIRFLOW_UID=1000AIRFLOW_GID=0AIRFLOW_UID=1000AIRFLOW_GID=0PG_VER=14-alpinePOSTGRES_SRC_PASSWORD=Sup3rS3c3tPORT=5432POSTGRES_USER_OLAP=postgresHOSTNAME=olapONTAINER_NAME=postgresPOSTGRES_DB_OLAP=postgres
grant permissions to the bash script:
chmod a+x build_Services.sh
Bash:
$ ./build_Services.sh
Build Docker:
$ docker-compose up --build -d
When everything is done, you can check all the containers running:
$ docker ps
oltp Interface
Now you can access adminer web interface by going tohttp://localhost:8085 with the default user which is in the docker-compose.yml:
Système MySQL
Serveur oltp
user root
password myrootpassword
Database testdb
olap Interface
Now you can access new adminer web interface by going tohttp://localhost:8085 with the default user which is in the docker-compose.yml:
Système PostgesSQL
Serveur olap
user postgres
password Sup3rS3c3t
Database postgres
Airflow Interface
Now you can access Airflow web interface by going tohttp://localhost:8080 with the default user which is in the docker-compose.yml.Username/Password: airflow/airflowpassword:
Airflow DAG
Now you can run Airflow etl dag:
Check oltp and olap database
:)
Check yourmailtrap.io/inboxes
Shut down or restart Airflow
If you need to make changes or shut down:
$ docker-compose down
References
https://medium.com/@stefentaime_10958/elt-airflow-pipeline-project-dcf834c1be17
Top comments(0)
For further actions, you may consider blocking this person and/orreporting abuse