- Notifications
You must be signed in to change notification settings - Fork0
Automates KBO data collection and deployment with Airflow.
License
leewr9/kbo-data-pipeline
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This repository automates the collection and deployment of KBO data using Apache Airflow. It manages the flow of data from collection to visualization in the Data Portal.
This pipeline runs onApache Airflow and is deployed usingDocker Compose. To set up and run the pipeline, ensure Docker is installed and configured properly.
To run the pipeline locally using Docker Compose:
- Clone this repository and initialize submodules:
git clone --recurse-submodules https://github.com/leewr9/kbo-data-pipeline.gitcd kbo-data-pipeline
- Ensure that your GCP service account key is placed in the
config
folder and renamed tokey.json
:mv your-service-account-key.json config/key.json
- Start the Airflow services using Docker Compose:
docker-compose up -d
- Access the Airflow web UI athttp://localhost:8080
- Login withUsername:
admin
,Password:admin
- Login withUsername:
The following DAGs are currently implemented:
- fetch_kbo_games_daily - Runs daily at00:00, parsing the latest KBO game results.
- fetch_kbo_players_weekly - Runs everySunday at 00:00, parsing player records up to the current week.
- fetch_kbo_schedules_weekly - Runs everySunday at 00:00, parsing the schedule for the upcoming week.
- fetch_kbo_historical_data - Runs everyyear on January 1st at 00:00, parsing the schedule for the upcoming year.
The collected data is stored inGoogle Cloud Storage (GCS) under thekbo-data
bucket with the following structure:
- schedules/
weekly/
(Upcoming game schedules, weekly basis)historical/
(Past game schedules by year)
- games/
daily/
(Game details collected daily)historical/
(Historical game details by year)
- players/
daily/
(Player statistics per game)weekly/
(Aggregated player statistics per week)historical/
(Past player statistics by year)
The parsing modules are managed through thekbo-data-collector repository, which is included as aGit submodule in this project.
This project is licensed under theMIT License. See theLICENSE file for details.