You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
This document outlines the setup and workflow of our GitHub Actions data pipeline. The primary goal is to manage and version-control data generated by our CI/CD processes, with a special focus on handling a SQLite database and its schema migrations.
Core Components
_data Branch
A dedicated, orphaned branch (_data) serves as the storage for data artifacts. This keeps large data files and frequent data updates out of the main source code history, making the main repository lighter and faster to clone.
actions/pipeline-data ("Setup Pipeline Data Branch")
This composite action manages the interaction with the_data branch by creating a git worktree.
operation: setup: Checks out the_data branch into a.pipeline-data-worktree directory and usesrsync to copy the entire contents of the worktree'sdata directory into the main workspace'sdata directory.
operation: update: Usesrsync to sync thedata directory from the main workspace to the worktree, then commits and force-pushes the changes to the_data branch.
operation: cleanup: Removes the.pipeline-data-worktree directory. This should be run at the end of a workflow, typically using anif: always() condition to ensure cleanup happens even if other steps fail.
actions/restore-db ("SQLite Database Operations")
This action handles the dumping and restoring of the SQLite database in a way that is compatible with our migration-based schema management.
operation: dump:
Dumps the live SQLite database (e.g.,data/db.sqlite) into a diffable format in the specified dump directory (e.g.,data/dump) usingsqlite-diffable.
Copies the Drizzle migration journal (drizzle/meta/_journal.json) into the dump directory as_journal.json. This is a critical step that versions the database schema state along with the data itself.
operation: restore:
Reads the latest migration number from the_journal.json file located within the dump directory.
Initializes a new, empty database.
Runs database migrations from the main branch'sdrizzle directory up to the version specified in the journal file. This creates a database with the exact schema that corresponds to the dumped data.
Loads the data from the diffable dump into the database.
Runs any remaining migrations from the maindrizzle folder to bring the database schema fully up to date with the latest code in the main branch.
Workflows
This repository uses several GitHub Actions workflows to automate testing, data processing, and deployment.
run-pipelines.yml ("Run Pipelines")
This is the main data processing workflow. It's responsible for fetching the latest data from sources like GitHub, processing it, and generating summaries.
Triggers:
Runs on a daily schedule (cron: "0 23 * * *").
Can be manually triggered (workflow_dispatch) with various options to control its behavior (e.g., forcing re-ingestion, specifying date ranges).
Key Jobs:
ingest-export:
Checks out the_data branch and restores the database.
Runs theingest pipeline to fetch new data (issues, PRs, etc.).
Runs theprocess pipeline to calculate scores and other metrics.
Runs theexport pipeline to save processed data.
Dumps the updated database and pushes all new data artifacts to the_data branch.
generate-summaries:
Depends on the successful completion ofingest-export.
Restores the latest database from the_data branch.
Uses an AI service to generate project and contributor summaries.
On the daily schedule, it generates project summaries daily and contributor summaries weekly.
Pushes the generated summaries and updated database state back to the_data branch.
pr-checks.yml ("PR Checks")
This workflow runs on every pull request against themain branch to ensure code quality and prevent regressions.
Triggers:
pull_request on themain branch.
Key Jobs:
check: Lints the code and runs type-checking with TypeScript.
build: Ensures the Next.js application builds successfully with the PR changes. It restores the production data to ensure the build process is realistic.
test-pipelines: Runs the core data pipelines (ingest,process,export) in a test mode to verify their integrity.
check-migrations: If the database schema (src/lib/data/schema.ts) is modified, this job verifies that a corresponding Drizzle migration has been generated.
deploy.yml ("Deploy to GitHub Pages")
This workflow handles the deployment of the application to GitHub Pages.
Triggers:
Manually viaworkflow_dispatch.
Automatically after theRun Pipelines workflow successfully completes on themain branch.
Key Steps:
Restores the latest data from the_data branch.
Runs any pending database migrations.
Builds the Next.js application for production.
Copies thedata directory into theout directory to be included in the deployment.
Deploys the contents of theout directory to GitHub Pages.