githubocto/flatPublic

NotificationsYou must be signed in to change notification settings
Fork42
Star451

The GitHub Action which powers Flat

License

MIT license

451 stars 42 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Use this GitHub action with your project

Add this Action to an existing workflow or create a new one

View on Marketplace

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 188 Commits
.github/workflows		.github/workflows
.husky		.husky
.vscode		.vscode
dist		dist
docs		docs
src		src
.gitignore		.gitignore
.prettierrc		.prettierrc
LICENSE		LICENSE
README.md		README.md
action.yml		action.yml
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Repository files navigation

Flat Data GitHub Action

👉🏽 👉🏽 👉🏽Full writeup:Flat Data Project 👈🏽 👈🏽 👈🏽

Flat Data is a GitHub action which makes it easy to fetch data and commit it to your repository as flatfiles. The action is intended to be run on a schedule, retrieving data from any supported target and creating a commit if there is any change to the fetched data. Flat Data builds on the“git scraping” approach pioneered by Simon Willison to offer a simple pattern for bringing working datasets into your repositories and versioning them, because developing against local datasets is faster and easier than working with data over the wire.

✨ Best used in tandem with theFlat Editor VS Code Extension.

Why would I want to use this?

Flat Data aims to simplify everyday data acquisition and cleanup tasks. It runs on GitHub Actions, so there's no infrastructure to provision and monitor. Each Flat workflow fetches the data you specify, and optionally executes a postprocessing script on the fetched data. The resulting data is committed to your repository if the new data is different, with a commit message summarizing the changes. Flat workflows usually run on a periodic timer, but can be triggered by a variety of stimuli, like changes to your code, or manual triggers. That's it! No complicated job dependency graphs or orchestrators. No dependencies, libraries, or package managers. No new mental model to learn and incorporate. Just evergreen data, right in your repo.

Examples

Check out ourexample repositories.

Usage

Option 1: Flat Editor VSCode Extension

The easiest way to get a Flat Data action up and running is with the accompanyingFlat Editor VSCode Extension which helps you author Flat yml files.

To use it,install the extension and then invokeFlat Editor from the command palette within VSCode (Mac: ⌘⇧P, Others:ctrl-shift-P).

Option 2: Manually create a GitHub Actions workflow yml file

In the repository where you wish to fetch data, create.github/workflows/flat.yml. The following example will fetch a URL every thirty minutes and commit the response, but only if the response has changed since the last commit.

name:Flaton:push:branches:      -mainworkflow_dispatch:schedule:    -cron:'*/30 * * * *'jobs:scheduled:runs-on:ubuntu-lateststeps:# This step installs Deno, which is a new Javascript runtime that improves on Node. Can be used for an optional postprocessing step      -name:Setup denouses:denoland/setup-deno@mainwith:deno-version:v1.10.x# Check out the repository so it can read the files inside of it and do other operations      -name:Check out repouses:actions/checkout@v2# The Flat Action step. We fetch the data in the http_url and save it as downloaded_filename      -name:Fetch datauses:githubocto/flat@v3with:http_url:# THE URL YOU WISH TO FETCH GOES HEREdownloaded_filename:# The http_url gets saved and renamed in our repository. Example: data.json, data.csv, image.png

Note that theschedule parameter affects the overall workflow, which may contain other jobs and steps beyond Flat.

Theschedule parameter usescrontab format. There's alibrary of useful examples and an interactive playground onCrontab guru.

Inputs

The action currently has two fetching modes:

http: GETs a supplied URL
sql: Queries a SQL datastore

These two modes are exclusive; you cannot mix settings for these two in one Flat step for a workflow job.

HTTP Mode

`http_url`

A URL from which to fetch data. Specifying this input puts Flat intohttp mode.

This can be any endpoint: a json, csv, png, zip, xlsx, etc.

`authorization` (optional)

A string used for authorizing the HTTP request. The value of this field is passed in as a header w/ theauthorization key.

For example, if this field is set toBearer abc123 then the following header is sent with each request:

{"Authorization":"Bearer abc123"}

`axios_config` (optional)

Under the hood, thehttp backend usesAxios for data fetching. By default, Flat assumes you're interested in using theGET method to fetch data, but if you'd like toPOST (e.g., sending a GraphQL query), theaxios_config option allows you to override this behavior.

Specifically, theaxios_config parameter should reflect a relative path to a.json file in your repository. This JSON file should mirror the shape ofAxios' request config parameters, with a few notable exceptions.

url andbaseURL will both be ignored, as thehttp_url specified above will take precedence.
headers will be merged in with the authorization header described by theauthorization parameter above. Please do not put secret keys here, as they will be stored in plain text!
Allfunction parameters will be ignored (e.g.,transformRequest).
The response type is always set toresponseType: 'stream' in the background.

An exampleaxios_config might look thusly if you were interested in hitting GitHub's GraphQL API (here is a demo) 👇

{"method":"post","data": {"query":"query { repository(owner:\"octocat\", name:\"Hello-World\") { issues(last:20, states:CLOSED) { edges { node { title url labels(first:5) { edges { node { name } } } } } } } }"  }}

We advise escaping double quotes like\" in your JSON file.

`downloaded_filename`

The name of the file to store data fetched by Flat.

Inhttp mode this can be anything. This can be any endpoint: a json, csv, txt, png, zip, xlsx, etc. file

`postprocess` (optional)

A path to a local Deno javascript or typescript file for postprocessing thedownloaded_filename file. Read more in the"Postprocessing section".

`mask` (optional)

If yourhttp_url string contains secrets, you can choose to mask it from the commit message. You have two options:

Option 1: use a string boolean

mask: true # removes the source entirely from the commit message, defaults to false

Option 2: use a string array with each secret to mask

mask: '["${{ secrets.SECRET1 }}", "${{ secrets.SECRET2 }}"]'

SQL Mode

`sql_connstring`

A URI-style database connection string. Flat will use this connection string to connect to the database and issue the query.

⚠️ Don't write secrets into your workflow.yml!
Most connection strings contain an authentication secret like a username and password. GitHub provides an encrypted vault for secrets like these which can be used by the action when it runs.Create a secret on the repository where the Flat action will run, and use that secret in your workflow.yaml like so:
sql_connstring: ${{secrets.NAME_OF_THE_CREATED_SECRET}}
If you're using theflat-vscode extension, this is handled for you.

`sql_queryfile`

The pathname of the file containing the SQL query that will be issued to the database. Defaults to.github/workflows/query.sql. This path is relative to the root of your repo.

`downloaded_filename`

The name of the file to store data fetched by Flat.

Insql mode this should be one ofcsv orjson. SQL query results will be serialized to disk in the specified format.

⚠️ While the JSON is not pretty-printed, CSV is often a more efficient serialization for tabular data.

`typeorm_config` (optional)

A JSON string representing a configuration passed toTypeORMs createConnection function.

A common use case for this value is connecting yourFlat action to a Heroku database.

For instance, you can pass the following configuration string to your Flat action in order to connect to a Heroku Postgres database.

typeorm_config:'{"ssl":true,"extra":{"ssl":{"rejectUnauthorized":false}}}'

`postprocess` (optional)

A path to a local Deno javascript or typescript file for postprocessing thedownloaded_filename file. Read more in the"Postprocessing section".

Outputs

`delta_bytes`

A signed number describing the number of bytes that changed in this run. If the new data is smaller than the existing, committed data, this will be a negative number.

Postprocessing

You can add apostprocess input in the Action which is path to adeno Javascript or Typescript script that will be invoked to postprocess your data after it is fetched. This path is relative to the root of your repo.

The script can use eitherDeno.args[0] or the name of thedownloaded_filename to access the file fetched by Flat Data.

import{readJSON,writeJSON}from'https://deno.land/x/flat/mod.ts'// The filename is the first invocation argumentconstfilename=Deno.args[0]// Same name as downloaded_filenameconstdata=awaitreadJSON(filename)// Pluck a specific key off// and write it out to a different file// Careful! any uncaught errors and the workflow will fail, committing nothing.constnewfile=`subset_of_${filename}`awaitwriteJSON(newfile,data.path.to.something)

You can useconsole.log() as much as you like within your postprocessing script; the results should show up in your actions log.

Why Deno?

Deno's import-by-url model makes it easy to author lightweight scripts that can include dependencies without forcing you to set up a bundler.

How is my script invoked?

The postprocessing script is invoked withdeno run -q -A --unstable {your script} {your fetched data file}. Note that the-A grants your script full permissions to access network, disk — everything! Make sure you trust any dependencies you pull in, as they aren't restricted. We will likely revisit this in the future with another setting that specifies which permissions to grant deno.

How do I do ...?

The learn more about the possibilities for postprocessing check out ourhelper and examples postprocessing repo.

Building / Releasing

npm run dist and commit the built output (yes, you read that right)
Bump whatever you want to bump in thepackage.json version field
Mergemain intovMAJOR branch.git checkout vMAJOR && git merge main

If this is a new major version, create the branch.git checkout -b vMAJOR
Push the branch.git push --set-upstream origin vMAJOR

Create a new tag for the version:git tag -f vMAJOR.MINOR.PATCH
Push maingit checkout main && git push
Navigate tohttps://github.com/githubocto/flat/tags and cut a new release from the tag you just pushed!

Issues

If you run into any trouble or have questions, feel free toopen an issue.

❤️ GitHub OCTO

License

MIT

About

The GitHub Action which powers Flat

next.github.com/projects/flat-data/

Releases18

Support Node 16, address deprecation warnings Latest

Jun 7, 2023

+ 17 releases

Packages

No packages published

Movatterモバイル変換

License

githubocto/flat

Folders and files

Latest commit

History

Repository files navigation

Flat Data GitHub Action

Why would I want to use this?

Examples

Usage

Option 1: Flat Editor VSCode Extension

Option 2: Manually create a GitHub Actions workflow yml file

Inputs

HTTP Mode

http_url

authorization (optional)

axios_config (optional)

downloaded_filename

postprocess (optional)

mask (optional)

SQL Mode

sql_connstring

sql_queryfile

downloaded_filename

typeorm_config (optional)

postprocess (optional)

Outputs

delta_bytes

Postprocessing

Why Deno?

How is my script invoked?

How do I do ...?

Building / Releasing

Issues

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases18

Packages0

Uh oh!

Contributors7

Uh oh!

Languages

`http_url`

`authorization` (optional)

`axios_config` (optional)

`downloaded_filename`

`postprocess` (optional)

`mask` (optional)

`sql_connstring`

`sql_queryfile`

`downloaded_filename`

`typeorm_config` (optional)

`postprocess` (optional)

`delta_bytes`

Packages