- Notifications
You must be signed in to change notification settings - Fork0
github-scraper used to scan repos owned by an org, clone them locally, look for a Dockerfile, and extract the FROM into a nice CSV for management
License
psenger/github-scraper
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
github-scraper is used to scan repos owned by an org, clone them locally, look for a Dockerfile,extract theFROM (build) value into a nice CSV for management to use in its reports, or to finda container that is running at the wrong version without asking the Dev Ops guys to do it.
| Script | Purpose |
|---|---|
scraper.js | Pulls all the repo data belonging to the org ( as defined by type ) and stores the data in a file./data/<GITHUB-OUTFILE>. This file drives everything else. |
build-masterlist.js | This just reads./data/<GITHUB-OUTFILE> and builds a CSV file./data/<GITHUB-CSVFILE> |
build-inventory.js | Removes the directory./out/ which will be the clone directory, once cloned, scans all files for aDockerfile, reads them, and extracts^FROM\s+(.*)\s*$ to a report called./data/<GITHUB-INVENTORY> |
Required
A good internetnet connectionNode 15
Steps
- from the command prompt run
npm install - create a
.envfile with the environment variables listed inVariables - from the command prompt run
npm run build-masterlist - from the command prompt run
npm run scraper - from the command prompt run
npm run build-inventory - send your report to your boss, and then drink some coffee or reach out to me Philip A Sengerphilip.a.senger@cngrgroup.com for a job.
Refer toOctoKit for the Git hub api.
Refer todotenv for a better understanding of.env files
Refer toGithub Guides for Github
Refer toDocker Docs for Docker
This project uses.env
| Variable | Required | Default | Purpose |
|---|---|---|---|
| GITHUB-PAL-TOKEN | true | Personal access token (create) | |
| GITHUB-TIMEZONE | true | The time zone (list) | |
| GITHUB-ORG | true | The org to scan in the repos | |
| GITHUB-TYPE | true | Specifies the types of repositories you want returned. Can be one of all, public, private, forks, sources, member, internal. Default: all. If your organization is associated with an enterprise account using GitHub Enterprise Cloud or GitHub Enterprise Server 2.20+, type can also be internal. | |
| GITHUB-CSVFILE | false | ./data/data.csv | Builds a CSV master list file ( when build-masterlist is executed ) |
| GITHUB-OUTFILE | false | ./data/data.json | Output from the scraper command, a full listing from github. |
| GITHUB-INVENTORY | false | ./data/inventory.csv | the results of scanning files in github ( in this repo it is the Dockerfile FROM command ) |
| GITHUB-SKIP-NAMES | false | '' | any repos you want to skip while building the inventory. |
- The environment variables and expected chaining of data files is problematic.
- Might be nice to scan for repos owned by owners and or orgs.
- I think extracting the shell commands would be good, so you can make the code more reusable
- Naming convention is not so good.
- linting and tests would be good.
- update
build-masterlistto use the csv module and extract fields to environment variables. - change
GITHUB-ORGso it is defaulted toall
About
github-scraper used to scan repos owned by an org, clone them locally, look for a Dockerfile, and extract the FROM into a nice CSV for management
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.