Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

github-scraper used to scan repos owned by an org, clone them locally, look for a Dockerfile, and extract the FROM into a nice CSV for management

License

NotificationsYou must be signed in to change notification settings

psenger/github-scraper

Repository files navigation

Purpose

github-scraper is used to scan repos owned by an org, clone them locally, look for a Dockerfile,extract theFROM (build) value into a nice CSV for management to use in its reports, or to finda container that is running at the wrong version without asking the Dev Ops guys to do it.

ScriptPurpose
scraper.jsPulls all the repo data belonging to the org ( as defined by type ) and stores the data in a file./data/<GITHUB-OUTFILE>. This file drives everything else.
build-masterlist.jsThis just reads./data/<GITHUB-OUTFILE> and builds a CSV file./data/<GITHUB-CSVFILE>
build-inventory.jsRemoves the directory./out/ which will be the clone directory, once cloned, scans all files for aDockerfile, reads them, and extracts^FROM\s+(.*)\s*$ to a report called./data/<GITHUB-INVENTORY>

Running

Required

  • A good internetnet connection
  • Node 15

Steps

  1. from the command prompt runnpm install
  2. create a.env file with the environment variables listed inVariables
  3. from the command prompt runnpm run build-masterlist
  4. from the command prompt runnpm run scraper
  5. from the command prompt runnpm run build-inventory
  6. send your report to your boss, and then drink some coffee or reach out to me Philip A Sengerphilip.a.senger@cngrgroup.com for a job.

Additional Docs

Refer toOctoKit for the Git hub api.

Refer todotenv for a better understanding of.env files

Refer toGithub Guides for Github

Refer toDocker Docs for Docker

Variables

This project uses.env

VariableRequiredDefaultPurpose
GITHUB-PAL-TOKENtruePersonal access token (create)
GITHUB-TIMEZONEtrueThe time zone (list)
GITHUB-ORGtrueThe org to scan in the repos
GITHUB-TYPEtrueSpecifies the types of repositories you want returned. Can be one of all, public, private, forks, sources, member, internal. Default: all. If your organization is associated with an enterprise account using GitHub Enterprise Cloud or GitHub Enterprise Server 2.20+, type can also be internal.
GITHUB-CSVFILEfalse./data/data.csvBuilds a CSV master list file ( when build-masterlist is executed )
GITHUB-OUTFILEfalse./data/data.jsonOutput from the scraper command, a full listing from github.
GITHUB-INVENTORYfalse./data/inventory.csvthe results of scanning files in github ( in this repo it is the Dockerfile FROM command )
GITHUB-SKIP-NAMESfalse''any repos you want to skip while building the inventory.

Todo

  • The environment variables and expected chaining of data files is problematic.
  • Might be nice to scan for repos owned by owners and or orgs.
  • I think extracting the shell commands would be good, so you can make the code more reusable
  • Naming convention is not so good.
  • linting and tests would be good.
  • updatebuild-masterlist to use the csv module and extract fields to environment variables.
  • changeGITHUB-ORG so it is defaulted toall

About

github-scraper used to scan repos owned by an org, clone them locally, look for a Dockerfile, and extract the FROM into a nice CSV for management

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp