- Notifications
You must be signed in to change notification settings - Fork6
Screen scrapers relating to natural disasters. See their output inhttps://github.com/simonw/disaster-data/
License
simonw/irma-scrapers
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Screen scrapers relating to hurricane Irma. See their output inhttps://github.com/simonw/disaster-data/
The Irma Response project athttps://www.irmaresponse.org/ is a team ofvolunteers working together to make information available during and after thestorm. There is a huge amount of information out there, on many differentwebsites. The Irma API athttps://irma-api.herokuapp.com/ is an attempt togather key information in one place, verify it and publish it in a reuseableway.
To aid this effort, I've built a collection of screen scrapers that pull datafrom a number of different websites and APIs. That data is then stored in aGit repository, providing a clear history of changes made to the varioussources that are being tracked.
Some of the scrapers also publish their findings to Slack in a format designedto make it obvious when key events happen, such as new shelters being added orremoved from public listings.
A key goal of this screen scraping mechanism is to allow changes to theunderlying data sources to be tracked over time. This is achieved using git,via the GitHub API. Each scraper pulls down data from a source (an API or awebsite) and reformats that data into a sanitized JSON format. That JSON isthen written to the git repository. If the data has changed since the lasttime the scraper ran, those changes will be captured by git and made availablein the commit log.
Recent changes tracked by the scraper collection can be seen here:https://github.com/simonw/disaster-data/commits/master
The most complex code for most of the scrapers isn't in fetching the data:it's in generating useful, human-readable commit messages that summarize theunderlying change. For example, here is a commit message generated by thescraper that tracks thehttp://www.floridadisaster.org/shelters/summary.aspxpage:
florida-shelters.json: 2 shelters addedAdded shelter: Atwater Elementary School (Sarasota County)Added shelter: DEBARY ELEMENTARY SCHOOL (Volusia County)Change detected on http://www.floridadisaster.org/shelters/summary.aspx
The full commit also shows the changes to the underlying JSON, but the human-readable message provides enough information that people who are not JSON-literate programmers can still derive value from the commit.
https://github.com/simonw/disaster-data/commit/7919aeff0913ec26d1bea8dc
The Irma Response team use Slack to co-ordinate their efforts. You can jointheir Slack here:https://irma-response-slack.herokuapp.com/
Some of the scrapers publish detected changes in their data source to Slack,as links to the commits generated for each change. The human-readable messageis posted directly to the channel.
About
Screen scrapers relating to natural disasters. See their output inhttps://github.com/simonw/disaster-data/