This repository was archived by the owner on Nov 29, 2022. It is now read-only.

catalyst-cooperative/pudl-scrapersPublic archive

NotificationsYou must be signed in to change notification settings
Fork3
Star3

Scrapers used to acquire snapshots of raw data inputs for versioned archiving and replicable analysis.

License

MIT license

3 stars 3 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 204 Commits
.github		.github
src/pudl_scrapers		src/pudl_scrapers
tests		tests
.codecov.yml		.codecov.yml
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE.txt		LICENSE.txt
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
scrape_everything.sh		scrape_everything.sh
scrapy.cfg		scrapy.cfg
setup.py		setup.py
tox.ini		tox.ini

Repository files navigation

PUDL Scrapers

Deprecated

This repo has been replaced by the newpudl-archiver repo, which combines both the scraping andd archiving process.

Installation

We recommend using conda to create and manage your environment.

Run:

conda env create -f environment.ymlconda activate pudl-scrapers

Output location

Logs are collected:[your home]/Downloads/pudl_scrapers/scraped/

Data from the scrapers is stored:[your home]/Downloads/pudl_scrapers/scraped/[source_name]/[today #]

Running the scrapers

The general pattern isscrapy crawl [source_name] for one of the supportedsources. Typically and additional "year" argument is available, in the formscrapy crawl [source_name] -a year=[year].

See below for exact commands and available arguments.

2010 Census DP1 GeoDatabase

scrapy crawl censusdp1tract

No other options.

EPA CEMS

For full instructions:

epacems --help

EIA Bulk Electricity Data

eia_bulk_elec

No other options.

EPA CAMD to EIA Crosswalk

To collect the data and field descriptions:

scrapy crawl epacamd_eia

EIA860

To collect all the data:

scrapy crawl eia860

To collect a specific year (eg, 2007):

scrapy crawl eia860 -a year=2007

EIA860M

To collect all the data:

scrapy crawl eia860m

To collect a specific month & year (eg, August 2020):

scrapy crawl eia860 -a month=August -a year=2020

EIA861

To collect all the data:

scrapy crawl eia861

To collect a specific year (eg, 2007):

scrapy crawl eia861 -a year=2007

EIA923

To collect all the data:

scrapy crawl eia923

To collect a specific year (eg, 2007):

scrapy crawl eia923 -a year=2007

FERC Forms 1, 2, 6, & 60:

To collect all the data:

scrapy crawl ferc1scrapy crawl ferc2scrapy crawl ferc6scrapy crawl ferc60

There are no subsets enabled.

FERC 714

To collect the data:

scrapy crawl ferc714

There are no subsets, that's it.

About

Scrapers used to acquire snapshots of raw data inputs for versioned archiving and replicable analysis.

Releases

No releases published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

PUDL Scrapers

Deprecated

Installation

Output location

Running the scrapers

2010 Census DP1 GeoDatabase

EPA CEMS

EIA Bulk Electricity Data

EPA CAMD to EIA Crosswalk

EIA860

EIA860M

EIA861

EIA923

FERC Forms 1, 2, 6, & 60:

FERC 714

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Contributors10

Languages

Movatterモバイル変換

License

catalyst-cooperative/pudl-scrapers

Folders and files

Latest commit

History

Repository files navigation

PUDL Scrapers

Deprecated

Installation

Output location

Running the scrapers

2010 Census DP1 GeoDatabase

EPA CEMS

EIA Bulk Electricity Data

EPA CAMD to EIA Crosswalk

EIA860

EIA860M

EIA861

EIA923

FERC Forms 1, 2, 6, & 60:

FERC 714

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Contributors10

Languages