- Notifications
You must be signed in to change notification settings - Fork3
Scrapers used to acquire snapshots of raw data inputs for versioned archiving and replicable analysis.
License
catalyst-cooperative/pudl-scrapers
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This repo has been replaced by the newpudl-archiver repo, which combines both the scraping andd archiving process.
We recommend using conda to create and manage your environment.
Run:
conda env create -f environment.ymlconda activate pudl-scrapers
Logs are collected:[your home]/Downloads/pudl_scrapers/scraped/
Data from the scrapers is stored:[your home]/Downloads/pudl_scrapers/scraped/[source_name]/[today #]
The general pattern isscrapy crawl [source_name]
for one of the supportedsources. Typically and additional "year" argument is available, in the formscrapy crawl [source_name] -a year=[year]
.
See below for exact commands and available arguments.
scrapy crawl censusdp1tract
No other options.
For full instructions:
epacems --help
eia_bulk_elec
No other options.
To collect the data and field descriptions:
scrapy crawl epacamd_eia
To collect all the data:
scrapy crawl eia860
To collect a specific year (eg, 2007):
scrapy crawl eia860 -a year=2007
To collect all the data:
scrapy crawl eia860m
To collect a specific month & year (eg, August 2020):
scrapy crawl eia860 -a month=August -a year=2020
To collect all the data:
scrapy crawl eia861
To collect a specific year (eg, 2007):
scrapy crawl eia861 -a year=2007
To collect all the data:
scrapy crawl eia923
To collect a specific year (eg, 2007):
scrapy crawl eia923 -a year=2007
To collect all the data:
scrapy crawl ferc1scrapy crawl ferc2scrapy crawl ferc6scrapy crawl ferc60
There are no subsets enabled.
To collect the data:
scrapy crawl ferc714
There are no subsets, that's it.
About
Scrapers used to acquire snapshots of raw data inputs for versioned archiving and replicable analysis.