PowerGenome/PowerGenomePublic

NotificationsYou must be signed in to change notification settings
Fork77
Star223

A tool to quickly and easily create inputs for power systems models

License

MIT license

223 stars 77 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1,824 Commits
.github/workflows		.github/workflows
bin		bin
data		data
docs/_static		docs/_static
example_systems		example_systems
notebooks		notebooks
powergenome		powergenome
tests		tests
wiki		wiki
.codecov.yml		.codecov.yml
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.travis.yml		.travis.yml
.zenodo.json		.zenodo.json
LICENSE.md		LICENSE.md
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Repository files navigation

PowerGenome

The code and data for PowerGenome are under active development and some changes may break existing functions. Keep up to date with major code and data releases by joiningPowerGenome on groups.io. Andcheck out the growing documentation on theWiki for helpful background information.

Power system optimization models can be used to explore the cost and emission implications of different regulations in future energy systems. One of the most difficult parts of running these models is assembling all the data. A typical model will define several regions, each of which need data such as:

All existing generating units (perhaps grouped into a few discrete clusters within each region)
Transmission constraints between regions
Hourly load profiles (including new loads from vehicle and building electrification)
Hourly generation profiles for wind & solar
Cost estimates for new generating units

Because computational complexity and run times increase as the number of regions and generating unit clusters increases, a user might want only want to disaggregate regions and generating units close to the primary region of interest. For example, a study focused on clean electricity regulations in New Mexico might combine several states in the Pacific Northwest into a single region while also splitting Arizona combined cycle units into multiple clusters.

The goal of PowerGenome is to let a user make all of these choices in a settings file and then run a single script that generates input files for the power system model. PowerGenome currently generates input files forGenX, and we hope to expand to other models in the near future.

Data

PowerGenome uses data from a number of different sources, including EIA, NREL, and EPA. The data are accessed through a combination of sqlite databases, CSV files, and parquet data files. All data filesare available here.

EIA data on existing generating units are already compiled into asingle sqlite database (PUDL) (see instructions for using it below). This file is available at the link above or you can download it from the Zenodo repository.
A second sqlite database (pg_misc_tables_efs_2023_2.sqlite) has tables with new resource costs from NREL ATB, transmission constraints between IPM regions from EIA, and hourly demand within each IPM region derived from NREL or FERC data.
The hourly incremental demand for different flexible demand technologies, and stock values across a range of projection scenarios (efs_files_utc).

PUDL Dependency

This project pulls data fromPUDL. As such, it requires installation of PUDL to access a normalized sqlite database and some of the convienience PUDL functions.

catalystcoop.pudl is included in theenvironment.yml file and will be installed automatically in the conda environment (see instructions below). Catalyst Cooperative will be creating versioned data releases of PUDL, which can beaccessed on Zenodo. Download the zip file from Zenodo, unzip it, and find the sqlite database underpudl_data/sqlite/pudl.sqlite. Note that the version ofcatalystcoop.pudl software may change based on the database version you use. Look on the right-hand side of the zenodo archive to see what software version was used to compile the data. If the version in your conda environment does not match the version used to compile the data, you can change it in theenvironment.yml file or install adifferent version usingmamba install catalystcoop.pudl=<your_version>.

Installation from GitHub

Clone this repository to your local machine and navigate to the top level (PowerGenome) folder.
Use the providedenvironment.yml file to create a conda environment namedpowergenome. If you don't already use conda it is easiest to download and installMiniconda.

conda env create -f environment.yml

Activate thepowergenome environment.

conda activate powergenome

pip-install an editable version of this project

pip install -e.

Download the PUDL databasefrom Zenodo or thePowerGenome data repository, unzip it, and copy the/pudl_data/sqlite/pudl.sqlite to wherever you would like to store PowerGenome data on your computer. The zip file contains other data sets that aren't needed for PowerGenome and can be deleted. Note that as of May 2023 the most recent version of this database (v2022.11.30) is compatible withcatalystcoop.pudl version v2022.11.30 and may not work if an earlier software version is included in your conda environment.
Download the additional PowerGenome database from thePowerGenome data repository. It includes NREL ATB cost data, transmission constraints between IPM regions, and hourly demand for each IPM region. Hourly demand is based on a 2012 weather year and was constructed either directly from FERC 714 data (load_curves_ferc) or from NREL EFS data (load_curves_nrel_efs) that also sources back to FERC 714. The NREL load curves, which separate hourly demand by sector and subsector, are now the default source for load curves in PowerGenome. Seethe wiki for more information. These files will eventually be provided through a data repository with citation information.
Download the appropriate renewable resource data files from thePowerGenome data repository. There is a single set of generation profiles and resource group folders specific to different regional aggregations. Read through theincluded README for more background. This folder contains:

generation_profiles can be saved in a single place and used across multiple studies.
Each of the folders underresource_groups has CSV files that tell PowerGenome the metro that each potential wind/solar site will deliver power to based on a set of regional aggregations. Use the corresponding regional aggregations in your settings file. You can request new resource group files for different regional aggregations on the PowerGenomerepository discussion page

Download data files derived from NREL's EFS from thePowerGenome data repository. These provide hourly demand profiles for growing electrification technologies like electric vehicles and heat pumps and are used to both build up demand profiles in the future and create flexible demand resources that can shift their load.
Download distributed generation profiles from thePowerGenome data repository compiled from NREL Cambium 2022 scenarios.
Create the filePowerGenome/powergenome/.env. In this file, add:

PUDL_DB=YOUR_PATH_HERE (your path to the PUDL database downloaded in step 5)
PG_DB=YOUR_PATH_HERE (your path to the additional PowerGenome data downloaded in step 6)
RESOURCE_GROUP_PROFILES=YOUR_PATH_HERE (your path to the folder with hourly wind/solar generation parquet files)
EFS_DATA=YOUR_PATH_HERE (your path to the folder with EFS derived data files)
DISTRIBUTED_GEN_DATA=YOUR_PATH_HERE (your path to the folder with distributed generation profiles)
OPTIONAL:RESOURCE_GROUPS=YOUR_PATH_HERE (your path to the resource groups data for a project --this can be included in a settings file such as env.yml instead of the .env file)

Quotation marks are only needed if your values contain spaces. The.env file is included in.gitignore and will not be synced with the repository.

Installation with a packaged version (pip)

Installing Powergenome with pip has only been tested within a conda environment but it should work in other environment management systems. Make sure that you have an updated version of pip installed. If you hit dependency errors I suggest trying to install them using conda. PowerGenome hascatalystcoop.pudl as a dependency, which has a large number of its own dependencies.

Depending on your operating system you might also have issues installing some other packages from pip. The example code below is what works for me on a Mac, where python-snappy fails to build wheels.

(base) conda create -n powergenome python=3.10 pip python-snappy=0.6.1(base) conda activate powergenome(powergenome) pip install powergenome

If you are installing a packaged version of PowerGenome you won't be able to easily use a .env file. Instead, add the environment parameters (PUDL_DB,PG_DB, etc) to a YAML file in the same folder as the rest of your settings. It doesn't really matter which file these parameters are included in but creating a new file such asenv.yml will help keep them separate from other settings parameters that might be shared with other PowerGenome users.

Running code

Suggested folder structure

It is best practice to set up project folders outside of the cloned repository so that git doesn't track any new/changed files within the upper-levelPowerGenome folder. Try copying one of the example systems (settings file and extra inputs) and modifying it. Copy thenotebooks folder into your project folder, change the path to the settings file as needed, and run code in the notebooks. This can also be a good way to learn how data are created in PowerGenome and debug problem.

Keeping project folders separate from the clonedPowerGenome folder will also make it easier to pull changes as they are released.

Example systems

A few example systems are included underPowerGenome/example_systems. Each system has settings files in a folder (settings) and a folder with extra user inputs (extra_inputs). The different example systems are not meant to be accurate for real-world analysis, so please do not blindly use the external data files included with them in your own studies!

Settings

Settings are controlled in a set of YAML files within a folder or combined into a single file. An example folder of settings files (settings) and folder with extra user inputs (extra_inputs) are included in each of the example systems. Scenario options across different planning years are defined in the filetest_scenario_inputs.csv. Documentation on extra inputs is included in the folder of each example system.

Example notebooks

A series of example notebooks are included inPowerGenome/notebooks describe how to access different functions within PowerGenome to create resource clusters, variable generation profiles, fuel costs, hourly demand, and transmission constraints. They include a description of how the data are compiled and the settings parameters that are required for each type of data.

Command line interface

The outputs are all formatted for GenX we hope to make the data formatting code more module to allow users to easily switch between outputs for different power system models.

Functions from each module can be imported and used in an interactive environment (e.g. JupyterLab). Examples of how to load data in this way are included inPowerGenome/notebooks. To run from the command line, navigate to a project folder that contains a settings file and extra inputs (e.g.myproject/powergenome), activate thepowergenome conda environment, and use the commandrun_powergenome_multiple with flags for the settings file name and where the results should be saved. Since thepowergenome package is installed in thepowergenome conda environment, you can run the command line function from anywhere on your computer (not just within the clonedPowerGenome folder).

run_powergenome_multiple --settings_file settings --results_folder test_system

The command line arguments--settings_file and--results_folder can be shortened to-sf and-rf respectively. For all options, run:

run_powergenome_multiple --help

A folder with extra user inputs is required when using therun_powergenome_multiple command. The name of this folder is defined in the settings YAML file with theinput_folder parameter. Look at the files in each example system for test cases to follow.

If you have previously installed PowerGenome and therun_powergenome_multiple command doesn't work, try reinstalling it usingpip install -e . as described above. If you downloaded the custom PUDL database before May of 2020, some errors may be resolved by downloading a new version.

Licensing

PowerGenome is released under theMIT License. Most data inputs are from US government sources (EIA, EPA, FERC, etc), which should not besubject to copyright in the US. Hourly FERC demand data has been cleaned usingtechniques developed by Tyler Ruggles and David Farnham, and allocated to IPM regions usingmethods developed by Catalyst Cooperative. Hourly generation profiles for wind and solar resources were created byVibrant Clean Energy and provided without usage restrictions. All PowerGenome data outputs are released under theCC-BY-4.0 license.

Contributing

Contributions are welcome! There is significant work to do on this project and additional perspective on user needs will help make it better. If you see something that needs to be improved,open an issue. If you have questions or need assistance, joinPowerGenome on groups.io and post a message there.

Pull requests are always welcome. To start modifying/adding code, make a fork of this repository, create a new branch, andsubmit a pull request.

All code added to the project should be formatted withblack. After making a fork and cloning it to your own computer, runpre-commit install toinstall the git hook scripts that will run every time you make a commit. These hooks will automatically runblack (in case you forgot), fix trailing whitespace, check yaml formatting, etc.