Catalyst Cooperative

Catalyst is a small data engineering cooperative working on electricity regulation and climate change.

Verified
We've verified that the organizationcatalyst-cooperative controls the domain:
- catalyst.coop
Learn more about verified organizations

Catalyst Cooperative is a data engineering and analysisconsultancy, specializing in energy system and utility financial data. Our currentfocus is on the US electricity and natural gas sectors. We primarily servenon-profit organizations, academic researchers, journalists, climate policy advocates,public policymakers, and occasionally smaller business users.

We believe public data should be freely available and easy to use by those working inthe public interest. Whenever possible, we release our software under theMITLicense, and our data products under theCreativeCommons Attribution 4.0 License

If you're interested inhiring usemailhello@catalyst.coop. Our current rate is$200/hr. We can often make acommodations for smaller/grassroots organizations andfrequently collaborate with open source contributors.

Contact Us 💌

For general support, questions, or other conversations about our workthat might be of interest to others, head over to ourGitHub Discussions
If you'd like to get (very) occasional updates about our worksign up for our email list.
Want to schedule a time to chat with us one-on-one about our software or data? Haveideas for improvement, or need to get some personalized support?Join us for OfficeHours
Follow us on BlueSky:@catalyst.coop
Connect with uson LinkedIn
Follow us on Mastodon:@CatalystCoop@mastodon.energy
Follow us on Twitter:@CatalystCoop
Subscribe toour channel on YouTube
Play with our data and notebookson Kaggle
Combine our data with ML modelson HuggingFace
Learn more about us on our website:https://catalyst.coop

Services We Provide

Programmatic acquisition, cleaning, and integration of public data sources.
Data oriented software development.
Compilation of new machine-readable data sources from regulatory filings, legislation,and other public information.
Data warehousing and dashboard development.
Both ad-hoc and replicable production data analysis.
Translation of existing ad-hoc data wrangling workflows into replicable data pipelineswritten in Python.
Reproducible data pipeline design, implementation, and ongoing maintenance.

Tools We Use 🔨 🔧

Python is our primary language for everything.
Pandas the swiss army knife of tabular data manipulationin Python.
duckdb as a performant, columnar, analysis oriented embedded database.The SQLite of analytical databases.
Dask to scale up data wrangling tasks we do with Pandasbeyond what can be done in memory.
Dagster for orchestrating and parallelizing our data pipelines.
SQLite for local storage and distribution of tabular,relational data.
Apache Parquet to persist larger data tables to disk.
JupyterLab for interactive data wrangling, exploration, andvisualizations.
Pydantic for managing and validating settingsand our collection of metadata.
Scikit Learn to construct machine learning pipelines.
Splink for fast, generalizedentity matching / record linkage.
MLFlow for ML experiment and artifact tracking, mostly in thecontext of our entity matching / record linkage work.
Google BigQuery to warehouse finished dataproducts for live access.
Google Batch to minimize the infrastructure weneed to manage for our nightly builds.
Pandera to specifiy dataframe schemas and datavalidations in conjunction with Dagster.
Hypothesis for more robust data-oriented unittesting.
Zenodo provides long-term,programmatically accessible, versioned archives of all our raw inputs.
Sphinx for buildingourdocumentation, incorporatingmuch of our structured metadata directly using Jinja templates.
TheFrictionless Framework as a standardinterchange model for tabular data.
Tableau for producing dashboards and interactive datavisualizations for client projects.
VS Code is our primary main code editor, ever moredeeply integrated with GitHub.
pre-commit to enforce code formatting and style standards.
We useGitHub Actions to run our continuousintegration and coordinate our nightly builds and data scraping jobs.

Tools We're Studying 🚧

Perspective for in-browser data analytics andvisualizations.
Pixi, a fast, ergonomic conda package managementcommand line tool.
Evidence,Rill,Apache Superset, andStreamlitas open source BI tools that play nice with revision control.
SQLModel to more easily unify our metadata anddatabase schema definitions withSQLAlchemy.
dbt to manage pure SQL data transformations whereappropriate within our larger Python based workflows.

Adjacent Projects 🧠

GridStatus
Interconnection.fyi
GridEmissions
PowerGenome from@gschivley
The Open Grid Emissions Initiativefrom@grgmiller &Singularity Energy
Pangeo Forge
DSIRE at North Carolina State University

Organizational Friends & Allies 💞

Funders & Clients 💰 💵

Business & Employment 🌲 🌲

Catalyst is ademocratic workplace and a member of theUSFederation of Worker Cooperatives. We exist to help our membersearn a decent living while working for a more just, livable, and sustainable world. Ourincome comes from a mix of grant funding and client work. We only work withmission-aligned clients.

We are an entirely remote organization, and have been since well before the coronaviruspandemic. Our members are scattered all across North America from Alaska to Mexico. Weenjoy a great deal of autonomy and flexibility in determining our own work-life balanceand schedules. Membership entails working a minimum of 1000 hours each year for theco-op.

As a small 100% employee-owned cooperative, we are able to compensate members through anunusual mix of wages and profit sharing, including:

An hourly wage (currently $36.75/hr)
Tax-deferred employer retirement plan contributions (proportional to wages, up to 25%of wages)
Tax-advantaged patronage dividends (proportional to hours worked, unlimited butsubject to profitability)

We also reimburse ourselves for expenses related to maintaining a home office, andprovide a monthly health insurance stipend.

Candidates must do at least 500 hours of contract work for the cooperative within oversix months, at which point they will be considered for membership.

Check our website to see if we're recruiting newmembers.

PinnedLoading

pudlpudlPublic
The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
Python 550 125
ferc-xbrl-extractorferc-xbrl-extractorPublic
A tool for converting FERC filings published in XBRL into SQLite databases
Python 13 3
pudl-archiverpudl-archiverPublic
A tool for capuring snapshots of public data sources and archiving them on Zenodo for programmatic use.
Python 14 6
pudl-examplespudl-examplesPublic
Example Jupyter notebooks hosted on Kaggle that demonstrate how to work with US energy data from PUDL.
Jupyter Notebook 19 5
catalystcoop-handbookcatalystcoop-handbookPublic
A readthedocs site containing Catalyst Cooperative policies.
Python 2 2

Repositories

Showing 10 of 80 repositories

pudl Public
The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
catalyst-cooperative/pudl’s past year of commit activity
Python 550MIT 125 445 (11 issues need help) 27 UpdatedJul 13, 2025
open-energy-data-for-all Public
catalyst-cooperative/open-energy-data-for-all’s past year of commit activity
Jupyter Notebook 10 2 5 UpdatedJul 11, 2025
pudl-examples Public
Example Jupyter notebooks hosted on Kaggle that demonstrate how to work with US energy data from PUDL.
catalyst-cooperative/pudl-examples’s past year of commit activity
Jupyter Notebook 19MIT 5 0 2 UpdatedJul 11, 2025
pudl-archiver Public
A tool for capuring snapshots of public data sources and archiving them on Zenodo for programmatic use.
catalyst-cooperative/pudl-archiver’s past year of commit activity
Python 14MIT 6 42 4 UpdatedJul 11, 2025
pudl-usage-metrics Public
A dagster ETL for collecting and cleaning PUDL usage metrics.
catalyst-cooperative/pudl-usage-metrics’s past year of commit activity
Python 2MIT0 27 2 UpdatedJul 10, 2025
ferc-xbrl-extractor Public
A tool for converting FERC filings published in XBRL into SQLite databases
catalyst-cooperative/ferc-xbrl-extractor’s past year of commit activity
Python 13MIT 3 7 0 UpdatedJul 7, 2025
rmi-energy-communities Public
Partnership between Catalyst and RMI to identify energy communities as defined by the Inflation Reduction Act
catalyst-cooperative/rmi-energy-communities’s past year of commit activity
Python 4MIT 2 5 9 UpdatedJul 7, 2025
cheshire Public template
A template repository to make setting up new Python projects easier and more uniform.
catalyst-cooperative/cheshire’s past year of commit activity
Python 5MIT 1 2 0 UpdatedJul 7, 2025
catalyst-cooperative.github.io Public
A top level GitHub Pages documentation index for Catalyst Cooperative.
catalyst-cooperative/catalyst-cooperative.github.io’s past year of commit activity
0MIT0 0 0 UpdatedJul 7, 2025
gridstatus Public Forked fromgridstatus/gridstatus
API to access energy data
catalyst-cooperative/gridstatus’s past year of commit activity
Python0BSD-3-Clause 69 0 1 UpdatedJul 7, 2025