mims-harvard/ProCyonPublic

NotificationsYou must be signed in to change notification settings
Fork10
Star35

ProCyon: A multimodal foundation model for protein phenotypes

License

MIT license

35 stars 10 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
assets		assets
configs		configs
examples		examples
procyon		procyon
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Repository files navigation

ProCyon: A multimodal foundation model for protein phenotypes

ProCyon is an open-source model for predicting protein phenotypes across scales.This repository provides the official implementation of the model as described in ouroverview page and ourpaper.Our associated HuggingFace collection containing model weights and datasets can be found at the following links:

Dataset:ProCyon-Instruct
Full model:ProCyon-Full
Benchmarking model:ProCyon-Split
Binding prediction model:ProCyon-Bind

Installation

Requirements:

CUDA toolkit, particularlynvcc
Sign up for Huggingface permissions for LLaMA-3 atthis link. You'll need this to use ProCyon-Full and ProCyon-Bind.

We recommend installing withuv, but install can also be done viapip alone. Theprocyon package used to interact with pre-trained models or train new models can be installed via

cd /path/to/ProCyon# RECOMMENDED: use uv to install. Two options depending on whether#              you want to use the default .venv virtual env that#              uv will create# OPTION 1: let uv create and manage the virtual enviroment, requires#           uv to already be installeduv sync --extra builduv sync --extra build --extra compileuv pip install -e .source .venv/bin/activate# OPTION 2: create virtual environment with choice of name and pathpython3 -m venv ./procyon_venvsource ./procyon_venv/bin/activatepython3 -m pip install uvuv pip install -r pyproject.toml --extra builduv pip install -r pyproject.toml --extra build --extra compileuv pip install -e .# OR if omitting uvpython3 pip install -e .

Installation withuv should take less than 10 minutes, depending on thespeed of your internet connection for downloading packages.

In addition to the package code, ProCyon also requires pre-trained weights for associatedmodels (e.g. Llama-3, ESM2) as well as access to the ProCyon-Instruct dataset.You'll need to request access to the LLaMA-3 model through the model pagehere.These dependencieswill all be stored in a single directory, which we denoteDATA_DIR.

DATA_DIR=/path/to/datamkdir $DATA_DIRcd $DATA_DIR# Clone ProCyon-Instruct dataset from HuggingFacegit clone git@hf.co:datasets/mims-harvard/ProCyon-Instruct# Clone model weights for associated Llama models from HuggingFace# Llama-3-8b for ProCyon-Fullcd /path/to/llama3/# Ensure you've signed up for LLaMA-3 accessgit clone https://huggingface.co/meta-llama/Meta-Llama-3-8Becho "LLAMA3_PATH=/path/to/llama3/Meta-Llama-3-8B" >> .env# Llama-2-7b for ProCyon-Splitcd ../llama-2-7b-hfgit clone git@hf.co:meta-llama/Llama-2-7b-hf# Add a `.env` file which the `procyon` package will use to find the `DATA_DIR`cd /path/to/ProCyonecho "DATA_DIR=\"$DATA_DIR\"" > .envecho "HOME_DIR=\"$(pwd)\"" >> .env

Version note: We are aware of a bug where havingtransformers>4.31.0 changes generated model outputs. Please ensure yourtransformers version is set to 4.31.0 (as in environment requirements) for inference of ProCyon.

Examples

For the core capabilities of ProCyon models, please see the provided demonotebooks. Both examples should run in less than 5 minutes depending on thespeed of your GPU.

To see how to perform benchmarking runs comparing the performance of ProCyon models tovarious other baselines and models, please see theexample configs and scriptsor theevaluation README.

For details on how to reproduce the various experiments and results in our manuscript, please seethereproducibility README.

For details on training a ProCyon model and example scripts, please see thetraining README.

Citation

@article {Queen2024.12.10.627665,  author = {Queen, Owen and Huang, Yepeng and Calef, Robert and Giunchiglia, Valentina and Chen, Tianlong and Dasoulas, George and Tai, LeAnn and Ektefaie, Yasha and Noori, Ayush and Brown, Joseph and Cobley, Tom and Hrovatin, Karin and Hartvigsen, Tom and Theis, Fabian and Pentelute, Bradley L. and Khurana, Vikram and Kellis, Manolis and Zitnik, Marinka},  title = {ProCyon: A multimodal foundation model for protein phenotypes},  elocation-id = {2024.12.10.627665},  year = {2024},  doi = {10.1101/2024.12.10.627665},  URL = {https://www.biorxiv.org/content/early/2024/12/15/2024.12.10.627665},  eprint = {https://www.biorxiv.org/content/early/2024/12/15/2024.12.10.627665.full.pdf},  journal = {bioRxiv}}

About

ProCyon: A multimodal foundation model for protein phenotypes

zitniklab.hms.harvard.edu/ProCyon

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

ProCyon: A multimodal foundation model for protein phenotypes

Installation

Examples

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors2

Uh oh!

Languages

Movatterモバイル変換

License

mims-harvard/ProCyon

Folders and files

Latest commit

History

Repository files navigation

ProCyon: A multimodal foundation model for protein phenotypes

Installation

Examples

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors2

Uh oh!

Languages

Packages