Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

ProCyon: A multimodal foundation model for protein phenotypes

License

NotificationsYou must be signed in to change notification settings

mims-harvard/ProCyon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ProCyon logo

ProCyon is an open-source model for predicting protein phenotypes across scales.This repository provides the official implementation of the model as described in ouroverview page and ourpaper.Our associated HuggingFace collection containing model weights and datasets can be found at the following links:

Installation

Requirements:

  • CUDA toolkit, particularlynvcc
  • Sign up for Huggingface permissions for LLaMA-3 atthis link. You'll need this to use ProCyon-Full and ProCyon-Bind.

We recommend installing withuv, but install can also be done viapip alone. Theprocyon package used to interact with pre-trained models or train new models can be installed via

cd /path/to/ProCyon# RECOMMENDED: use uv to install. Two options depending on whether#              you want to use the default .venv virtual env that#              uv will create# OPTION 1: let uv create and manage the virtual enviroment, requires#           uv to already be installeduv sync --extra builduv sync --extra build --extra compileuv pip install -e .source .venv/bin/activate# OPTION 2: create virtual environment with choice of name and pathpython3 -m venv ./procyon_venvsource ./procyon_venv/bin/activatepython3 -m pip install uvuv pip install -r pyproject.toml --extra builduv pip install -r pyproject.toml --extra build --extra compileuv pip install -e .# OR if omitting uvpython3 pip install -e .

Installation withuv should take less than 10 minutes, depending on thespeed of your internet connection for downloading packages.

In addition to the package code, ProCyon also requires pre-trained weights for associatedmodels (e.g. Llama-3, ESM2) as well as access to the ProCyon-Instruct dataset.You'll need to request access to the LLaMA-3 model through the model pagehere.These dependencieswill all be stored in a single directory, which we denoteDATA_DIR.

DATA_DIR=/path/to/datamkdir $DATA_DIRcd $DATA_DIR# Clone ProCyon-Instruct dataset from HuggingFacegit clone git@hf.co:datasets/mims-harvard/ProCyon-Instruct# Clone model weights for associated Llama models from HuggingFace# Llama-3-8b for ProCyon-Fullcd /path/to/llama3/# Ensure you've signed up for LLaMA-3 accessgit clone https://huggingface.co/meta-llama/Meta-Llama-3-8Becho "LLAMA3_PATH=/path/to/llama3/Meta-Llama-3-8B" >> .env# Llama-2-7b for ProCyon-Splitcd ../llama-2-7b-hfgit clone git@hf.co:meta-llama/Llama-2-7b-hf# Add a `.env` file which the `procyon` package will use to find the `DATA_DIR`cd /path/to/ProCyonecho "DATA_DIR=\"$DATA_DIR\"" > .envecho "HOME_DIR=\"$(pwd)\"" >> .env

Version note: We are aware of a bug where havingtransformers>4.31.0 changes generated model outputs. Please ensure yourtransformers version is set to 4.31.0 (as in environment requirements) for inference of ProCyon.

Examples

For the core capabilities of ProCyon models, please see the provided demonotebooks. Both examples should run in less than 5 minutes depending on thespeed of your GPU.

To see how to perform benchmarking runs comparing the performance of ProCyon models tovarious other baselines and models, please see theexample configs and scriptsor theevaluation README.

For details on how to reproduce the various experiments and results in our manuscript, please seethereproducibility README.

For details on training a ProCyon model and example scripts, please see thetraining README.

Citation

@article {Queen2024.12.10.627665,  author = {Queen, Owen and Huang, Yepeng and Calef, Robert and Giunchiglia, Valentina and Chen, Tianlong and Dasoulas, George and Tai, LeAnn and Ektefaie, Yasha and Noori, Ayush and Brown, Joseph and Cobley, Tom and Hrovatin, Karin and Hartvigsen, Tom and Theis, Fabian and Pentelute, Bradley L. and Khurana, Vikram and Kellis, Manolis and Zitnik, Marinka},  title = {ProCyon: A multimodal foundation model for protein phenotypes},  elocation-id = {2024.12.10.627665},  year = {2024},  doi = {10.1101/2024.12.10.627665},  URL = {https://www.biorxiv.org/content/early/2024/12/15/2024.12.10.627665},  eprint = {https://www.biorxiv.org/content/early/2024/12/15/2024.12.10.627665.full.pdf},  journal = {bioRxiv}}

[8]ページ先頭

©2009-2025 Movatter.jp