Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Machine Learning models for in vitro enzyme kinetic parameter prediction

License

MIT, MIT licenses found

Licenses found

MIT
LICENSE
MIT
LICENSE.txt
NotificationsYou must be signed in to change notification settings

maranasgroup/CatPred

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOIColabLicense


🚨 Announcements 📢

  • 28th Feb 2025 - Published inNature Communications
  • 27th Dec 2024 - Updated repository with scripts to reproduce results from the manuscript.
  • 🚧TODO
    • Add prediction codes for models using 3D-structural features.
    • Add instructions to install CatPred using a Docker image.

📚 Table of Contents


🌐 Google Colab Interface

For ease of use without any hardware requirements, a Google Colab interface is available here:tiny.cc/catpred.It contains sample data, instructions, and installation all in the Colab notebook.


💻 Local Installation

If you would like to install the package on a local machine, please follow the instructions below.

🖥️ System Requirements

  • For prediction: Any machine running a Linux-based operating system is recommended.
  • For training: A Linux-based operating system on a GPU-enabled machine is recommended.

Both training and prediction have been tested onUbuntu 20.04.5 LTS withNVIDIA A10 andCUDA Version: 12.0.

To train or predict with GPUs, you will need:

  • CUDA >= 11.7
  • cuDNN

📥 Installation

Both options requireconda, so first install Miniconda fromhttps://conda.io/miniconda.html.

Then proceed to either option below to complete the installation. If installing the environment with conda seems to be taking too long, you can also try runningconda install -c conda-forge mamba and then replacingconda withmamba in each of the steps below.

Note for machines with GPUs: You may need to manually install a GPU-enabled version of PyTorch by following the instructionshere. If you're encountering issues with not using a GPU on your system after following the instructions below, check which version of PyTorch you have installed in your environment usingconda list | grep torch or similar. If the PyTorch line includescpu, please uninstall it usingconda remove pytorch and reinstall a GPU-enabled version using the instructions at the link above.

Installing and Downloading Pre-trained Models (~10 mins)

mkdir catpred_pipeline catpred_pipeline/resultscd catpred_pipelinewget https://catpred.s3.us-east-1.amazonaws.com/capsule_data.tar.gztar -xzf capsule_data.tar.gzgit clone https://github.com/maranasgroup/catpred.gitcd catpredconda env create -f environment.ymlconda activate catpredpip install -e.

🔮 Prediction

The Jupyter Notebookbatch_demo.ipynb and the Python scriptdemo_run.py show the usage of pre-trained models for prediction.

🔄 Reproducing Publication Results

We provide three separate ways for reproducing the results of the publication.

1. Quick Method ⚡

Estimated run time: Few minutes

Run using:

./reproduce_quick.sh

For all results pertaining to CatPred, UniKP, DLKcat, and Baseline models, this method only uses pre-trained predictions and analyses to reproduce results of the publications, including all main and supplementary figures.

2. Prediction Method 🛠️

Estimated run time: Up to a day depending on your GPU

Run using:

./reproduce_prediction.sh

For results pertaining to CatPred, this method uses pre-trained models to perform predictions on test sets. For results pertaining to UniKP, DLKcat, and Baseline, this method uses only pre-trained predictions and analyses to reproduce results of the publications, including all main and supplementary figures.

3. Training Method 🏋️

Estimated run time: Up to 12-14 days depending on your GPU

Run using:

./reproduce_training.sh

For all results pertaining to CatPred, UniKP, DLKcat, and Baseline models, this method trains everything from scratch. Then, it uses the trained checkpoints to make predictions and analyzes them to reproduce results of the publications, including all main and supplementary figures.


🙏 Acknowledgements

We thank the authors of the following open-source repositories:

  • Chemprop - Majority of the functionality in this codebase has been inspired from theChemprop library.
  • Rotary PyTorch - The rotary positional embeddings functionality for Seq-Attn. is fromRotary PyTorch.
  • Progres - Protein Graph Embedding Search using pre-trained EGNN models fromProgres.

📜 License

This source code is licensed under the MIT license found in theLICENSE file in the root directory of this source tree.


📖 Citations

If you find the models useful in your research, we ask that you cite the relevant paper:

@article {Boorla2024.03.10.584340,author ={Veda Sheersh Boorla and Costas D. Maranas},title ={CatPred: A comprehensive framework for deep learning in vitro enzyme kinetic parameters kcat, Km and Ki},elocation-id ={2024.03.10.584340},year ={2024},doi ={10.1101/2024.03.10.584340},publisher ={Cold Spring Harbor Laboratory},URL ={https://www.biorxiv.org/content/early/2024/03/26/2024.03.10.584340},eprint ={https://www.biorxiv.org/content/early/2024/03/26/2024.03.10.584340.full.pdf},journal ={bioRxiv}}

[8]ページ先頭

©2009-2025 Movatter.jp