Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
NotificationsYou must be signed in to change notification settings

XinhaoLi74/MolPMoFiT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Implementation ofInductive transfer learning for Molecular Activity Prediction: Next-Gen QSAR Models with MolPMoFiT

MolecularPredictionModelFine-Tuning (MolPMoFiT) is a transfer learning method based on self-supervised pre-training + task-specific fine-tuning for QSPR/QSAR modeling.

MolPMoFiT is adapted from theULMFiT using Pytorch andFastai v1. A large-scale molecular structure prediction model is pre-trained using one million unlabeled molecules from ChEMBL in a self-supervised learning manner, and can then be fine-tuned on various QSPR/QSAR tasks for smaller chemical datasets with a specific endpoints.

UMSPMFiT Overview

Enviroment

We recommand to build the enviroment withConda.

conda env create -f molpmofit.yml

Datasets

We provide all the datasets needed to reproduce the experiments in thedata folder.

  • data/MSPM contains the dataset to train the general domain molecular structure prediction model.
  • data/QSAR contains the datasets for QSAR tasks.

Experiments

The code is provided asjupyter notebook in thenotebooks folder. All the code was developed in a Ubuntu 18.04 workstation with 2 Quadro P4000 GPUs.

  1. 01_MSPM_Pretraining.ipynb: Training the general domain molecular structure prediction model(MSPM).
  2. 02_MSPM_TS_finetuning.ipynb: (1) Fine-tuning the general MSPM on a target dataset to generate a task-specific MSPM model. (2) Fine-tuning the task-specific MSPM to tran a QSAR model.
  3. 03_QSAR_Classifcation.ipynb: Fine-tuning the general domain MSPM to train a classification model.
  4. 04_QSAR_Regression.ipynb: Fine-tuning the general domain MSPM to train a regression model.

Pre-trained Models Download

  1. DownloadChEMBL_1M_atom. Seenotebooks/05_Pretrained_Models.ipynb for instructions of usage.

    • This model is trained on 1M ChEMBL molecules with the atomwise tokenization method (original MoPMoFiT).
  2. DownloadChEMBL_1M_SPE. Seenotebooks/06_SPE_Pretrained_Models.ipynb for instructions of usage.

    • This model is trained on 1M ChEMBL molecules with the SMILES pair encoding tokenization method.
    • SMILES Pair Encoding (SmilesPE) is A Data-Driven Substructure Tokenization Algorithm for Deep Learning.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp