NotificationsYou must be signed in to change notification settings
Fork21
Star46

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
notebooks		notebooks
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
TOC.PNG		TOC.PNG
molpmofit.yml		molpmofit.yml

Repository files navigation

MolPMoFiT

Implementation ofInductive transfer learning for Molecular Activity Prediction: Next-Gen QSAR Models with MolPMoFiT

MolecularPredictionModelFine-Tuning (MolPMoFiT) is a transfer learning method based on self-supervised pre-training + task-specific fine-tuning for QSPR/QSAR modeling.

MolPMoFiT is adapted from theULMFiT using Pytorch andFastai v1. A large-scale molecular structure prediction model is pre-trained using one million unlabeled molecules from ChEMBL in a self-supervised learning manner, and can then be fine-tuned on various QSPR/QSAR tasks for smaller chemical datasets with a specific endpoints.

Enviroment

We recommand to build the enviroment withConda.

conda env create -f molpmofit.yml

Datasets

We provide all the datasets needed to reproduce the experiments in thedata folder.

data/MSPM contains the dataset to train the general domain molecular structure prediction model.
data/QSAR contains the datasets for QSAR tasks.

Experiments

The code is provided asjupyter notebook in thenotebooks folder. All the code was developed in a Ubuntu 18.04 workstation with 2 Quadro P4000 GPUs.

01_MSPM_Pretraining.ipynb: Training the general domain molecular structure prediction model(MSPM).
02_MSPM_TS_finetuning.ipynb: (1) Fine-tuning the general MSPM on a target dataset to generate a task-specific MSPM model. (2) Fine-tuning the task-specific MSPM to tran a QSAR model.
03_QSAR_Classifcation.ipynb: Fine-tuning the general domain MSPM to train a classification model.
04_QSAR_Regression.ipynb: Fine-tuning the general domain MSPM to train a regression model.

Pre-trained Models Download

DownloadChEMBL_1M_atom. Seenotebooks/05_Pretrained_Models.ipynb for instructions of usage.
- This model is trained on 1M ChEMBL molecules with the atomwise tokenization method (original MoPMoFiT).
DownloadChEMBL_1M_SPE. Seenotebooks/06_SPE_Pretrained_Models.ipynb for instructions of usage.
- This model is trained on 1M ChEMBL molecules with the SMILES pair encoding tokenization method.
- SMILES Pair Encoding (SmilesPE) is A Data-Driven Substructure Tokenization Algorithm for Deep Learning.

About

No description, website, or topics provided.

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

MolPMoFiT

Enviroment

Datasets

Experiments

Pre-trained Models Download

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors2

Uh oh!

Languages

Movatterモバイル変換

XinhaoLi74/MolPMoFiT

Folders and files

Latest commit

History

Repository files navigation

MolPMoFiT

Enviroment

Datasets

Experiments

Pre-trained Models Download

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors2

Uh oh!

Languages

Packages