CYF2000127/MolNexTRPublic

NotificationsYou must be signed in to change notification settings
Fork30
Star132

This is the official code of the paper "MolNexTR: a generalized deep learning model for molecular image recognition"

License

Apache-2.0 license

132 stars 30 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 206 Commits
MolNexTR		MolNexTR
examples		examples
exps		exps
figure		figure
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
evaluate.py		evaluate.py
main.py		main.py
prediction.ipynb		prediction.ipynb
prediction.py		prediction.py
requirements.txt		requirements.txt
visualization.ipynb		visualization.ipynb

Repository files navigation

MolNexTR

This is the official code of the paper "MolNexTR: a generalized deep learning model for molecular image recognition".

✨ Highlights

In this work, We propose MolNexTR, a novel graph generation model. The model follows the encoder-decoder architecture, takes three-channel molecular images as input, outputs molecular graph structure prediction, and can be easily converted to SMILES. We aim to enhance the robustness and generalization of the molecular structure recognition model by enhancing the feature extraction ability of the model and the augmentation strategy, to deal with any molecular images that may appear in the real literature.

Overview of our MolNexTR model.

🚀 Using the code and the model

Using the code

Clone the following repositories:

git clone https://github.com/CYF2000127/MolNexTR

Example usage of the model

First create and activate aconda environment with the following command in a Linux, Windows, or MacOS environment (Linux is the most recommended):

conda create -n molnextr python=3.8conda activate molnextr

Then Install requirements:

pip install -r requirements.txt

Alternatively, directly use the following command:

conda env create -f environment.yml

Download the model checkpoint from ourHugging Face Repo or Zenodo Repo: and put in your own path
Run the following code to predict molecular images:

importtorchfromMolNexTRimportmolnextrImage='./examples/1.png'Model='./checkpoints/molnextr_best.pth'device=torch.device('cpu')model=molnextr(Model,device)predictions=model.predict_final_results(Image,return_atoms_bonds=True)print(predictions)

or useprediction.ipynb. You can also change the image and model path to your own images and models.

The input is a molecular image:

Example input molecular image.

The output dictionary includes the atom sets, bond sets, predicted MolFile, and predicted SMILES:

{    'atom_sets':  [                  {'atom_number': '0', 'symbol': 'Ph', 'coords': (0.143, 0.349)},                  {'atom_number': '1', 'symbol': 'C', 'coords': (0.286, 0.413)},                  {'atom_number': '2', 'symbol': 'C', 'coords': (0.429, 0.349)}, ...                   ],    'bonds_sets': [                  {'atom_number': '0', 'bond_type': 'single', 'endpoints': (0, 1)},                  {'atom_number': '1', 'bond_type': 'double', 'endpoints': (1, 2)},                   {'atom_number': '1', 'bond_type': 'single', 'endpoints': (1, 5)},                   {'atom_number': '2', 'bond_type': 'single', 'endpoints': (2, 3)}, ...                  ],    'predicted_molfile': '2D\n\n 11 12  0  0  0  0  0  0  0  0999 V2000 ...',    'predicted_smiles': 'COC1CCCc2oc(-c3ccccc3)cc21'}

🔥 Experiments

Data preparation

For training and inference, please download the following datasets to your own path.

Training datasets

Synthetic:PubChem
Realistic:USPTO

Testing datasets

Synthetic:Indigo, ChemDraw
Realistic:CLEF, UOB, USPTO, JPO, Staker, ACS
Perturbed by image transform:CLEF, UOB, USPTO, JPO, Staker, ACS
Perturbed by curved arrows:CLEF, UOB, USPTO, JPO, Staker, ACS

Train

Note: we recommend users to use linux to train the model.Run the following command:

sh ./exps/train.sh

The default batch size was set to 256. And it takes about 20 hours to train with 10 NVIDIA RTX 3090 GPUs. Please modify the corresponding parameters according to your hardware configuration.

Inference

Run the following command:

sh ./exps/eval.sh

The default batch size was set to 32 with a single NVIDIA RTX 3090 GPU. Please modify the corresponding parameters according to your hardware configuration.The outputs include the main metrics we used, such as SMILES and graph exact matching accuracy.

Prediction

Run the following command:

python prediction.py --model_path your_model_path --image_path your_image_path

Visualization

Usevisualization.ipynb to visualize the ground truth and the predictions.

We also show some qualitative results of our method below:

Qualitative results of our method on ACS.

Qualitative results of our method on some hand-drawn molecular images.

✅ Citation

Chen, Yufan, et al. "MolNexTR: a generalized deep learning model for molecular image recognition." Journal of Cheminformatics 16.1 (2024): 141.

@article{chen2024molnextr,title={MolNexTR: a generalized deep learning model for molecular image recognition},author={Chen, Yufan and Leung, Ching Ting and Huang, Yong and Sun, Jianwei and Chen, Hao and Gao, Hanyu},journal={Journal of Cheminformatics},volume={16},number={1},pages={141},year={2024},publisher={Springer}}

About

This is the official code of the paper "MolNexTR: a generalized deep learning model for molecular image recognition"

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

MolNexTR

✨ Highlights

🚀 Using the code and the model

Using the code

Example usage of the model

🔥 Experiments

Data preparation

Training datasets

Testing datasets

Train

Inference

Prediction

Visualization

✅ Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors2

Uh oh!

Languages

Movatterモバイル変換

License

CYF2000127/MolNexTR

Folders and files

Latest commit

History

Repository files navigation

MolNexTR

✨ Highlights

🚀 Using the code and the model

Using the code

Example usage of the model

🔥 Experiments

Data preparation

Training datasets

Testing datasets

Train

Inference

Prediction

Visualization

✅ Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors2

Uh oh!

Languages

Packages