- Notifications
You must be signed in to change notification settings - Fork30
This is the official code of the paper "MolNexTR: a generalized deep learning model for molecular image recognition"
License
CYF2000127/MolNexTR
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is the official code of the paper "MolNexTR: a generalized deep learning model for molecular image recognition".
In this work, We propose MolNexTR, a novel graph generation model. The model follows the encoder-decoder architecture, takes three-channel molecular images as input, outputs molecular graph structure prediction, and can be easily converted to SMILES. We aim to enhance the robustness and generalization of the molecular structure recognition model by enhancing the feature extraction ability of the model and the augmentation strategy, to deal with any molecular images that may appear in the real literature.
Clone the following repositories:
git clone https://github.com/CYF2000127/MolNexTR
- First create and activate aconda environment with the following command in a Linux, Windows, or MacOS environment (Linux is the most recommended):
conda create -n molnextr python=3.8conda activate molnextr
- Then Install requirements:
pip install -r requirements.txt
Alternatively, directly use the following command:
conda env create -f environment.yml
Download the model checkpoint from ourHugging Face Repo or Zenodo Repo:
and put in your own path
Run the following code to predict molecular images:
importtorchfromMolNexTRimportmolnextrImage='./examples/1.png'Model='./checkpoints/molnextr_best.pth'device=torch.device('cpu')model=molnextr(Model,device)predictions=model.predict_final_results(Image,return_atoms_bonds=True)print(predictions)
or useprediction.ipynb
. You can also change the image and model path to your own images and models.
The input is a molecular image:
{ 'atom_sets': [ {'atom_number': '0', 'symbol': 'Ph', 'coords': (0.143, 0.349)}, {'atom_number': '1', 'symbol': 'C', 'coords': (0.286, 0.413)}, {'atom_number': '2', 'symbol': 'C', 'coords': (0.429, 0.349)}, ... ], 'bonds_sets': [ {'atom_number': '0', 'bond_type': 'single', 'endpoints': (0, 1)}, {'atom_number': '1', 'bond_type': 'double', 'endpoints': (1, 2)}, {'atom_number': '1', 'bond_type': 'single', 'endpoints': (1, 5)}, {'atom_number': '2', 'bond_type': 'single', 'endpoints': (2, 3)}, ... ], 'predicted_molfile': '2D\n\n 11 12 0 0 0 0 0 0 0 0999 V2000 ...', 'predicted_smiles': 'COC1CCCc2oc(-c3ccccc3)cc21'}
For training and inference, please download the following datasets to your own path.
- Synthetic:Indigo, ChemDraw
- Realistic:CLEF, UOB, USPTO, JPO, Staker, ACS
- Perturbed by image transform:CLEF, UOB, USPTO, JPO, Staker, ACS
- Perturbed by curved arrows:CLEF, UOB, USPTO, JPO, Staker, ACS
Note: we recommend users to use linux to train the model.Run the following command:
sh ./exps/train.sh
The default batch size was set to 256. And it takes about 20 hours to train with 10 NVIDIA RTX 3090 GPUs. Please modify the corresponding parameters according to your hardware configuration.
Run the following command:
sh ./exps/eval.sh
The default batch size was set to 32 with a single NVIDIA RTX 3090 GPU. Please modify the corresponding parameters according to your hardware configuration.The outputs include the main metrics we used, such as SMILES and graph exact matching accuracy.
Run the following command:
python prediction.py --model_path your_model_path --image_path your_image_path
Usevisualization.ipynb
to visualize the ground truth and the predictions.
We also show some qualitative results of our method below:
Qualitative results of our method on some hand-drawn molecular images.
Chen, Yufan, et al. "MolNexTR: a generalized deep learning model for molecular image recognition." Journal of Cheminformatics 16.1 (2024): 141.
@article{chen2024molnextr,title={MolNexTR: a generalized deep learning model for molecular image recognition},author={Chen, Yufan and Leung, Ching Ting and Huang, Yong and Sun, Jianwei and Chen, Hao and Gao, Hanyu},journal={Journal of Cheminformatics},volume={16},number={1},pages={141},year={2024},publisher={Springer}}
About
This is the official code of the paper "MolNexTR: a generalized deep learning model for molecular image recognition"
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors2
Uh oh!
There was an error while loading.Please reload this page.