Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[ICLR 2022] Data-Efficient Graph Grammar Learning for Molecular Generation

License

NotificationsYou must be signed in to change notification settings

gmh14/data_efficient_grammar

Repository files navigation

This repository contains the implementation code for paperData-Efficient Graph Grammar Learning for Molecular Generation (ICLR 2022 oral).

In this work, we propose a data-efficient generative model (DEG) that can be learned from datasets with orders ofmagnitude smaller sizes than common benchmarks. At the heart of this method is a learnable graph grammar that generates molecules from a sequence of production rules. Our learned graph grammar yields state-of-the-art results on generating high-quality molecules forthree monomer datasets that contain only ∼20 samples each.

overview

Installation

Prerequisites

  • Retro*: The training of our DEG relies onRetro* to calculate the metric. Follow the instructionhere to install.

  • Pretrained GNN: We usethis codebase for the pretrained GNN used in our paper. The necessary code & pretrained models are built in the current repo.

Conda

You can useconda to install the dependencies for DEG from the providedenvironment.yml file, which can give you the exact python environment we run the code for the paper:

git clone git@github.com:gmh14/data_efficient_grammar.gitcd data_efficient_grammarconda env create -f environment.ymlconda activate DEGpip install -e retro_star/packages/mlp_retrosynpip install -e retro_star/packages/rdchiral

Note: it may take a decent amount of time to build necessary wheels using conda.

InstallRetro*:

  • Download and unzip the files from thislink,and put all the folders (dataset/,one_step_model/ andsaved_models/) under theretro_star directory.

  • Install dependencies:

conda deactivateconda env create -f retro_star/environment.ymlconda activate retro_star_envpip install -e retro_star/packages/mlp_retrosynpip install -e retro_star/packages/rdchiralpip install setproctitle

Train

For Acrylates, Chain Extenders, and Isocyanates,

conda activate DEGpython main.py --training_data=./datasets/**dataset_path**

where**dataset_path** can beacrylates.txt,chain_extenders.txt, orisocyanates.txt.

For Polymer dataset,

conda activate DEGpython main.py --training_data=./datasets/polymers_117.txt --motif

SinceRetro* is a major bottleneck of the training speed, we separate it from the main process, run multipleRetro* processes, and use file communication to evaluate the generated grammar during training. This is a compromise on the inefficiency of the built-in python multiprocessing package. We need to run the following command in another terminal window,

conda activate retro_star_envbash retro_star_listener.sh**num_processes**

Note: opening multipleRetro* is EXTREMELY memory consuming (~5G each). We suggest to start from using only one process bybash retro_star_listener.sh 1 and monitor the memory usage, then accordingly increase the number to maximize the efficiency. We use35 in the paper.

After finishing the training, to kill all the generated processes related toRetro*, run

killall retro_star_listener

Use DEG

Download and unzip the log & checkpoint files from thislink. Seevisualization.ipynb for more details.

Acknowledgements

The implementation of DEG is partly based onMolecular Optimization Using Molecular Hypergraph Grammar andHierarchical Generation of Molecular Graphs using Structural Motifs.

Citation

If you find the idea or code useful for your research, please citeour paper:

@inproceedings{guo2021data,title={Data-Efficient Graph Grammar Learning for Molecular Generation},author={Guo, Minghao and Thost, Veronika and Li, Beichen and Das, Payel and Chen, Jie and Matusik, Wojciech},booktitle={International Conference on Learning Representations},year={2021}}

Contact

Please contactguomh2014@gmail.com if you have any questions. Enjoy!

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp