zjunlp/MolGenPublic

NotificationsYou must be signed in to change notification settings
Fork13
Star153

[ICLR 2024] Domain-Agnostic Molecular Generation with Chemical Feedback

License

MIT license

153 stars 13 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
MolGen		MolGen
fig		fig
moldata		moldata
LICENSE		LICENSE
README.md		README.md
molgen.png		molgen.png

Repository files navigation

⚗️ MolGen

Domain-Agnostic Molecular Generation with Chemical Feedback

📃Paper • 🤗Model • 🔬Space

🔔 News

2024-2 We've releasedChatCell, a new paradigm that leverages natural language to make single-cell analysis more accessible and intuitive. Please visit ourhomepage andGithub page for more information.
2024-1 Our paperDomain-Agnostic Molecular Generation with Chemical Feedback is accepted by ICLR 2024.
2024-1 Our paperMol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models is accepted by ICLR 2024.
2023-10 We open-sourceMolGen-7b, which now supports de novo molecule generation!
2023-6 We open-sourceKnowLM, a knowledgeable LLM framework with pre-training and instruction fine-tuning code (supports multi-machine multi-GPU setup).
2023-6 We releaseMol-Instructions, a large-scale biomolecule instruction dataset for large language models.
2023-5 We proposeKnowledge graph-enhanced molecular contrAstive learning with fuNctional prOmpt (KANO) onNature Machine Intelligence, exploiting fundamental domain knowledge in both pre-training and fine-tuning.
2023-4 We provide a NLP for science paper-list athttps://github.com/zjunlp/NLP4Science_Papers.
2023-3 We release our pre-trained and fine-tuned model on 🤗 Hugging Face atMolGen-large andMolGen-large-opt.
2023-2 We provide a demo on 🤗 Hugging Face atSpace.

📕 Requirements

To run the codes, You can configure dependencies by restoring our environment:

conda env create -f environment.yaml

and then：

conda activate my_env

📚 Resource Download

You can download the pre-trained and fine-tuned models via Huggingface:MolGen-large andMolGen-large-opt.

You can also download the model using the following link:https://drive.google.com/drive/folders/1Eelk_RX1I26qLa9c4SZq6Tv-AAbDXgrW?usp=sharing

Moreover, the dataset used for downstream tasks can be foundhere.

The expected structure of files is:

moldata├── checkpoint │   ├── molgen.pkl              # pre-trained model│   ├── syn_qed_model.pkl       # fine-tuned model for QED optimization on synthetic data│   ├── syn_plogp_model.pkl     # fine-tuned model for p-logP optimization on synthetic data│   ├── np_qed_model.pkl        # fine-tuned model for QED optimization on natural product data│   ├── np_plogp_model.pkl      # fine-tuned model for p-logP optimization on natural product data├── finetune│   ├── np_test.csv             # nature product test data│   ├── np_train.csv            # nature product train data│   ├── plogp_test.csv          # synthetic test data for plogp optimization│   ├── qed_test.csv            # synthetic test data for plogp optimization│   └── zinc250k.csv            # synthetic train data├── generate                    # generate molecules├── output                      # molecule candidates└── vocab_list    └── zinc.npy                # SELFIES alphabet

🚀 How to run

Fine-tune
- First, preprocess the finetuning dataset by generating candidate molecules using our pre-trained model. The preprocessed data will be stored in the folderoutput.
```
cd MolGen    bash preprocess.sh
```
- Then utilize the self-feedback paradigm. The fine-tuned model will be stored in the foldercheckpoint.
```
    bash finetune.sh
```
Generate
To generate molecules, run this script. Please specify thecheckpoint_path to determine whether to use the pre-trained model or the fine-tuned model.
```
cd MolGenbash generate.sh
```

🥽 Experiments

We conduct experiments on well-known benchmarks to confirm MolGen's optimization capabilities, encompassing penalized logP, QED, and molecular docking properties. For detailed experimental settings and analysis, please refer to ourpaper.

MolGen captures real-word molecular distributions

MolGen mitigates molecular hallucinations

Targeted molecule discovery

Constrained molecular optimization

Citation

If you use or extend our work, please cite the paper as follows:

@inproceedings{fang2023domain,author       ={Yin Fang and                  Ningyu Zhang and                  Zhuo Chen and                  Xiaohui Fan and                  Huajun Chen},title        ={Domain-Agnostic Molecular Generation with Chemical feedback},booktitle    ={{ICLR}},publisher    ={OpenReview.net},year         ={2024},url          ={https://openreview.net/pdf?id=9rPyHyjfwP}}

About

[ICLR 2024] Domain-Agnostic Molecular Generation with Chemical Feedback

huggingface.co/spaces/zjunlp/MolGen

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

⚗️ MolGen

Domain-Agnostic Molecular Generation with Chemical Feedback

🔔 News

📕 Requirements

📚 Resource Download

🚀 How to run

Fine-tune

Generate

🥽 Experiments

MolGen captures real-word molecular distributions

MolGen mitigates molecular hallucinations

Targeted molecule discovery

Constrained molecular optimization

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages

Contributors3

Languages

Movatterモバイル変換

License

zjunlp/MolGen

Folders and files

Latest commit

History

Repository files navigation

⚗️ MolGen

Domain-Agnostic Molecular Generation with Chemical Feedback

🔔 News

📕 Requirements

📚 Resource Download

🚀 How to run

Fine-tune

Generate

🥽 Experiments

MolGen captures real-word molecular distributions

MolGen mitigates molecular hallucinations

Targeted molecule discovery

Constrained molecular optimization

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages0

Contributors3

Languages

Packages