Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[NeurIPS 2022] Code for "Retrieve, Reason, and Refine: Generating Accurate and Faithful Discharge/Patient Instructions"

License

NotificationsYou must be signed in to change notification settings

AI-in-Health/Patient-Instructions

Repository files navigation

DOIGitHub Repo starsGitHub Repo forks

Code for our paper published in NeurIPS 2022[arXiv]:

Retrieve, Reason, and Refine: Generating Accurate and Faithful Patient Instructions

(a.k.a.,Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine)

Fenglin Liu, Bang Yang, Chenyu You, Xian Wu, Shen Ge, Zhangdaihong Liu, Xu Sun*, Yang Yang*, and David A. Clifton.

Updates

  • [22-10-25]: We release the code and data.

Clone the repo

git clone https://github.com/AI-in-Hospitals/Patient-Instructions.git# clone the following repo to calculate automatic metricscd Patient-Instructiongit clone https://github.com/ruotianluo/coco-caption.git

Environment

conda create -n pi python==3.9conda activate pipip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.htmlpip install transformers==4.10.0pip install pytorch-lightning==1.5.1pip install pandas rouge scipy# if you want to re-produce our data preparation processpip install scikit-learn plotly

Higher version oftorch andcuda can also work.

Download the data

As we can not re-distribute the rawMIMIC-III data, we release only our pre-processed dataset used in the paper atGoogle Drive (data.zip, 132MB). After downloading, unzip the data and place it like the structure below:

Patient-Instructions/ # the root of the repo    data    ├── README.md    ├── prepare_dataset.ipynb    ├── prepare_subtasks.ipynb    ├── diagnose-procedure-medication    │   ├── admDxMap_mimic3.pk      # Source: D_ICD_DIAGNOSES.csv    │   ├── admMedMap_mimic3.pk     # Source: prescriptions.csv    │   ├── admPrMap_mimic3.pk      # Source: D_ICD_PROCEDURES.csv    │   └── readme.txt    ├── splits                      # Source: NOTEEVENTS.csv    │   ├── train.csv               # obtained by data/prepare_dataset.ipynb    │   ├── val.csv                 # obtained by data/prepare_dataset.ipynb    │   ├── test.csv                # obtained by data/prepare_dataset.ipynb    │   └── subtasks                    │       ├── age                 # Source: NOTEEVENTS.csv    │       │   └── ...             # obtained by data/prepare_subtasks.ipynb    │       ├── sex                 # Source: NOTEEVENTS.csv    │       │   └── ...             # obtained by data/prepare_subtasks.ipynb    │       └── diseases            # Source: NOTEEVENTS.csv, D_ICD_DIAGNOSES.csv    │           └── ...             # obtained by data/prepare_subtasks.ipynb    └── vocab                      ├── special_tokens_map.json # obtained by data/prepare_dataset.ipynb       ├── tokenizer_config.json   # obtained by data/prepare_dataset.ipynb       └── vocab.txt               # obtained by data/prepare_dataset.ipynb

We also provide insturctions to re-produce our data preparation process indata/README.md.

Pretreatments

Run the following codes to prepare some necessary files:

# Generate the adjacent matrix of all unique digonosis, medication, and procedure codes# This is essential if we want to use knowledge graph to assist PI-Writerpython pretreatments/prepare_codes_adjacent_matrix.py# Get top-300 most similar admission records from the training set for each query hospital admission# This is essential if we want to retrieve historical PIs to assist PI-Writerpython pretreatments/prepare_relevant_info.py# Use bert-base-uncased to extract sentence-level embeddings of PIs# We apply max pooling on the word embs of the last layerpython pretreatments/extract_instruction_embs.py

Training

Here are some key argument to runtrain.py:

  • gpus: specify the number of gpus;
  • batch_size: specify the number of samples in a batch;
  • accumulate_grad_batches: use it if you don't have much gpu memory;
  • arch: specify the architecture, can be eithersmall (hidden size = 256) orbase (hidden size = 512). Seeconfigs/archs;
  • setup: specify which setup to use. See options inconfig/setups.yaml, where we provide setups for model variants such as Transformer-basedtransformer andtransformer_Full and LSTM-basedlstm andlstm_Full.

Here are some examples:

python train.py --gpus 8 --batch_size 8 --arch base --setup transformerpython train.py --gpus 8 --batch_size 8 --arch base --setup transformer_Fullpython train.py --gpus 8 --batch_size 4 --accumulate_grad_batches 2 --arch base --setup transformer_Fullpython train.py --gpus 8 --batch_size 8 --arch small --setup lstmpython train.py --gpus 8 --batch_size 8 --arch small --setup lstm_Full

Evaluation

  1. The simplest command below can show you results of automatic metrics (Bleu,METEOR, andROUGE), which will be written to./csv_results/overall.csv.
python translate.py $path_to_model
  1. You can save the generated patient instructions by running:
# The ouput file will be saved to `./inference_results/preds_and_scores.json` in this casepython translate.py $path_to_model --save_json --save_base_path ./inference_results --save_folder "" --json_file_name preds_and_scores.json
  1. You can evaluate the model on subtasks (seedata/README.md for details) by passing the augment--subtask_type:
python translate.py $path_to_model --subtask_type agepython translate.py $path_to_model --subtask_type sexpython translate.py $path_to_model --subtask_type disease

Bugs or Questions?

If you encounter any problems when using the code, or want to report a bug, you can open an issue or email {yangbang@pku.edu.cn,fenglinliu98@pku.edu.cn}. Please try to specify the problem with details so we can help you better and quicker!

Citation

Please consider citing our papers if our code or datasets are useful to your work, thanks sincerely!

@inproceedings{liu2022retrieve,title={Retrieve, Reason, and Refine: Generating Accurate and Faithful Patient Instructions},author={Liu, Fenglin and Yang, Bang and You, Chenyu and Wu, Xian and Ge, Shen and Liu, Zhangdaihong and Sun, Xu and Yang, Yang and Clifton, David A.},booktitle={Advances in Neural Information Processing Systems},year={2022}}

Acknowledgements

We borrow some codes fromShivanandroy/simpleT5.

About

[NeurIPS 2022] Code for "Retrieve, Reason, and Refine: Generating Accurate and Faithful Discharge/Patient Instructions"

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp