Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Code for Findings of EMNLP 2022 short paper "CDGP: Automatic Cloze Distractor Generation based on Pre-trained Language Model".

License

NotificationsYou must be signed in to change notification settings

AndyChiangSH/CDGP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code for Findings of EMNLP 2022 short paper"CDGP: Automatic Cloze Distractor Generation based on Pre-trained Language Model".

🗂 Structure

  • paper/: "CDGP: Automatic Cloze Distractor Generation based on Pre-trained Language Model"
  • models/: models in CDGP
    • CSG/: the models as Candidate Set Generator
    • DS/: the models as Distractor Selector
  • datasets/: datasets for fine-tuneing and testing
    • CLOTH.zip: CLOTH dataset
    • DGen.zip: DGen dataset
  • fine-tune/: code for fine-tuning
  • test/: code for testing
    • dis_generator(BERT).py: distractors generator based on BERT
    • dis_generator(SciBERT).py: distractors generator based on SciBERT
    • dis_generator(RoBERTa).py: distractors generator based on RoBERTa
    • dis_generator(BART).py: distractors generator based on BART
    • dis_evaluator.py: distractors evaluator
    • results/: results of distractors generator
    • evaluations/: evaluations of distractors evaluator
  • demo.ipynb: code for CDGP demo

❤ Models

Models are available at Hugging Face.

Candidate Set Generator (CSG)

Its input are stem and answer, and output is candidate set of distractors.

ModelsCLOTHDGen
BERTcdgp-csg-bert-clothcdgp-csg-bert-dgen
SciBERTcdgp-csg-scibert-clothcdgp-csg-scibert-dgen
RoBERTacdgp-csg-roberta-clothcdgp-csg-roberta-dgen
BARTcdgp-csg-bart-clothcdgp-csg-bart-dgen

Distractor Selector (DS)

Its input are stem, answer and candidate set of distractors, and output are top 3 distractors.

fastText:cdgp-ds-fasttext

📚 Datasets

Datasets are available at Hugging Face and GitHub.

CLOTH

CLOTH is a dataset which is a collection of nearly 100,000 cloze questions from middle school and high school English exams. The detail of CLOTH dataset is shown below.

Number of questionsTrainValidTest
Middle school2205632733198
High school5479477948318
Total768501106711516

You can download CLOTH dataset fromHugging Face orGitHub.

DGen

DGen is a cloze questions dataset which covers multiple domains including science, vocabulary, common sense and trivia. It is compiled from a wide variety of datasets including SciQ, MCQL, AI2 Science Questions, etc. The detail of DGen dataset is shown below.

DGen datasetTrainValidTestTotal
Number of questions23213002592880

You can download CLOTH dataset fromHugging Face orGitHub.

📝 Evaluations

The evaluations of these model as a Candidate Set Generator in CDGP is shown as follows:

CLOTH

ModelsP@1F1@3F1@10MRRNDCG@10
cdgp-csg-bert-cloth18.5013.8015.3729.9637.82
cdgp-csg-scibert-cloth8.109.1312.2219.5328.76
cdgp-csg-roberta-cloth10.509.8310.2520.4228.17
cdgp-csg-bart-cloth14.2011.0711.3724.2931.74

DGen

ModelsP@1F1@3MRRNDCG@10
cdgp-csg-bert-dgen10.817.7218.1524.47
cdgp-csg-scibert-dgen13.1312.2325.1234.17
cdgp-csg-roberta-dgen13.139.6519.3424.52
cdgp-csg-bart-dgen8.498.2416.0122.66

💡 How to use?

Setup environment

  1. Clone or download this repo.
git clone https://github.com/AndyChiangSH/CDGP.git
  1. Move into this repo.
cd ./CDGP/
  1. Setup a virtual environment.
python -m venv CDGP-env

Python version: 3.8.8

  1. Pip install the required packages.
pip install -r requirements.txt

Fine-tune

Our model is fine-tuned onColab, so you can uploadthese Jupyter Notebook to Colab and run it by yourself!

Test

We are testing in local, so we need to download the datasets and models.

  1. Unzip the CLOTH or DGen datasets in/datasets/.
  2. CSG models will download from Hugging Face when you run the code, so you don't have to do anything!
  3. If you want to use your own CSG model, you can put it in the new directory/models/CSG/.
  4. However, you have to download the DS models by yourself.
  5. Then, move the DS models into the new directory/models/DS/.
  6. Run/test/dis_generator(BERT).py to generate the distractors based on BERT.
  7. Run/test/dis_generator(SciBERT).py to generate the distractors based on SciBERT.
  8. Run/test/dis_generator(RoBERTa).py to generate the distractors based on RoBERTa.
  9. Run/test/dis_generator(BART).py to generate the distractors based on BART.
  10. Check the generating results as.json files in/test/results/.
  11. Run/test/dis_evaluator.py to evaluate the generating results.
  12. Check the evaluations as.csv file in/test/evaluations/.

📌 Citation

@inproceedings{chiang-etal-2022-cdgp,    title = "{CDGP}: Automatic Cloze Distractor Generation based on Pre-trained Language Model",    author = "Chiang, Shang-Hsuan  and      Wang, Ssu-Cheng  and      Fan, Yao-Chung",    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022",    month = dec,    year = "2022",    address = "Abu Dhabi, United Arab Emirates",    publisher = "Association for Computational Linguistics",    url = "https://aclanthology.org/2022.findings-emnlp.429",    pages = "5835--5840",    abstract = "Manually designing cloze test consumes enormous time and efforts. The major challenge lies in wrong option (distractor) selection. Having carefully-design distractors improves the effectiveness of learner ability assessment. As a result, the idea of automatically generating cloze distractor is motivated. In this paper, we investigate cloze distractor generation by exploring the employment of pre-trained language models (PLMs) as an alternative for candidate distractor generation. Experiments show that the PLM-enhanced model brings a substantial performance improvement. Our best performing model advances the state-of-the-art result from 14.94 to 34.17 (NDCG@10 score). Our code and dataset is available at https://github.com/AndyChiangSH/CDGP.",}

😀 Author

⭐ Star History

Star History Chart

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp