ku-nlp/bert-based-faqirPublic

NotificationsYou must be signed in to change notification settings
Fork7
Star52

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
localgovnnir		localgovnnir
scripts		scripts
Makefile.generate_dataset		Makefile.generate_dataset
Makefile.run_classifier		Makefile.run_classifier
README.md		README.md
download.sh		download.sh

Repository files navigation

bert-based-faqir

FAQ retrieval system that considers the similarity between a user’s query and a question as well as the relevance between the query and an answer.The detail is on our paper(arxiv).

Requirements

tensorflow >= 1.11.0

Usage

Download the BERT repository, BERT Japanese pre-trained model, QA pairs in Amagasaki City FAQ, testset(localgovFAQ) and samples of prediction results.

./download.sh

The data structure is below.

data├── bert: BERT original repository├── Japanese_L-12_H-768_A-12_E-30_BPE: BERT Japanese pre-trained model└── localgovfaq    ├── qas: QA pairsin Amagasaki City FAQ    ├── testset_segmentation.txt: testsetfor evaluation    └── samples: retrieval results by TSUBAKI, BERT, and Joint model

And we should add the task class to run_classifier.py in the original BERT repository as below.

classCQAProcessor(DataProcessor):"""Processor for the CoLA data set (GLUE version)."""defget_train_examples(self,data_dir):"""See base class."""returnself._create_examples(self._read_tsv(os.path.join(data_dir,"train.tsv")),"train")defget_dev_examples(self,data_dir):"""See base class."""returnself._create_examples(self._read_tsv(os.path.join(data_dir,"dev.tsv")),"dev")defget_test_examples(self,data_dir):"""See base class."""returnself._create_examples(self._read_tsv(os.path.join(data_dir,"test.tsv")),"test")defget_labels(self):"""See base class."""return ["0","1"]def_create_examples(self,lines,set_type):"""Creates examples for the training and dev sets."""examples= []for (i,line)inenumerate(lines):guid="%s-%s"% (set_type,i)text_a=tokenization.convert_to_unicode(line[1])text_b=tokenization.convert_to_unicode(line[2])label=tokenization.convert_to_unicode(line[0])examples.append(InputExample(guid=guid,text_a=text_a,text_b=text_b,label=label))returnexamplesdefmain(_):tf.logging.set_verbosity(tf.logging.INFO)processors= {"cqa":CQAProcessor,  }

For Japanese, we need to comment outtext = self._tokenize_chinese_chars(text) in tokenization.py in BERT repository.

Finetune and evaluate.

make -f Makefile.generate_dataset --OUTPUT_DIR=/path/to/data_dirmake -f Makefile.run_classifier --BERT_DATA_DIR=/path/to/data_dir --OUTPUT_DIR=/path/to/somewhere

The result example is below.

Hit@1 : 381, 3: 524, 5 : 578, all : 784SR@1 : 0.486, 3: 0.668, 5 : 0.737P@1 : 0.486, 3: 0.349, 5 : 0.286MAP : 0.550, MRR : 0.596, MDCG : 0.524

TSUBAKI + BERT

TSUBAKI is the open search engine based on BM25 (paper,github ).We can get the higher score by using both TSUBAKI and BERT.

We can evaluate the joint model by the below command.

python scripts/merge_tsubaki_bert_results.py --bert localgovfaq/samples/bert.txt \    --tsubaki localgovfaq/samples/tsubaki.txt \    --threshold 0.3 \    --tsubaki_ratio 10> /path/to/resultfile.txtpython scripts/calculate_score.py --testset data/localgovfaq/testset_segmentation.txt \    --target_qs data/localgovfaq/qas/questions_in_Amagasaki.txt \    --target_as data/localgovfaq/qas/answers_in_Amagasaki.txt \    --search_result /path/to/resultfile.txt| tail -n 4

In this command, the results pre-computed by TSUBAKI and BERT are used.

The result example is below.

Hit@1 : 498, 3: 611, 5 : 661, all : 784SR@1 : 0.635, 3: 0.779, 5 : 0.843P@1 : 0.635, 3: 0.446, 5 : 0.360MAP : 0.660, MRR : 0.720, MDCG : 0.625

Reference

Wataru Sakata(LINE Corporation), Tomohide Shibata(Kyoto University), Ribeka Tanaka(Kyoto University) and Sadao Kurohashi(Kyoto University):FAQ Retrieval using Query-Question Similarity and BERT-Based Query-Answer Relevance,Proceedings of SIGIR2019: 42nd Intl ACM SIGIR Conference on Research and Development in Information Retrieval, (2019.7).arxiv

About

No description, website, or topics provided.

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

bert-based-faqir

Requirements

Usage

TSUBAKI + BERT

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Contributors2

Uh oh!

Languages

Movatterモバイル変換

ku-nlp/bert-based-faqir

Folders and files

Latest commit

History

Repository files navigation

bert-based-faqir

Requirements

Usage

TSUBAKI + BERT

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Contributors2

Uh oh!

Languages

Packages