rohithreddy024/Text-Summarizer-PytorchPublic

NotificationsYou must be signed in to change notification settings
Fork72
Star322

Pytorch implementation of "A Deep Reinforced Model for Abstractive Summarization" paper and pointer generator network

322 stars 72 forks Branches Tags Activity

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
data_util		data_util
README.md		README.md
beam_search.py		beam_search.py
eval.py		eval.py
make_data_files.py		make_data_files.py
model.py		model.py
train.py		train.py
train_util.py		train_util.py
training_log.txt		training_log.txt

Repository files navigation

Text-Summarizer-Pytorch

CombiningA Deep Reinforced Model for Abstractive Summarization andGet To The Point: Summarization with Pointer-Generator Networks

Model Description

LSTM based Sequence-to-Sequence model for Abstractive Summarization
Pointer mechanism for handling Out of Vocabulary (OOV) wordsSee et al. (2017)
Intra-temporal and Intra-decoder attention for handling repeated wordsPaulus et al. (2018)
Self-critic policy gradient training along with MLE trainingPaulus et al. (2018)

Prerequisites

Pytorch
Tensorflow
Python 2 & 3
rouge

Data

Download train and valid pairs (article, title) of OpenNMT provided Gigaword dataset fromhere
Copy filestrain.article.txt,train.title.txt,valid.article.filter.txtandvalid.title.filter.txt todata/unfinished folder
Files are already preprcessed

Creating`.bin` files and vocab file

The model accepts data in the form of.bin files.
To convert.txt file into.bin file and chunk them further, run (requires Python 2 & Tensorflow):

python make_data_files.py

You will find the data indata/chunked folder and vocab file indata folder

Training

As suggested inPaulus et al. (2018), first pretrain the seq-to-seq model using MLE (with Python 3):

python train.py --train_mle=yes --train_rl=no --mle_weight=1.0

Next, find the best saved model on validation data by running (with Python 3):

python eval.py --task=validate --start_from=0005000.tar

After finding the best model (lets say0100000.tar) with high rouge-l f score, load it and run (with Python 3):

python train.py --train_mle=yes --train_rl=yes --mle_weight=0.25 --load_model=0100000.tar --new_lr=0.0001

for MLE + RL training (or)

python train.py --train_mle=no --train_rl=yes --mle_weight=0.0 --load_model=0100000.tar --new_lr=0.0001

for RL training

Validation

To perform validation of RL training, run (with Python 3):

python eval.py --task=validate --start_from=0100000.tar

Testing

After finding the best model of RL training (lets say0200000.tar), evaluate it on test data & get all rouge metrics by running (with Python 3):

python eval.py --task=test --load_model=0200000.tar

Results

Rouge scores obtained by using best MLE trained model on test set:
scores: {
'rouge-1': {'f': 0.4412018559893622, 'p': 0.4814799494024485, 'r': 0.4232331027817015},
'rouge-2': {'f': 0.23238981595683728, 'p': 0.2531296070596062, 'r': 0.22407861554997008},
'rouge-l': {'f': 0.40477682528278364, 'p': 0.4584684491434479, 'r': 0.40351107200202596}
}
Rouge scores obtained by using best MLE + RL trained model on test set:
scores: {
'rouge-1': {'f': 0.4499047033247696, 'p': 0.4853756369556345, 'r': 0.43544461386607497},
'rouge-2': {'f': 0.24037014314625643, 'p': 0.25903387205387235, 'r': 0.23362662645146298},
'rouge-l': {'f': 0.41320241732946406, 'p': 0.4616655167980162, 'r': 0.4144419466382236}
}
Training log file is included in the repository

Examples

article: russia 's lower house of parliament was scheduled friday to debate an appeal to the prime minister that challenged the right of u.s.-funded radio liberty to operate in russia following its introduction of broadcasts targeting chechnya .
ref: russia 's lower house of parliament mulls challenge to radio liberty
dec: russian parliament to debate on banning radio liberty

article: continued dialogue with the democratic people 's republic of korea is important although australia 's plan to open its embassy in pyongyang has been shelved because of the crisis over the dprk 's nuclear weapons program , australian foreign minister alexander downer said on friday .
ref: dialogue with dprk important says australian foreign minister
dec: australian fm says dialogue with dprk important

article: water levels in the zambezi river are rising due to heavy rains in its catchment area , prompting zimbabwe 's civil protection unit -lrb- cpu -rrb- to issue a flood alert for people living in the zambezi valley , the herald reported on friday .
ref: floods loom in zambezi valley
dec: water levels rising in zambezi river

article: tens of thousands of people have fled samarra , about ## miles north of baghdad , in recent weeks , expecting a showdown between u.s. troops and heavily armed groups within the city , according to u.s. and iraqi sources .
ref: thousands flee samarra fearing battle
dec: tens of thousands flee samarra expecting showdown with u.s. troops

article: the #### tung blossom festival will kick off saturday with a fun-filled ceremony at the west lake resort in the northern taiwan county of miaoli , a hakka stronghold , the council of hakka affairs -lrb- cha -rrb- announced tuesday .
ref: #### tung blossom festival to kick off saturday
dec: #### tung blossom festival to kick off in miaoli

References

pytorch implementation of "Get To The Point: Summarization with Pointer-Generator Networks"

About

Pytorch implementation of "A Deep Reinforced Model for Abstractive Summarization" paper and pointer generator network

Releases

No releases published

Packages

No packages published

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Text-Summarizer-Pytorch

Model Description

Prerequisites

Data

Creating`.bin` files and vocab file

Training

Validation

Testing

Results

Examples

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

rohithreddy024/Text-Summarizer-Pytorch

Folders and files

Latest commit

History

Repository files navigation

Text-Summarizer-Pytorch

Model Description

Prerequisites

Data

Creating.bin files and vocab file

Training

Validation

Testing

Results

Examples

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Creating`.bin` files and vocab file

Packages