Movatterモバイル変換

malllabiisc/NeuralDaterPublic

NotificationsYou must be signed in to change notification settings
Fork15
Star60

ACL 2018: Dating Documents using Graph Convolution Networks

License

Apache-2.0 license

60 stars 15 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
config		config
preprocess		preprocess
.gitignore		.gitignore
LICENSE		LICENSE
Readme.md		Readme.md
helper.py		helper.py
neural_dater.py		neural_dater.py
overview.png		overview.png
requirements.txt		requirements.txt
setup.sh		setup.sh

Repository files navigation

Dating Documents using Graph Convolution Networks

Source code and dataset forACL 2018 paper:Document Dating using Graph Convolution Networks.

Overview of NeuralDater (proposed method). NeuralDater exploits syntactic and temporal structure in a document to learn effective representation, which in turn are used to predict the document time. NeuralDater uses a Bi-directional LSTM (Bi-LSTM), two Graph Convolution Networks (GCN) – one over the dependency tree and the other over the document’s temporal graph – along with a softmax classifier, all trained end-to-end jointly. Please refer paper for more details.

Dependencies

Compatible with TensorFlow 1.x and Python 3.x.
Dependencies can be installed usingrequirements.txt.

Dataset:

Download the processed version (includes dependency and temporal graphs of each document) ofNYT andAPW datasets.
Unzip the.pkl file indata directory.
Documents are originally taken from NYT and APW section ofGigaword Corpus, 5th ed.
The structure of the processed input data is as follows.
```
{"voc2id":   {"w1":0,"w2":1, ...},"et2id":    {"NONE":0,"INCLUDES":1,"BEFORE":2,"IS_INCLUDED":3 ...},"de2id":{"subj":0,"obj":1,"conj":3 ...},"train":    {"X":        [[s1_w1,s1_w2, ...], [s2_w1,s2_w2, ...], ...],"Y":        [s1_time_stamp,s2_time_stamp,s3_time_stamp, ...],"DepEdges": [[s1_dep_edges], [s2_dep_edges] ...],"ETEdges":  [[s1_et_edges], [s2_et_edges], ...],"ETIdx":    [[s1_et1,s1_et2, ...], [s2_et1,s2_et2, ...], ...],"ET":       [[s1_et1_type,s1_et2_type, ...], [s2_et1_type,s2_et2_type, ...], ...],    }"test": {sameas"train"},"valid": {sameas"train"}}
```
- voc2id is the mapping of words to their unique identifier
- et2id is the maping of temporal graph edge types to their unique identifier.
- de2id is the mapping of dependency graph edges types to their unique identifier.
- Each entry oftrain,test andvalid is a bag of sentences, where
  - X denotes the list sentences as the list of list of word indices.
  - Y is the time stamp associated with each sentence.
  - DepEdges is the edgelist of dependency parse for each sentence (required for S-GCN).
  - ETEdges is the edgelist of temporal graph for each sentence (required for T-GCN).
  - ETIdx is the position indices ofevent/time_expression in each sentence.
  - ET is the type of each word in a sentence.0 denotes normal word,1event and2time expression.

Preprocessing:

For getting temporal graph of new documents. The following steps need to be followed:

SetupCAEVO andCATENA as explained in their respective repositories.
For extracting event and time mentions of a document
- ./runcaevoraw.sh <path_of_document>
- Above command generates an.xml file. This is used by CATENA for extracting temporal graph and it also contains the dependency parse information of the document which can be extracted using the following command:
```
python preprocess/read_caevo_out.py<caevo_out_path><destination_path>
```
For making the generated.xml file compatible for input to CATENA, use the following script as
```
python preprocess/make_catena_input.py<caevo_out_path><destination_path>
```

.xml generated above is given as input to CATENA for getting the temporal graph of the document.

 java -Xmx6G -jar ./target/CATENA-1.0.3.jar -i<path_to_xml> \ --tlinks ./data/TempEval3.TLINK.txt \ --clinks ./data/Causal-TimeBank.CLINK.txt \ -l ./models/CoNLL2009-ST-English-ALL.anna-3.3.lemmatizer.model \ -g ./models/CoNLL2009-ST-English-ALL.anna-3.3.postagger.model \ -p ./models/CoNLL2009-ST-English-ALL.anna-3.3.parser.model \ -x ./tools/TextPro2.0/ -d ./models/catena-event-dct.model \ -t ./models/catena-event-timex.model \ -e ./models/catena-event-event.model  -c ./models/catena-causal-event-event.model><destination_path>

The above command outputs the list of links in the temporal graph which are given as input to NeuralDater. The output file can be read using the following command:

python preprocess/read_catena_out.py<catena_out_path><destination_path>

Usage:

After installing python dependencies fromrequirements.txt, executesh setup.sh for downloading GloVe embeddings.
neural_dater.py contains TensorFlow (1.x) based implementation of NeuralDater (proposed method).
To start training:
```
python neural_dater.py -data data/nyt_processed_data.pkl -class 10 -name test_run
```
- -class denotes the number of classes in datasets,10 for NYT and16 for APW.
- -name is arbitrary name for the run.

Citing:

Please cite the following paper if you use this code in your work.

@InProceedings{neuraldater2018,author ="Vashishth, Shikhar and Dasgupta, Shib Sankar and Ray, Swayambhu Nath and Talukdar, Partha",title ="Dating Documents using Graph Convolution Networks",booktitle ="Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",year ="2018",publisher ="Association for Computational Linguistics",pages ="1605--1615",location ="Melbourne, Australia",url ="http://aclweb.org/anthology/P18-1149"}

About

ACL 2018: Dating Documents using Graph Convolution Networks

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Dating Documents using Graph Convolution Networks

Dependencies

Dataset:

Preprocessing:

Usage:

Citing:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors3

Uh oh!

Languages

Movatterモバイル変換

License

malllabiisc/NeuralDater

Folders and files

Latest commit

History

Repository files navigation

Dating Documents using Graph Convolution Networks

Dependencies

Dataset:

Preprocessing:

Usage:

Citing:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors3

Uh oh!

Languages

Packages