vered1986/HypeNETPublic

NotificationsYou must be signed in to change notification settings
Fork13
Star84

Integrated path-based and distributional method for hypernymy detection

License

View license

84 stars 13 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
common		common
corpus		corpus
dataset		dataset
train		train
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md

Repository files navigation

HypeNET: Integrated Path-based and Distributional Method for Hypernymy Detection

This is the code used in the paper:

"Improving Hypernymy Detection with an Integrated Path-based and Distributional Method"
Vered Shwartz, Yoav Goldberg and Ido Dagan. ACL 2016.link

It is used to classify hypernymy relations between term-pairs, using disributional information on each term, and path-based information, encoded using an LSTM.

Version 2:

Major features and improvements:

Using dynet instead of pycnn (thanks @srajana!)
Automating corpus processing with a single bash script which is more time and memory efficient

Bug fixes:

Too many paths in parse_wikipedia (see issue#2)

To reproduce the results reported in the paper, please useV1.The current version acheives similar results - the integrated model's performance on the randomly split dataset is:Precision: 0.918, Recall: 0.907, F1: 0.912

Consider using our new project,LexNET! It supports classification of multiple semantic relations, and contains several model enhancements and detailed documentation.

Prerequisites:

Python 2.7
numpy
scikit-learn
bsddb
dynet
spacy

Quick Start:

The repository contains the following directories:

common - the knowledge resource class, which is used by other models to save the path data from the corpus.
corpus - code for parsing the corpus and extracting paths, including the generalizations made for the baseline method.
dataset - code for creating the dataset used in the paper, and the dataset itself.
train - code for training and testing both variants of our model (path-based and integrated).

To create a processed corpus, download a Wikipedia dump, and run:

bash create_resource_from_corpus.sh [wiki_dump_file] [resource_prefix]

Whereresource_prefix is the file path and prefix of the corpus files, e.g.corpus/wiki, such that the directorycorpus will eventually contain thewiki_*.db files created by this script.

To train the integrated model, run:

train_integrated.py [resource_prefix] [dataset_prefix] [model_prefix_file] [embeddings_file] [alpha] [word_dropout_rate]

Where:

resource_prefix is the file path and prefix of the corpus files, e.g.corpus/wiki, such that the directorycorpus contains thewiki_*.db files created bycreate_resource_from_corpus.sh.
dataset_prefix is the file path of the dataset files, e.g.dataset/rnd, such that this directory contains 3 files:train.tsv,test.tsv andval.tsv.
model_prefix_file is the output directory and prefix for the model files. The model is saved in 3 files:.model,.params and.dict.In addition, the test set predictions are saved in.predictions, and the prominent paths are saved to.paths.
embeddings_file is the pre-trained word embeddings file, in txt format (i.e., every line consists of the word, followed by a space, and its vector. SeeGloVe for an example.)
alpha is the learning rate (default=0.001).
word_dropout_rate is the... word dropout rate.

Similarly, you can train the path-based model withtrain_path_based.py or test any of these pre-trained models usingtest_integrated.py andtest_path_based.py respectively.

About

Integrated path-based and distributional method for hypernymy detection

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

HypeNET: Integrated Path-based and Distributional Method for Hypernymy Detection

Version 2:

Major features and improvements:

Bug fixes:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

vered1986/HypeNET

Folders and files

Latest commit

History

Repository files navigation

HypeNET: Integrated Path-based and Distributional Method for Hypernymy Detection

Version 2:

Major features and improvements:

Bug fixes:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages