Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
This repository was archived by the owner on Jul 24, 2020. It is now read-only.

Distantly Supervised Relation Extraction

License

NotificationsYou must be signed in to change notification settings

INK-USC/USC-DS-RelationExtraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository puts together recent models and data sets forsentence-level relation extractionusing knowledge bases (i.e., distant supervision). In particular, it contains the source code for WWW'17 paperCoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases.

Please also check out our new repository onhandling shifted label distribution in distant supervision

Task: Given a text corpus with entity mentionsdetected andheuristically labeled using distant supervision, the task aims to identify relation types/labels between a pair of entity mentions based on the sentence context where they co-occur.

Quick Start

Blog Posts

Data

For evaluating on sentence-level extraction, weprocessed (using ourdata pipeline) three public datasets to our JSON format. We ranStanford NER on training set to detect entity mentions, mapped entity names to Freebase entities usingDBpediaSpotlight, aligned Freebase facts to sentences, and assign entity types of Freebase entities to their mapped names in sentences:

  • PubMed-BioInfer: 100k PubMed paper abstracts as training data and 1,530 manually labeled biomedical paper abstracts fromBioInfer (Pyysalo et al., 2007) as test data. It consists of 94 relation types (protein-protein interactions) and over 2,000 entity types (from MESH ontology). (Download)

  • NYT-manual: 1.18M sentences sampled from 294K New York Times news articles which were then aligned with Freebase facts by (Riedel et al., ECML'10) (link to Riedel's data). For test set, 395 sentences are manually annotated with 24 relation types and 47 entity types (Hoffmann et al., ACL'11) (link to Hoffmann's data). (Download)

  • Wiki-KBP: the training corpus contains 1.5M sentences sampled from 780kWikipedia articles (Ling & Weld, 2012) plus ~7,000 sentences from 2013 KBP corpus. Test data consists of 14k system-labeled sentences from2013 KBP slot filling assessment results. It has 7 relation types and 126 entity types after filtering of numeric value relations. (Download)

Please put the data files in corresponding subdirectories underdata/source

Benchmark

Performance comparison with severalrelation extraction systems over KBP 2013 dataset (sentence-level extraction).

MethodPrecisionRecallF1
Mintz (our implementation,Mintz et al., 2009)0.2960.3870.335
LINE + Dist Sup (Tang et al., 2015)0.3600.2570.299
MultiR (Hoffmann et al., 2011)0.3250.2780.301
FCM + Dist Sup (Gormley et al., 2015)0.1510.4980.300
HypeNet (our implementation,Shwartz et al., 2016)0.2100.3150.252
CNN (our implementation,Zeng et at., 2014)0.1980.3340.242
PCNN (our implementation,Zeng et at., 2015)0.2200.4520.295
LSTM (our implementation)0.2740.5000.350
Bi-GRU (our implementation)0.3010.4650.362
SDP-LSTM (our implementation,Xu et at., 2015)0.3000.4360.356
Position-Aware LSTM (Zhang et al., 2017)0.2650.5980.367
CoType-RM (Ren et al., 2017)0.3030.4070.347
CoType (Ren et al., 2017)0.3480.4060.369

Note: for models that trained on sentences annotated with a single label (HypeNet, CNN/PCNN, LSTM, SDP/PA-LSTMs, Bi-GRU), we form one training instance for each sentence-label pair based on their DS-annotated data.

Usage

Dependencies

We will take Ubuntu for example.

  • python 2.7
  • Python library dependencies
$ pip install pexpect ujson tqdm
$ cd code/DataProcessor/$ git clone git@github.com:stanfordnlp/stanza.git$ cd stanza$ pip install -e .$ wget http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip$ unzip stanford-corenlp-full-2016-10-31.zip

We have included compilied binaries. If you need to re-compileretype.cpp under your own g++ environment

$ cd code/Model/retype; make

Default Run

As an example, we show how to run CoType on the Wiki-KBP dataset

Start the Stanford corenlp server for the python wrapper.

$ java -mx4g -cp "code/DataProcessor/stanford-corenlp-full-2016-10-31/*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer

Feature extraction, embedding learning on training data, and evaluation on test data.

$ ./run.sh

For relation classification, the "none"-labeled instances need to be first removed from train/test JSON files. The hyperparamters for embedding learning are included in the run.sh script.

Parameters

Dataset to run on.

Data="KBP"
  • Hyperparameters forrelation extraction:
- KBP: -negative 3 -iters 400 -lr 0.02 -transWeight 1.0- NYT: -negative 5 -iters 700 -lr 0.02 -transWeight 7.0- BioInfer: -negative 5 -iters 700 -lr 0.02 -transWeight 7.0

Hyperparameters forrelation classification are included in the run.sh script.

Evaluation

Evaluates relation extraction performance (precision, recall, F1): produce predictions along with their confidence score; filter the predicted instances by tuning the thresholds.

$ python code/Evaluation/emb_test.py extract KBP retype cosine 0.0$ python code/Evaluation/tune_threshold.py extract KBP emb retype cosine

In-text Prediction

The last command inrun.sh generates json file for predicted results, in the same format as test.json in data/source/$DATANAME, except that we only output the predicted relation mention labels. Replace the second parameter with whatever threshold you would like.

$ python code/Evaluation/convertPredictionToJson.py $Data 0.0

Customized Run

Code for producing the JSON files from a raw corpus for running CoType and baseline models ishere.

Baselines

You can find our implementation of some recent relation extraction models under theCode/Model/ directory.

References

Contributors

  • Ellen Wu
  • Meng Qu
  • Frank Xu
  • Wenqi He
  • Maosen Zhang
  • Qinyuan Ye
  • Xiang Ren

Releases

No releases published

Packages

No packages published

Contributors4

  •  
  •  
  •  
  •  

[8]ページ先頭

©2009-2025 Movatter.jp