hankcs/ID-CNN-CWSPublic

NotificationsYou must be signed in to change notification settings
Fork40
Star135

Source codes and corpora of paper "Iterated Dilated Convolutions for Chinese Word Segmentation"

License

GPL-3.0 license

135 stars 40 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bilstm.py		bilstm.py
bilstm_char.py		bilstm_char.py
cnn.py		cnn.py
cnn_char.py		cnn_char.py
convert_corpus.py		convert_corpus.py
data_utils.py		data_utils.py
eval_f1.py		eval_f1.py
official_scorer.py		official_scorer.py
radical.py		radical.py
score		score
tf_utils.py		tf_utils.py
train.py		train.py
tsv_to_tfrecords.py		tsv_to_tfrecords.py
utils.py		utils.py

Repository files navigation

ID-CNN-CWS

Source codes and corpora of paper "Iterated Dilated Convolutions for Chinese Word Segmentation" published in NNW journal.

It implements the following4 models for CWS:

Bi-LSTM
Bi-LSTM-CRF
ID-CNN
ID-CNN-CRF

Dependencies

Python >= 3.6
TensorFlow >= 1.2

Both CPU and GPU are supported. GPU training is10 times faster.

Preparation

Run following script to convert corpus to TensorFlow dataset.

$ ./scripts/make.sh

Train and Test

Quick Start

$ ./scripts/run.sh $dataset $model

$dataset can bepku,msr,asSC orcityuSC.
$model can becnn orbilstm.

For example:

$ ./scripts/run.sh pku cnn

It will train acnn model onpku dataset, then evaluate performance on test set.

CRF Layer

To enable CRF layer, simply append--viterbi to your command, e.g.

$ ./scripts/run.sh pku cnn --viterbi

Accuracy

Speed

Acknowledgments

Corpora are from SIGHAN05, converted to Simplified Chinese viaHanLP. Note that the SIGHAN datasets should only be used for research purposes.
Model implementations adopted fromhttps://github.com/iesl/dilated-cnn-ner byEmma Strubell.

About

Source codes and corpora of paper "Iterated Dilated Convolutions for Chinese Word Segmentation"

Releases1

v1.0 Paper Version Latest

Oct 27, 2017

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

ID-CNN-CWS

Dependencies

Preparation

Train and Test

Quick Start

CRF Layer

Accuracy

Speed

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases1

Packages

Contributors2

Languages

Movatterモバイル変換

License

hankcs/ID-CNN-CWS

Folders and files

Latest commit

History

Repository files navigation

ID-CNN-CWS

Dependencies

Preparation

Train and Test

Quick Start

CRF Layer

Accuracy

Speed

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases1

Packages0

Contributors2

Languages

Packages