- Notifications
You must be signed in to change notification settings - Fork40
Source codes and corpora of paper "Iterated Dilated Convolutions for Chinese Word Segmentation"
License
NotificationsYou must be signed in to change notification settings
hankcs/ID-CNN-CWS
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Source codes and corpora of paper "Iterated Dilated Convolutions for Chinese Word Segmentation" published in NNW journal.
It implements the following4
models for CWS:
- Bi-LSTM
- Bi-LSTM-CRF
- ID-CNN
- ID-CNN-CRF
- Python >= 3.6
- TensorFlow >= 1.2
Both CPU and GPU are supported. GPU training is10
times faster.
Run following script to convert corpus to TensorFlow dataset.
$ ./scripts/make.sh
$ ./scripts/run.sh $dataset $model
$dataset
can bepku
,msr
,asSC
orcityuSC
.$model
can becnn
orbilstm
.
For example:
$ ./scripts/run.sh pku cnn
It will train acnn
model onpku
dataset, then evaluate performance on test set.
To enable CRF layer, simply append--viterbi
to your command, e.g.
$ ./scripts/run.sh pku cnn --viterbi
- Corpora are from SIGHAN05, converted to Simplified Chinese viaHanLP. Note that the SIGHAN datasets should only be used for research purposes.
- Model implementations adopted fromhttps://github.com/iesl/dilated-cnn-ner byEmma Strubell.
About
Source codes and corpora of paper "Iterated Dilated Convolutions for Chinese Word Segmentation"
Topics
Resources
License
Stars
Watchers
Forks
Packages0
No packages published