- Notifications
You must be signed in to change notification settings - Fork0
ku-nlp/jumanpp-jumandic
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This repository contains a set of scripts to build a ready-to-useJuman++ model for Jumandic.
- Unix environment (on Windows useWSL orMSYS2/MinGW64)
- Juman++ build environment
- Python 3.6+
- Ruby
- Perl
- Configured ssh authorization for github (we will clone several repositories via ssh)
- 32 GB of RAM
- Original texts from Mainichi Shinbun (year 1995) forKyoto Corpus(see the page for more information).Othewise, Juman++ model will be trained only on Leads corpus and will have poor quality.
Run the configuration script:python3 configure.py.It will prompt for the location of Mainichi Shinbun texts.
After that runmake nornn for training a model without RNN component.make rnn produces the model with RNN component.The models will be inside thebld/model folder.
It is possible to add your words to the model.To do it:
- Perform the configuration as described above:
python3 configure.py - Fetch the repositories
make repo. - Go into
bld/repos/jumandicfolder, it is a local clone ofJumanDIC repository. - Create a new file with the
.dicextension in theuserdicfolder of thebld/repos/jumandicfolder. - Put your words into that file, in JUMAN dictionary format (refer to other files for example).
- Execute
make clean-dicif you have already built a Juman++ model. - Build your model as shown above.
If the built model does not contain your words, ensure that the binary dictionary was rebuilt after adding new words.
About
Scripts for training Jumandic Juman++ model
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.