Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Scripts for training Jumandic Juman++ model

NotificationsYou must be signed in to change notification settings

ku-nlp/jumanpp-jumandic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains a set of scripts to build a ready-to-useJuman++ model for Jumandic.

Prerequrements

  • Unix environment (on Windows useWSL orMSYS2/MinGW64)
  • Juman++ build environment
  • Python 3.6+
  • Ruby
  • Perl
  • Configured ssh authorization for github (we will clone several repositories via ssh)
  • 32 GB of RAM

Recommended

  • Original texts from Mainichi Shinbun (year 1995) forKyoto Corpus(see the page for more information).Othewise, Juman++ model will be trained only on Leads corpus and will have poor quality.

How to Use

Run the configuration script:python3 configure.py.It will prompt for the location of Mainichi Shinbun texts.

After that runmake nornn for training a model without RNN component.make rnn produces the model with RNN component.The models will be inside thebld/model folder.

Adding your words to the model

It is possible to add your words to the model.To do it:

  1. Perform the configuration as described above:python3 configure.py
  2. Fetch the repositoriesmake repo.
  3. Go intobld/repos/jumandic folder, it is a local clone ofJumanDIC repository.
  4. Create a new file with the.dic extension in theuserdic folder of thebld/repos/jumandic folder.
  5. Put your words into that file, in JUMAN dictionary format (refer to other files for example).
  6. Executemake clean-dic if you have already built a Juman++ model.
  7. Build your model as shown above.

If the built model does not contain your words, ensure that the binary dictionary was rebuilt after adding new words.


[8]ページ先頭

©2009-2025 Movatter.jp