- Notifications
You must be signed in to change notification settings - Fork1
Rakuten MA (Python version)
License
ikegami-yukino/rakutenma-python
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Rakuten MA Python (morphological analyzer) is a Python version of Rakuten MA (word segmentor + PoS Tagger) for Chinese and Japanese.
For details about Rakuten MA, Seehttps://github.com/rakuten-nlp/rakutenma
See alsohttp://qiita.com/yukinoi/items/925bc238185aa2fad8a7 (In Japanese)
Contributions are welcome!
pip install rakutenma
fromrakutenmaimportRakutenMA# Initialize a RakutenMA instance with an empty model# the default ja feature set is set alreadyrma=RakutenMA()# Let's analyze a sample sentence (from http://tatoeba.org/jpn/sentences/show/103809)# With a disastrous result, since the model is empty!print(rma.tokenize("彼は新しい仕事できっと成功するだろう。"))# Feed the model with ten sample sentences from tatoeba.com# "tatoeba.json" is available at https://github.com/rakuten-nlp/rakutenmaimportjsontatoeba=json.load(open("tatoeba.json"))foriintatoeba:rma.train_one(i)# Now what does the result look like?print(rma.tokenize("彼は新しい仕事できっと成功するだろう。"))# Initialize a RakutenMA instance with a pre-trained modelrma=RakutenMA(phi=1024,c=0.007812)# Specify hyperparameter for SCW (for demonstration purpose)rma.load("model_ja.json")# Set the feature hash function (15bit)rma.hash_func=rma.create_hash_func(15)# Tokenize one sample sentenceprint(rma.tokenize("うらにわにはにわにわとりがいる"));# Re-train the model feeding the right answer (pairs of [token, PoS tag])res=rma.train_one( [["うらにわ","N-nc"], ["に","P-k"], ["は","P-rj"], ["にわ","N-n"], ["にわとり","N-nc"], ["が","P-k"], ["いる","V-c"]])# The result of train_one contains:# sys: the system output (using the current model)# ans: answer fed by the user# update: whether the model was updatedprint(res)# Now what does the result look like?print(rma.tokenize("うらにわにはにわにわとりがいる"))
As compared to original RakutenMA, following methods are added:
- RakutenMA::load(model_path)- Load model from JSON file
- RakutenMA::save(model_path)- Save model to path
As initial setting, following values are set:
- rma.featset = CTYPE_JA_PATTERNS # RakutenMA.default_featset_ja
- rma.hash_func = rma.create_hash_func(15)
- rma.tag_scheme = "SBIEO" # if using Chinese, set "IOB2"
Apache License version 2.0
Rakuten MA Python(c) 2015- Yukino Ikegami. All Rights Reserved.
Rakuten MA (original)(c) 2014 Rakuten NLP Project. All Rights Reserved.
About
Rakuten MA (Python version)
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.