- Notifications
You must be signed in to change notification settings - Fork1
Viterbi-based accelerated tokenizer (Python wrapper)
License
Apache-2.0, MIT licenses found
Licenses found
daac-tools/python-vibrato
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Vibrato is a fast implementation of tokenization (or morphological analysis) based on the Viterbi algorithm.This is a Python wrapper for Vibrato.
You need to install the Rust compiler followingthe documentation beforehand.daachorse usespyproject.toml, so you also need to upgrade pip to version 19 or later.
$ pip install --upgrade pipAfter setting up the environment, you can install daachorse as follows:
$ pip install .python-vibrato does not contain model files.To perform tokenization, followthe document of Vibrato to download distribution models or train your own models beforehand.
importvibratowithopen('path/to/system.dic','rb')asfp:dict_data=fp.read()tokenizer=vibrato.Vibrato(dict_data)tokens=tokenizer.tokenize('社長は火星猫だ')len(tokens)#=> 5list(tokens)#=> [Token { surface: "社長", feature: "名詞,一般,*,*,*,*,社長,シャチョウ,シャチョー,," },# Token { surface: "は", feature: "助詞,係助詞,*,*,*,*,は,ハ,ワ,," },# Token { surface: "火星", feature: "名詞,一般,*,*,*,*,火星,カセイ,カセイ,," },# Token { surface: "猫", feature: "名詞,一般,*,*,*,*,猫,ネコ,ネコ,," },# Token { surface: "だ", feature: "助動詞,*,*,*,特殊・ダ,基本形,だ,ダ,ダ,," }]tokens[0].surface()#=> '社長'tokens[0].feature()#=> '名詞,一般,*,*,*,*,社長,シャチョウ,シャチョー,,'tokens[0].start()#=> 0tokens[0].end()#=> 2
Use the help function to show the API reference.
importvibratohelp(vibrato)
Licensed under either of
- Apache License, Version 2.0(LICENSE-APACHE orhttp://www.apache.org/licenses/LICENSE-2.0)
- MIT license(LICENSE-MIT orhttp://opensource.org/licenses/MIT)
at your option.
About
Viterbi-based accelerated tokenizer (Python wrapper)
Topics
Resources
License
Apache-2.0, MIT licenses found
Licenses found
Contributing
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors2
Uh oh!
There was an error while loading.Please reload this page.