Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Viterbi-based accelerated tokenizer (Python wrapper)

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT
NotificationsYou must be signed in to change notification settings

daac-tools/python-vibrato

Repository files navigation

Vibrato is a fast implementation of tokenization (or morphological analysis) based on the Viterbi algorithm.This is a Python wrapper for Vibrato.

Installation

Build from source

You need to install the Rust compiler followingthe documentation beforehand.daachorse usespyproject.toml, so you also need to upgrade pip to version 19 or later.

$ pip install --upgrade pip

After setting up the environment, you can install daachorse as follows:

$ pip install .

Example Usage

python-vibrato does not contain model files.To perform tokenization, followthe document of Vibrato to download distribution models or train your own models beforehand.

importvibratowithopen('path/to/system.dic','rb')asfp:dict_data=fp.read()tokenizer=vibrato.Vibrato(dict_data)tokens=tokenizer.tokenize('社長は火星猫だ')len(tokens)#=> 5list(tokens)#=> [Token { surface: "社長", feature: "名詞,一般,*,*,*,*,社長,シャチョウ,シャチョー,," },#    Token { surface: "は", feature: "助詞,係助詞,*,*,*,*,は,ハ,ワ,," },#    Token { surface: "火星", feature: "名詞,一般,*,*,*,*,火星,カセイ,カセイ,," },#    Token { surface: "猫", feature: "名詞,一般,*,*,*,*,猫,ネコ,ネコ,," },#    Token { surface: "だ", feature: "助動詞,*,*,*,特殊・ダ,基本形,だ,ダ,ダ,," }]tokens[0].surface()#=> '社長'tokens[0].feature()#=> '名詞,一般,*,*,*,*,社長,シャチョウ,シャチョー,,'tokens[0].start()#=> 0tokens[0].end()#=> 2

Documentation

Use the help function to show the API reference.

importvibratohelp(vibrato)

License

Licensed under either of

at your option.

About

Viterbi-based accelerated tokenizer (Python wrapper)

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors2

  •  
  •  

[8]ページ先頭

©2009-2025 Movatter.jp