Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

👺 tokenizer specified for Japanese

License

NotificationsYou must be signed in to change notification settings

SamuraiT/tinysegmenter

Repository files navigation

TinySegmenter -- Super compact Japanese tokenizer was originally created by(c) 2008 Taku Kudo for javascript under the terms of a new BSD licence.For details, seehere

tinysegmenter for python2.x was written by Masato Hagiwara.for his information seehere

This tinysegmenter is modified for python3.x and python2.x for distribution by Tatsuro Yasukawa.Additionaly, this tinysegmenter is modified for being more faster - thanks to@chezou, @cocoatomo and @methane.

See info abouttinysegmenter

Installation

pip install tinysegmenter3

Usage

importtinysegmenterstatement='私はpython大好きStanding Engineerです.'tokenized_statement=tinysegmenter.tokenize(statement)print(tokenized_statement)# ['私', 'は', 'python', '大好き', 'Standing', ' Engineer', 'です', '.']

Test Text

Thetest text (in thetests directory) wasThe Time Machine by H.G. Wells, translated to Japanese by Hiroo Yamagata under the CC BY-SA 2.0 License.

How to run Test

Install requirements fromrequirements.txt by

pipinstall-rrequirements.txt

then run this:

./runtests.sh

About

👺 tokenizer specified for Japanese

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors3

  •  
  •  
  •  

Languages


[8]ページ先頭

©2009-2025 Movatter.jp