takumakanari/japanese-numbers-pythonPublic

NotificationsYou must be signed in to change notification settings
Fork5
Star21

A parser for Japanese number (Kanji, arabic) in the natural language.

License

MIT license

21 stars 5 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.circleci		.circleci
japanese_numbers		japanese_numbers
tests		tests
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
setup.py		setup.py
tox.ini		tox.ini

Repository files navigation

japanese_numbers

A parser for Japanese number (Kanji, arabic) in the natural language.

The modulejapanese_numbers finds any numbers in the natural language, and converts to arabic numerals.The followings are example patterns what can be parsed.

二千万百一円
5百万
一を聞いて十を知る
五〇六号室

Installation

pip install japanese-numbers-python

Usage

Functionto_arabic andto_arabic_numbers are almost stable.

to_arabic returns An array of[japanese_numbers.result.ParsedResult].

importjapanese_numbersjapanese_numbers.to_arabic('銀河の向こう、六千三百二十一億千五百十一万二千百八十一光年彼方。')# => [<ParsedResult 632115112181 : "六千三百二十一億千五百十一万二千百八十一" index=7>]japanese_numbers.to_arabic('一を聞いて十を知る。')# => [<ParsedResult 1 : "一" index=0>, <ParsedResult 10 : "十" index=5>]

Then you can see a numeric value (and others) in the instance ofParsedResult like as follows:

result=japanese_numbers.to_arabic('一を聞いて十を知る。')result[0].number# => 1result[0].text# => '一'result[0].index# => 0 as position that number was foundresult[1].number# => 10result[1].text# => '十'result[1].index# => 5

to_arabic_numbers returns a tuple of numbers directly.

importjapanese_numbersjapanese_numbers.to_arabic_numbers('一を聞いて十を知る。')# => (1, 10)

Charsets

Bothto_arabic_numbers,to_arabic getencode option to specify encode of input.

It'sutf8 by default, if you put non-unicode string into functions, it will be converted to unicode by using its encode first.

japanese_numbers.to_arabic_numbers('一を聞いて十を知る。')# utf8 by defaultjapanese_numbers.to_arabic('一を聞いて十を知る。',encode='eucjp')# set another charset

TODO

support float/double types
support negative types

Patch

Welcome!

About

A parser for Japanese number (Kanji, arabic) in the natural language.

Releases

4tags

Packages

No packages published

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

japanese_numbers

Installation

Usage

Charsets

TODO

Patch

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

takumakanari/japanese-numbers-python

Folders and files

Latest commit

History

Repository files navigation

japanese_numbers

Installation

Usage

Charsets

TODO

Patch

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages