- Notifications
You must be signed in to change notification settings - Fork53
🐍 mecab-python. you can find original version here:http://taku910.github.io/mecab/
License
SamuraiT/mecab-python3
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is a Python wrapper for theMeCab morphological analyzer for Japanesetext. It currently works with Python 3.8 and greater.
Note: If using MacOS Big Sur, you'll need to upgrade pip to version 20.3 orhigher to use wheels due to a pip issue.
issueを英語で書く必要はありません。
Note that Windows wheels require aMicrosoft Visual C++Redistributable, so be sure to install that.
>>>importMeCab>>>wakati=MeCab.Tagger("-Owakati")>>>wakati.parse("pythonが大好きです").split()['python','が','大好き','です']>>>tagger=MeCab.Tagger()>>>print(tagger.parse("pythonが大好きです"))pythonpythonpythonpython名詞-普通名詞-一般がガガが助詞-格助詞大好きダイスキダイスキ大好き形状詞-一般ですデスデスです助動詞助動詞-デス終止形-一般EOS
The API formecab-python3
closely follows the API for MeCab itself,even when this makes it not very “Pythonic.” Please consult theofficial MeCabdocumentation for more information.
Binary wheels are available for MacOS X, Linux, and Windows (64bit) areinstalled by default when you usepip
:
pip install mecab-python3
These wheels include a copy of the MeCab library, but not a dictionary. Inorder to use MeCab you'll need to install a dictionary.unidic-lite
is a goodone to start with:
pip install unidic-lite
To build from source using pip,
pip install --no-binary :all: mecab-python3
In order to use MeCab, you must install a dictionary. There are many different dictionaries available for MeCab. These UniDic packages, which include slight modifications for ease of use, are recommended:
- unidic: The latest full UniDic.
- unidic-lite: A slightly modified UniDic 2.1.2, chosen for its small size.
The dictionaries below are not recommended due to being unmaintained for many years, but they are available for use with legacy applications.
For more details on the differences between dictionaries seehere.
If you get aRuntimeError
when you try to run MeCab, here are some things to check:
You have to installthis to use this package on Windows.
Runpip install unidic-lite
and confirm that works. If that fixes yourproblem, you either don't have a dictionary installed, or you need to specifyyour dictionary path like this:
tagger = MeCab.Tagger('-r /dev/null -d /usr/local/lib/mecab/dic/mydic')
Note: on Windows, usenul
instead of/dev/null
. Alternately, if you have amecabrc
you can use the path after-r
.
If you get this error:
error message: [ifs] no such file or directory: /usr/local/etc/mecabrc
You need to specify amecabrc
file. It's OK to specify an empty file, it justhas to exist. You can specify amecabrc
with-r
. This may be necessary onDebian or Ubuntu, where themecabrc
is in/etc/mecabrc
.
You can specify an emptymecabrc
like this:
tagger = MeCab.Tagger('-r/dev/null -d/home/hoge/mydic')
Chasen output is not a built-in feature of MeCab, you must specify it in yourdicrc
ormecabrc
. Notably, Unidic does not include Chasen output format.Please seethe MeCab documentation.
- fugashi is a Cython wrapper for MeCab with a Pythonic interface, by the current maintainer of this library
- SudachiPy is a modern tokenizer with an actively maintained dictionary
- pymecab-ko is a wrapper of the Korean MeCab forkmecab-ko based on mecab-python3
- KoNLPy is a library for Korean NLP that includes a MeCab wrapper
Like MeCab itself,mecab-python3
is copyrighted free software byTaku Kudotaku@chasen.org and Nippon Telegraph and Telephone Corporation,and is distributed under a 3-clause BSD license (see the fileBSD
).Alternatively, it may be redistributed under the terms of theGNU General Public License, version 2 (see the fileGPL
) or theGNU Lesser General Public License, version 2.1 (see the fileLGPL
).
About
🐍 mecab-python. you can find original version here:http://taku910.github.io/mecab/
Resources
License
Uh oh!
There was an error while loading.Please reload this page.