- Notifications
You must be signed in to change notification settings - Fork1
Python binding for J.DepP(C++ implementation of Japanese Dependency Parsers)
License
lighttransport/jdepp-python
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Python binding for J.DepP(C++ implementation of Japanese Dependency Parsers)https://www.tkl.iis.u-tokyo.ac.jp/~ynaga/jdepp/
$ python -m pip install jdepp
pip install does not install the model(dictionary).
You can get precompiled model files(MeCab POS tagging + train with KNBC copus) from
https://github.com/lighttransport/jdepp-python/releases/tag/v0.1.0
Precompiled KNBC model file is licensed under 3-clause BSD license.
- MeCab style POS format:
FEATURE_SEP ','
- See
jdepp/typedf.h
for more info about ifdef macros.
Download precompiled model file.
$ wget https://github.com/lighttransport/jdepp-python/releases/download/v0.1.0/knbc-mecab-jumandic-2ndpoly.tar.gz$ tar xvf knbc-mecab-jumandic-2ndpoly.tar.gz
importjdeppmodel_path="model/knbc"parser=jdepp.Jdepp()parser.load_model(model_path)# NOTE: Mecab format: surface + TAB + feature(comma separated 7 fields)input_postagged="""吾輩名詞,普通名詞,*,*,吾輩,わがはい,代表表記:我が輩/わがはい カテゴリ:人は助詞,副助詞,*,*,は,は,*猫名詞,普通名詞,*,*,猫,ねこ,*である判定詞,*,判定詞,デアル列基本形,だ,である,*。特殊,句点,*,*,。,。,*名前名詞,普通名詞,*,*,名前,なまえ,*は助詞,副助詞,*,*,は,は,*まだ副詞,*,*,*,まだ,まだ,*ない形容詞,*,イ形容詞アウオ段,基本形,ない,ない,*。特殊,句点,*,*,。,。,*EOS"""sent=parser.parse_from_postagged(input_postagged)print(sent)
print(jdepp.to_tree(str(sent)))
# S-ID: 1; J.DepP 0: 吾輩は━━┓ 1: 猫である。━━┓ 2: 名前は━━┫ 3: まだ━━┫ 4: ない。EOS
jdepp.to_dot
is provided to export graph as dot(Graphviz)
dot_text=jdepp.to_dot(str(sentence))# feed output text to graphviz viewer, e.g. https://dreampuf.github.io/GraphvizOnline/
Seeexamples/ for more details
MeCab style. surface + TAB + feature(comma separated 7 fields)
You can use jagger-python for POS tagging.
importjaggerimportjdeppjagger_model_path="model/kwdlc/patterns"tokenizer=jagger.Jagger()tokenizer.load_model(jagger_model_path)text="吾輩は猫である。名前はまだない。"toks=tokenizer.tokenize(text)pos_tagged_input=""fortokintoks:pos_tagged_input+=tok.surface()+'\t'+tok.feature()+'\n'pos_tagged_input+="EOS\n"jdepp_model_path="model/knbc"parser.load_model(jdepp_model_path)parser.parse_from_postagged(pos_tagged_input)
If you just want to use J.DepP from cli(e.g. batch processing),you can build a standalone C++ app using CMake.
We modified J.DepP source code to improve portablily(e.g. Ours works well on Windows)
Training a model from Python binding is also not yet supported.For a while, you can train a model by using standalone C++ jdepp app.
This is for developer usecase.Use setup.py(pyproject.toml) to build python module for end users.
Install pybind11 devkit.
$ python -m pip install pybind11
Then invoke cmake with-DJDEPP_WITH_PYTHON
andpybind11_DIR
$ pybind11_DIR=/path/to/pybind11 cmake -DJDEPP_WITH_PYTHON=1 ...
- tag it:
git tag vX.Y.Z
- push tag:
git push --tags
Versioning is automatically done throughsetuptools_scm
- WASM build
- Training API support
- Integrate jagger POS tagger as builtin(standalone) POS tagger in J.DepP
- MMap(or SharedMemory) load of dict data to save memory in Python multiprocessing
jdepp-python is licensed under 2-Clause BSD license.
J.DepPhttps://www.tkl.iis.u-tokyo.ac.jp/~ynaga/jdepp/ is licensed under GPLv2/LGPLv2.1/BSD triple license.
- pacco, cedar, opal(subcompoennts of J.DepP): GPLv2/LGPLv2.1/BSD triple license. We choose BSD license.
- io-util: MIT license.
- optparse: Unlicensehttps://github.com/skeeto/optparse
About
Python binding for J.DepP(C++ implementation of Japanese Dependency Parsers)
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.