Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A Python wrapper of the Yandex Mystem 3.1 morphological analyzer (http://api.yandex.ru/mystem). The original tool is shipped as a binary and this library makes it easy to integrate it in Python projects. Let us know in the issues if you would like to be involved into the developments or maintenance of this project. If you have any fix or suggest…

License

NotificationsYou must be signed in to change notification settings

nlpub/pymystem3

Repository files navigation

Build Status

Introduction

This module contains a wrapper for an excellent morphological analyzer for Russian languageYandex Mystem 3.1 released in June 2014.A morphological analyzer can perform lemmatization of text and derive a set of morphological attributes for each token.For more details about the algorithm see I. Segalovich«A fast morphological algorithm with unknown word guessing induced by a dictionary for a web searchengine», MLMTA-2003, Las Vegas, Nevada, USA.

Python is the language of choice for many computational linguists, including those working with Russian language. The main motivation for this development was absence of any Python wrapper for the Mystem, a one of the most popular morphological analyzers for Russian language along with thePyMorphy2, theTreeTagger andAOT.

The third version of Mystem introduces several importaint improvements, most importaintly part-of-speech disambiguation. Our wrapper runs the Mystem in the mode which performs POS disambiguation.

This wrapper is open sources under MIT license. However, please consider that the Yandex Mystem is not open source and licensed under conditions of theYandex License.

System Requrements

The wrapper works with CPython 2.6+/3.3+ and PyPy 1.9+.

The wrapper was tested on Ubuntu Linux 12.04+, Mac OSX 10.9+ and Windows 7+.

For 32bit architectures and freebsd platform support use ver. 0.1.10.

Installation

  1. Stable version:https://pypi.python.org/pypi/pymystem3. You can install it using pip:

    pip install pymystem3
  1. Latest version (recommended):https://github.com/nlpub/pymystem3:

    pip install git+https://github.com/nlpub/pymystem3

A Quick Example

Lemmatization

>>> from pymystem3 import Mystem>>> text = "Красивая мама красиво мыла раму">>> m = Mystem()>>> lemmas = m.lemmatize(text)>>> print(''.join(lemmas))красивый мама красиво мыть рама

Getting grammatical information and lemmas.

import jsonfrom pymystem3 import Mystemtext = "Красивая мама красиво мыла раму"m = Mystem()lemmas = m.lemmatize(text)print ("lemmas:", ''.join(lemmas))print ("full info:", json.dumps(m.analyze(text), ensure_ascii=False))lemmas: красивый мама красиво мыть рамаfull info: [{"text": "Красивая", "analysis": [{"lex": "красивый", "gr": "A=им,ед,полн,жен"}]}, {"text": " "}, {"text": "мама", "analysis": [{"lex": "мама", "gr": "S,жен,од=им,ед"}]}, {"text": " "}, {"text": "красиво", "analysis": [{"lex": "красиво", "gr": "ADV="}]}, {"text": " "}, {"text": "мыла", "analysis": [{"lex": "мыть", "gr": "V,несов,пе=прош,ед,изъяв,жен"}]}, {"text": " "}, {"text": "раму", "analysis": [{"lex": "рама", "gr": "S,жен,неод=вин,ед"}]}, {"text": "\n"}]

Issues

Please report any bugs or requests that you have using the GitHub issue tracker (https://github.com/nlpub/pymystem3/issues)!We have only very limited amount of resources to maintain this project: please propose a pull request directly if you see an obvious way of fixing the issue. We are very open to accepting bug fixes and your help is greatly appreciated.

Authors

The full list of contributors is listed by Github. You can also contact the original contributors of the project via email:

  • Denis Sukhonin (d.sukhonin): development
  • Alexander Panchenko (panchenko.alexander): conception

@ gmail

If you are interested in further developments or becoming a maintainter of this project please drop us an email: your help is greatly appreciated.

About

A Python wrapper of the Yandex Mystem 3.1 morphological analyzer (http://api.yandex.ru/mystem). The original tool is shipped as a binary and this library makes it easy to integrate it in Python projects. Let us know in the issues if you would like to be involved into the developments or maintenance of this project. If you have any fix or suggest…

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors10

Languages


[8]ページ先頭

©2009-2025 Movatter.jp