Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Thai natural language processing in Python

License

NotificationsYou must be signed in to change notification settings

PyThaiNLP/pythainlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Project Logo

pypiPython 3.9LicenseDOI

Project Status: ActiveCodacy GradeCoverage Status

Google Colab BadgeChat on Matrix

PyThaiNLP is a Python package for text processing and linguistic analysis, similar toNLTK with a focus on Thai language.

PyThaiNLP เป็นไลบารีภาษาไพทอนสำหรับประมวลผลภาษาธรรมชาติ คล้ายกับ NLTK โดยเน้นภาษาไทยดูรายละเอียดภาษาไทยได้ที่ README_TH.MD

Quick install

pip install pythainlp
VersionDescriptionStatus
5.1.2StableChange Log
devRelease Candidate for 5.2Change Log

Getting Started

Capabilities

PyThaiNLP provides standard linguistic analysis for Thai language and standard Thai locale utility functions.Some of these functions are also available via the command-line interface (runthainlp in your shell).

Partial list of features:

  • Convenient character and word classes, like Thai consonants (pythainlp.thai_consonants), vowels (pythainlp.thai_vowels), digits (pythainlp.thai_digits), and stop words (pythainlp.corpus.thai_stopwords) -- comparable to constants likestring.letters,string.digits, andstring.punctuation
  • Linguistic unit segmentation at different levels: sentence (sent_tokenize), word (word_tokenize), and subword (subword_tokenize)
  • Part-of-speech tagging (pos_tag)
  • Spelling suggestion and correction (spell andcorrect)
  • Phonetic algorithm and transliteration (soundex andtransliterate)
  • Collation (sorted by dictionary order) (collate)
  • Number read out (num_to_thaiword andbahttext)
  • Datetime formatting (thai_strftime)
  • Thai-English keyboard misswitched fix (eng_to_thai,thai_to_eng)

Installation

pip install --upgrade pythainlp

This will install the latest stable release of PyThaiNLP.

Install different releases:

  • Stable release:pip install --upgrade pythainlp
  • Pre-release (nearly ready):pip install --upgrade --pre pythainlp
  • Development (likely to break things):pip install https://github.com/PyThaiNLP/pythainlp/archive/dev.zip

Installation Options

Some functionalities, like Thai WordNet, may require extra packages. To install those requirements, specify a set of[name] immediately afterpythainlp:

pip install"pythainlp[extra1,extra2,...]"

Possibleextras:

  • full (install everything)
  • compact (install a stable and small subset of dependencies)
  • attacut (to support attacut, a fast and accurate tokenizer)
  • benchmarks (forword tokenization benchmarking)
  • icu (for ICU, International Components for Unicode, support in transliteration and tokenization)
  • ipa (for IPA, International Phonetic Alphabet, support in transliteration)
  • ml (to support ULMFiT models for classification)
  • thai2fit (for Thai word vector)
  • thai2rom (for machine-learnt romanization)
  • wordnet (for Thai WordNet API)

For dependency details, look at theextras variable insetup.py.

Data Directory

  • Some additional data, like word lists and language models, may be automatically downloaded during runtime.
  • PyThaiNLP caches these data under the directory~/pythainlp-data by default.
  • The data directory can be changed by specifying the environment variablePYTHAINLP_DATA_DIR.
  • See the data catalog (db.json) athttps://github.com/PyThaiNLP/pythainlp-corpus

Command-Line Interface

Some of PyThaiNLP functionalities can be used via command line with thethainlp command.

For example, to display a catalog of datasets:

thainlp data catalog

To show how to use:

thainlphelp

Testing and test suites

We test core functionalities on all officially supported Python versions.

Some functionality requiring extra dependencies may be tested less frequentlydue to potential version conflicts or incompatibilities between packages.

Test cases are categorized into three groups: core, compact, and extra.You can find these tests in thetests/ directory.

For more detailed information on testing, please refer to the tests README:tests/README.md

Licenses

License
PyThaiNLP source codes and notebooksApache Software License 2.0
Corpora, datasets, and documentations created by PyThaiNLPCreative Commons Zero 1.0 Universal Public Domain Dedication License (CC0)
Language models created by PyThaiNLPCreative Commons Attribution 4.0 International Public License (CC-by)
Other corpora and models that may be included in PyThaiNLPSeeCorpus License

Contribute to PyThaiNLP

  • Please fork and create a pull request :)
  • For style guides and other information, including references to algorithms we use,please refer to ourcontributing page.

Who uses PyThaiNLP?

You can readINTHEWILD.md.

Citations

If you usePyThaiNLP in your project or publication,please cite the library as follows:

Phatthiyaphaibun, Wannaphong, Korakot Chaovavanich, Charin Polpanumas, Arthit Suriyawongkul, Lalita Lowphansirikul, and Pattarawat Chormai. “Pythainlp: Thai Natural Language Processing in Python”. Zenodo, 2 June 2024.http://doi.org/10.5281/zenodo.3519354.

or by BibTeX entry:

@software{pythainlp,title ="{P}y{T}hai{NLP}: {T}hai Natural Language Processing in {P}ython",author ="Phatthiyaphaibun, Wannaphong  and      Chaovavanich, Korakot  and      Polpanumas, Charin  and      Suriyawongkul, Arthit  and      Lowphansirikul, Lalita  and      Chormai, Pattarawat",doi ={10.5281/zenodo.3519354},license ={Apache-2.0},month = jun,url ={https://github.com/PyThaiNLP/pythainlp/},version ={v5.0.4},year ={2024},}

OurNLP-OSS 2023 paper:

Wannaphong Phatthiyaphaibun, Korakot Chaovavanich, Charin Polpanumas, Arthit Suriyawongkul, Lalita Lowphansirikul, Pattarawat Chormai, Peerat Limkonchotiwat, Thanathip Suntorntip, and Can Udomcharoenchaikit. 2023.PyThaiNLP: Thai Natural Language Processing in Python. In Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023), pages 25–36, Singapore, Singapore. Empirical Methods in Natural Language Processing.

and its BibTeX entry:

@inproceedings{phatthiyaphaibun-etal-2023-pythainlp,title ="{P}y{T}hai{NLP}: {T}hai Natural Language Processing in {P}ython",author ="Phatthiyaphaibun, Wannaphong  and      Chaovavanich, Korakot  and      Polpanumas, Charin  and      Suriyawongkul, Arthit  and      Lowphansirikul, Lalita  and      Chormai, Pattarawat  and      Limkonchotiwat, Peerat  and      Suntorntip, Thanathip  and      Udomcharoenchaikit, Can",editor ="Tan, Liling  and      Milajevs, Dmitrijs  and      Chauhan, Geeticka  and      Gwinnup, Jeremy  and      Rippeth, Elijah",booktitle ="Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)",month = dec,year ="2023",address ="Singapore, Singapore",publisher ="Empirical Methods in Natural Language Processing",url ="https://aclanthology.org/2023.nlposs-1.4",pages ="25--36",abstract ="We present PyThaiNLP, a free and open-source natural language processing (NLP) library for Thai language implemented in Python. It provides a wide range of software, models, and datasets for Thai language. We first provide a brief historical context of tools for Thai language prior to the development of PyThaiNLP. We then outline the functionalities it provided as well as datasets and pre-trained language models. We later summarize its development milestones and discuss our experience during its development. We conclude by demonstrating how industrial and research communities utilize PyThaiNLP in their work. The library is freely available at https://github.com/pythainlp/pythainlp.",}

Sponsors

LogoDescription
VISTEC-depa Thailand Artificial Intelligence Research InstituteSince 2019, our contributors Korakot Chaovavanich and Lalita Lowphansirikul have been supported byVISTEC-depa Thailand Artificial Intelligence Research Institute.
MacStadiumWe get support of free Mac Mini M1 fromMacStadium for running CI builds.

Made with ❤️ | PyThaiNLP Team 💻 | "We build Thai NLP" 🇹🇭

We have only one official repository athttps://github.com/PyThaiNLP/pythainlp and another mirror athttps://gitlab.com/pythainlp/pythainlp
Beware of malware if you use codes from mirrors other than the official two on GitHub and GitLab.

[8]ページ先頭

©2009-2025 Movatter.jp