Welcome to Malaya’s documentation!#
Malaya is a Natural-Language-Toolkit library for bahasa Malaysia, powered by PyTorch.
Documentation#
Proper documentation is available athttps://malaya.readthedocs.io/
Installing from the PyPI#
$ pip install malaya
It will automatically install all dependencies except for PyTorch. So you can choose your own PyTorch CPU / GPU version.
OnlyPython >= 3.6.0, andPyTorch >= 1.10 are supported.
If you are a Windows user, make sure readhttps://malaya.readthedocs.io/en/latest/running-on-windows.html
Development Release#
Install frommaster branch,
$ pip install git+https://github.com/huseinzol05/malaya.git
We recommend to usevirtualenv for development.
Documentation athttps://malaya.readthedocs.io/en/latest/
Pretrained Models#
Malaya also released Malaysian pretrained models, simply check athttps://huggingface.co/mesolitica
References#
If you use our software for research, please cite:
@misc{Malaya,Natural-Language-ToolkitlibraryforbahasaMalaysia,poweredbyPyTorch,author={Husein,Zolkepli},title={Malaya},year={2018},publisher={GitHub},journal={GitHubrepository},howpublished={\url{https://github.com/mesolitica/malaya}}}
Acknowledgement#
Thanks to,
KeyReply for private V100s cloud.

Nvidia for Azure credit.

Tensorflow Research Cloud for free TPUs access.

Contributing#
Thank you for contributing this library, really helps a lot. Feel free to contact me to suggest me anything or want to contribute other kind of forms, we accept everything, not just code!
Contents:#
Getting Started
- Speech Toolkit
- Installation
- Dataset
- Running on Windows
- Contributing
- API
- malaya
- malaya.augmentation.abstractive
- malaya.augmentation.rules
- malaya.dictionary
- malaya.generator.isi_penting
- malaya.keyword.abstractive
- malaya.keyword.extractive
- malaya.normalizer.rules
- malaya.qa.extractive
- malaya.similarity.doc2vec
- malaya.similarity.semantic
- malaya.spelling_correction.jamspell
- malaya.spelling_correction.probability
- malaya.spelling_correction.spylls
- malaya.spelling_correction.symspell
- malaya.summarization.abstractive
- malaya.summarization.extractive
- malaya.topic_model.decomposition
- malaya.topic_model.transformer
- malaya.zero_shot.classification
- malaya.cluster
- malaya.constituency
- malaya.dependency
- malaya.embedding
- malaya.emotion
- malaya.entity
- malaya.jawi
- malaya.knowledge_graph
- malaya.language_detection
- malaya.language_model
- malaya.llm
- malaya.nsfw
- malaya.num2word
- malaya.paraphrase
- malaya.pos
- malaya.preprocessing
- malaya.segmentation
- malaya.sentiment
- malaya.stack
- malaya.stem
- malaya.syllable
- malaya.tatabahasa
- malaya.tokenizer
- malaya.transformer
- malaya.translation
- malaya.true_case
- malaya.word2num
- malaya.wordvector
- malaya.model.extractive_summarization
- malaya.model.ml
- malaya.model.rules
- malaya.torch_model.gpt2_lm
- malaya.torch_model.huggingface
- malaya.torch_model.mask_lm
GPU Environment
Augmentation Module
Dictionary Module
Tokenization Module
Language Model Module
Spelling Correction Module
Normalization Module
- Preprocessing
- Demoji
- Stemmer and Lemmatization
- True Case
- Segmentation
- Num2Word
- Word2Num
- Rules based Normalizer
- Load normalizer
- Use translator
- Use segmenter
- Use stemmer
- Validate uppercase
- Validate non human word
- Skip spelling correction
- Pass kwargs preprocessing
- Normalize text
- Normalize url
- Normalize email
- Normalize year
- Normalize telephone
- Normalize date
- Normalize time
- Normalize emoji
- Normalize elongated
- Normalize hingga
- Normalize pada hari bulan
- Normalize fraction
- Normalize money
- Normalize units
- Normalize percents
- Normalize IC
- Normalize Numbers
- Normalize x kali
- Normalize Cardinals
- Normalize Ordinals
- Normalize entity
Jawi Module
Kesalahan Tatabahasa Module
Generative Module
Classification Module
Embedding Module
Similarity Module
Parsing Module
Summarization Module
Translation Module
Question Answer Module
Zeroshot Module
Topic Modeling Module
Keyword Module
Knowledge Graph