- Notifications
You must be signed in to change notification settings - Fork14
mohabmes/Arabycia
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Arabic NLP tool Built using NLTK, Pyaramorph, and Sinai-corpus to perform:
- Tokenization
- Lemmatization
- Segmentation
- Transliteration
- Reverse Transliteration
- Sentence diacritization
- Text Search
- POS tagging
- Translation
- Find ambiguity
text = 'يستعيد الكاتب في هذه الرواية كيف تحولت من مدينة للانوار الي مدينة للاشباح'arabycia = Arabycia()arabycia.set_raw_text(text)arabycia.analyze()
Sentence :يستعيد الكاتب في هذه الرواية كيف تحولت من مدينة للانوار الي مدينة للاشباحWith Diacritics :يَسْتَعِيد الكاتِب فِي هٰذِهِ الرِوايَة كَيْفَ تَحَوَّلْتُ مِن مَدِينَة لِلأَنْوار إِلَى مَدِينَة لِلأَشْباحPOS :sotaEiyd/VERB_IMPERFECT kAtib/NOUN fiy/PREP h`*ihi/DEM_PRON_F riwAy/NOUN kayofa/REL_PRON taHaw~al/VERB_PERFECT min/PREP madiyn/NOUN >anowAr/NOUN <ilaY/PREP madiyn/NOUN >a$obAH/NOUNWord : يَسْتَعِيدyasotaEiyd{isotaEAd_1 POS : ya/IV3MS+sotaEiyd/VERB_IMPERFECT Gloss : recover;regain;reclaimWord : هٰذِهِh`*ihih`*A_1 POS : h`*ihi/DEM_PRON_F Gloss : this/theseWord : لِلأَنْوارlil>anowArnuwr_2 POS : li/PREP+Al/DET+>anowAr/NOUN Gloss : lightsWord : لِلأَشْباحlil>a$obAH$abaH_1 POS : li/PREP+Al/DET+>a$obAH/NOUN Gloss : specters;shapesWord : الكاتِبAlkAtibkAtib_1 POS : Al/DET+kAtib/NOUN Gloss : writer;authorWord : فِيfiyfiy_1 POS : fiy/PREP Gloss : inWord : الرِوايَةAlriwAyapriwAyap_1 POS : Al/DET+riwAy/NOUN+ap/NSUFF_FEM_SG Gloss : story;novelWord : كَيْفَkayofakayofa_1 POS : kayofa/REL_PRON Gloss : howWord : تَحَوَّلْتُtaHaw~alotutaHaw~al_1 POS : taHaw~al/VERB_PERFECT+tu/PVSUFF_SUBJ:1S Gloss : be changed;be transformedWord : مِنminmin_1 POS : min/PREP Gloss : fromWord : مَدِينَةmadiynapmadiynap_1 POS : madiyn/NOUN+ap/NSUFF_FEM_SG Gloss : cityWord : إِلَى<ilaY<ilaY_1 POS : <ilaY/PREP Gloss : to;towardsWord : مَدِينَةmadiynapmadiynap_1 POS : madiyn/NOUN+ap/NSUFF_FEM_SG Gloss : city
text = 'يستجمع المؤرخ أفكاره'arabycia = Arabycia()arabycia.set_raw_text(text)search_result = arabycia.text_search("جمع")print(search_result)
['يستجمع']
- Arabycia uses modified version of pyaramorph (rewritten for better data manipulation).
- Arabycia usesSinai-corpus: Arabic tagged corpus.
- NLTK
- Sinai-corpus
About
Arabic NLP tool used to perform Text Search, POS tagging, Translation, auto-diacritization, etc..