You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
In this project, we build part-of-speech (POS) taggers and chunkers for Indian Languages.
Languages supported: Telugu (te), Hindi (hi), Tamil (ta), Marathi (mr), Punjabi (pa), Kannada (kn), Malayalam (ml), Urdu (ur), Bengali (bn)
If you reuse this software, please use the following citation:
@inproceedings{PVS:SPSAL2007, editor = {P.V.S., Avinesh and Gali, Karthik}, title = {Part of Speech Tagging and Chunking using Conditional Random Fields and Transformation Based Learning} booktitle = {Proceedings of the Shallow Parsing for South Asian Languages (SPSAL) Workshop, held at IJCAI-07, Hyderabad, India}, series = {{SPSAL} Workshop Proceedings}, month = {January}, year = {2007}, pages = {21--24},}
Training Data Statistics and System Performances (F1 macro)
Languages
# Words
# Sents
CRF POS
CRF Chunk
BI-LSTM-CRF POS
BI-LSTM CRF Chunk
te
347k
30k
93%
96%
92%
92%
hi
350k
16.3k
93%
97%
94%
93%
bn
298.3k
14.6k
84%
95%
85%
88%
pa
152.5k
5.6k
92%
98%
94%
96%
mr
207.9k
8.5k
89%
95%
88%
90%
ur
158.9k
7.6k
90%
96%
92%
89%
ta
337k
14.2k
88%
92%
87%
85%
ml
192k
11.4k
96%
95%
98%
98%
kn
294.3k
16.5k
90%
98%
88%
87%
Training Data Statistics and System Performances (F1 macro) for NER