- Notifications
You must be signed in to change notification settings - Fork13
Codebase for Indic-Transliteration using Seq2Seq RNN. For latest repo with Transformer-based models, check:https://github.com/AI4Bharat/IndicXlit
License
AI4Bharat/IndicNLP-Transliteration
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Project Website |Demo UI |Python Library
The main goal of this project is to create open source input tools for content creation in under-represented languages in India.
It started in collaboration withStory Weaver a non-profit working towards foundational literary education for children, supported byGoogle's AI for Social Good initiative.
Most languages in India do not have digital presence due to an underdeveloped ecosystem. One of the major bottlenecks in content creation and language adoption, is difficulty to input text in several native Indian languages. Lack of stable input tools in underserved languages is huge barrier for creating digital content and NLP datasets in these languages.
Supported Languages
- Bengali - বাংলা
- Gujarati - ગુજરાતી
- Hindi - हिंदी
- Kannada - ಕನ್ನಡ
- Konkani Goan - कोंकणी
- Maithili - मैथिली
- Malayalam - മലയാളം
- Marathi - मराठी
- Panjabi Eastern - ਪੰਜਾਬੀ
- Sindhi - سنڌي
- Sinhala - සිංහල
- Telugu - తెలుగు
- Tamil - தமிழ்
- Urdu - اُردُو
For Attributions and Contributions lists,check here 🖖
This repository is developed to facilate easier experimentation with different network architecture models, reformulated objectives with minimal effort and highly tinkerable, rather than a offshelf library.
A Condensed standalone version of a simple model training, inferencing and accuracy computation is created as jupyter notebook.
Pythonic transliteration library is available asPython Package Index and also under github releases.
Follow usages inapps readme.
Transliteration models for languages are made available as releases, in a easy deployable way.
All the NN models (along with metadata) of Xlit - Transliteration are licensed under aCreative Commons Attribution-ShareAlike 4.0 International License.
Datasets created as part of the project for languages Maithili, Konkani, Hindi are made available as JSON files underdownloads.
Xlit - Transliteration Datasets byStory Weaver &AI4Bharat are licensed under aCreative Commons Attribution 4.0 International License.
Kindly attribute if you use the dataset for your research or products
If you have benefited by our datasets/models/services or got motivated by our works, we would like to hear from you.
email:opensource@ai4bharat.org
About
Codebase for Indic-Transliteration using Seq2Seq RNN. For latest repo with Transformer-based models, check:https://github.com/AI4Bharat/IndicXlit
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Uh oh!
There was an error while loading.Please reload this page.
Contributors3
Uh oh!
There was an error while loading.Please reload this page.