- Notifications
You must be signed in to change notification settings - Fork23
Turkish deasciifier in Python based on Deniz Yüret's turkish-mode for Emacs
emres/turkish-deasciifier
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is adeasciifier Python library and command line utility for Turkish that solves the problem ofdiacritics restoration (also known asdiacritics reconstruction). It takes a Turkish string containing onlyASCII characters (that is, without proper diacritics) and replaces the relevant characters with their correspondingTurkish letters.
The web-based, online version of this system is available at:
Keep in mind that diacritics restoration (deasciification) for Turkish doesn't work 100% of the time; it is an active research topic! Still, this library is good enough for many practical purposes, and served many people and projects in the last 10 years.
This system is based on theturkish-mode forGNU Emacs byProf. Deniz Yüret.
- Installation
- Example Python Library Usage
- Example CLI (Command Line Interface) Usage
- Other Programming Languages and Systems
- Advanced Research
For now,the recommended way to install is to usepip and install direcly from theproject's GitHub repository:
pip install git+https://github.com/emres/turkish-deasciifier.git
Keep in mind thatswitching to Python 3 is strongly recommended! If you insist on using Python 2.x, you can install using the following command:
pip install Turkish-Deasciifier
fromturkish.deasciifierimportDeasciifiermy_ascii_turkish_txt="Opusmegi cagristiran catirtilar."deasciifier=Deasciifier(my_ascii_turkish_txt)my_deasciified_turkish_txt=deasciifier.convert_to_turkish()print(my_deasciified_turkish_txt)
Keep in mind thatswitching to Python 3 is strongly recommended! If you insist on using Python 2.x, you can use the library in the following manner:
fromturkish.deasciifierimportDeasciifiermy_ascii_turkish_txt="Opusmegi cagristiran catirtilar."deasciifier=Deasciifier(my_ascii_turkish_txt.decode("utf-8"))my_deasciified_turkish_txt=deasciifier.convert_to_turkish()printmy_deasciified_turkish_txt.encode("utf-8")
Example tested in a Bash shell:
$echo"Opusmegi cagristiran catirtilar."| turkish-deasciify$ cat somefile.txt| turkish-deasciify
Keep in mind thatswitching to Python 3 is strongly recommended!
Example tested in a Bash shell:
$echo"Opusmegi cagristiran catirtilar."| turkish-deasciify-python2$ cat somefile.txt| turkish-deasciify-python2
- Java:https://github.com/ahmetb/turkish-deasciifier-java
- Perl:https://metacpan.org/pod/release/BURAK/Lingua-TR-ASCII-0.13/lib/Lingua/TR/ASCII.pm
- Haskell:http://hackage.haskell.org/package/turkish-deasciifier
- Node.js:https://github.com/f/deasciifier/
- VIM:https://github.com/joom/turkish-deasciifier.vim
- Emacs Lisp:https://github.com/emres/turkish-mode (also available as apackage in MELPA)
For recent advanced scientific research articles, please see the following:
- Diacritic Restoration Using Recurrent Neural Network
- Diacritics Restoration Using Neural Networks
- Diacritic restoration of Turkish tweets with word2vec
- Vowel and Diacritic Restoration for Social Media Texts
About
Turkish deasciifier in Python based on Deniz Yüret's turkish-mode for Emacs
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.