Tamil All Character Encoding (TACE16) is a scheme forencoding theTamil script in thePrivate Use Area ofUnicode, implementing asyllabary-based character model differing from the modified-ISCII model used byUnicode's existing Tamil implementation.[1][2]
The keyboard driver for this encoding scheme is available on theTamil Virtual Academy website for free.[3][4] It usesTamil 99 and Tamil Typewriterkeyboard layouts, which are approved by theGovernment of Tamil Nadu, and maps the input keystrokes to its corresponding characters of the TACE16 scheme.[2] To read files created using TACE16, the corresponding Unicode Tamil fonts are also available on the same website.[3][4] These fonts map glyphs for characters of TACE16 format, but also for theUnicode block for bothASCII andTamil characters, so that they can providebackward compatibility for reading existing files which are created using theTamil Unicode block.
All the characters of this encoding scheme are located in theprivate use area of theBasic Multilingual Plane ofUnicode'sUniversal Coded Character Set.
| Vowels→ | ∅ | A | Ā | I | Ī | U | Ū | E | Ē | Ai | O | Ō | Au | (Miscellaneous) | |||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Consonants ↓ | _0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8 | _9 | _A | _B | _C | _D | _E | _F | |
| (Symbols) | U+E10_ | ௳ | ௴ | ௵ | ௶ | ௷ | ௸ | ௹ | ௺ | ○ | ● | ★ | ராஜ | ௐ | |||
| (Numbers) | U+E18_ | ௦ | ௧ | ௨ | ௩ | ௪ | ௫ | ௬ | ௭ | ௮ | ௯ | ௰ | ௱ | ௲ | |||
| (Fractions) | U+E1A_ | 𑿌 | 𑿐 | 𑿑 | 𑿓 | 𑿅 | 𑿉 | 𑿎 | 𑿄 | 𑿈 | 𑿋 | 𑿍 | 𑿏 | 𑿀 | 𑿁 | 𑿂 | 𑿆 |
| ∅ | U+E1F_ | ் | ா | ி | ீ | ு | ூ | ெ | ே | ை | ொ | ோ | ௌ | ||||
| ∅ | U+E20_ | அ | ஆ | இ | ஈ | உ | ஊ | எ | ஏ | ஐ | ஒ | ஓ | ஔ | ஃ | |||
| K | U+E21_ | க் | க | கா | கி | கீ | கு | கூ | கெ | கே | கை | கொ | கோ | கௌ | |||
| Ng | U+E22_ | ங் | ங | ஙா | ஙி | ஙீ | ஙு | ஙூ | ஙெ | ஙே | ஙை | ஙொ | ஙோ | ஙௌ | |||
| C | U+E23_ | ச் | ச | சா | சி | சீ | சு | சூ | செ | சே | சை | சொ | சோ | சௌ | |||
| Ñ | U+E24_ | ஞ் | ஞ | ஞா | ஞி | ஞீ | ஞு | ஞூ | ஞெ | ஞே | ஞை | ஞொ | ஞோ | ஞௌ | |||
| Ṭ | U+E25_ | ட் | ட | டா | டி | டீ | டு | டூ | டெ | டே | டை | டொ | டோ | டௌ | |||
| Ṇ | U+E26_ | ண் | ண | ணா | ணி | ணீ | ணு | ணூ | ணெ | ணே | ணை | ணொ | ணோ | ணௌ | |||
| T | U+E27_ | த் | த | தா | தி | தீ | து | தூ | தெ | தே | தை | தொ | தோ | தௌ | |||
| N | U+E28_ | ந் | ந | நா | நி | நீ | நு | நூ | நெ | நே | நை | நொ | நோ | நௌ | |||
| P | U+E29_ | ப் | ப | பா | பி | பீ | பு | பூ | பெ | பே | பை | பொ | போ | பௌ | |||
| M | U+E2A_ | ம் | ம | மா | மி | மீ | மு | மூ | மெ | மே | மை | மொ | மோ | மௌ | |||
| Y | U+E2B_ | ய் | ய | யா | யி | யீ | யு | யூ | யெ | யே | யை | யொ | யோ | யௌ | |||
| R | U+E2C_ | ர் | ர | ரா | ரி | ரீ | ரு | ரூ | ரெ | ரே | ரை | ரொ | ரோ | ரௌ | |||
| L | U+E2D_ | ல் | ல | லா | லி | லீ | லு | லூ | லெ | லே | லை | லொ | லோ | லௌ | |||
| V | U+E2E_ | வ் | வ | வா | வி | வீ | வு | வூ | வெ | வே | வை | வொ | வோ | வௌ | |||
| Ḻ | U+E2F_ | ழ் | ழ | ழா | ழி | ழீ | ழு | ழூ | ழெ | ழே | ழை | ழொ | ழோ | ழௌ | |||
| Ḷ | U+E30_ | ள் | ள | ளா | ளி | ளீ | ளு | ளூ | ளெ | ளே | ளை | ளொ | ளோ | ளௌ | |||
| Ṟ | U+E31_ | ற் | ற | றா | றி | றீ | று | றூ | றெ | றே | றை | றொ | றோ | றௌ | |||
| Ṉ | U+E32_ | ன் | ன | னா | னி | னீ | னு | னூ | னெ | னே | னை | னொ | னோ | னௌ | |||
| Grantha characters | |||||||||||||||||
| J | U+E33_ | ஜ் | ஜ | ஜா | ஜி | ஜீ | ஜு | ஜூ | ஜெ | ஜே | ஜை | ஜொ | ஜோ | ஜௌ | |||
| Sh | U+E34_ | ஶ் | ஶ | ஶா | ஶி | ஶீ | ஶு | ஶூ | ஶெ | ஶே | ஶை | ஶொ | ஶோ | ஶௌ | |||
| Ṣ | U+E35_ | ஷ் | ஷ | ஷா | ஷி | ஷீ | ஷு | ஷூ | ஷெ | ஷே | ஷை | ஷொ | ஷோ | ஷௌ | |||
| S | U+E36_ | ஸ் | ஸ | ஸா | ஸி | ஸீ | ஸு | ஸூ | ஸெ | ஸே | ஸை | ஸொ | ஸோ | ஸௌ | |||
| H | U+E37_ | ஹ் | ஹ | ஹா | ஹி | ஹீ | ஹு | ஹூ | ஹெ | ஹே | ஹை | ஹொ | ஹோ | ஹௌ | |||
| Kṣ | U+E38_ | க்ஷ் | க்ஷ | க்ஷா | க்ஷி | க்ஷீ | க்ஷு | க்ஷூ | க்ஷெ | க்ஷே | க்ஷை | க்ஷொ | க்ஷோ | க்ஷௌ | ஶ்ரீ | ||
| Legend: | |
|---|---|
| Syllabograms with irregular glyphs, which inherently need to be handled individually by a font.[a] | |
| Newly added. Not present in Unicode version 6.3. | |
| Corresponds to a character in theTamil Supplement block, added in Unicode version 12 (2019) | |
| Allocated for research (NLP) | |
This"criticism" or "controversy" sectionmay compromise the article'sneutrality. Please helpintegrate negative information into other sections or removeundue focus on minor aspects throughdiscussion on thetalk page.(August 2024) |

Theexisting Unicode character model for Tamil is, like most ofIndic Unicode,[b] anabugida-based model derived fromISCII. It been criticized for several reasons.[1]
Unicode represents only 31 Tamil base characters as singlecode points, out of 247grapheme clusters. These include stand-alone vowels, and 23 basic consonant glyphs (which, due to not bearing avirama, nonetheless denote a syllable with both a consonant and a vowel when used on their own). The others are represented as sequences of code points, requiring software support for advanced typography features (such asApple Advanced Typography,Graphite, orOpenType advanced typography) to render correctly. This also requires the use of invisiblezero-width joiner andzero-width non-joiner characters in places where the desired grapheme cluster would otherwise be ambiguous. This complexity can result in security vulnerabilities and ambiguous combinations, can require the use of an exception table to forbid invalid combinations of code points, and can necessitate the use ofstring normalization to compare twostrings for equality.
Additionally, since syllables with both a consonant and a vowel make up 64 to 70% of Tamil text, an abugida-based model which encodes the consonant and vowel parts as separate code points is inefficient, in terms of how long a string needs to be to contain a given piece of text, in comparison with a syllabary-based model.
Furthermore, ISCII is primarily an encoding ofDevanagari, and the ISCII encodings of otherBrahmic scripts (including Tamil) encode characters over the code points of the corresponding characters in Devanagari ISCII. Although Unicode encodes the Brahmic scripts separately from one another, the Tamil block mirrors the ISCII layout (with Devanagari-style character ordering, and reserved space in positions corresponding to Devanagari characters with no Tamil equivalent); consequently, the characters are not in the natural sequence order, and stringscollated by code point (analogous to "ASCIIbetical" sorting of English text) will not produce the expected sorting order. It requiresa complex collation algorithm for arranging them in the natural order.
This articleis inlist format but may read better asprose. You can help byconverting this article, if appropriate.Editing help is available.(January 2024) |
The following data provides a comparison of current Unicode Tamil vs. TACE16 on e-governance and browsing:[1][better source needed]
TACE16 provides performance improvements in processing time and processing space. It encompasses all of the general Tamil text; it is sequential; and it is unambiguous, with any point corresponding to only one character.[1][better source needed] The TACE16 system takes fewerinstruction cycles than Unicode Tamil, and also allows programming based on Tamil grammar[clarification needed], which needs extra framework development in Unicode Tamil.
TheUnicode Consortium publishes a dedicatedFAQ page on the Tamil script which responds to some of the criticisms. In defence of the ISCII model, the Consortium notes that expertlinguists,typographers and programmers were involved in its development, but acknowledges that compromises were made due to ISCII being constrained to single-byteextended ASCII. The Consortium points out that Unicode Tamil is now implemented by all majoroperating systems andweb browsers, and maintains that it should be used in open interchange contexts, such as online, since tools such assearch engines would not necessarily be able to identify or interpret a sequence of Unicode private-use code points as Tamil text. However, the Consortium does not object to the use of Private-Use Area schemes, including TACE16, internally to particular processes for which they are useful. In particular, it highlights that bothmarkup schemes and alternative encoding schemes may be used by researchers for specialised purposes such asnatural-language processing.[6]
Unicode defines normative named-sequences for all Tamil pure consonants and syllables which are represented with sequences of more than one code point, and a dedicated table is published as part of the Unicode Standard listing all of these sequences, in their traditional order, along with their correct glyphs. The Consortium points out that it has been open to accepting proposals for characters for whichno existing Unicode representation exists: for example, adding several historical fractions and other symbols as theTamil Supplement block in version 12.0 in 2019.[6]
Regarding collation, the Consortium argues that obtaining the correct result from sorting by code point is the exception rather than the rule, highlighting that, in unmodifiedASCIIbetical ordering, the uppercase Latin letterZ sorts before the lowercase lettera, and also highlighting that collation rules often differ by language (see e.g.ö). Regarding space efficiency, the Consortium argues that storage space and bandwidth taken up by text is usually far overshadowed by other accompanying media such as images and video, and that text content performs well under general-purpose compression methods such asDeflate (originally from theZIP file format, standardized in RFC 1951 and integrated in the HTTP protocol as a generic encoding scheme).[6]
When first published (version 1.0.0), Unicode made only limited stability guarantees. As such, theoriginal Tibetan block was deleted in version 1.0.1 (and its space has since been occupied by theMyanmar block), and theoriginal block for Korean syllables was deleted in version 2.0 (and is now occupied byCJK Unified Ideographs Extension A). Both the currentHangul Syllables block for Korean syllables, and the currentTibetan block, date back to Unicode 2.0. This was done on the assumption that little or no existing content using Unicode for those writing systems existed,[7] since it would break compatibility with all existing Unicode content in, andinput methods for, those writing systems. After this so-dubbed "Korean mess", the responsible committees pledged not to make such a compatibility-breaking change ever again,[7] which now forms part of the Unicode Stability Policy.[8]
This stability policy has been upheld ever since, in spite of demands to re-encode or change the character model for both Tibetan and Korean a second time, made byChina andNorth Korea respectively.[9][10][11][12] Likewise in relation to Tamil, the Consortium emphasises the "crucial issue of maintaining the stability of the standard for existing implementations", and argues that "the resulting costs and impact of destabilizing the standard" would substantially outweigh any efficiency benefits in processing speed or storage space.[6]
There was a proposal to re-encode Tamil[13] that was rejected by Unicode, who said that the re-encoding would be damaging and that there was no convincing evidence that Unicode Tamil encoding is deficient.[14]
The Open-Tamil project[15] provides many of the common operations. It claims Level-1 compliance of Tamil text processing without using TACE16, but is written on top of extra programming logic which is needed for Unicode Tamil.