FreeType »Docs »Core API » Character Mapping
Character Mapping¶
Synopsis¶
This section holds functions and structures that are related to mapping character input codes to glyph indices.
Note that for many scripts the simplistic approach used by FreeType of mapping a single character to a single glyph is not valid or possible! In general, a higher-level library like HarfBuzz or ICU should be used for handling text strings.
FT_CharMap¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
typedefstruct FT_CharMapRec_*FT_CharMap;A handle to a character map (usually abbreviated to ‘charmap’). A charmap is used to translate character codes in a given encoding into glyph indexes for its parent's face. Some font formats may provide several charmaps per font.
Each face object owns zero or more charmaps, but only one of them can be ‘active’, providing the data used byFT_Get_Char_Index orFT_Load_Char.
The list of available charmaps in a face is available through theface->num_charmaps andface->charmaps fields ofFT_FaceRec.
The currently active charmap is available asface->charmap. You should callFT_Set_Charmap to change it.
note
When a new face is created (either throughFT_New_Face orFT_Open_Face), the library looks for a Unicode charmap within the list and automatically activates it. If there is no Unicode charmap, FreeType doesn't set an ‘active’ charmap.
also
SeeFT_CharMapRec for the publicly accessible fields of a given character map.
FT_CharMapRec¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
typedefstruct FT_CharMapRec_ {FT_Face face;FT_Encoding encoding;FT_UShort platform_id;FT_UShort encoding_id; }FT_CharMapRec;The base charmap structure.
fields
| face | A handle to the parent face object. |
| encoding | An |
| platform_id | An ID number describing the platform for the following encoding ID. This comes directly from the TrueType specification and gets emulated for other formats. |
| encoding_id | A platform-specific encoding number. This also comes from the TrueType specification and gets emulated similarly. |
FT_Encoding¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
typedefenum FT_Encoding_ {FT_ENC_TAG(FT_ENCODING_NONE, 0, 0, 0, 0 ),FT_ENC_TAG(FT_ENCODING_MS_SYMBOL, 's', 'y', 'm', 'b' ),FT_ENC_TAG(FT_ENCODING_UNICODE, 'u', 'n', 'i', 'c' ),FT_ENC_TAG(FT_ENCODING_SJIS, 's', 'j', 'i', 's' ),FT_ENC_TAG(FT_ENCODING_PRC, 'g', 'b', ' ', ' ' ),FT_ENC_TAG(FT_ENCODING_BIG5, 'b', 'i', 'g', '5' ),FT_ENC_TAG(FT_ENCODING_WANSUNG, 'w', 'a', 'n', 's' ),FT_ENC_TAG(FT_ENCODING_JOHAB, 'j', 'o', 'h', 'a' ), /* for backward compatibility */ FT_ENCODING_GB2312 =FT_ENCODING_PRC,FT_ENCODING_MS_SJIS =FT_ENCODING_SJIS,FT_ENCODING_MS_GB2312 =FT_ENCODING_PRC,FT_ENCODING_MS_BIG5 =FT_ENCODING_BIG5,FT_ENCODING_MS_WANSUNG =FT_ENCODING_WANSUNG,FT_ENCODING_MS_JOHAB =FT_ENCODING_JOHAB,FT_ENC_TAG(FT_ENCODING_ADOBE_STANDARD, 'A', 'D', 'O', 'B' ),FT_ENC_TAG(FT_ENCODING_ADOBE_EXPERT, 'A', 'D', 'B', 'E' ),FT_ENC_TAG(FT_ENCODING_ADOBE_CUSTOM, 'A', 'D', 'B', 'C' ),FT_ENC_TAG(FT_ENCODING_ADOBE_LATIN_1, 'l', 'a', 't', '1' ),FT_ENC_TAG(FT_ENCODING_OLD_LATIN_2, 'l', 'a', 't', '2' ),FT_ENC_TAG(FT_ENCODING_APPLE_ROMAN, 'a', 'r', 'm', 'n' ) }FT_Encoding; /* these constants are deprecated; use the corresponding `FT_Encoding` */ /* values instead */#define ft_encoding_noneFT_ENCODING_NONE#define ft_encoding_unicodeFT_ENCODING_UNICODE#define ft_encoding_symbolFT_ENCODING_MS_SYMBOL#define ft_encoding_latin_1FT_ENCODING_ADOBE_LATIN_1#define ft_encoding_latin_2FT_ENCODING_OLD_LATIN_2#define ft_encoding_sjisFT_ENCODING_SJIS#define ft_encoding_gb2312FT_ENCODING_PRC#define ft_encoding_big5FT_ENCODING_BIG5#define ft_encoding_wansungFT_ENCODING_WANSUNG#define ft_encoding_johabFT_ENCODING_JOHAB#define ft_encoding_adobe_standardFT_ENCODING_ADOBE_STANDARD#define ft_encoding_adobe_expertFT_ENCODING_ADOBE_EXPERT#define ft_encoding_adobe_customFT_ENCODING_ADOBE_CUSTOM#define ft_encoding_apple_romanFT_ENCODING_APPLE_ROMANAn enumeration to specify character sets supported by charmaps. Used in theFT_Select_Charmap API function.
note
Despite the name, this enumeration lists specific character repertoires (i.e., charsets), and not text encoding methods (e.g., UTF-8, UTF-16, etc.).
Other encodings might be defined in the future.
values
| FT_ENCODING_NONE | The encoding value 0 is reserved for all formats except BDF, PCF, and Windows FNT; see below for more information. |
| FT_ENCODING_UNICODE | The Unicode character set. This value covers all versions of the Unicode repertoire, including ASCII and Latin-1. Most fonts include a Unicode charmap, but not all of them. For example, if you want to access Unicode value U+1F028 (and the font contains it), use value 0x1F028 as the input value for |
| FT_ENCODING_MS_SYMBOL | Microsoft Symbol encoding, used to encode mathematical symbols and wingdings. For more information, see ‘https://learn.microsoft.com/typography/opentype/spec/recom#non-standard-symbol-fonts’, ‘http://www.kostis.net/charsets/symbol.htm’, and ‘http://www.kostis.net/charsets/wingding.htm’. This encoding uses character codes from the PUA (Private Unicode Area) in the range U+F020-U+F0FF. |
| FT_ENCODING_SJIS | Shift JIS encoding for Japanese. More info at ‘https://en.wikipedia.org/wiki/Shift_JIS’. See note on multi-byte encodings below. |
| FT_ENCODING_PRC | Corresponds to encoding systems mainly for Simplified Chinese as used in People's Republic of China (PRC). The encoding layout is based on GB 2312 and its supersets GBK and GB 18030. |
| FT_ENCODING_BIG5 | Corresponds to an encoding system for Traditional Chinese as used in Taiwan and Hong Kong. |
| FT_ENCODING_WANSUNG | Corresponds to the Korean encoding system known as Extended Wansung (MS Windows code page 949). For more information see ‘https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit949.txt’. |
| FT_ENCODING_JOHAB | The Korean standard character set (KS C 5601-1992), which corresponds to MS Windows code page 1361. This character set includes all possible Hangul character combinations. |
| FT_ENCODING_ADOBE_LATIN_1 | Corresponds to a Latin-1 encoding as defined in a Type 1 PostScript font. It is limited to 256 character codes. |
| FT_ENCODING_ADOBE_STANDARD | Adobe Standard encoding, as found in Type 1, CFF, and OpenType/CFF fonts. It is limited to 256 character codes. |
| FT_ENCODING_ADOBE_EXPERT | Adobe Expert encoding, as found in Type 1, CFF, and OpenType/CFF fonts. It is limited to 256 character codes. |
| FT_ENCODING_ADOBE_CUSTOM | Corresponds to a custom encoding, as found in Type 1, CFF, and OpenType/CFF fonts. It is limited to 256 character codes. |
| FT_ENCODING_APPLE_ROMAN | Apple roman encoding. Many TrueType and OpenType fonts contain a charmap for this 8-bit encoding, since older versions of Mac OS are able to use it. |
| FT_ENCODING_OLD_LATIN_2 | This value is deprecated and was neither used nor reported by FreeType. Don't use or test for it. |
| FT_ENCODING_MS_SJIS | Same as FT_ENCODING_SJIS. Deprecated. |
| FT_ENCODING_MS_GB2312 | Same as FT_ENCODING_PRC. Deprecated. |
| FT_ENCODING_MS_BIG5 | Same as FT_ENCODING_BIG5. Deprecated. |
| FT_ENCODING_MS_WANSUNG | Same as FT_ENCODING_WANSUNG. Deprecated. |
| FT_ENCODING_MS_JOHAB | Same as FT_ENCODING_JOHAB. Deprecated. |
note
When loading a font, FreeType makes a Unicode charmap active if possible (either if the font provides such a charmap, or if FreeType can synthesize one from PostScript glyph name dictionaries; in either case, the charmap is tagged withFT_ENCODING_UNICODE). If such a charmap is synthesized, it is placed at the first position of the charmap array.
All other encodings are considered legacy and tagged only if explicitly defined in the font file. Otherwise,FT_ENCODING_NONE is used.
FT_ENCODING_NONE is set by the BDF and PCF drivers if the charmap is neither Unicode nor ISO-8859-1 (otherwise it is set toFT_ENCODING_UNICODE). UseFT_Get_BDF_Charset_ID to find out which encoding is really present. If, for example, thecs_registry field is ‘KOI8’ and thecs_encoding field is ‘R’, the font is encoded in KOI8-R.
FT_ENCODING_NONE is always set (with a single exception) by the winfonts driver. UseFT_Get_WinFNT_Header and examine thecharset field of theFT_WinFNT_HeaderRec structure to find out which encoding is really present. For example,FT_WinFNT_ID_CP1251 (204) means Windows code page 1251 (for Russian).
FT_ENCODING_NONE is set ifplatform_id isTT_PLATFORM_MACINTOSH andencoding_id is notTT_MAC_ID_ROMAN (otherwise it is set toFT_ENCODING_APPLE_ROMAN).
Ifplatform_id isTT_PLATFORM_MACINTOSH, use the functionFT_Get_CMap_Language_ID to query the Mac language ID that may be needed to be able to distinguish Apple encoding variants. See
https://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/Readme.txt
to get an idea how to do that. Basically, if the language ID is 0, don't use it, otherwise subtract 1 from the language ID. Then examineencoding_id. If, for example,encoding_id isTT_MAC_ID_ROMAN and the language ID (minus 1) isTT_MAC_LANGID_GREEK, it is the Greek encoding, not Roman.TT_MAC_ID_ARABIC withTT_MAC_LANGID_FARSI means the Farsi variant of the Arabic encoding.
FT_ENC_TAG¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
#ifndefFT_ENC_TAG#defineFT_ENC_TAG( value, a, b, c, d ) \ value = ( ( FT_STATIC_BYTE_CAST(FT_UInt32, a ) << 24 ) | \ ( FT_STATIC_BYTE_CAST(FT_UInt32, b ) << 16 ) | \ ( FT_STATIC_BYTE_CAST(FT_UInt32, c ) << 8 ) | \ FT_STATIC_BYTE_CAST(FT_UInt32, d ) )#endif /*FT_ENC_TAG */This macro converts four-letter tags into an unsigned long. It is used to define ‘encoding’ identifiers (seeFT_Encoding).
note
Since many 16-bit compilers don't like 32-bit enumerations, you should redefine this macro in case of problems to something like this:
#define FT_ENC_TAG( value, a, b, c, d ) valueto get a simple enumeration without assigning special numbers.
FT_Select_Charmap¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
FT_EXPORT(FT_Error )FT_Select_Charmap(FT_Face face,FT_Encoding encoding );Select a given charmap by its encoding tag (as listed infreetype.h).
inout
| face | A handle to the source face object. |
input
| encoding | A handle to the selected encoding. |
return
FreeType error code. 0 means success.
note
This function returns an error if no charmap in the face corresponds to the encoding queried here.
Because many fonts contain more than a single cmap for Unicode encoding, this function has some special code to select the one that covers Unicode best (‘best’ in the sense that a UCS-4 cmap is preferred to a UCS-2 cmap). It is thus preferable toFT_Set_Charmap in this case.
FT_Set_Charmap¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
FT_EXPORT(FT_Error )FT_Set_Charmap(FT_Face face,FT_CharMap charmap );Select a given charmap for character code to glyph index mapping.
inout
| face | A handle to the source face object. |
input
| charmap | A handle to the selected charmap. |
return
FreeType error code. 0 means success.
note
This function returns an error if the charmap is not part of the face (i.e., if it is not listed in theface->charmaps table).
It also fails if an OpenType type 14 charmap is selected (which doesn't map character codes to glyph indices at all).
FT_Get_Charmap_Index¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
FT_EXPORT(FT_Int )FT_Get_Charmap_Index(FT_CharMap charmap );Retrieve index of a given charmap.
input
| charmap | A handle to a charmap. |
return
The index into the array of character maps within the face to whichcharmap belongs. If an error occurs, -1 is returned.
FT_Get_Char_Index¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
Return the glyph index of a given character code. This function uses the currently selected charmap to do the mapping.
input
| face | A handle to the source face object. |
| charcode | The character code. |
return
The glyph index. 0 means ‘undefined character code’.
note
If you use FreeType to manipulate the contents of font files directly, be aware that the glyph index returned by this function doesn't always correspond to the internal indices used within the file. This is done to ensure that value 0 always corresponds to the ‘missing glyph’. If the first glyph is not named ‘.notdef’, then for Type 1 and Type 42 fonts, ‘.notdef’ will be moved into the glyph ID 0 position, and whatever was there will be moved to the position ‘.notdef’ had. For Type 1 fonts, if there is no ‘.notdef’ glyph at all, then one will be created at index 0 and whatever was there will be moved to the last index – Type 42 fonts are considered invalid under this condition.
FT_Get_First_Char¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
Return the first character code in the current charmap of a given face, together with its corresponding glyph index.
input
| face | A handle to the source face object. |
output
| agindex | Glyph index of first character code. 0 if charmap is empty. |
return
The charmap's first character code.
note
You should use this function together withFT_Get_Next_Char to parse all character codes available in a given charmap. The code should look like this:
FT_ULong charcode; FT_UInt gindex; charcode = FT_Get_First_Char( face, &gindex ); while ( gindex != 0 ) { ... do something with (charcode,gindex) pair ... charcode = FT_Get_Next_Char( face, charcode, &gindex ); }Be aware that character codes can have values up to 0xFFFFFFFF; this might happen for non-Unicode or malformed cmaps. However, even with regular Unicode encoding, so-called ‘last resort fonts’ (using SFNT cmap format 13, see functionFT_Get_CMap_Format) normally have entries for all Unicode characters up to 0x1FFFFF, which can causea lot of iterations.
Note that*agindex is set to 0 if the charmap is empty. The result itself can be 0 in two cases: if the charmap is empty or if the value 0 is the first valid character code.
FT_Get_Next_Char¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
Return the next character code in the current charmap of a given face following the valuechar_code, as well as the corresponding glyph index.
input
| face | A handle to the source face object. |
| char_code | The starting character code. |
output
| agindex | Glyph index of next character code. 0 if charmap is empty. |
return
The charmap's next character code.
note
You should use this function withFT_Get_First_Char to walk over all character codes available in a given charmap. See the note for that function for a simple code example.
Note that*agindex is set to 0 when there are no more codes in the charmap.
FT_Load_Char¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
Load a glyph into the glyph slot of a face object, accessed by its character code.
inout
| face | A handle to a target face object where the glyph is loaded. |
input
| char_code | The glyph's character code, according to the current charmap used in the face. |
| load_flags | A flag indicating what to load for this glyph. The |
return
FreeType error code. 0 means success.
note
This function simply callsFT_Get_Char_Index andFT_Load_Glyph.
Many fonts contain glyphs that can't be loaded by this function since its glyph indices are not listed in any of the font's charmaps.
If no active cmap is set up (i.e.,face->charmap is zero), the call toFT_Get_Char_Index is omitted, and the function behaves identically toFT_Load_Glyph.