Documentation Home
MySQL 9.3 Reference Manual
Related Documentation Download this Manual
PDF (US Ltr) - 40.8Mb
PDF (A4) - 40.9Mb
Man Pages (TGZ) - 261.1Kb
Man Pages (Zip) - 368.3Kb
Info (Gzip) - 4.1Mb
Info (Zip) - 4.1Mb


12.13 Adding a Character Set

This section discusses the procedure for adding a character set to MySQL. The proper procedure depends on whether the character set is simple or complex:

  • If the character set does not need special string collating routines for sorting and does not need multibyte character support, it is simple.

  • If the character set needs either of those features, it is complex.

For example,greek andswe7 are simple character sets, whereasbig5 andczech are complex character sets.

To use the following instructions, you must have a MySQL source distribution. In the instructions,MYSET represents the name of the character set that you want to add.

  1. Add a<charset> element forMYSET to thesql/share/charsets/Index.xml file. Use the existing contents in the file as a guide to adding new contents. A partial listing for thelatin1<charset> element follows:

    <charset name="latin1">  <family>Western</family>  <description>cp1252 West European</description>  ...  <collation name="latin1_swedish_ci" order="Finnish, Swedish">    <flag>primary</flag>    <flag>compiled</flag>  </collation>  <collation name="latin1_danish_ci" order="Danish"/>  ...  <collation name="latin1_bin" order="Binary">    <flag>binary</flag>    <flag>compiled</flag>  </collation>  ...</charset>

    The<charset> element must list all the collations for the character set. These must include at least a binary collation and a default (primary) collation. The default collation is often named using a suffix ofgeneral_ci (general, case-insensitive). It is possible for the binary collation to be the default collation, but usually they are different. The default collation should have aprimary flag. The binary collation should have abinary flag.

    You must assign a unique ID number to each collation. The range of IDs from 1024 to 2047 is reserved for user-defined collations. To find the maximum of the currently used collation IDs, use this query:

    SELECT MAX(ID) FROM INFORMATION_SCHEMA.COLLATIONS;
  2. This step depends on whether you are adding a simple or complex character set. A simple character set requires only a configuration file, whereas a complex character set requires C source file that defines collation functions, multibyte functions, or both.

    For a simple character set, create a configuration file,MYSET.xml, that describes the character set properties. Create this file in thesql/share/charsets directory. You can use a copy oflatin1.xml as the basis for this file. The syntax for the file is very simple:

    • Comments are written as ordinary XML comments (<!--text -->).

    • Words within<map> array elements are separated by arbitrary amounts of whitespace.

    • Each word within<map> array elements must be a number in hexadecimal format.

    • The<map> array element for the<ctype> element has 257 words. The other<map> array elements after that have 256 words. SeeSection 12.13.1, “Character Definition Arrays”.

    • For each collation listed in the<charset> element for the character set inIndex.xml,MYSET.xml must contain a<collation> element that defines the character ordering.

    For a complex character set, create a C source file that describes the character set properties and defines the support routines necessary to properly perform operations on the character set:

  3. Modify the configuration information. Use the existing configuration information as a guide to adding information forMYSYS. The example here assumes that the character set has default and binary collations, but more lines are needed ifMYSET has additional collations.

    1. Editmysys/charset-def.c, andregister the collations for the new character set.

      Add these lines to thedeclaration section:

      #ifdef HAVE_CHARSET_MYSETextern CHARSET_INFO my_charset_MYSET_general_ci;extern CHARSET_INFO my_charset_MYSET_bin;#endif

      Add these lines to theregistration section:

      #ifdef HAVE_CHARSET_MYSET  add_compiled_collation(&my_charset_MYSET_general_ci);  add_compiled_collation(&my_charset_MYSET_bin);#endif
    2. If the character set usesctype-MYSET.c, editstrings/CMakeLists.txt and addctype-MYSET.c to the definition of theSTRINGS_SOURCES variable.

    3. Editcmake/character_sets.cmake:

      1. AddMYSET to the value of withCHARSETS_AVAILABLE in alphabetic order.

      2. AddMYSET to the value ofCHARSETS_COMPLEX in alphabetic order. This is needed even for simple character sets, so thatCMake can recognize-DDEFAULT_CHARSET=MYSET.

  4. Reconfigure, recompile, and test.