Unicode® Technical Standard #51

Unicode Emoji

Version	15.1
Editors	Mark Davis (Google Inc.), Ned Holbrook (Apple Inc.)
Date	2023-09-05
This Version	https://www.unicode.org/reports/tr51/tr51-25.html
Previous Version	https://www.unicode.org/reports/tr51/tr51-23.html
Latest Version	https://www.unicode.org/reports/tr51/
Latest Proposed Update	https://www.unicode.org/reports/tr51/proposed.html
Revision	25

Summary

This document defines the structure of Unicode emoji characters andsequences, and provides data to support that structure, such as whichcharacters are considered to be emoji, which emoji should be displayed bydefault with a text style versus an emoji style, and which can bedisplayed with a variety of skin tones.It also provides design guidelines for improving the interoperability ofemoji characters across platforms and implementations.

Starting with Version 11.0 of this specification, the repertoire ofemoji characters is synchronized with the Unicode Standard, and has the sameversion numbering system. For details, see Section 1.5.2,Versioning.

Status

This document has been reviewed by Unicode members and otherinterested parties, and has been approved for publication by theUnicode Consortium. This is a stable document and may be used asreference material or cited as a normative reference by otherspecifications.

A Unicode Technical Standard (UTS) is an independent specification. Conformance to the Unicode Standard does not imply conformance to any UTS.

Please submit corrigenda and other comments with the online reporting form [Feedback]. Related information that is useful in understanding this document is found in theReferences. For the latest version of the Unicode Standard, see [Unicode]. For a list of current Unicode Technical Reports, see [Reports]. For more information about versions of the Unicode Standard, see [Versions].

1Introduction
- Table:Emoji Proposals
- Table:Major Sources
- 1.1Emoticons and Emoji
- 1.2Encoding Considerations
- 1.3Goals
- 1.4Definitions
  - 1.4.1Emoji Characters
  - 1.4.2Emoji Presentation
  - 1.4.3Emoji and Text Presentation Sequences
  - 1.4.4Emoji Modifiers
  - 1.4.5Emoji Sequences
  - 1.4.6Emoji Sets
  - 1.4.7Notation
  - 1.4.8Property Stability
  - 1.4.9EBNF and Regex
- 1.5Conformance
  - Table:Emoji Capabilities
  - 1.5.1Collation Conformance
  - 1.5.2Versioning
    - Table:Emoji Versions
2Design Guidelines
- 2.1Names
- 2.2Display
- 2.3Gender
  - Table:EmojiWith Explicit Gender Appearance
  - Table:EmojiChanged to Gender-Neutral in Emoji 13.0+
  - 2.3.1Gender-Neutral Emoji
  - 2.3.2Marking Gender in Emoji Input
- 2.4Diversity
  - Table:Emoji Modifiers
  - 2.4.1Implementations
    - Table:Sample Emoji Modifier Bases
    - Table:Expected Emoji Modifiers Display
  - 2.4.2Emoji Modifiers in Text
    - Table:Minipalettes
- 2.5Emoji ZWJ Sequences
  - Table:ZWJ Sequence Display
- 2.6Multi-Person Groupings
  - Table:Multi-Person Groupings
  - 2.6.1Multi-Person Gender
    - Table:Gender with Multi-Person Groupings
  - 2.6.2Multi-Person Skin Tones
    - Table:SkinTones for Multi-Person Groupings Using Sequences
    - Table:SkinTones for Multi-Person Groupings Using Single Characters
- 2.7Emoji Implementation Notes
  - 2.7.1Emoji and Text Presentation Selectors
  - 2.7.2Handling Tag Characters
- 2.8Hair Components
- 2.9Color
  - Table:EmojiGlyph Color Examples
- 2.10Emoji Glyph Facing Direction
  - Table:EmojiGlyph Direction Examples
- 2.11Order of Emoji ZWJ Sequences
3Which Characters are Emoji
4Presentation Style
5Ordering and Grouping
6Input
- Table:Palette Input
7Searching
8Longer Term Solutions
Annex A:Emoji Properties and Data Files
- Table:Emoji Character Properties
- A.1Data Files
  - Table:Data Files
Annex B:Valid Emoji Flag Sequences
- B.1Presentation
- B.2Ordering
Annex C:Valid Emoji Tag Sequences
- C.1Flag Emoji Tag Sequences
Acknowledgments
Rights to Emoji Images
References
Modifications

1Introduction

Emoji are pictographs (pictorial symbols) that are typicallypresented in a colorful cartoon form and used inline in text. Theyrepresent things such as faces, weather, vehicles and buildings, foodand drink, animals and plants, or icons that represent emotions,feelings, or activities.

Emoji on smartphones and in chat and email applications have becomeextremely popular worldwide. As of March 2015, for example, Instagramreported that “nearly half of text [on Instagram] contained emoji.”Individual emoji also vary greatly in popularity (and even by country),as described in the SwiftKey Emoji Report. Seeemoji press page for detailsabout these reports and others.

Emoji are most often used in quick, short social media messages, wherethey connect with the reader and add flavor, color,and emotion. Emoji do not have the grammar or vocabulary tosubstitute for written language. In social media, emoji make up forthe lack of gestures, facial expressions, and intonation that arefound in speech. They also add useful ambiguity to messages, allowingthe writer to convey many different possible concepts at the sametime. Many people are also attracted by the challenge of composingmessages in emoji, and puzzling out emoji messages.

The wordemoji comes from Japanese:

絵 (e ≅ picture)文字 (moji ≅ writtencharacter).

Emoji may be represented internally as graphics or they may berepresented by normal glyphs encoded in fonts like other characters.These latter are calledemoji characters for clarity. SomeUnicode characters are normally displayed as emoji; some are normallydisplayed as ordinary text, and some can be displayed both ways.

There’s been considerable mediaattention to emoji since they appeared in the Unicode Standard, withincreased attention starting in late 2013. For example, there weresome 6,000 articles on the emoji appearing in Unicode 7.0, accordingto Google News. See theemojipress page for many samples of such articles, and also theKeynote from the 38thInternationalization & Unicode Conference.

Emoji became available in 1999 on Japanese mobile phones. Therewas an early proposal in 2000 to encode DoCoMo emoji in the Unicodestandard. At that time, it was unclear whether these characters wouldcome into widespread use—and there was not support from the Japanesemobile phone carriers to add them to Unicode—so no action was taken.

The emoji turned out to be quite popular in Japan, but eachmobile phone carrier developed different (but partially overlapping)sets, and each mobile phone vendor used their own text encodingextensions, which were incompatible with one another. The vendorsdeveloped cross-mapping tables to allow limited interchange of emojicharacters with phones from other vendors, including email.Characters from other platforms that could not be displayed wererepresented with 〓 (U+3013 GETA MARK), but it was all too easy forthe characters to get corrupted or dropped.

When non-Japanese email and mobile phone vendors started to supportemail exchange with the Japanese carriers, they ran into thoseproblems. Moreover, there was no way to represent these characters inUnicode, which was the basis for text in all modern programs. In2006, Google started work on converting Japanese emoji to Unicodeprivate-use codes, leading to the development of internal mappingtables for supporting the carrier emoji via Unicode characters in2007 external link .

There are, however, many problems with a private-use approach,and thus a proposal was made to the Unicode Consortium to expand thescope of symbols to encompass emoji. This proposal was approved inMay 2007, leading to the formation of a symbols subcommittee, and inAugust 2007 the technical committee agreed to support the encoding ofemoji in Unicode based on a set of principles developed by thesubcommittee. The following are a few of the documents tracking theprogression of Unicode emoji characters.

Emoji Proposals

Date	Doc No.	Title	Authors
2000-04-26	L2/00-152	NTT DoCoMo Pictographs	Graham Asher (Symbian)
2006-11-01	L2/06-369	Symbols (scope extension)	Mark Davis (Google)
2007-08-03	L2/07-257	Working Draft Proposal for Encoding Emoji Symbols	Kat Momoi, Mark Davis, Markus Scherer (Google)
2007-08-09	L2/07-274R	Symbols draft resolution	Mark Davis (Google)
2007-09-18	L2/07-391	Japanese TV Symbols (ARIB)	Michel Suignard (Microsoft)
2009-01-30	L2/09-026	Emoji Symbols Proposed for New Encoding	Markus Scherer, Mark Davis, Kat Momoi, DarickTong (Google); Yasuo Kida, Peter Edberg (Apple)
2009-03-05	L2/09-025R2	Proposal for Encoding Emoji Symbols
2010-04-27	L2/10-132	Emoji Symbols: Background Data
2011-02-15	L2/11-052R	Wingdings and Webdings Symbols	Michel Suignard

To find the documents in this table, seeUTC Documents.

In 2009, the first Unicode characters explicitly intended asemoji were added to Unicode 5.2 for interoperability with the ARIB(Association of Radio Industries and Businesses) set. A set of 722characters was defined as the union of emoji characters used byJapanese mobile phone carriers: 114 of these characters were alreadyin Unicode 5.2. In 2010, the remaining 608 emoji characters wereadded to Unicode 6.0, along with some other emoji characters. In2012, a few more emoji were added to Unicode 6.1, and in 2014 alarger number were added to Unicode 7.0. Additional characters have been added since then, based on theSelection Factors found inGuidelines for Submitting Unicode EmojiProposals.

Here is a summary of when some of the major sources ofpictographs used as emoji were encoded in Unicode. Each sourcemay include other characters in addition to emoji, andUnicode characters can correspond to multiple sources. The L columncontains single-letter abbreviations of the various sourcesfor use in charts [emoji-charts] and datafiles [emoji-data]. Characters that do notcorrespond to any of these sources can be marked with Other (x).

Major Sources

Source	Abbr	L	Dev. Starts	Released	Unicode Version	Sample Character
Source	Abbr	L	Dev. Starts	Released	Unicode Version	B&W	Color	Code	CLDR Short Name
Zapf Dingbats	ZDings	z	1989	1991-10	1.0			U+270F	pencil
ARIB	ARIB	a	2007	2008-10-01	5.2			U+2614	umbrella with rain drops
Japanese carriers	JCarrier	j	2007	2010-10-11	6.0			U+1F60E	smiling face with sunglasses
Wingdings & Webdings	WDings	w	2010	2014-06-16	7.0			U+1F336	hot pepper

For a detailed view of when various source sets of emoji were addedto Unicode, seeEmoji Version Sources [emoji-charts]. The data file [JSources] shows the correspondence to the original Japanese carrier symbols.

People often ask how many emoji are in the Unicode Standard. Thisquestion does not have a simple answer, because there is no clearline separating which pictographic characters should be displayedwith a typical emoji style. For a complete picture, seeWhich Characters are Emoji.

The colored images used in this document and associated charts [emoji-charts] are for illustration only. They do not appear in the UnicodeStandard, which has only black and white images. They are either madeavailable by the respective vendors for use in this document, or arebelieved to be available for non-commercial reuse. Inquiries forpermission to use vendor images should be directed to those vendors,not to the Unicode Consortium. For more information, seeRights to Emoji Images.

1.1Emoticons and Emoji

The termemoticon refers to a series of text characters(typically punctuation or symbols) that is meant to represent afacial expression or gesture (sometimes when viewed sideways), suchas the following.

;-)

Emoticonspredate Unicode andemoji external link ,but were later adapted to include Unicode characters. Thefollowing examples use not only ASCII characters, but also U+203F ( ‿), U+FE35 ( ︵ ), U+25C9 ( ◉ ), and U+0CA0 ( ಠ ).

^‿^

◉︵◉

ಠ_ಠ

Often implementations allow emoticons to be used to input emoji. Forexample, the emoticon ;-) can be mapped to in achat window. The termemoticon is sometimes used in abroader sense, to also include the emoji for facial expressions andgestures. That broad sense is used in the Unicode block nameEmoticons,covering the code points from U+1F600 to U+1F64F.

1.2Encoding Considerations

Unicode is the foundation for text in all modern software: it’s howall mobile phones, desktops, and other computers represent the textof every language. People are using Unicode every time they type akey on their phone or desktop computer, and every time they look at aweb page or text in an application. It is very important that thestandard be stable, and that every character that goes into it bescrutinized carefully. This requires a formal process with a longdevelopment cycle. For example, thedarksunglasses character was first proposed years before it was releasedin Unicode 7.0.

Characters considered for encoding must normally be in widespread useas elements of text. The emoji and various symbols were added toUnicode because of their use as characters for text-messaging in anumber of Japanese manufacturers’ corporate standards, and otherplaces, or in long-standing use in widely distributed fonts such asWingdings and Webdings. In many cases, the characters were added forcomplete round-tripping to and from a source set,notbecause they were inherently of more importance than othercharacters. For example, theclamshellphone character was included because it was in Wingdings andWebdings, not because it is more important than, say, a “skunk”character.

In some cases, a character was added to complete a set: for example,arugby footballcharacter was added to Unicode 6.0 to complement theamericanfootball character (thesoccer ball hadbeen added back in Unicode 5.2). Similarly, a mechanism was addedthat could be used to represent all country flags (thosecorresponding to a two-letterunicode_region_subtag),such as theflag for Canada,even though the Japanese carrier set only had 10 country flags.

The data does not include non-pictographs, except for those inUnicode that are used to represent characters from emoji sources, forcompatibility, such as:

Game pieces, such as the dominos (🀰 🀱 🀲 ... 🂑 🂒), are currentlynot included as emoji, with the exceptions of U+1F0CF () PLAYING CARD BLACK JOKER and U+1F004 ( )MAHJONG TILE RED DRAGON. These are included because they correspondeach to an emoji character from one of the carrier sets.

The selection factors used to weighthe encoding of prospective candidates are found inSelectionFactors inGuidelines for SubmittingUnicode Emoji Proposals. That document also providesinstructions for submitting proposals for new emoji.

For a list of frequently asked questions on emoji, see theUnicodeEmoji FAQ.

1.3Goals

This document provides:

design guidelines for improving interoperability acrossplatforms and implementations
background information about emoji characters, and long-termalternatives
data indicating:
- which characters normally can be considered to be emoji
- which emoji characters should be displayedby default in text style versus emoji style
- which emoji characters may be displayed using avariety of skin tones, with implementation details
pointers to [CLDR] data for
- sorting emoji characters more naturally
- annotations for searching and grouping emoji characters

It also provides background information about emoji, anddiscusses longer-term approaches to emoji.

As new Unicode characters are added or the “common practice”for emoji usage changes, the data and recommendations supplied bythis document may change in accordance. Thus the recommendations anddata will change across versions of this document.

1.4Definitions

The following provide more formal definitions of some of the termsused in this document. Readers who are more interested in otherfeatures of the document may choose to continue fromSection2,Design Guidelines.

ED-1.emoji — A colorful pictograph that can be used inline in text.Internally the representation is either (a) an image, (b) anencoded character, or (c) a sequence of encoded characters.
For (a) the termemoji image is used in this document. The termsticker may also be used.
For (b) the termemoji character is used where necessary for clarity.
For (c) the termemoji sequence is used for clarity.
ED-2. emoticon — (1) A series of textcharacters (typically punctuation or symbols) that is meant torepresent a facial expression or gesture such as ;-) and(2) in a broader sense, also includes emoji for facial expressionsand gestures.

1.4.1EmojiCharacters

ED-3.emoji character — A character that has theEmojiproperty.
emoji_character := \p{Emoji}
These characters are recommended for use as emoji.
ED-4.extended pictographic character — a character that has theExtended_Pictographic property.
These characters are pictographic, or otherwise similar in kind to characters with the Emoji property.
TheExtended_Pictographic property is used to customize segmentation (as described in [UAX29] and [UAX14]) so that possible future emoji ZWJ sequences will not break grapheme clusters, words, or lines. Unassigned codepoints with Line_Break=ID in some blocks are also assigned theExtended_Pictographic property. Those blocks are intended for future allocation of emoji characters.
ED-5. emoji component — A character that has theEmoji_Componentproperty.
These characters are used in emoji sequences but normally do not appear on emoji keyboards as separate choices, such as keycap base characters or Regional_Indicator characters.
Someemoji components areemoji characters, and others (such as tag characters andZWJ) are not.

For more information, seeSection 3,Which Characters are Emoji.For information on data files which define emoji properties, seeAnnex A: Emoji Properties and Data Files.

1.4.2EmojiPresentation

ED-6. default emojipresentation character— A character that, by default, shouldappear with an emoji presentation, rather than a text presentation.
default_emoji_presentation_character := \p{Emoji_Presentation}
These characters have theEmoji_Presentationproperty. SeeAnnex A: Emoji Propertiesand Data Files.
ED-7. default text presentationcharacter — A character that, by default, should appear with atext presentation, rather than an emoji presentation.
default_text_presentation_character := \P{Emoji_Presentation}
These characters do not have theEmoji_Presentationproperty; that is, theirEmoji_Presentationproperty value isNo. SeeAnnexA: Emoji Properties and Data Files.

For more details about emoji and text presentation, seeSection 2,Design Guidelines andSection4,Presentation Style.

1.4.3Emoji and TextPresentation Sequences

ED-8. text presentationselector — The character U+FE0E VARIATION SELECTOR-15 (VS15),used to request a text presentation for an emoji character.(Also known astext variation selectorin prior versions of this specification.)
text_presentation_selector := \x{FE0E}
ED-8a. textpresentation sequence — A variation sequenceconsisting of anemoji characterfollowed by atext presentation selector.
text_presentation_sequence := emoji_character text_presentation_selector
The only validtext presentation sequencesare those listed inemoji-variation-sequences.txt [emoji-data].
ED-9. emoji presentation selector — The character U+FE0F VARIATION SELECTOR-16 (VS16),used to request an emoji presentation for an emoji character.(Also known asemoji variation selectorin prior versions of this specification.)
emoji_presentation_selector := \x{FE0F}
ED-9a. emojipresentation sequence — A variation sequence consisting of anemoji character followed by aemoji presentation selector.
emoji_presentation_sequence := emoji_character emoji_presentation_selector
The only validemoji presentation sequences are those listed inemoji-variation-sequences.txt [emoji-data].
ED-10.(This definition has been removed.)

1.4.4EmojiModifiers

ED-11. emoji modifier —A character that can be used to modify the appearance of a preceding emojiin anemoji modifiersequence.
emoji_modifier := \p{Emoji_Modifier}
These characters have theEmoji_Modifierproperty. SeeAnnex A: Emoji Propertiesand Data Files.
ED-12. emoji modifier base —A character whose appearance can be modified by a subsequent emojimodifier in anemojimodifier sequence.
emoji_modifier_base := \p{Emoji_Modifier_Base}
These characters have theEmoji_Modifier_Baseproperty. SeeAnnex A: Emoji Propertiesand Data Files.
They are also listed inCharactersSubject to Emoji Modifiers.
ED-13. emoji modifiersequence — A sequence of the following form:
emoji_modifier_sequence := emoji_modifier_base emoji_modifier

For more details about emoji modifiers, seeSection 2.4,Diversity.

1.4.5EmojiSequences

ED-14. emoji flag sequence — Asequence of two Regional Indicator characters, where thecorresponding ASCII characters are valid region sequences asspecified by Unicoderegion subtags in [CLDR],with idStatus = “regular”, “deprecated”, or “macroregion”. See alsoAnnex B: Valid Emoji Flag Sequences.
emoji_flag_sequence := regional_indicator regional_indicator
regional_indicator := \p{Regional_Indicator}
A singleton Regional Indicator character is not a well-formedemoji flag sequence.
ED-14a. emoji tag sequence (ETS) —A sequence of the following form:
emoji_tag_sequence := tag_base tag_spec tag_end tag_base := emoji_character | emoji_modifier_sequence | emoji_presentation_sequence tag_spec := [\x{E0020}-\x{E007E}]+ tag_end := \x{E007F}
Thetag_spec consists of all characters from U+E0020 TAG SPACE to U+E007E TAG TILDE. Eachtag_spec defines a particular visual variant to be applied to thetag_base character(s). Thoughtag_spec includes the values U+E0041 TAG LATIN CAPITAL LETTER A .. U+E005A TAG LATIN CAPITAL LETTER Z, they are not used currently and are reserved for future extensions.
Thetag_end consists of the character U+E007F CANCEL TAG, and must be used to terminate the sequence.
A sequence of tag characters that is not part of anemoji_tag_sequence is not a well-formedemoji tag sequence.
The meaning and validity criteria for anemoji tag sequence and expected visual variants for atag_spec are determined byAnnex C: Valid Emoji Tag Sequences.
ED-14b.(This definition has been removed.)
ED-14c. emoji keycap sequence — A sequence of the following form:
emoji_keycap_sequence := [0-9#*] \x{FE0F 20E3}
These sequences are in theemoji-sequences.txt file listed under the type_fieldEmoji_Keycap_Sequence
ED-15. emoji core sequence —A sequence of the following form:
emoji_core_sequence := emoji_character | emoji_presentation_sequence | emoji_keycap_sequence | emoji_modifier_sequence | emoji_flag_sequence
ED-15a. emoji ZWJ element —An element that can be used in an emojiZWJ sequence, as follows:
emoji_zwj_element := emoji_core_sequence | emoji_tag_sequence
ED-16. emoji ZWJ sequence —An emoji sequence with at least one joiner character.
emoji_zwj_sequence := emoji_zwj_element ( ZWJ emoji_zwj_element )+
ZWJ := \x{200d}
ED-17. emoji sequence —A core sequence, tag sequence, or ZWJ sequence, as follows:
emoji_sequence := emoji_core_sequence | emoji_zwj_sequence | emoji_tag_sequence
Note that all emoji sequences are single grapheme clusters: there is never a grapheme cluster boundary within an emoji sequence. This affects editing operations, such as cursor movement or deletion, as well as word break, line break, and so on. For more information, see [UAX29].
ED-17a. qualifiedemoji character — An emoji character in a string that(a) has default emoji presentation or(b) is the first character in an emoji modifier sequence or(c) is not a default emoji presentation character, but is thefirst character in an emoji presentation sequence.
ED-18. fully-qualifiedemoji — A qualified emoji character, or an emoji sequence in which each emoji character is qualified.
ED-18a. minimally-qualifiedemoji — An emoji sequence in which the first character is qualified but the sequence is not fully qualified.
ED-19. unqualifiedemoji — An emoji that is neither fully-qualified nor minimally qualified.

For recommendations on the use of variation selectors inemoji sequences, seeSection 2.7,EmojiImplementation Notes.

1.4.6Emoji Sets

The following sets are defined based on the data files and properties described in Annex A: Emoji Properties and Data Files. The composition of these sets may change from one release to the next.

Each of these sets can be conceived of as a binary property; they are properties of strings. SeeUTS #18: Unicode Regular Expressions [UTS18] andUTR #23: The Unicode Character Property Model [UTR23] for more discussion.

ED-20. basic emoji set — The set of emoji characters and emoji presentation sequences listed in the emoji-sequences.txt file [emoji-data] under the type_fieldBasic_Emoji.

This is the set of emoji intended for general-purpose input.
This set excludes all those instances of anemoji component that are not intended for independent, direct input. Implementations should support independent display ofemoji components in this set even if they are not made available for direct input.
- Skin tone modifiers and hair components should be displayed even in isolation, but they should not (typically) be on the keyboard palette. These are included in Basic_Emoji.
- Other components (U+20E3 COMBINING ENCLOSING KEYCAP, Regional Indicators, tag characters, ZWJ, and VS16) should never have an emoji presentation in isolation, but do occur as part of emoji sequences. These are not included in Basic_Emoji.
This set otherwise includes all instances of an emoji character with the property value Emoji_Presentation = Yes and all instances of a valid emoji presentation sequence whose base character has the property value Emoji_Presentation = No.

ED-21. emoji keycap sequence set — The specific set of emoji sequences listed in theemoji-sequences.txt file [emoji-data] under the type_fieldEmoji_Keycap_Sequence.

This is the set of all validemoji keycap sequences.

Note: The following definitions use the acronym “RGI”to mean “recommended for general interchange”, referring to that subset of some larger set that is intended to be widely supported across multiple platforms.

ED-22. RGI emoji modifier sequence set — The specific set of emoji sequences listed in theemoji-sequences.txt file [emoji-data] under the type_fieldRGI_Emoji_Modifier_Sequence.

This is thesubset of all validemoji modifier sequences recommended for general interchange.

ED-23. RGI emoji flag sequence set — The specific set of emoji sequences listed in theemoji-sequences.txt file [emoji-data] under the type_fieldRGI_Emoji_Flag_Sequence.

This is thesubset of all validemoji flag sequences recommended for general interchange. SeeAnnex B: Valid Emoji Flag Sequences

ED-24. RGI emoji tag sequence set — The specific set of emoji sequences listed in theemoji-sequences.txt file [emoji-data] under the type_fieldRGI_Emoji_Tag_Sequence.

This is thesubset of all valid emoji tag sequences recommended for general interchange. SeeAnnex C: Valid Emoji Tag Sequences.

ED-25. RGI emoji ZWJ sequence set — The specific set of emoji sequences listed in theemoji-zwj-sequences.txt file [emoji-data] under the type_fieldRGI_Emoji_ZWJ_Sequence.

This is thesubset of all validemoji ZWJ sequences recommended for general interchange.

ED-26.(This definition has been removed.)

ED-27. RGI emoji set — Theset of all emoji (characters and sequences) covered byED-20,ED-21,ED-22,ED-23,ED-24, andED-25.

This is the subset of all valid emoji (characters and sequences) recommended for general interchange.
This corresponds to theRGI_Emoji property.

1.4.7Notation

Character names in all capitals are the formal Unicode Name property values, such as U+1F473 MAN WITH TURBAN. The formal names are immutable internal identifiers, but often do not reflect the current practice for interpretation of the character.

Lowercase character names for existing characters or sequences are CLDR short names, such as U+1F473person wearing turban.

1.4.8Property Stability

The emoji properties are stable for each version of the data—they will not change for that version. They may, however, change between that version and a subsequent version. For example, isEmoji(♟)=false for Emoji Version 5.0, but true for Version 11.0.

Some emoji properties are not closed over certain string operations. For example:

isEmoji(toLowercase(X)) ≠ isEmoji(X) for the case of X=Ⓜ️, because:
isEmoji(Ⓜ️) = true
toLowercase(Ⓜ️) = ⓜ
isEmoji(ⓜ) = false

Casing operations may produce invalid variation sequences. While the following strings form a case pair, theemoji presentation selector is not defined for ⓜ, and thus has no effect on its rendering:

Ⓜ️ = <U+24C2 CIRCLED LATIN CAPITAL LETTER M, U+FE0FVS16>	valid variation sequence
ⓜ = <U+24DC CIRCLED LATIN SMALL LETTER M, U+FE0F VS16>	invalid variation sequence

1.4.9EBNF and Regex

The following EBNF can be used to quickly scan forpossible emoji. Those possible emoji can then be verified where necessary by performing validity tests according to the definitions, or checking against theRGI emoji set. It is much simpler than the expressions currently in the definitions. It includes a superset of emoji as a by-product of that simplicity, but the extras can be weeded out by validity tests.

EBNF	Notes
possible_emoji :=zwj_element (\x{200D} zwj_element)*	\x{200D} = zero-width joiner
flag_sequence := \p{RI} \p{RI}	\p{RI} = Regional_Indicator
zwj_element := \p{Emoji} emoji_modification?\| flag_sequence
emoji_modification := \p{EMod}\| \x{FE0F} \x{20E3}?\| tag_modifier	\p{EMod} = Emoji_Modifier \x{FE0F} = emoji VS \x{20E3} = enclosing keycap
tag_modifier := [\x{E0020}-\x{E007E}]+ \x{E007F}	\x{E00xx} are tags \x{E007F} = TERM tag

From these EBNF rules a regex can be generated, as below. While this regex may seem complex, it is far simpler than what would result from the definitions. Direct use of the definitions would result in regex expressions which are many times more complicated, and yet still require verification with validity tests.

Regex
\p{RI} \p{RI} \| \p{Emoji} ( \p{EMod} \| \x{FE0F} \x{20E3}? \| [\x{E0020}-\x{E007E}]+ \x{E007F} )? (\x{200D} ( \p{RI} \p{RI} \| \p{Emoji} ( \p{EMod} \| \x{FE0F} \x{20E3}? \| [\x{E0020}-\x{E007E}]+ \x{E007F} )? ) )*

Regex

\p{RI} \p{RI} | \p{Emoji}   ( \p{EMod}   | \x{FE0F} \x{20E3}?   | [\x{E0020}-\x{E007E}]+ \x{E007F}  )?  (\x{200D}    ( \p{RI} \p{RI}    | \p{Emoji}      ( \p{EMod}       | \x{FE0F} \x{20E3}?       | [\x{E0020}-\x{E007E}]+ \x{E007F}      )?    )  )*

1.5Conformance

Conformance to this specification is specified by the following clauses.

C1. Animplementation claiming conformance to this specification shall identify theversion of this specification to which conformance is claimed.

Each version of this specification has a minimum version of the Unicode Standard, which contains all the characters withEmoji=Yes. For example, an implementation that claims conformance to Emoji 5.0 must also have support for the Unicode 9.0 repertoire.

C2. An implementationclaiming conformance to this specification shall identify which of thecapabilities specified below are supported for which emoji setsED-20 throughED-25. This must include atleast theC2a display capability forsetED-20 basic emoji set.For example, an implementation can declare that it supports thedisplay,editing andinput capabilities for thebasic emoji set, and thedisplay andediting capabilities for theemojimodifier sequence set, and may make no claim of capabilities for anyother sets.

Emoji Capabilities

C2adisplay	The implementation is capable of displaying each of the charactersand sequences in the specified set as a single glyph with emojipresentation.
C2bediting	The implementation treats each of the characters and sequences in thespecified set as an indivisible unit for editing purposes (cursormovement, deletion, line breaking, and so on).
C2cinput	The implementation provides a mechanism for inputting each of the charactersand sequences in the specified set as a single glyph with emojipresentation.

An implementation may claimpartial conformance to C2, specifying the set of characters that it does not support. For example, an implementation could claim conformance to C2 for all emoji sets and capabilities except for the set [⏏ {🇺🇳}], that is:

U+23CFeject button
U+1F1FA U+1F1F3United Nations

C3. An implementation claiming conformance to this specification must not support an invalid emoji_flag_sequence or invalid or ill-formed emoji_tag_sequence fordisplay orinput, except for a fallbackdisplay depiction indicating the presence of an invalid sequence, such as.

A singleton emoji Regional Indicator may be displayed as a capital A..Z character with a special display

An implementationmay support any of the following fordisplay,editing, orinput:

a single code point outside of thebasic emoji set
an emoji sequence that would be in one of the emoji setsED-20 throughED-25 except that it is missing one or moreemoji presentation selectors
an emoji ZWJ sequence that is not inED-25

1.5.1Collation Conformance

Implementations can claim conformance for emoji collation or short names by conforming to a particular version of CLDR.

1.5.2Versioning

Starting with Version 11.0 of this specification, the repertoire of emoji characters issynchronized with the Unicode Standard, and has the same version numbering system.

As of version 13.0, data file comments use the labeling convention “Ex.x”. This label corresponds to the Emoji version when the emoji character or emoji sequence was first defined in associated data files. For example, the label “E5.0” is associated withUnicode Emoji, Version 5.0. There are three special values used primarily for emoji characters before the official release of Emoji 1.0 in 2015:

Label	Intended Coverage
E0.0	This label is used for special characters, including: Most emoji component characters, regardless of when they were first encoded. Other non-emoji characters in the data files.
E0.6	Emoji characters added to Unicode 6.0. This includes the emoji characters deriving from Japanese carrier sets, as well as some characters from the ARIB Japanese television standard.
E0.7	Emoji characters added to Unicode 7.0. This consists largely of emoji deriving from the Windows Wingding and Webdings sets, but also includes more characters from the ARIB Japanese television standard.

The following table shows the corresponding Emoji version and Unicode Standard version, up through Version 15.1, including the labels used in data file comments.

Emoji Versions

Emoji Version	Date	Unicode Version	Data File Comment
N/A	various	various	E0.0
N/A	2010-10-11	Unicode 6.0	E0.6
N/A	2014-06-16	Unicode 7.0	E0.7
Emoji 1.0	2015-06-09	Unicode 8.0	E1.0
Emoji 2.0	2015-11-12	Unicode 8.0	E2.0
Emoji 3.0	2016-06-03	Unicode 9.0	E3.0
Emoji 4.0	2016-11-22	Unicode 9.0	E4.0
Emoji 5.0	2017-06-20	Unicode 10.0	E5.0
Emoji 11.0	2018-05-21	Unicode 11.0	E11.0
Emoji 12.0	2019-03-05	Unicode 12.0	E12.0
Emoji 12.1	2019-10-21	Unicode 12.1	E12.1
Emoji 13.0	2020-03-10	Unicode 13.0	E13.0
Emoji 13.1	2020-09-15	Unicode 13.0	E13.1
Emoji 14.0	2021-09-14	Unicode 14.0	E14.0
Emoji 15.0	2022-09-13	Unicode 15.0	E15.0
Emoji 15.1	2023-09-12	Unicode 15.1	E15.1

2DesignGuidelines

Unicode characters can have many different presentations astext. An “a” for example, can look quite differentdepending on the font. Emoji characters can have two main kinds ofpresentation:

anemoji presentation,with colorful and perhapswhimsical shapes, even animated
atext presentation,such as black & white

More precisely, a text presentation is a simple foreground shapewhose color is determined by other information, such as settingacolor on the text, while an emojipresentation determines the color(s) of the character, and istypically multicolored. In other words, when someone changes the textcolor in a word processor, a character with an emoji presentationwill not change color.

Any Unicode character can be presented with a textpresentation, as in the Unicode charts. For the emoji presentation,both the name and the representative glyph in the Unicode chartshould be taken into account when designing the appearance of theemoji, along with the images used by other vendors. The shape of thecharacter can vary significantly. For example, here are just a few ofthe possible images for U+1F36D LOLLIPOP, U+1F36E CUSTARD, U+1F36FHONEY POT, and U+1F370 SHORTCAKE:

emoji examples

While the shape of the character can vary significantly, designersshould maintain the same “core” shape, based on the shapes usedmostly commonly in industry practice. For example, a U+1F36F HONEYPOT encodes for a pictorial representation of a pot of honey, not forsome semantic like “sweet”. It would be unexpected torepresent U+1F36F HONEY POT as a sugar cube, for example. Deviatingtoo far from that core shape can cause interoperability problems: seeaccidentally-sending-friends-a-hairy-heart-emoji external link .Direction (whether a person or object faces to the right or left, upor down) should also be maintained where possible, because a changein direction can change the meaning: when sending “crocodile shot bypolice”, people expect any recipient to see the pistol pointing inthe same direction as when they composed it. Similarly, the U+1F6B6pedestrianshould face to the left, not to the right.SeeSection 2.10,Emoji Glyph Facing Direction.

General-purpose emoji for people and body parts should also not begiven overly specific images: the general recommendation is to be asneutral as possible regarding race, ethnicity, and gender. Thus forthe character U+1F777 CONSTRUCTION WORKER, therecommendation is to use a neutral graphic like (with an orange skin tone) instead of an overly specific image like (with a light skin tone). This includes theemoji modifier base characterslisted inSample Emoji Modifier Bases.The emoji modifiers allow for variations in skin tone tobe expressed.

Unicode 9.0 adds several characters intendedto complete gender pairs, and there are ongoing efforts to providemore gender choices in the future.For more information, seeSection 2.3,Gender.

Combining enclosing marks may be applied to emoji, just like they can beapplied to other characters. When that is done, the combinationshould take on an emoji presentation. For example, a is represented as the sequence “1” plus an emoji presentationselector plus U+20E3 COMBINING ENCLOSING KEYCAP.

The U+20E3 COMBINING ENCLOSING KEYCAP is the only such symbol that is currently in RGI emoji sequences.

Flag emoji characters are discussed in Annex B:Valid Emoji Flag Sequences.

2.1Names

Every emoji has a CLDR short name, which may change over time.Every emoji character also has a formal Unicode name, like every other Unicodecharacter; this is a permanent identifier which cannot be changed.

The formal Unicode name of a Unicode character does notdetermine its appearance. Formal names of symbols such as BLACK MEDIUMSQUARE or WHITE MEDIUM SQUARE are not meant to indicate that the correspondingcharacter must be presented in black or white, respectively; rather, the useof “black” and “white” in the names is generally just to contrastfilled versusoutline shapes, or a darkercolor fill versus a lighter color fill. Similarly, in other symbols such asthe hands U+261A BLACK LEFT POINTING INDEX and U+261C WHITE LEFT POINTINGINDEX, the words “white” and “black” also refer to outlined versus filled,and do not indicate skin color.

However, other color words in the name, such as YELLOW,typically provide a recommendation as to the emoji presentation,which should be followed to avoid interoperability problems.

In many cases the consensus for the best depiction has evolvedin the time since the original formal name was standardized, and the preferreddepiction is now better reflected by the CLDR short name. For example,U+1F483 DANCER should be designed in accordance with the CLDR short namewoman dancing (an additional character was added forman dancing).In addition, only emoji characters have formal Unicode names; the emojisequences just have CLDR short names.

The formal Unicode name of each character must be unique, and sometimes distinguishing words are included in the name to maintain that uniqueness when two contrasting characters are added, such as:

🐶 U+1F436DOG FACE
🐕 U+1F415DOG

🐮 U+1F42ECOW FACE
🐄 U+1F404COW

In cases such as these, the images must also contrast. However, in some cases additional terms like FACE were added to the name when they were not needed for uniqueness. There is no requirement that an image contrast be maintained where there are not contrasting emoji. Consider the following emoji:

🦌 U+1F98CDEER

🦓 U+1F993ZEBRA FACE

Because there are no other contrasting DEER or ZEBRA emoji,each of these two could be depicted with a face only, face and shoulders,full body, or other choices.

2.2Display

Emoji characters may not always be displayed on a whitebackground. They are often best given a faint, narrow contrastingborder to keep the character visually distinct from a similarlycolored background. Thus a Japanese flag would have a border so thatit would be visible on a white background, and a Swiss flag have aborder so that it is visible on a red background.

Current practice is for emoji to have a square aspect ratio,deriving from their origin in Japanese. For interoperability, it isrecommended that this practice be continued with current and futureemoji. They will typically have about the same vertical placement and advancewidth as CJK ideographs. For example:

emoji_advance_width

They should use transparency for proper display for selection and with coloredbackgrounds:

emoji_transparency

The set of supported emoji sequences may vary by platform. For example,take the following emoji ZWJ sequence:

black_flag apple_1f3f4 skull_and_crossbones

On a particular platform, it can be shown as a single image:

PirateFlagEmoji

However, if that combination is not supported as a single unit,it may show up as a sequence like the following, and the user sees no indicationthat it was meant to be composed into a single image:

black_flag skull_and_crossbones

Implementations could provide an indication of the composed natureof an unsupported emoji sequence where possible. This gives users the additionalinformation that that sequence was intended to have a composed form. It alsoexplains why the sequence will not behave as separate elements: The arrow keywill not move between the flag and the skull & crossbones, and line breakswill not occur between apparently separate emoji.

The following is an example of an approach that implementations can use. There are other approaches that could have a more intuitive appearance, but that could be difficult to implement with current text display mechanisms.

Display the ZWJ as a visible “glue” character, with zero or very narrow width.

2.3Gender

The following human-form emoji are currently considered to have explicitgender appearance based on the name and/or practice. They intentionallycontrast with other characters. This list may change in the future if newexplicit-gender characters are added, or if some of these are changed to begender-neutral. The names below are the CLDR short names,followed by the formal Unicode name in capital letters if it differs.

EmojiWith Explicit Gender Appearance

Female		Male
U+1F467	girl	U+1F466	boy
U+1F469	woman	U+1F468	man
U+1F475	old woman OLDER WOMAN	U+1F474	old man OLDER MAN
U+1F46D	women holding hands TWO WOMEN HOLDING HANDS	U+1F46C	men holding hands TWO MEN HOLDING HANDS
U+1F936	Mrs. Claus MOTHER CHRISTMAS	U+1F385	Santa Claus FATHER CHRISTMAS
U+1F478	princess	U+1F934	prince
U+1F483	woman dancing DANCER	U+1F57A	man dancing
U+1F930	pregnant woman	U+1FAC3	pregnant man
U+1F931	breast-feeding
U+1F9D5	woman with headscarf PERSON WITH HEADSCARF
Explicit Gender Combination
U+1F46B	woman and man holding hands MAN AND WOMAN HOLDING HANDS

The emoji in the tableEmojiChanged to Gender-Neutral in Emoji 13.0+ below have been removed from thetableEmoji With Explicit Gender Appearance,and the CLDR names for most have been changed to useperson (along withsome other changes). Theperson with veil andperson in tuxedoemoji also have RGI man and woman gender variants. The others do not; forperson in suit levitating andperson with skullcap, the visualdistinctions would be unclear at emoji sizes.

EmojiChanged to Gender-Neutral in Emoji 13.0+

Gender-Neutral
E13.0	U+1F470	person with veil BRIDE WITH VEIL
	U+1F935	person in tuxedo MAN IN TUXEDO
	U+1F574	person in suit levitating MAN IN BUSINESS SUIT LEVITATING
	U+1F472	person with skullcap MAN WITH GUA PI MAO
E13.1	U+1F9D4	person: beard BEARDED PERSON

2.3.1Gender-Neutral Emoji

It is often the case that gender is unknown or irrelevant, as in the usage“Is there a doctor on the plane?,” or a gendered appearance may not be desired. Such cases are known as “gender-neutral,” “gender-inclusive,”“unspecified-gender,” or many other terms. Except for the emojishown in the tableEmoji With Explicit Gender Appearance,human-form emoji should normally be depicted in a gender-neutral way unlessgender appearance is explicitly specified using anemoji ZWJ sequence in one of the ways shownin the following table.

Gender Appearance Mechanisms

Type	Description	Examples
Sign Format	A human-form emoji can be given explicit gender using a ZWJ sequence. The sequence contains the base emoji followed by ZWJ and either FEMALE SIGN or MALE SIGN. The human-form emoji alone should be gender-neutral in form.	man runner = RUNNER + ZWJ + MALE SIGN woman runner = RUNNER + ZWJ + FEMALE SIGN runner = RUNNER
Object Format	A profession or role emoji can be formed using a ZWJ sequence. The sequence starts with MAN or WOMAN followed by ZWJ and ending with an object. The ADULT character can be used for a gender-neutral version.	man astronaut = MAN + ZWJ + ROCKET SHIP woman astronaut = WOMAN + ZWJ + ROCKET SHIP astronaut = ADULT + ZWJ + ROCKET SHIP

Although the human-form emoji used in sign format type ZWJ sequences are supposed to have gender-neutral appearance by themselves (when not used in a sign format type ZWJ sequence), many vendors previously depicted these human-form emoji as a man or woman. As a result, they had the same appearance as one of the sign format type ZWJ sequences. For example, most vendors depicteddetective asman detective andperson getting haircut aswoman getting haircut, but some vendors depictedpolice officer asman police officer while others depicted it aswoman police officer.

Gender-neutral versions of the profession or role emoji using object format type ZWJ sequences are promulgated by adding them to theRGI emoji ZWJ sequence set.

2.3.2MarkingGender in Emoji Input

Emoji input systems such as keyboards or palettes typicallyprovide for input of some emoji whose appearance is explicitly gendered—forexample, emoji that appear specifically as a woman or man. When such emojiare not included in the tableEmoji With Explicit GenderAppearance, the input system should generate a sequence forthem that explicitly indicates the gendered appearance, rather thanrelying on a particular system’s default appearance. This principle is shownwith the following example:

Assume on some system that the default appearance ofdetective is asman detective. On that system, when enteringman detective, an input system should still use the explicit sequence

U+1F575 U+FE0F U+200D U+2642 U+FE0F (man detective)

rather than just

U+1F575 U+FE0F (detective)

2.4Diversity

People all over the world want to have emoji that reflect morehuman diversity, especially for skin tone. The Unicode emojicharacters for people and body parts are intended to be generic and shown with a generic(nonhuman) appearance, such as a yellow/orange color similar to that used forsmiley faces.

Five symbol modifier characters that provide for a range of skintones for human emoji were released in Unicode Version 8.0(mid-2015). These characters are based on the six tones of theFitzpatrick scale, a recognized standard for dermatology (there aremany examples of this scale online, such asFitzpatrickSkinType.pdf external link ).The exact shades may vary between implementations.

EmojiModifiers

Code	CLDR Short Name	Unicode Character Name
U+1F3FB	light skin tone	EMOJI MODIFIER FITZPATRICK TYPE-1-2
U+1F3FC	medium-light skin tone	EMOJI MODIFIER FITZPATRICK TYPE-3
U+1F3FD	medium skin tone	EMOJI MODIFIER FITZPATRICK TYPE-4
U+1F3FE	medium-dark skin tone	EMOJI MODIFIER FITZPATRICK TYPE-5
U+1F3FF	dark skin tone	EMOJI MODIFIER FITZPATRICK TYPE-6

These characters have been designed so that even where diversecolor images for human emoji are not available, readers can see theintended meaning.

When used alone, the default representation of these modifier charactersis a color swatch. Whenever one of these charactersimmediatelyfollows certain characters (such as WOMAN), then a font should showthe sequence as a single glyph corresponding to the image for theperson(s) or body part with the specified skin tone, such as thefollowing:

However, even if the font doesn’t show the combined character,the user can still see that a skin tone was intended:

This may fall back to a black and white stippled or hatchedimage such as when colorful emoji are not supported.

When a human emoji is notimmediately followed by an emojimodifier character, it should use a generic,non-realistic skintone, such asRGB #FFCC22 (one of thecolors typically used for the smiley faces).

No particular hair color is required, however, dark hair is generallyregarded as more neutral because black or dark brown hairis widespread among people of every skin tone. This does not apply to emoji thatalready have an explicit hair color such asPERSON WITH BLOND HAIR (originally added for compatibilitywith Japanese mobile phone emoji), which needs to have blond hairregardless of skin tone.

To have an effect on an emoji, an emoji modifier must immediately follow thatbase emoji character.Emoji presentation selectors are neither needed nor recommended for emojicharacters when they are followed by emoji modifiers, and should not be usedin newly generated emoji modifier sequences; the emoji modifier automaticallyimplies the emoji presentation style. SeeED-13. emoji modifier sequence.However, some older data may includedefective emoji modifiersequences in which an emoji presentation selector does occur between thebase emoji character and the emoji modifier; this is the only exceptionto the rule that an emoji modifier must immediately follow the characterthat it modifies. In this case the emoji presentation selector should beignored. For handling text presentation selectors in sequences, seeSection 4,Presentation Style.

<U+270C VICTORY HAND, FE0F, TYPE-3>

Any other intervening character causes the emojimodifier to appear as a free-standing character. Thus

2.4.1Implementations

Implementations can present the emoji modifiers as separatecharacters in an input palette, or present the combined charactersusing mechanisms such as long press.

The emoji modifiers are not intended for combination with arbitrary emoji characters. Instead, they are restricted to the emoji modifier base characters: no other characters are to be combined with emoji modifiers. This set may change over time, with successive versions of this document. To find the exact list of emoji modifier bases for each version, use the Emoji_Modifier_Base character property, as described inAnnex A: Emoji Properties and Data Files.

Sample Emoji Modifier Bases

👦 👧 👨 👩 👴 👵 👶 👱 👮 👲 👳 👷 👸 💂 🕵 🎅 👼 💆 💇 👰 🙍 🙎 🙅 🙆 💁 🙋 🙇 🙌 🙏 🚶 🏃 💃 💪 👈 👉 ☝ 👆 🖕 👇 ✌ 🖖 🤘 🖐 ✊ ✋ 👊 👌 👍 👎 👋 👏 👐 ✍ 💅 👂 👃 🚣 🛀 🏄 🏊 ⛹ 🏋 🚴 🚵

The following chart shows the expected display with emoji modifiers, depending on the preceding character and the level of support for the emoji modifier. The “Unsupported” rows show how the character would typically appear on a system that does not have a font with that character in it: with a missing glyph indicator. In some circumstances, display of an emoji modifier following an Emoji_Modifier_Base character should be suppressed:

If an emoji modifier base has no skin visible on a particular system, then any following emoji modifier should be suppressed.

In other circumstances, display of an emoji modifier following an Emoji_Modifier_Base character may be suppressed:

If a particular emoji modifier base uses a non-realistic skin tone that differs from the default skin tone used for other Emoji_Modifier_Base characters, then any following emoji modifier may be suppressed. For example, supposevampire is shown with gray skin in a particular implementation while other Emoji_Modifier_Base characters are shown with neon yellow skin in the absence of emoji modifiers; any emoji modifier followingvampire may be suppressed.

ExpectedEmoji Modifiers Display

Support Level	Emoji Modifier Base	Sequence	Display
Fully supported	Yes
	Yes
	Yes, but no skin visible
	Yes, but unusual default skin tone
	No
Fallback	Yes
Fallback	No
Unsupported	Yes
Unsupported	No

As noted above at the end ofSection 2.4,Diversity, emoji presentation selectorsare neither needed nor recommended for use in emoji modifier sequences. SeeED-13. emoji modifier sequence.However, older data may includedefective emoji modifier sequences whichdo include emoji presentation selectors.

2.4.2Emoji Modifiers in Text

For input, the composition ofan emoji sequence does not need to be apparent to the user: it appears onthe screen as a single image. On a phone, for example, a long presson a human figure can bring up a minipalette of different skin tones,without the user having to separately find the human figure and thenthe modifier. The following shows some possible appearances:

Minipalettes

	or

Of course, there are many other types of diversity in humanappearance besides different skin tones: Different hair styles andcolor, use of eyeglasses, various kinds of facial hair, differentbody shapes, different headwear, and so on. It is beyond the scope ofUnicode to provide an encoding-based mechanism for representing everyaspect of human appearance diversity that emoji users might want toindicate. The best approach for communicating very specific humanimages—or any type of image in which preservation of specificappearance is very important—is the use of embedded graphics, asdescribed inLonger Term Solutions.

2.5EmojiZWJ Sequences

The U+200D ZERO WIDTH JOINER (ZWJ) can be used between theelements of a sequence of characters to indicate that a single glyphshould be presented if available. An implementation uses thismechanism to handle such an emoji ZWJ sequence as a single glyph,with a palette or keyboard that generates the appropriate sequencesfor the glyphs shown. To the user of such a system, these behave likesingle emoji characters, even though internally they are sequences.

When an emoji ZWJ sequence is sent to a system that does nothave a corresponding single glyph, the ZWJ characters areignored and a fallback sequence of separate emoji is displayed.Thus an emoji ZWJ sequence should only be defined and supported byimplementations where the fallback sequence would also make sense toa recipient.

For example, the following are possible displays:

ZWJ Sequence Display

Sequence	Display	Combined glyph?
		Yes
		No

See also theEmoji ZWJ Sequences [emoji-charts].

The use of ZWJ sequences may be difficult in someimplementations, so caution should be taken before adding new sequences.

For recommendations on the use of variation selectors in ZWJsequences, seeSection 2.7,EmojiImplementation Notes below.

2.6Multi-PersonGroupings

There are several emoji that depict more than one person interacting.If these are to be implemented with a choice or genders or skin tones,special handling may be required on a case-by-case basis. These emoji arelisted below:

Multi-Person Groupings

Hex	Char	CLDR Name
U+1F91D		handshake
U+1F46F		people with bunny ears
U+1F93C		people wrestling
U+1F46B		woman and man holding hands
U+1F46C		men holding hands
U+1F46D		women holding hands
U+1F48F		kiss
U+1F491		couple with heart
U+1F46A		family

There are some other emoji that would share the same gender and skin tone,such as folded hands. As far as gender and skin tone are concerned, these behave justlike a single person and so need no special treatment. Other examples include:

For U+1F486person getting massage, the hands of the person providing the massage should be depicted with no skin tone showing, perhaps in gloves.
For the following emoji and their skin-tonevariants, the infant should be depicted with no skin tone showing, perhaps covered in a blanket, so that the emoji is treated as a single person for purposes of skin tone modification:
- U+1F931breast-feeding
- U+1F469 U+200D U+1F37Cwoman feeding baby
- U+1F468 U+200D U+1F37Cman feeding baby
- U+1F9D1 U+200D U+1F37Cperson feeding baby

2.6.1Multi-Person Gender

The emoji for multi-person groupings have unspecified gender (unlessmodified) with the exception of the three characters for people holding hands.The handshake itself does not provide for gender differences.

Gender is applied to KISS, COUPLE WITH HEART, and FAMILY by using ZWJsequences with MAN, WOMAN, ADULT, BOY, GIRL, and CHILD.The data files list the RGI versions of these, such as the following:

U+1F469 U+200D U+2764 U+FE0F U+200D U+1F48B U+200D U+1F468

kiss: woman, man

Gender is applied to people with bunny ears and people wrestling by using ZWJ sequences, as follows.

Gender with Multi-Person Groupings

Description	Internal Representation
people with bunny ears
men with bunny ears
women with bunny ears
people wrestling
men wrestling
women wrestling

2.6.2Multi-Person Skin Tones

As with gender, skin tones can be applied to multi-person groupingsin a similar manner. Emoji represented internally by sequences may have skintone modifiers (Emoji_Modifier characters) added after eachof the characters that take them (those withEmoji_Modifier_Base). This is illustrated by the tableSkin Tones for Multi-PersonGroupings Using Sequences below.

Multi-person sequences that mix people characters without skin tones andpeople characters with skin tones should not be generated. That is, foran input system, if one person character in a multi-person emoji sequencehas a skin tone modifier, then all people characters in that sequence shouldhave skin tone modifiers.

In Emoji 12.0, the Emoji_Modifier_Base property, emoji modifier sequences andRGIZWJ sequences were updated to add 25 skin tonecombinations for woman and man holding hands, and 15 combinations each for womenholding hands, men holding hands, and people holding hands. These sequencesappear as 70 different images.

In Emoji 12.1, the RGI ZWJ sequences for women holding hands,men holding hands, and people holding hands were further updated to add 10more sequences each, so their sequences correspond to those for woman and manholding hands. The new sequences are for people of different skin tones, butwith the darker skin tone later in the sequence instead of earlier. For example:

Emoji 12.0 sequence: 1F4681F3FD 200D 1F91D 200D1F4681F3FB ; men holding hands:medium skin tone,light skin tone
Emoji 12.1 addition: 1F4681F3FB 200D 1F91D 200D1F4681F3FD ; men holding hands:light skin tone,medium skin tone

The only difference between the above sequences is that the inferredpositions of the medium-skin-tone man and the light-skin-tone man are swapped, leftand right.

Implementations can use the same image for both sequences. For themulti-person emoji, implementations are not required to have different images forpeople of the same gender depending solely on position. The choice of whether todo so may depend on design considerations specific to particular vendor images.

Other multi-person groups with different skin tone combinations can be representedas valid sequences, but are not yet RGI; adding mixed skin tones to familieswould add 4,225 emoji sequences, for example.

SkinTones for Multi-Person Groupings Using Sequences

Description	Internal Representation
women holding hands: medium, dark skin tones
people holding hands: medium, dark skin tones
family: woman, woman, girl, girl: medium, dark. light, medium skin tones

Skin tone modifiers can be applied to each of the nine characters listedin the tableMulti-Person Groupings;examples for some of these characters are illustrated in the following table.This gives all of the people in the group the same skin tone, which is similarto how the gender marker works.

However, in Emoji 15.1, such emoji modifiersequences only have RGI status for six of the nine characters:kiss,couple with heart,woman and manholding hands,men holding hands,women holding hands, andhandshake.

SkinTones for Multi-Person Groupings Using Single Characters

Description	Internal Representation
handshake: medium skin tone
people with bunny ears: medium skin tone
women with bunny ears: medium skin tone
woman and man holding hands: medium skin tone
family: medium skin tone

2.7Emoji Implementation Notes

This section describes important implementation features of emoji,including the use of emoji and text presentation selectors, how to do segmentation,and handling of tag characters.

2.7.1Emoji and Text Presentation Selectors

This section describes where the emoji presentation selectors can be used. Thetext presentation selector only occurs in text presentation sequences, whichare not displayed as emoji.

Characters	Variation / Behavior
emoji character	may have an emoji or text presentation selector added if the result is a validemoji presentation sequence ortext presentation sequence
emoji character	should have an emoji presentationselector added if Emoji_Presentation=No wheneveran emoji presentation is desired
emoji flag sequence	does not contain an emoji or text presentation selector
emoji flag sequence	should be displayed with an emoji presentation by default
emoji modifier sequence	does not contain an emoji or text presentation selector
emoji modifier sequence	should be displayed with an emoji presentation by default, whether or not the modifier base has Emoji_Presentation=Yes Implementationsmay choose to support old data that containsdefective emoji_modifier_sequences, that is, having emoji presentationselectors.
emoji ZWJ sequence	may have an emoji presentation selector The recommended behavior is: User Input: onlyfully-qualified emoji ZWJ sequencesshould be generated by keyboards and other user input devices. Processing and Display: fully-qualified emoji ZWJ sequences should be handled appropriately in processing,such as display, editing, segmentation, and so on. minimally-qualified orunqualifiedemoji ZWJ sequences may be handled in the same way as theirfully-qualified forms; the choice is up to the implementation. A text presentation selector applied to any element of an emoji ZWJ sequence breaks that sequence, preventing it from displaying as a single image. Thepartial sequences shouldbe displayed as separate images, each with presentation style as specified by anypresentation selectors present, or by default style for those emoji that do nothave any variation selectors.

2.7.2Handling Tag Characters

The properties for tag characters U+E0020..U+E007F (TAG SPACE..CANCEL TAG) have been modified for use in indicating variants or extensions of emoji characters. For detailed information on handling TAG sequences correctly, seeAnnex C: Valid Emoji Tag Sequences.

2.8Hair Components

Emoji Version 11.0 introduced hair components, which can be used in ZWJ sequences to indicate hair colors or styles. The sequences recommended for general interchange (RGI) are listed in the data files. The components include:

Red-haired (ginger)
Curly-haired
White-haired
Bald

There are hundreds of possible distinctions among hair colors and styles, but to limit the number of combinations—and because emoji are presented with a “cartoon” style—there is a small number of hair components. Note that the hair color blond has already been provided for by an explicit blond man/woman/person emoji. Brown/black-haired are already typical defaults for hair color in human-form emoji.

2.9Color

Nine large colored square emoji may be used in ZWJ sequences to indicate that a base emoji should be displayed with that color if possible. The color of the resulting image may not be exactly the same as the color square. The color squares used for this purpose are:

U+2B1BBLACK LARGE SQUARE
U+2B1CWHITE LARGE SQUARE
U+1F7E5LARGE RED SQUARE … U+1F7EBLARGE BROWN SQUARE

Where the implementation does not provide a single emoji image in that color, the user should see thefallback appearance showing an indication of the desired color. Where color ZWJsequences are supported and the base emoji already has that color, the color squareshould be ignored.

EmojiGlyph Color Examples

	Internal Representation
black cat
black cat	U+1F408	U+200D	U+2B1B
orange cat
orange cat	U+1F408	U+200D	U+1F7E7

The squares require a ZWJ; they do not behave like the five skin-tone modifierslisted inEmoji Modifiers.

Thewhite square emoji is often presented as a light gray, to setit off from white backgrounds.

In Emoji Version 15.1 there are four RGI emoji ZWJ sequences of this form.

2.10Emoji Glyph Facing Direction

Emoji with glyphs that face to the right or left may face either direction, according tovendor practice. However, that inconsistency can cause a change in meaning when exchangingtext across platforms. The following ZWJ mechanism can be used to explicitly indicate direction. If the base emoji image is not available facing in that direction, the user should see the fallback appearance showing an indication of the desired direction. If direction ZWJ sequences are supported and the base emoji already faces that direction, the direction emoji should be ignored.

EmojiGlyph Direction Examples

Internal Representation

U+1F3C3	U+200D	U+2B05 U+FE0F

U+1F3C3	U+200D	U+27A1 U+FE0F

A direction RGI sequence can also exist for emoji where there is no inconsistency across vendors: in this case there will be an RGI sequence for only one direction; vendors may choose to handle the non-RGI sequence for the opposite direction (corresponding to the unmodified emoji) to suppress the arrow of the fallback appearance.

In Emoji Version 15.1 there are 108 RGI emoji ZWJ sequences of this form.

2.11Order of Emoji ZWJ Sequences

When representing emoji ZWJ sequences for an individual person, the following order should be used:

Order	Category	Section
1	Base	Section 1.4.1 Emoji Characters
2	Emoji modifier or emoji presentation selector	Section 2.4 Diversity
3	Hair component	Section 2.8 Hair Component
4	Color	Section 2.9,Color
5	Gender sign or object	Section 2.3.1, Gender-Neutral Emoji
6	Direction indicator	Section 2.10,Emoji Glyph Facing Direction

3Which Characters are Emoji

There are different ways to count the emoji in Unicode, especiallybecause an emoji sequence may display as a singleemoji image. The following provides an overview of the ways to countemoji; it can be (for example):

The count of code points that can be used in emoji, though this includes some code points that are only used as part of sequences and don’t have emoji appearance by themselves;
All sequences of one or more characters that can appear as a single glyph (which is probably closer to what users think of as the number of emoji), though typically only a subset of possible sequences are displayed as a single glyph on any platform, and some sequences may be platform-specific extensions.

It is recommended that any font or keyboard whose goal is to support Unicode emojishould support the characters and sequences listed in the [emoji-data]data files. The best definition of the full set is in the emoji-test.txt file.

Emoji Counts [emoji-charts] provides more detail about the various counts as ofthe current version of this specification. The various columnand row headers are described inEmoji Counts Key.

The “Subtotal” row in the chartindicates the count of what users typically think of as emoji. For example,the 26 Regional Indicator (RI) code points are not included there; even thoughthey have Emoji status, they are typically only used in pairs to representflags.
Typicalkeyboards may normally present even fewer emoji, since they may usemechanisms like a long press to display modifiersequences for specific emoji, and would thus notsimultaneously display all of the images associated with the chart rows that countemoji with explicit skin tones.

Separate [emoji-charts] provide more information on many of these subsets and others.

4PresentationStyle

Certain emoji have defined variation sequences, in which an emoji charactercan be followed by an invisibleemoji presentation selector ortext presentation selector.

This capability was added inUnicode6.1. Some systems may also provide this distinction with higher-levelmarkup, rather than variation sequences. For more information on these selectors,seeEmoji Presentation Sequences[emoji-charts]. For detailsregarding the use of emoji or text presentation selectors in emoji sequencesspecifically, seeSection 2.7,EmojiImplementation Notes.

Implementations should support both styles of presentation for thecharacters with emoji and text presentation sequences,if possible. Most of these characters are emoji that were unified withpreexisting characters. Because people are now using emoji presentation fora broader set of characters, Unicode 9.0 added emoji and textpresentation sequences for all emoji with default text presentation(see discussion below). These are the characters shown in the column labeled“Default Text Style; no VS in U8.0” in theTextvs Emoji chart [emoji-charts].

However, even for cases in which the emojiand text presentation selectors are available, it hadnot been clear for implementers whether thedefault presentation forpictographs should be emoji or text. That means that a piece of textmay show up in a different style than intended when shared acrossplatforms. While this is all perfectly legitimate for Unicodecharacters—presentation style is never guaranteed—a sharedsense among developers of when to use emoji presentation by defaultis important, so that there are fewer unexpected or jarring presentations.Implementations need to know what the generally expected default presentation is,to promote interoperability across platforms and applications.

There had been no clear line for implementers between threecategories of Unicode characters:

emoji-default: those expected to have anemoji presentation by default, but can also have a text presentation
text-default: those expected to have a textpresentation by default, but could also have an emoji presentation
text-only: those that should only have atext presentation

These categories can be distinguished using properties listed inAnnex A: Emoji Properties and Data Files. Thefirst category are characters withEmoji=Yes andEmoji_Presentation=Yes.The second category are characters withEmoji=YesandEmoji_Presentation=No. The third category arecharacters withEmoji=No.

The presentation of a given emoji character depends on theenvironment, whether or not there is an emoji or text presentationselector, and the default presentation style (emoji versus text). Ininformal environments like texting and chats, it is more appropriatefor most emoji characters to appear with a colorful emojipresentation, and only get a text presentation with a text presentationselector. Conversely, in formal environments such as word processing,it is generally better for emoji characters to appear with a textpresentation, and only get the colorful emoji presentation with theemoji presentation selector.

Based on those factors, here is typical presentation behavior.However, these guidelines may change with changing user expectations.

Emojiversus Text Display

Example Environment	with Emoji presentation selector	with Text presentation selector	with neither
Example Environment	with Emoji presentation selector	with Text presentation selector	text-default	emoji-default
word processing
plain web pages
texting, chats

Computer languages use the Pattern_Syntax property to identify code points that have been reserved forsyntactic use. Some of the code points with the Pattern_Syntax property have default emoji presentation.When emoji are used as part of computer language syntax, text presentationsequences can be used to unambiguously express that they should be displayedand interpreted as syntactic characters, rather than emoji.SeeSection 7.2, Emoji Profile, in Unicode Standard Annex #31,“Unicode Identifiers and Syntax” [UAX31].

4.1Emoji andText Presentation Selectors

Every emoji character with adefault text presentation allows for an emoji or textpresentation selector. Thus the presentation of these characterscan be controlled on a character-by-character basis. The characters that canhave these selectors applied to themare listed inEmoji Variation Sequences[emoji-charts].

In addition, the next two sections describe two othermechanisms for globally controlling the emoji presentation: usinglanguage tags with locale extensions, or using special script codes.Though these are new mechanisms and not yet widely supported, vendorsare encouraged to support the locale extension for most general usagesuch as in browsers; the special script codes may be appropriate formore specific usage such as OpenType font selection, or in APIs.For more information, see [CLDR].

4.2EmojiLocale Extension

The locale extension “-em” can be used to specify desiredpresentation for characters that may have both text-style and emoji-stylepresentations available. There are three values that can be used, hereillustrated with “sr-Latn”:

Locale Code	Description
sr-Latn-u-em-emoji	use an emoji presentation for emoji characters wherepossible
sr-Latn-u-em-text	use a text presentation for emoji characters wherepossible
sr-Latn-u-em-default	use the default presentation (only needed to reset aninherited -em setting).

This can be used in HTML, for example, with<html lang="sr-Latn-u-em-emoji">.Note that this approach does not have the disadvantages listed belowfor the script-tag approach.

4.3Emoji Script Codes

Two script subtags can be used to control the presentation style.These use script codes defined by ISO 15924 but given morespecific semantics by CLDR, seeunicode_script_subtag:

Zsye—prefer emoji style for characters that have both text andemoji styles available.
Zsym—prefer text style for characters that have both text andemoji styles available.

These script codes are not suitable for use in general language tags:

They cannot be used with language-script combinations; for example,if the language is sr-Latn (Serbian in Latin script), then Zsyecannot be used.
They may confuse processes that depend on language tags, such asspell checkers.

However, they may be useful by themselves in specific contexts such asOpenType font selection, or in APIs that take script codes.

4.4OtherApproaches for Control of Emoji Presentation

Other approaches for control of emoji presentation are also in use.For example, in some CSS implementations, if any font in the lookuplist is an emoji font, then emoji presentation is used whenever possible.

5Ordering and Grouping

Neither the Unicode code point order, nor the defaultcollation provided by the Unicode Collation Algorithm (DUCET), arecurrently well suited for emoji, because theyseparate conceptually-related characters. From the user's perspective, theordering in the following selection of characters sorted by DUCET appearsquite random, as illustrated by the following example:

Emoji Ordering [emoji-charts] shows an ordering for emoji characters that groups themtogether in a more natural fashion. This data has been incorporatedinto [CLDR].

This ordering presents a cleaner and more expectedordering for sorted lists of characters. The groupings include:faces, people, body-parts, emotion, clothing, animals, plants, food,places, transport, and so on. The ordering also groups more naturallyfor the purpose of selection in input palettes. However, for sorting,each character must occur in only one position, which is not arestriction for input palettes. SeeSection 6,Input.

6Input

Emoji are not typically typed on a keyboard. Instead, they aregenerally picked from a palette, or recognized via a dictionary. Themobile keyboards typically have a button to select a paletteof emoji, such as in the left image below. Clicking on thebutton reveals a palette, as in the right image.

Palette Input

The palettes need to be organized in a meaningful way forusers. They typically provide a small number of broad categories,such as People, Nature, and so on. These categories typically have100-200 emoji.

Many characters can be categorized in multiple ways: an orangeis both a plant and a food. Unlike a sort order, an input palette canhave multiple instances of a single character. It can thus extend thesort ordering to add characters in any groupings where people mightreasonably be expected to look for them.

More advanced palettes will have long-press enabled, so thatpeople can press-and-hold on an emoji and have a set of related emojipop up. This allows for faster navigation, with less scrollingthrough the palette.

Annotations for emoji characters are much more finely grainedkeywords. They can be used for searching characters, and are ofteneasier than palettes for entering emoji characters. For example, whensomeone types “hourglass” on their mobile phone, they could see andpick from either of the matching emoji charactersor. That is often much easier than scrolling throughthe palette and visually inspecting the screen. Input mechanisms mayalso mapemoticons to emoji as keyboard shortcuts: typing:-) can result in.

In some input systems, a word or phrase bracketed by colons is usedto explicitly pick emoji characters. Thus typing in “I saw an:ambulance:”is converted to “I saw an”. For completeness,such systems might support all of the full Unicode names, such as:firstquarter moon with face: for. Spaces within the phrasemay be represented by _, as in the following:

“my:alarm_clock: didn’t work”

“my didn’t work”.

However, in general the full Unicode names are not especiallysuitable for that sort of use; they were designed to be uniqueidentifiers, and tend to be overly long or confusing.

For emoji that have gender and/or skin tone variants, inputsystems should fully specify the intended appearance, rather than relyingon a particular system’s default appearance; see for exampleSection 2.3.2,Marking Gender in EmojiInput.

7Searching

Searching includes both searching for emoji characters in queries,and finding emoji characters in the target. These are most usefulwhen they include the annotations as synonyms or hints. For example,when someone searches for onyelp.com external link ,they see matches for “gas station”. Conversely, searching for “gaspump” in a search engine could find pages containing. Similarly, searching for “gas pump” in an emailprogram can bring up all the emails containing.

There is no requirement for uniqueness in both palette categories andannotations: an emoji should show up wherever users would expect it.A gas pump might show up under “object” and“travel”; a heart under “heart” and“emotion”, a under “animal”, “cat”, and “heart”.

Annotations are language-specific: searching onyelp.de external link ,someone would expect a search for to result in matches for “Tankstelle”. Thusannotations need to be in multiple languages to be useful acrosslanguages. They should also include regional annotations within agiven language, like “petrol station”, which people would expectsearch for to result in onyelp.co.uk external link .An English annotation cannot simply be translated into differentlanguages, because different words may have different associations indifferent languages. The emojimay be associated with Mexican or Southwestern restaurants in the US,but not be associated with them in, say, Greece.

As noted inSection 2.1Names,there is one further kind of annotation, called a CLDRshort name. This is also referred to as theTTS name, foruse in text-to-speech processingsuch as providing a short, descriptive emoji name when reading textfor accessibility purposes. In this case the CLDR namesprovide several advantages over formal Unicode character names:

They can be shorter and less cumbersome than the formalname, whose requirement for name uniqueness oftenresults in names that are overly long, such asBLACK RIGHT-POINTING TRIANGLE WITH DOUBLE VERTICAL BAR for.
They can apply to emoji that are represented bysequences as well as those represented by single characters.
They can be updated to better reflect current emojidepictions and usage.

TTS names are also outside the current scope of this document.

8Longer Term Solutions

The longer-term goal for implementations should be to supportembedded graphics, in addition to the emoji characters. Embeddedgraphics allow arbitrary emoji symbols, and are not dependent onadditional Unicode encoding. Some examples of this are found in Skypeand LINE.

However, to be as effective and simple to use as emojicharacters, a full solution requires significant infrastructurechanges to allow simple, reliable input and transport of images(stickers) in texting, chat, mobile phones, email programs, virtualand mobile keyboards, and so on. (Even so, such images will neverinterchange in environments that only support plain text, such asemail addresses.) Until that time, many implementations will need touse Unicode emoji instead.

For example, mobile keyboards need to be enhanced. Enabling embeddedgraphics would involve adding an additional custom mechanism forusers to add in their own graphics or purchase additional sets, suchas a sign to add an image tothe palette above. This would prompt the user to paste or otherwiseselect a graphic, and add annotations for dictionary selection.

With such an enhanced mobile keyboard, the user could thenselect those graphics in the same way as selecting the Unicode emoji.If users started adding many custom graphics, the mobile keyboardmight even be enhanced to allow ordering or organization of thosegraphics so that they can be quickly accessed. The extra graphicswould need to be disabled if the target of the mobile keyboard (suchas an email header line) would only accept text.

Other features required to make embedded graphics work wellinclude the ability of images to scale with font size, inclusion ofembedded images in more transport protocols, switching services andapplications to use protocols that do permit inclusion of embeddedimages (for example, MMS versus SMS for text messages). There will always,however, be places where embedded graphics cannot be used—such asemail headers, SMS messages, or file names. There are also privacyaspects to implementations of embedded graphics: if the graphicitself is not packaged with the text, but instead is just a referenceto an image on a server, then that server could track usage.

Annex A:Emoji Propertiesand Data Files

The following binary character properties are available foremoji characters.

EmojiCharacter Properties

Property	Abbr	Property Values
Emoji	Emoji	=Yes for characters that are emoji
Emoji_Presentation	EPres	=Yes for characters that have emojipresentation by default
Emoji_Modifier	EMod	=Yes for characters that are emojimodifiers
Emoji_Modifier_Base	EBase	=Yes for characters that can serve as abase for emoji modifiers
Emoji_Component	EComp	=Yes for characters used inemoji sequences that normally do not appear on emoji keyboards asseparate choices, such as keycap base characters or Regional_Indicator characters. All characters in emoji sequences are eitherEmoji orEmoji_Component. Implementations must not, however, assume that allEmoji_Component characters are alsoEmoji. There are some non-emoji characters that are used in various emoji sequences, such as tag characters andZWJ.
Extended_Pictographic	ExtPict	=Yes for characters that are used to future-proof segmentation. TheExtended_Pictographic characters contain all theEmoji characters except for someEmoji_Component characters.

IfEmoji=No, thenEmoji_Presentation=No,Emoji_Modifier=No, andEmoji_Modifier_Base=No.

A.1Data Files

The emoji properties are specified in the emoji data files (see [emoji-data]):

DataFiles

emoji-data.txt	Property value for the properties listed in theEmojiCharacter Properties table
emoji-variation-sequences.txt	All permissibleemoji presentation sequences andtext presentation sequences
emoji-zwj-sequences.txt	ZWJ sequences used to represent emoji
emoji-sequences.txt	Other sequences used to represent emoji
emoji-test.txt	Test file for emoji characters and sequences

See [emoji-charts] for a collection ofcharts that have been generated from the emoji data files and the related [CLDR] emoji data (annotations and ordering). They are purely illustrative; the data to use forimplementation is in [emoji-data].

The data file comments and their structure are purely informative, and may change across releases without notice. For version conventions used in the data files, seeSection 1.5.2,Versioning.

Annex B:Valid Emoji Flag Sequences

While the syntax of a well-formedemoji flag sequence is defined inED-14,only valid sequences are displayed as flags by conformant implementations, where:

The valid region sequences are specified by two-letterUnicode region subtags as defined in [CLDR], with idStatus = “regular”, “deprecated”, or “macroregion”. For macroregions, onlyUN andEU are valid.
1. Deprecated regions are included in the list of valid region sequences so that deprecations in the future do not invalidate previously valid emoji flag sequences. RGI emoji flag sequences with deprecated regions are recommended for support. Non-RGI emoji flag sequences with deprecated regions should not be generated.
2. Macroregion region sequences generally do not have official flags, with the exception ofUN andEU.

Some region sequences represent countries (as recognized by the United Nations, for example); others represent territories that are associated with a country. Such territories may have flags of their own, or may use the flag of the country with which they are associated. Depictions of images for flags may be subject to constraints by the administration of that region.

Caveats:

Although a pair of REGIONAL INDICATOR symbols is referred to as anemoji_flag_sequence, it really represents a specific region, not aspecific flag for that region. The actual flag displayed for the pair may bedifferent on different platforms, for example for territories which donot have anofficial flag. The displayed flag may change over time asregions change their flags and platforms update their software.
For some territories (especially those without separate official flags),the displayed flag may be the same as the flag for the countrywith which they are associated. For more about cases where charactershave the same appearance, seeUTR #36: Unicode SecurityConsiderations [UTR36].

For additional information see the sub-section on Regional Indicator SymbolsinSection 22.10, Enclosed and Square of[Unicode].

B.1Presentation

Emoji are generally presented with a square aspect ratio, whichpresents a problem for flags. The flag for Qataris over 150% wider than tall; for Switzerlandit is square; for Nepal it is over 20% tallerthan wide. To avoid a ransom-note effect, implementations may want touse a fixed ratio across all flags, such as 150%, with a blank bandon the top and bottom. (The average width for flags is between 150%and 165%.) Presentation as a waving flag, or clipping to acircle, can help to present a uniform appearance, masking the aspectdifferences.

Flags should have a visible edge. One option is to use a one-pixel grayline chosen to be contrasting with the adjacent field color.

For an open-source set of flag images (png and svg), seeregion-flags external link .

Options for presenting an emoji_flag_sequence for whicha system does not have a specific flag or other glyph include:

Display each REGIONAL INDICATOR symbol separately as a letter in adotted square, as shown in the Unicode charts. This providesinformation about the specific region indicated, but may be mystifyingto some users.
For all unsupported REGIONAL INDICATOR pairs, display the samemissing flag glyph, such as the image shown below.This would indicate that the supported pair was intended to representthe flag of some region, without indicating which one.

B.2Ordering

The code point order of flags is by region code, which will not beintuitive for users, because that rarely matches the order ofcountries in the user’s language. English speakers are surprisedthat the flag for Germany comes before the flag for Djibouti. Analternative is to present the sorted order according to the localizedcountry name, using [CLDR] data.

Annex C:Valid Emoji Tag Sequences

While the syntax of a well-formed emoji tag sequence is defined inED-14a, notall possible tag sequences are valid. The only valid sequences in this versionof Unicode Emoji are defined by sections in this annex, which specify validcombinations of <tag_base> characters and<tag_spec> sequences and their expected presentation. Conformant implementations onlydisplay valid sequences as emoji, and display invalid sequences with a special presentation to show that they are invalid, such as in the examples below.

There is one common constraint on valid emoji tag sequences:the entire emoji_tag_sequence, including tag_base and tag_end, must not be longer than 32 code points. This provides a practical limit needed bymany rendering systems, and is consistent with the 32-code-point buffer limitspecified for the Stream-Safe Text Format as defined inUnicode Standard Annex #15, “Unicode Normalization Forms” [UAX15].

If a platform supports tag sequences, but a particular emoji tag sequence is invalid or cannot be displayed, then to reduce spoofing risk that emoji tag sequence should be displayed using amissing emoji glyph if feasible. The following are examples, where thetag_base character is ablack flag.

Sample images		Condition
		The implementation supports tag sequences, but this particular sequence is either not supported or simply invalid. (If the font technology permits, the missing emoji glyph can be overlaid on the`tag_base` character, thus occupying the same physical dimensions as if the sequence were supported.)
		The implementation does not support tag sequences at all. (The tag characters are normally invisible, and thus only the base character displays.)

In examples in this section, underlined ASCII characters representthe corresponding tag characters, while✦ represents thetag_end.

C.1FlagEmoji Tag Sequences

A valid flag emoji tag sequence must satisfy the following constraints:

Thetag_base andtag_spec are limited to the following:
tag_base U+1F3F4 BLACK FLAG
tag_spec (U+E0030 TAG DIGIT ZERO .. U+E0039 TAG DIGIT NINE,
U+E0061 TAG LATIN SMALL LETTER A .. U+E007A TAG LATIN SMALL LETTER Z)+
tag_end is U+E007F CANCEL TAG, as described inED-14a.
Let SD be the result of mapping each character in thetag_spec to a character in [0-9a-z] by subtracting 0xE0000.
1. SD must then be a specification as per [CLDR] of either a Unicodesubdivision_id or a 3-digitunicode_region_subtag, and
2. SD must have CLDR idStatus = “regular”, “deprecated”, or “macroregion”.

Notes:

Deprecated SD values are included in the list of valid region sequences so that deprecations in the future do not invalidate previously valid emoji tag sequences. RGI emoji tag sequences with deprecated SD values are recommended for support. Non-RGI emoji tag sequences with deprecated SD values should not be generated.
There is no hyphen in thetag_spec, unlike ISO subdivisions like “GB-SCT”.
These flag emoji tag sequences are used to request an image for whatever is currently the flag of the specified subregion. Like the emoji flag sequences, they are not intended to provide a mechanism forversioned representations of any particular flag image.
Specific platforms and programs decide which emoji extended flag sequences they will support. There is no requirement that any be supported, and no expectation that more than a small number be commonly supported by vendors.
Note that SD cannot be a two-letter code like “US” or “us”.

C.1.1Sample Valid Emoji Tag Sequences

A completely tag-unaware implementation will display any sequence of tag characters as invisible, without any effect on adjacent characters. The following sections apply to conformant implementations that support at least one tag sequence.

An implementation may support emoji tag sequences, but not support a particular valid emoji tag sequence.

Images for unsupported valid emoji tag sequences must indicate that the sequence image is missing, by showing the base glyph with either a following “missing emoji glyph” or with an overlay “missing” glyph. The overlay glyph approach is recommended, so that the sequence would have the same width as if supported. A tag-unaware implementation (TU) will show just the base character.

Display of Valid Emoji Tag Sequences

Sequence	Sample Images			Comments	RGI sequence?
Sequence	Supported	Unsupported	TU	Comments	RGI sequence?
gbeng✦				England	Yes
gbsct✦				Scotland	Yes
gbwls✦				Wales	Yes
usca✦				California	No
caon✦				Ontario	No
chzh✦				Canton Zürich	No
frnor✦				Normandy	No

C.1.2Sample Invalid Emoji Tag Sequences

Images for invalid (but well-formed) emoji tag sequences must not be interpreted as if they were regular emoji tag sequences for a different appearance. They must instead indicate that there is something wrong with the sequence. The recommended approach is to also show the base glyph with either a following “missing emoji glyph” or with an overlay “missing” glyph.

Display of Invalid Emoji Tag Sequences

Sequence	Rec. Images		TU	Comments
ushuh✦				Incorrect subregion with “us” region
uksct✦				No “uk” region so incorrect subregion
usca✦				Base invalid for flag tag emoji sequence
olvikan✦				Invalid base and tag_spec — not conformant to show as a “demon” or other non-missing image

C.1.3Sample Ill-formed Emoji Tag Sequences

Images for an ill-formed tag sequence should indicate that there is something wrong with the sequence. The recommended approach is to show the ill-formed tag sequence as a “missing emoji glyph”.

Display of Ill-formed Emoji Tag Sequences

Sequence	Rec. Images	TU	Comments
Ausca✦	A	A	No emoji base
usca✦			No base
usca			No terminator
usca			No base, no terminator

Acknowledgments

Mark Davis and Peter Edberg created the initial versions ofthis document and maintained the text for many versions. Mark Davis and Ned Holbrook now maintain the text.

Thanks toShervin Afshar, Julie Allen,Deborah Anderson,Rachel Been, Nicole Bleuel,Charlotte Buff,Jeremy Burge,Mathias Bynens, Charles Carson,Chenjintao (陈锦涛), Chenshiwei,Michele Coady, Peter Constable,David Corbett, Craig Cummings,Jennifer Daniel, Monica Dinculescu,Behnam Esfahbod, Doug Ewell,Kara Fong,Agustin Fonts, Asmus Freytag,Claudia Galvan, Andrew Glass,Seb Grubb, Bryan Haggerty,Casey Henson,Paul Hunt,Olli Jones,Tayfun Karadeniz, Hiroyuki Komatsu,Mike LaJoie,Jennifer 8. Lee, Norbert Lindenberg,Ken Lunde, Gwyneth Marshall,Rick McGowan, Katsuhiko Momoi,Lisa Moore,Sarah Neufeld,Katsuhiro Ogata,Christoph Päper,Katrina Parrott, Michelle Perham,Addison Phillips, Roozbeh Pournader,Judy Safran-Aasen, Markus Scherer,Alolita Sharma, Jane Solomon,Sean Stewart,Michel Suignard,Richard Tunnicliffe,Yifán Wáng,and Ken Whistlerfor feedback on and contributions to this document and related data andcharts, including earlier versions.

Thanks to Adobe / Paul Hunt, Apple,Emojination, EmojiOne, Emojipedia, EmojiXpress,Michael Everson, Facebook, Google, iDiversicons,Microsoft, Samsung, and Twitter for supplying imagesfor illustration in this document,or earlier versions of this document.

Rightsto Emoji Images

The content for this section, discussing rights and acknowledgments, has beenmoved toEmoji Images andRights.

References

[CLDR]	CLDR - Unicode Common LocaleData Repository https://cldr.unicode.org/ For the latest version of the associatedspecification (LDML), see: https://www.unicode.org/reports/tr35/
[emoji-charts]	The illustrative charts ofemoji. For the 15.1 versions, see: https://www.unicode.org/emoji/charts-15.1/ For the latest versions, see: https://www.unicode.org/emoji/charts/
[emoji-data]	The associated data files for emoji characters. For the 15.1 versions, see: https://www.unicode.org/Public/15.1.0/ucd/emoji/emoji-data.txt https://www.unicode.org/Public/15.1.0/ucd/emoji/emoji-variation-sequences.txt https://www.unicode.org/Public/emoji/15.1/emoji-sequences.txt https://www.unicode.org/Public/emoji/15.1/emoji-zwj-sequences.txt https://www.unicode.org/Public/emoji/15.1/emoji-test.txt For the latest versions, see: https://www.unicode.org/Public/UCD/latest/ucd/emoji/emoji-data.txt https://www.unicode.org/Public/UCD/latest/ucd/emoji/emoji-variation-sequences.txt https://www.unicode.org/Public/emoji/latest/emoji-sequences.txt https://www.unicode.org/Public/emoji/latest/emoji-zwj-sequences.txt https://www.unicode.org/Public/emoji/latest/emoji-test.txt
[JSources]	The UCD sources for theJCarrier symbols For the latest version, see: https://www.unicode.org/Public/UCD/latest/ucd/EmojiSources.txt
[UAX14]	UAX #14: Unicode Line Breaking Algorithm https://www.unicode.org/reports/tr14/
[UAX15]	UAX #15: Unicode Normalization Forms https://www.unicode.org/reports/tr15/
[UAX29]	UAX #29: Unicode Text Segmentation https://www.unicode.org/reports/tr29/
[UAX31]	UAX #31: Unicode Identifiers and Syntax https://www.unicode.org/reports/tr31/
[Unicode]	The Unicode Standard For the latest version, see: https://www.unicode.org/versions/latest/
[UTR36]	UTR #36: Unicode Security Considerations https://www.unicode.org/reports/tr36/
[UTS18]	UTS #18: Unicode Regular Expressions https://www.unicode.org/reports/tr18/
[UTR23]	UTR #23: The Unicode Character Property Model https://www.unicode.org/reports/tr23/

Modifications

The following summarizes modifications from the previousrevision of this document.

Revision 25

Section 1.4.5EmojiSequences
- Added a note that all emoji sequences are single grapheme clusters.
Section 1.5.2Versioning
- Added version 15.1.
Section 2.3EmojiWith Explicit Gender Appearance
- Add missing character.
Section 2.4.2EmojiModifiers in Text
- Removed text duplicating expanded note.
Section 2.3.1Gender-Neutral Emoji
- Fix reference.
Section 2.6.2Multi-Person Skin Tones
- Updated for version 15.1.
Section 2.9Color
- Updated for version 15.1.
Section 2.10Emoji Glyph Facing Direction
- Updated for version 15.1.
Section 3Which Characters are Emoji
- Replace versioned reference to charts.
- Remove examples, one of which will no longer be posted.
Section 4Presentation Style
- Added a discussion of interactions with computer language syntaxes.
Section 5Ordering and Grouping
- Replace versioned reference to charts.
References
- Changed file versions for version 15.1.

Modifications for prior versions can be found in those prior versions.

© 2023 Unicode, Inc. All Rights Reserved. The Unicode Consortium makes noexpressed or implied warranty of any kind, and assumes no liabilityfor errors or omissions. No liability is assumed for incidental andconsequential damages in connection with or arising out of the useof the information or programs contained or accompanying thistechnical report. The UnicodeTerms of Use apply.

Unicode and the Unicode logo are trademarks ofUnicode, Inc., and are registered in some jurisdictions.

`tag_base`	U+1F3F4 BLACK FLAG
`tag_spec`	(U+E0030 TAG DIGIT ZERO .. U+E0039 TAG DIGIT NINE, U+E0061 TAG LATIN SMALL LETTER A .. U+E007A TAG LATIN SMALL LETTER Z)+

Movatterモバイル変換