Transform text to normalized form, optionally mapping to lowercase andapplying compatibility maps.
Usage
utf8_normalize(x,..., map_case=FALSE, map_compat=FALSE, map_quote=FALSE, remove_ignorable=FALSE)Arguments
- x
character object.
- ...
These dots are for future extensions and must be empty.
- map_case
a logical value indicating whether to apply Unicode casemapping to the text. For most languages, this transformation changesuppercase characters to their lowercase equivalents.
- map_compat
a logical value indicating whether to apply Unicodecompatibility mappings to the characters, those required for NFKC and NFKDnormal forms.
- map_quote
a logical value indicating whether to replace curly singlequotes and Unicode apostrophe characters with ASCII apostrophe (U+0027).
- remove_ignorable
a logical value indicating whether to remove Unicode"default ignorable" characters like zero-width spaces and soft hyphens.
Details
utf8_normalize() converts the elements of a character object to Unicodenormalized composed form (NFC) while applying the character maps specifiedby themap_case,map_compat,map_quote, andremove_ignorable arguments.
Examples
angstrom<-c("\u00c5","\u0041\u030a","\u212b")utf8_normalize(angstrom)=="\u00c5"#> [1] TRUE TRUE TRUE