Movatterモバイル変換


[0]ホーム

URL:


Skip to contents

Text Normalization

Source:R/utf8.R
utf8_normalize.Rd

Transform text to normalized form, optionally mapping to lowercase andapplying compatibility maps.

Usage

utf8_normalize(x,...,  map_case=FALSE,  map_compat=FALSE,  map_quote=FALSE,  remove_ignorable=FALSE)

Arguments

x

character object.

...

These dots are for future extensions and must be empty.

map_case

a logical value indicating whether to apply Unicode casemapping to the text. For most languages, this transformation changesuppercase characters to their lowercase equivalents.

map_compat

a logical value indicating whether to apply Unicodecompatibility mappings to the characters, those required for NFKC and NFKDnormal forms.

map_quote

a logical value indicating whether to replace curly singlequotes and Unicode apostrophe characters with ASCII apostrophe (U+0027).

remove_ignorable

a logical value indicating whether to remove Unicode"default ignorable" characters like zero-width spaces and soft hyphens.

Value

The result is a character object with the same attributes asx but withEncoding set to"UTF-8".

Details

utf8_normalize() converts the elements of a character object to Unicodenormalized composed form (NFC) while applying the character maps specifiedby themap_case,map_compat,map_quote, andremove_ignorable arguments.

See also

as_utf8().

Examples

angstrom<-c("\u00c5","\u0041\u030a","\u212b")utf8_normalize(angstrom)=="\u00c5"#> [1] TRUE TRUE TRUE

[8]ページ先頭

©2009-2025 Movatter.jp