Movatterモバイル変換


[0]ホーム

URL:


String(Elixir v1.18.3)

View Source

Strings in Elixir are UTF-8 encoded binaries.

Strings in Elixir are a sequence of Unicode characters,typically written between double quoted strings, suchas"hello" and"héllò".

In case a string must have a double-quote in itself,the double quotes must be escaped with a backslash,for example:"this is a string with \"double quotes\"".

You can concatenate two strings with the<>/2 operator:

iex>"hello"<>" "<>"world""hello world"

The functions in this module act according toThe Unicode Standard, Version 16.0.0.

Interpolation

Strings in Elixir also support interpolation. This allowsyou to place some value in the middle of a string by usingthe#{} syntax:

iex>name="joe"iex>"hello#{name}""hello joe"

Any Elixir expression is valid inside the interpolation.If a string is given, the string is interpolated as is.If any other value is given, Elixir will attempt to convertit to a string using theString.Chars protocol. Thisallows, for example, to output an integer from the interpolation:

iex>"2 + 2 =#{2+2}""2 + 2 = 4"

In case the value you want to interpolate cannot beconverted to a string, because it doesn't have a humantextual representation, a protocol error will be raised.

Escape characters

Besides allowing double-quotes to be escaped with a backslash,strings also support the following escape characters:

  • \0 - Null byte
  • \a - Bell
  • \b - Backspace
  • \t - Horizontal tab
  • \n - Line feed (New lines)
  • \v - Vertical tab
  • \f - Form feed
  • \r - Carriage return
  • \e - Command Escape
  • \s - Space
  • \# - Returns the# character itself, skipping interpolation
  • \\ - Single backslash
  • \xNN - A byte represented by the hexadecimalNN
  • \uNNNN - A Unicode code point represented byNNNN
  • \u{NNNNNN} - A Unicode code point represented byNNNNNN

Note it is generally not advised to use\xNN in Elixirstrings, as introducing an invalid byte sequence wouldmake the string invalid. If you have to introduce acharacter by its hexadecimal representation, it is bestto work with Unicode code points, such as\uNNNN. In fact,understanding Unicode code points can be essential when doinglow-level manipulations of string, so let's explore them indetail next.

Unicode and code points

In order to facilitate meaningful communication between computersacross multiple languages, a standard is required so that the onesand zeros on one machine mean the same thing when they are transmittedto another. The Unicode Standard acts as an official registry ofvirtually all the characters we know: this includes characters fromclassical and historical texts, emoji, and formatting and controlcharacters as well.

Unicode organizes all of the characters in its repertoire into codecharts, and each character is given a unique numerical index. Thisnumerical index is known as a Code Point.

In Elixir you can use a? in front of a character literal to revealits code point:

iex>?a97iex>322

Note that most Unicode code charts will refer to a code point by itshexadecimal (hex) representation, e.g.97 translates to0061 in hex,and we can represent any Unicode character in an Elixir string byusing the\u escape character followed by its code point number:

iex>"\u0061"==="a"trueiex>0x0061=97=?a97

The hex representation will also help you look up information about acode point, e.g.https://codepoints.net/U+0061has a data sheet all about the lower casea, a.k.a. code point 97.Remember you can get the hex presentation of a number by callingInteger.to_string/2:

iex>Integer.to_string(?a,16)"61"

UTF-8 encoded and encodings

Now that we understand what the Unicode standard is and what code pointsare, we can finally talk about encodings. Whereas the code point iswhatwe store, an encoding deals withhow we store it: encoding is animplementation. In other words, we need a mechanism to convert the codepoint numbers into bytes so they can be stored in memory, written to disk, and such.

Elixir uses UTF-8 to encode its strings, which means that code points areencoded as a series of 8-bit bytes. UTF-8 is avariable width characterencoding that uses one to four bytes to store each code point. It is capableof encoding all valid Unicode code points. Let's see an example:

iex>string="héllo""héllo"iex>String.length(string)5iex>byte_size(string)6

Although the string above has 5 characters, it uses 6 bytes, as two bytesare used to represent the characteré.

Grapheme clusters

This module also works with the concept of grapheme cluster(from now on referenced as graphemes). Graphemes can consistof multiple code points that may be perceived as a single characterby readers. For example, "é" can be represented either as a single"e with acute" code point, as seen above in the string"héllo",or as the letter "e" followed by a "combining acute accent"(two code points):

iex>string="\u0065\u0301""é"iex>byte_size(string)3iex>String.length(string)1iex>String.codepoints(string)["e","́"]iex>String.graphemes(string)["é"]

Although it looks visually the same as before, the example aboveis made of two characters, it is perceived by users as one.

Graphemes can also be two characters that are interpreted as oneby some languages. For example, some languages may consider "ch"as a single character. However, since this information depends onthe locale, it is not taken into account by this module.

In general, the functions in this module rely on the UnicodeStandard, but do not contain any of the locale specific behavior.More information about graphemes can be found in theUnicodeStandard Annex #29.

For converting a binary to a different encoding and for Unicodenormalization mechanisms, see Erlang's:unicode module.

String and binary operations

To act according to the Unicode Standard, many functionsin this module run in linear time, as they need to traversethe whole string considering the proper Unicode code points.

For example,String.length/1 will take longer asthe input grows. On the other hand,Kernel.byte_size/1 always runsin constant time (i.e. regardless of the input size).

This means often there are performance costs in using thefunctions in this module, compared to the more low-leveloperations that work directly with binaries:

Autf8 modifier is also available inside the binary syntax<<>>.It can be used to match code points out of a binary/string:

iex><<eacute::utf8>>="é"iex>eacute233

See thePatterns and Guards guide and the documentation for<<>> for more information on binary pattern matching.

You can also fully convert a string into a list of integer code points,known as "charlists" in Elixir, by callingString.to_charlist/1:

iex>String.to_charlist("héllo")[104,233,108,108,111]

If you would rather see the underlying bytes of a string, instead ofits codepoints, a common trick is to concatenate the null byte<<0>>to it:

iex>"héllo"<><<0>><<104,195,169,108,108,111,0>>

Alternatively, you can view a string's binary representation bypassing an option toIO.inspect/2:

IO.inspect("héllo",binaries::as_binaries)#=> <<104, 195, 169, 108, 108, 111>>

Self-synchronization

The UTF-8 encoding is self-synchronizing. This means thatif malformed data (i.e., data that is not possible accordingto the definition of the encoding) is encountered, only onecode point needs to be rejected.

This module relies on this behavior to ignore such invalidcharacters. For example,length/1 will returna correct result even if an invalid code point is fed into it.

In other words, this module expects invalid data to be detectedelsewhere, usually when retrieving data from the external source.For example, a driver that reads strings from a database will beresponsible to check the validity of the encoding.String.chunk/2can be used for breaking a string into valid and invalid parts.

Compile binary patterns

Many functions in this module work with patterns. For example,String.split/3 can split a string into multiple strings givena pattern. This pattern can be a string, a list of strings ora compiled pattern:

iex>String.split("foo bar"," ")["foo","bar"]iex>String.split("foo bar!",[" ","!"])["foo","bar",""]iex>pattern=:binary.compile_pattern([" ","!"])iex>String.split("foo bar!",pattern)["foo","bar",""]

The compiled pattern is useful when the same match willbe done over and over again. Note though that the compiledpattern cannot be stored in a module attribute as the patternis generated at runtime and does not survive compile time.

Summary

Types

A single Unicode code point encoded in UTF-8. It may be one or more bytes.

Multiple code points that may be perceived as a single character by readers

Pattern used in functions likereplace/4 andsplit/3.

A UTF-8 encoded binary.

Functions

Returns the grapheme at theposition of the given UTF-8string.Ifposition is greater thanstring length, then it returnsnil.

Computes the bag distance between two strings.

Returns a substring starting at (or after)start_bytes and of at mostthe givensize_bytes.

Converts the first character in the given string touppercase and the remainder to lowercase according tomode.

Splits the string into chunks of characters that share a common trait.

Returns a list of code points encoded as strings.

Searches ifstring contains any of the givencontents.

Converts all characters in the given string to lowercase according tomode.

Returns a stringsubject repeatedn times.

Returnstrue ifstring ends with any of the suffixes given.

Returnstrue ifstring1 is canonically equivalent tostring2.

Returns the first grapheme from a UTF-8 string,nil if the string is empty.

Returns Unicode graphemes in the string as per Extended GraphemeCluster algorithm.

Computes the Jaro distance (similarity) between two strings.

Returns the last grapheme from a UTF-8 string,nil if the string is empty.

Returns the number of Unicode graphemes in a UTF-8 string.

Checks ifstring matches the given regular expression.

Returns a keyword list that represents an edit script.

Returns the next code point in a string.

Returns the next grapheme in a string.

Returns the size (in bytes) of the next grapheme.

Converts all characters instring to Unicode normalizationform identified byform.

Returns a new string padded with a leading fillerwhich is made of elements from thepadding.

Returns a new string padded with a trailing fillerwhich is made of elements from thepadding.

Checks if a string contains only printable characters up tocharacter_limit.

Returns a new string created by replacing occurrences ofpattern insubject withreplacement.

Returns a new string created by replacing all invalid bytes withreplacement ("�" by default).

Replaces all leading occurrences ofmatch byreplacement ofmatch instring.

Replaces prefix instring byreplacement if it matchesmatch.

Replaces suffix instring byreplacement if it matchesmatch.

Replaces all trailing occurrences ofmatch byreplacement instring.

Reverses the graphemes in given string.

Returns a substring from the offset given by the start of therange to the offset given by the end of the range.

Returns a substring starting at the offsetstart, and of the givenlength.

Divides a string into substrings at each Unicode whitespaceoccurrence with leading and trailing whitespace ignored.

Divides a string into parts based on a pattern.

Splits a string into two at the specified offset. When the offset given isnegative, location is counted from the end of the string.

Returns an enumerable that splits a string on demand.

Returnstrue ifstring starts with any of the prefixes given.

Converts a string to an existing atom or creates a new one.

Converts a string into a charlist.

Converts a string to an existing atom or raises ifthe atom does not exist.

Returns a float whose text representation isstring.

Returns an integer whose text representation isstring.

Returns an integer whose text representation isstring in basebase.

Returns a string where all leading and trailing Unicode whitespaceshave been removed.

Returns a string where all leading and trailingto_trim characters have beenremoved.

Returns a string where all leading Unicode whitespaceshave been removed.

Returns a string where all leadingto_trim characters have been removed.

Returns a string where all trailing Unicode whitespaceshas been removed.

Returns a string where all trailingto_trim characters have been removed.

Converts all characters in the given string to uppercase according tomode.

Checks whetherstring contains only valid characters.

Types

codepoint()

@type codepoint() ::t()

A single Unicode code point encoded in UTF-8. It may be one or more bytes.

grapheme()

@type grapheme() ::t()

Multiple code points that may be perceived as a single character by readers

pattern()

@type pattern() ::t() | [nonempty_binary()] | (compiled_search_pattern :::binary.cp())

Pattern used in functions likereplace/4 andsplit/3.

It must be one of:

t()

@type t() ::binary()

A UTF-8 encoded binary.

The typesString.t() andbinary() are equivalent to analysis tools.Although, for those reading the documentation,String.t() impliesit is a UTF-8 encoded binary.

Functions

at(string, position)

@spec at(t(),integer()) ::grapheme() | nil

Returns the grapheme at theposition of the given UTF-8string.Ifposition is greater thanstring length, then it returnsnil.

Linear Access

This function has to linearly traverse the string.If you want to access a string or a binary in constant time based on thenumber of bytes, useKernel.binary_slice/3 or:binary.at/2 instead.

Examples

iex>String.at("elixir",0)"e"iex>String.at("elixir",1)"l"iex>String.at("elixir",10)niliex>String.at("elixir",-1)"r"iex>String.at("elixir",-10)nil

bag_distance(string1, string2)

(since 1.8.0)
@spec bag_distance(t(),t()) ::float()

Computes the bag distance between two strings.

Returns a float value between 0 and 1 representing the bagdistance betweenstring1 andstring2.

The bag distance is meant to be an efficient approximationof the distance between two strings to quickly rule out stringsthat are largely different.

The algorithm is outlined in the "String Matching with MetricTrees Using an Approximate Distance" paper by Ilaria Bartolini,Paolo Ciaccia, and Marco Patella.

Examples

iex>String.bag_distance("abc","")0.0iex>String.bag_distance("abcd","a")0.25iex>String.bag_distance("abcd","ab")0.5iex>String.bag_distance("abcd","abc")0.75iex>String.bag_distance("abcd","abcd")1.0

byte_slice(string, start_bytes, size_bytes)

(since 1.17.0)
@spec byte_slice(t(),integer(),non_neg_integer()) ::t()

Returns a substring starting at (or after)start_bytes and of at mostthe givensize_bytes.

This function works on bytes and then adjusts the string to eliminatetruncated codepoints. This is useful when you have a string and you needto guarantee it does not exceed a certain amount of bytes.

If the offset is greater than the number of bytes in the string, then itreturns"". Similar toString.slice/2, a negativestart_byteswill be adjusted to the end of the string (but in bytes).

This function does not guarantee the string won't have invalid codepoints,it only guarantees to remove truncated codepoints immediately at the beginningor the end of the slice.

Examples

Consider the string "héllo". Let's see its representation:

iex>inspect("héllo",binaries::as_binaries)"<<104, 195, 169, 108, 108, 111>>"

Although the string has 5 characters, it is made of 6 bytes. Now imaginewe want to get only the first two bytes. To do so, let's usebinary_slice/3,which is unaware of codepoints:

iex>binary_slice("héllo",0,2)<<104,195>>

As you can see, this operation is unsafe and returns an invalid string.That's because we cut the string in the middle of the bytes representing"é". On the other hand, we could useString.slice/3:

iex>String.slice("héllo",0,2)"hé"

While the above is correct, it has 3 bytes. If you have a requirement whereyou needat most 2 bytes, the result would also be invalid. In such scenarios,you can use this function, which will slice the given bytes, but clean upthe truncated codepoints:

iex>String.byte_slice("héllo",0,2)"h"

Truncated codepoints at the beginning are also cleaned up:

iex>String.byte_slice("héllo",2,3)"llo"

Note that, if you want to work on raw bytes, then you must usebinary_slice/3instead.

capitalize(string, mode \\ :default)

@spec capitalize(t(), :default | :ascii | :greek | :turkic) ::t()

Converts the first character in the given string touppercase and the remainder to lowercase according tomode.

mode may be:default,:ascii,:greek or:turkic. The:default modeconsiders all non-conditional transformations outlined in the Unicode standard.:ascii capitalizes only the letters A to Z.:greek includes the contextsensitive mappings found in Greek.:turkic properly handles the letteriwith the dotless variant.

Also seeupcase/2 andcapitalize/2 for other conversions. If you wanta variation of this function that does not lowercase the rest of string,see Erlang's:string.titlecase/1.

Examples

iex>String.capitalize("abcd")"Abcd"iex>String.capitalize("ABCD")"Abcd"iex>String.capitalize("fin")"Fin"iex>String.capitalize("olá")"Olá"

chunk(string, trait)

@spec chunk(t(), :valid | :printable) :: [t()]

Splits the string into chunks of characters that share a common trait.

The trait can be one of two options:

  • :valid - the string is split into chunks of valid and invalidcharacter sequences

  • :printable - the string is split into chunks of printable andnon-printable character sequences

Returns a list of binaries each of which contains only one kind ofcharacters.

If the given string is empty, an empty list is returned.

Examples

iex>String.chunk(<<?a,?b,?c,0>>,:valid)["abc\0"]iex>String.chunk(<<?a,?b,?c,0,0xFFFF::utf16>>,:valid)["abc\0",<<0xFFFF::utf16>>]iex>String.chunk(<<?a,?b,?c,0,0x0FFFF::utf8>>,:printable)["abc",<<0,0x0FFFF::utf8>>]

codepoints(string)

@spec codepoints(t()) :: [codepoint()]

Returns a list of code points encoded as strings.

To retrieve code points in their natural integerrepresentation, seeto_charlist/1. For details aboutcode points and graphemes, see theString moduledocumentation.

Examples

iex>String.codepoints("olá")["o","l","á"]iex>String.codepoints("оптими зации")["о","п","т","и","м","и"," ","з","а","ц","и","и"]iex>String.codepoints("ἅἪῼ")["ἅ","Ἢ","ῼ"]iex>String.codepoints("\u00e9")["é"]iex>String.codepoints("\u0065\u0301")["e","́"]

contains?(string, contents)

@spec contains?(t(), [t()] |pattern()) ::boolean()

Searches ifstring contains any of the givencontents.

contents can be either a string, a list of strings,or a compiled pattern. Ifcontents is a list, thisfunction will search if any of the strings incontentsare part ofstring.

Searching for a string in a list

If you want to check ifstring is listed incontents,wherecontents is a list, useEnum.member?(contents, string)instead.

Examples

iex>String.contains?("elixir of life","of")trueiex>String.contains?("elixir of life",["life","death"])trueiex>String.contains?("elixir of life",["death","mercury"])false

The argument can also be a compiled pattern:

iex>pattern=:binary.compile_pattern(["life","death"])iex>String.contains?("elixir of life",pattern)true

An empty string will always match:

iex>String.contains?("elixir of life","")trueiex>String.contains?("elixir of life",["","other"])true

An empty list will never match:

iex>String.contains?("elixir of life",[])falseiex>String.contains?("",[])false

Be aware that this function can match within or across grapheme boundaries.For example, take the grapheme "é" which is made of the characters"e" and the acute accent. The following returnstrue:

iex>String.contains?(String.normalize("é",:nfd),"e")true

However, if "é" is represented by the single character "e with acute"accent, then it will returnfalse:

iex>String.contains?(String.normalize("é",:nfc),"e")false

downcase(string, mode \\ :default)

@spec downcase(t(), :default | :ascii | :greek | :turkic) ::t()

Converts all characters in the given string to lowercase according tomode.

mode may be:default,:ascii,:greek or:turkic. The:default mode considersall non-conditional transformations outlined in the Unicode standard.:asciilowercases only the letters A to Z.:greek includes the context sensitivemappings found in Greek.:turkic properly handles the letter i with the dotless variant.

Also seeupcase/2 andcapitalize/2 for other conversions.

Examples

iex>String.downcase("ABCD")"abcd"iex>String.downcase("AB 123 XPTO")"ab 123 xpto"iex>String.downcase("OLÁ")"olá"

The:ascii mode ignores Unicode characters and provides a moreperformant implementation when you know the string contains onlyASCII characters:

iex>String.downcase("OLÁ",:ascii)"olÁ"

The:greek mode properly handles the context sensitive sigma in Greek:

iex>String.downcase("ΣΣ")"σσ"iex>String.downcase("ΣΣ",:greek)"σς"

And:turkic properly handles the letter i with the dotless variant:

iex>String.downcase("Iİ")"ii̇"iex>String.downcase("Iİ",:turkic)"ıi"

duplicate(subject, n)

@spec duplicate(t(),non_neg_integer()) ::t()

Returns a stringsubject repeatedn times.

Inlined by the compiler.

Examples

iex>String.duplicate("abc",0)""iex>String.duplicate("abc",1)"abc"iex>String.duplicate("abc",2)"abcabc"

ends_with?(string, suffix)

@spec ends_with?(t(),t() | [t()]) ::boolean()

Returnstrue ifstring ends with any of the suffixes given.

suffixes can be either a single suffix or a list of suffixes.

Examples

iex>String.ends_with?("language","age")trueiex>String.ends_with?("language",["youth","age"])trueiex>String.ends_with?("language",["youth","elixir"])false

An empty suffix will always match:

iex>String.ends_with?("language","")trueiex>String.ends_with?("language",["","other"])true

equivalent?(string1, string2)

@spec equivalent?(t(),t()) ::boolean()

Returnstrue ifstring1 is canonically equivalent tostring2.

It performs Normalization Form Canonical Decomposition (NFD) on thestrings before comparing them. This function is equivalent to:

String.normalize(string1,:nfd)==String.normalize(string2,:nfd)

If you plan to compare multiple strings, multiple times in a row, youmay normalize them upfront and compare them directly to avoid multiplenormalization passes.

Examples

iex>String.equivalent?("abc","abc")trueiex>String.equivalent?("man\u0303ana","mañana")trueiex>String.equivalent?("abc","ABC")falseiex>String.equivalent?("nø","nó")false

first(string)

@spec first(t()) ::grapheme() | nil

Returns the first grapheme from a UTF-8 string,nil if the string is empty.

Examples

iex>String.first("elixir")"e"iex>String.first("եոգլի")"ե"iex>String.first("")nil

graphemes(string)

@spec graphemes(t()) :: [grapheme()]

Returns Unicode graphemes in the string as per Extended GraphemeCluster algorithm.

The algorithm is outlined in theUnicode Standard Annex #29,Unicode Text Segmentation.

For details about code points and graphemes, see theString module documentation.

Examples

iex>String.graphemes("Ńaïve")["Ń","a","ï","v","e"]iex>String.graphemes("\u00e9")["é"]iex>String.graphemes("\u0065\u0301")["é"]

jaro_distance(string1, string2)

@spec jaro_distance(t(),t()) ::float()

Computes the Jaro distance (similarity) between two strings.

Returns a float value between0.0 (equates to no similarity) and1.0(is an exact match) representingJarodistance betweenstring1 andstring2.

The Jaro distance metric is designed and best suited for shortstrings such as person names. Elixir itself uses this functionto provide the "did you mean?" functionality. For instance, when youare calling a function in a module and you have a typo in thefunction name, we attempt to suggest the most similar functionname available, if any, based on thejaro_distance/2 score.

Examples

iex>String.jaro_distance("Dwayne","Duane")0.8222222222222223iex>String.jaro_distance("even","odd")0.0iex>String.jaro_distance("same","same")1.0

last(string)

@spec last(t()) ::grapheme() | nil

Returns the last grapheme from a UTF-8 string,nil if the string is empty.

It traverses the whole string to find its last grapheme.

Examples

iex>String.last("")niliex>String.last("elixir")"r"iex>String.last("եոգլի")"ի"

length(string)

@spec length(t()) ::non_neg_integer()

Returns the number of Unicode graphemes in a UTF-8 string.

Examples

iex>String.length("elixir")6iex>String.length("եոգլի")5

match?(string, regex)

@spec match?(t(),Regex.t()) ::boolean()

Checks ifstring matches the given regular expression.

Examples

iex>String.match?("foo",~r/foo/)trueiex>String.match?("bar",~r/foo/)false

Elixir also provides text-based match operator=~/2 and functionRegex.match?/2 asalternatives to test strings against regular expressions.

myers_difference(string1, string2)

(since 1.3.0)
@spec myers_difference(t(),t()) :: [{:eq | :ins | :del,t()}]

Returns a keyword list that represents an edit script.

CheckList.myers_difference/2 for more information.

Examples

iex>string1="fox hops over the dog"iex>string2="fox jumps over the lazy cat"iex>String.myers_difference(string1,string2)[eq:"fox ",del:"ho",ins:"jum",eq:"ps over the ",del:"dog",ins:"lazy cat"]

next_codepoint(arg)

@spec next_codepoint(t()) :: {codepoint(),t()} | nil

Returns the next code point in a string.

The result is a tuple with the code point and theremainder of the string ornil in casethe string reached its end.

As with other functions in theString module,next_codepoint/1works with binaries that are invalid UTF-8. If the string startswith a sequence of bytes that is not valid in UTF-8 encoding, thefirst element of the returned tuple is a binary with the first byte.

Examples

iex>String.next_codepoint("olá"){"o","lá"}iex>invalid="\x80\x80OK"# first two bytes are invalid in UTF-8iex>{_,rest}=String.next_codepoint(invalid){<<128>>,<<128,79,75>>}iex>String.next_codepoint(rest){<<128>>,"OK"}

Comparison with binary pattern matching

Binary pattern matching provides a similar way to decomposea string:

iex><<codepoint::utf8,rest::binary>>="Elixir""Elixir"iex>codepoint69iex>rest"lixir"

though not entirely equivalent becausecodepoint comes asan integer, and the pattern won't match invalid UTF-8.

Binary pattern matching, however, is simpler and more efficient,so pick the option that better suits your use case.

next_grapheme(string)

@spec next_grapheme(t()) :: {grapheme(),t()} | nil

Returns the next grapheme in a string.

The result is a tuple with the grapheme and theremainder of the string ornil in casethe String reached its end.

Examples

iex>String.next_grapheme("olá"){"o","lá"}iex>String.next_grapheme("")nil

next_grapheme_size(string)

@spec next_grapheme_size(t()) :: {pos_integer(),t()} | nil

Returns the size (in bytes) of the next grapheme.

The result is a tuple with the next grapheme size in bytes andthe remainder of the string ornil in case the stringreached its end.

Examples

iex>String.next_grapheme_size("olá"){1,"lá"}iex>String.next_grapheme_size("")nil

normalize(string, form)

@spec normalize(t(), :nfd | :nfc | :nfkd | :nfkc) ::t()

Converts all characters instring to Unicode normalizationform identified byform.

Invalid Unicode codepoints are skipped and the remaining ofthe string is converted. If you want the algorithm to stopand return on invalid codepoint, use:unicode.characters_to_nfd_binary/1,:unicode.characters_to_nfc_binary/1,:unicode.characters_to_nfkd_binary/1,and:unicode.characters_to_nfkc_binary/1 instead.

Normalization forms:nfkc and:nfkd should not be blindly appliedto arbitrary text. Because they erase many formatting distinctions,they will prevent round-trip conversion to and from many legacycharacter sets.

Forms

The supported forms are:

  • :nfd - Normalization Form Canonical Decomposition.Characters are decomposed by canonical equivalence, andmultiple combining characters are arranged in a specificorder.

  • :nfc - Normalization Form Canonical Composition.Characters are decomposed and then recomposed by canonical equivalence.

  • :nfkd - Normalization Form Compatibility Decomposition.Characters are decomposed by compatibility equivalence, andmultiple combining characters are arranged in a specificorder.

  • :nfkc - Normalization Form Compatibility Composition.Characters are decomposed and then recomposed by compatibility equivalence.

Examples

iex>String.normalize("yêṩ",:nfd)"yêṩ"iex>String.normalize("leña",:nfc)"leña"iex>String.normalize("fi",:nfkd)"fi"iex>String.normalize("fi",:nfkc)"fi"

pad_leading(string, count, padding \\ [" "])

@spec pad_leading(t(),non_neg_integer(),t() | [t()]) ::t()

Returns a new string padded with a leading fillerwhich is made of elements from thepadding.

Passing a list of strings aspadding will take one element of the listfor every missing entry. If the list is shorter than the number of inserts,the filling will start again from the beginning of the list.Passing a stringpadding is equivalent to passing the list of graphemes in it.If nopadding is given, it defaults to whitespace.

Whencount is less than or equal to the length ofstring,givenstring is returned.

RaisesArgumentError if the givenpadding contains a non-string element.

Examples

iex>String.pad_leading("abc",5)"  abc"iex>String.pad_leading("abc",4,"12")"1abc"iex>String.pad_leading("abc",6,"12")"121abc"iex>String.pad_leading("abc",5,["1","23"])"123abc"

pad_trailing(string, count, padding \\ [" "])

@spec pad_trailing(t(),non_neg_integer(),t() | [t()]) ::t()

Returns a new string padded with a trailing fillerwhich is made of elements from thepadding.

Passing a list of strings aspadding will take one element of the listfor every missing entry. If the list is shorter than the number of inserts,the filling will start again from the beginning of the list.Passing a stringpadding is equivalent to passing the list of graphemes in it.If nopadding is given, it defaults to whitespace.

Whencount is less than or equal to the length ofstring,givenstring is returned.

RaisesArgumentError if the givenpadding contains a non-string element.

Examples

iex>String.pad_trailing("abc",5)"abc  "iex>String.pad_trailing("abc",4,"12")"abc1"iex>String.pad_trailing("abc",6,"12")"abc121"iex>String.pad_trailing("abc",5,["1","23"])"abc123"

printable?(string, character_limit \\ :infinity)

@spec printable?(t(), 0) :: true
@spec printable?(t(),pos_integer() | :infinity) ::boolean()

Checks if a string contains only printable characters up tocharacter_limit.

Takes an optionalcharacter_limit as a second argument. Ifcharacter_limit is0, thisfunction will returntrue.

Examples

iex>String.printable?("abc")trueiex>String.printable?("abc"<><<0>>)falseiex>String.printable?("abc"<><<0>>,2)trueiex>String.printable?("abc"<><<0>>,0)true

replace(subject, pattern, replacement, options \\ [])

@spec replace(t(),pattern() |Regex.t(),t() | (t() ->t() |iodata()),keyword()) ::t()

Returns a new string created by replacing occurrences ofpattern insubject withreplacement.

Thesubject is always a string.

Thepattern may be a string, a list of strings, a regular expression, or acompiled pattern.

Thereplacement may be a string or a function that receives the matchedpattern and must return the replacement as a string or iodata.

By default it replaces all occurrences but this behavior can be controlledthrough the:global option; see the "Options" section below.

Options

  • :global - (boolean) iftrue, all occurrences ofpattern are replacedwithreplacement, otherwise only the first occurrence isreplaced. Defaults totrue

Examples

iex>String.replace("a,b,c",",","-")"a-b-c"iex>String.replace("a,b,c",",","-",global:false)"a-b,c"

The pattern may also be a list of strings and the replacement may alsobe a function that receives the matches:

iex>String.replace("a,b,c",["a","c"],fn<<char>>-><<char+1>>end)"b,b,d"

When the pattern is a regular expression, one can give\N or\g{N} in thereplacement string to access a specific capture in theregular expression:

iex>String.replace("a,b,c",~r/,(.)/,",\\1\\g{1}")"a,bb,cc"

Note that we had to escape the backslash escape character (i.e., we used\\Ninstead of just\N to escape the backslash; same thing for\\g{N}). Bygiving\0, one can inject the whole match in the replacement string.

A compiled pattern can also be given:

iex>pattern=:binary.compile_pattern(",")iex>String.replace("a,b,c",pattern,"[]")"a[]b[]c"

When an empty string is provided as apattern, the function will treat it asan implicit empty string between each grapheme and the string will beinterspersed. If an empty string is provided asreplacement thesubjectwill be returned:

iex>String.replace("ELIXIR","",".")".E.L.I.X.I.R."iex>String.replace("ELIXIR","","")"ELIXIR"

Be aware that this function can replace within or across grapheme boundaries.For example, take the grapheme "é" which is made of the characters"e" and the acute accent. The following will replace only the letter "e",moving the accent to the letter "o":

iex>String.replace(String.normalize("é",:nfd),"e","o")"ó"

However, if "é" is represented by the single character "e with acute"accent, then it won't be replaced at all:

iex>String.replace(String.normalize("é",:nfc),"e","o")"é"

replace_invalid(bytes, replacement \\ "�")

(since 1.16.0)
@spec replace_invalid(binary(),t()) ::t()

Returns a new string created by replacing all invalid bytes withreplacement ("�" by default).

Examples

iex>String.replace_invalid("asd"<><<0xFF::8>>)"asd�"iex>String.replace_invalid("nem rán bề bề")"nem rán bề bề"iex>String.replace_invalid("nem rán b"<><<225,187>><>" bề")"nem rán b� bề"iex>String.replace_invalid("nem rán b"<><<225,187>><>" bề","ERROR!")"nem rán bERROR! bề"

replace_leading(string, match, replacement)

@spec replace_leading(t(),t(),t()) ::t()

Replaces all leading occurrences ofmatch byreplacement ofmatch instring.

Returns the string untouched if there are no occurrences.

Ifmatch is"", this function raises anArgumentError exception: thishappens because this function replacesall the occurrences ofmatch atthe beginning ofstring, and it's impossible to replace "multiple"occurrences of"".

Examples

iex>String.replace_leading("hello world","hello ","")"world"iex>String.replace_leading("hello hello world","hello ","")"world"iex>String.replace_leading("hello world","hello ","ola ")"ola world"iex>String.replace_leading("hello hello world","hello ","ola ")"ola ola world"

This function can replace across grapheme boundaries. Seereplace/3for more information and examples.

replace_prefix(string, match, replacement)

@spec replace_prefix(t(),t(),t()) ::t()

Replaces prefix instring byreplacement if it matchesmatch.

Returns the string untouched if there is no match. Ifmatch is an emptystring (""),replacement is just prepended tostring.

Examples

iex>String.replace_prefix("world","hello ","")"world"iex>String.replace_prefix("hello world","hello ","")"world"iex>String.replace_prefix("hello hello world","hello ","")"hello world"iex>String.replace_prefix("world","hello ","ola ")"world"iex>String.replace_prefix("hello world","hello ","ola ")"ola world"iex>String.replace_prefix("hello hello world","hello ","ola ")"ola hello world"iex>String.replace_prefix("world","","hello ")"hello world"

This function can replace across grapheme boundaries. Seereplace/3for more information and examples.

replace_suffix(string, match, replacement)

@spec replace_suffix(t(),t(),t()) ::t()

Replaces suffix instring byreplacement if it matchesmatch.

Returns the string untouched if there is no match. Ifmatch is an emptystring (""),replacement is just appended tostring.

Examples

iex>String.replace_suffix("hello"," world","")"hello"iex>String.replace_suffix("hello world"," world","")"hello"iex>String.replace_suffix("hello world world"," world","")"hello world"iex>String.replace_suffix("hello"," world"," mundo")"hello"iex>String.replace_suffix("hello world"," world"," mundo")"hello mundo"iex>String.replace_suffix("hello world world"," world"," mundo")"hello world mundo"iex>String.replace_suffix("hello",""," world")"hello world"

This function can replace across grapheme boundaries. Seereplace/3for more information and examples.

replace_trailing(string, match, replacement)

@spec replace_trailing(t(),t(),t()) ::t()

Replaces all trailing occurrences ofmatch byreplacement instring.

Returns the string untouched if there are no occurrences.

Ifmatch is"", this function raises anArgumentError exception: thishappens because this function replacesall the occurrences ofmatch atthe end ofstring, and it's impossible to replace "multiple" occurrences of"".

Examples

iex>String.replace_trailing("hello world"," world","")"hello"iex>String.replace_trailing("hello world world"," world","")"hello"iex>String.replace_trailing("hello world"," world"," mundo")"hello mundo"iex>String.replace_trailing("hello world world"," world"," mundo")"hello mundo mundo"

This function can replace across grapheme boundaries. Seereplace/3for more information and examples.

reverse(string)

@spec reverse(t()) ::t()

Reverses the graphemes in given string.

Examples

iex>String.reverse("abcd")"dcba"iex>String.reverse("hello world")"dlrow olleh"iex>String.reverse("hello ∂og")"go∂ olleh"

Keep in mind reversing the same string twice doesnot necessarily yield the original string:

iex>"̀e""̀e"iex>String.reverse("̀e")"è"iex>String.reverse(String.reverse("̀e"))"è"

In the first example the accent is before the vowel, soit is considered two graphemes. However, when you reverseit once, you have the vowel followed by the accent, whichbecomes one grapheme. Reversing it again will keep it asone single grapheme.

slice(string, range)

@spec slice(t(),Range.t()) ::t()

Returns a substring from the offset given by the start of therange to the offset given by the end of the range.

This function works on Unicode graphemes. For example, slicing the firstthree characters of the string "héllo" will return "hél", which internallyis represented by more than three bytes. UseString.byte_slice/3 if youwant to slice by a given number of bytes, while respecting the codepointboundaries. If you want to work on raw bytes, checkKernel.binary_part/3orKernel.binary_slice/3 instead.

If the start of the range is not a valid offset for the givenstring or if the range is in reverse order, returns"".

If the start or end of the range is negative, the whole stringis traversed first in order to convert the negative indices intopositive ones.

Examples

iex>String.slice("elixir",1..3)"lix"iex>String.slice("elixir",1..10)"lixir"iex>String.slice("elixir",-4..-1)"ixir"iex>String.slice("elixir",-4..6)"ixir"iex>String.slice("elixir",-100..100)"elixir"

For ranges wherestart > stop, you need to explicitlymark them as increasing:

iex>String.slice("elixir",2..-1//1)"ixir"iex>String.slice("elixir",1..-2//1)"lixi"

You can use../0 as a shortcut for0..-1//1, which returnsthe whole string as is:

iex>String.slice("elixir",..)"elixir"

The step can be any positive number. For example, toget every 2 characters of the string:

iex>String.slice("elixir",0..-1//2)"eii"

If the first position is after the string ends or afterthe last position of the range, it returns an empty string:

iex>String.slice("elixir",10..3//1)""iex>String.slice("a",1..1500)""

slice(string, start, length)

@spec slice(t(),integer(),non_neg_integer()) ::grapheme()

Returns a substring starting at the offsetstart, and of the givenlength.

This function works on Unicode graphemes. For example, slicing the firstthree characters of the string "héllo" will return "hél", which internallyis represented by more than three bytes. UseString.byte_slice/3 if youwant to slice by a given number of bytes, while respecting the codepointboundaries. If you want to work on raw bytes, checkKernel.binary_part/3orKernel.binary_slice/3 instead.

If the offset is greater than string length, then it returns"".

Examples

iex>String.slice("elixir",1,3)"lix"iex>String.slice("elixir",1,10)"lixir"iex>String.slice("elixir",10,3)""

If the start position is negative, it is normalizedagainst the string length and clamped to 0:

iex>String.slice("elixir",-4,4)"ixir"iex>String.slice("elixir",-10,3)"eli"

If start is more than the string length, an emptystring is returned:

iex>String.slice("elixir",10,1500)""

split(binary)

@spec split(t()) :: [t()]

Divides a string into substrings at each Unicode whitespaceoccurrence with leading and trailing whitespace ignored.

Groups of whitespace are treated as a single occurrence.Divisions do not occur on non-breaking whitespace.

Examples

iex>String.split("foo bar")["foo","bar"]iex>String.split("foo"<><<194,133>><>"bar")["foo","bar"]iex>String.split(" foo   bar ")["foo","bar"]iex>String.split("no\u00a0break")["no\u00a0break"]

Removes empty strings, like when usingtrim: true inString.split/3.

iex>String.split(" ")[]

split(string, pattern, options \\ [])

@spec split(t(),pattern() |Regex.t(),keyword()) :: [t()]

Divides a string into parts based on a pattern.

Returns a list of these parts.

Thepattern may be a string, a list of strings, a regular expression, or acompiled pattern.

The string is split into as many parts as possible bydefault, but can be controlled via the:parts option.

Empty strings are only removed from the result if the:trim option is set totrue.

When the pattern used is a regular expression, the string issplit usingRegex.split/3.

If the pattern cannot be found, a list containing the originalstring will be returned.

Options

  • :parts (positive integer or:infinity) - the stringis split into at most as many parts as this option specifies.If:infinity, the string will be split into all possibleparts. Defaults to:infinity.

  • :trim (boolean) - iftrue, empty strings are removed fromthe resulting list.

This function also accepts all options accepted byRegex.split/3ifpattern is a regular expression.

Examples

Splitting with a string pattern:

iex>String.split("a,b,c",",")["a","b","c"]iex>String.split("a,b,c",",",parts:2)["a","b,c"]iex>String.split(" a b c "," ",trim:true)["a","b","c"]

A list of patterns:

iex>String.split("1,2 3,4",[" ",","])["1","2","3","4"]

A regular expression:

iex>String.split("a,b,c",~r{,})["a","b","c"]iex>String.split("a,b,c",~r{,},parts:2)["a","b,c"]iex>String.split(" a b c ",~r{\s},trim:true)["a","b","c"]iex>String.split("abc",~r{b},include_captures:true)["a","b","c"]

A compiled pattern:

iex>pattern=:binary.compile_pattern([" ",","])iex>String.split("1,2 3,4",pattern)["1","2","3","4"]

Splitting on empty string returns graphemes:

iex>String.split("abc","")["","a","b","c",""]iex>String.split("abc","",trim:true)["a","b","c"]iex>String.split("abc","",parts:1)["abc"]iex>String.split("abc","",parts:3)["","a","bc"]

Splitting on an non-existing pattern returns the original string:

iex>String.split("abc",",")["abc"]

Be aware that this function can split within or across grapheme boundaries.For example, take the grapheme "é" which is made of the characters"e" and the acute accent. The following will split the string into two parts:

iex>String.split(String.normalize("é",:nfd),"e")["","́"]

However, if "é" is represented by the single character "e with acute"accent, then it will split the string into just one part:

iex>String.split(String.normalize("é",:nfc),"e")["é"]

When using both the:trim and the:parts option, the empty valuesare removed as the parts are computed (if any). No trimming happensafter all parts are computed:

iex>String.split(" a  b  c  "," ",trim:true,parts:2)["a"," b  c  "]iex>String.split(" a  b  c  "," ",trim:true,parts:3)["a","b"," c  "]

split_at(string, position)

@spec split_at(t(),integer()) :: {t(),t()}

Splits a string into two at the specified offset. When the offset given isnegative, location is counted from the end of the string.

The offset is capped to the length of the string. Returns a tuple withtwo elements.

Linear Access

This function splits on graphemes and for such it has to linearly traversethe string.If you want to split a string or a binary based on the number of bytes,useKernel.binary_part/3 instead.

Examples

iex>String.split_at("sweetelixir",5){"sweet","elixir"}iex>String.split_at("sweetelixir",-6){"sweet","elixir"}iex>String.split_at("abc",0){"","abc"}iex>String.split_at("abc",1000){"abc",""}iex>String.split_at("abc",-1000){"","abc"}

splitter(string, pattern, options \\ [])

@spec splitter(t(),pattern(),keyword()) ::Enumerable.t()

Returns an enumerable that splits a string on demand.

This is in contrast tosplit/3 which splits theentire string upfront.

This function does not support regular expressionsby design. When using regular expressions, it is oftenmore efficient to have the regular expressions traversethe string at once than in parts, like this function does.

Options

  • :trim - whentrue, does not emit empty patterns

Examples

iex>String.splitter("1,2 3,4 5,6 7,8,...,99999",[" ",","])|>Enum.take(4)["1","2","3","4"]iex>String.splitter("abcd","")|>Enum.take(10)["","a","b","c","d",""]iex>String.splitter("abcd","",trim:true)|>Enum.take(10)["a","b","c","d"]

A compiled pattern can also be given:

iex>pattern=:binary.compile_pattern([" ",","])iex>String.splitter("1,2 3,4 5,6 7,8,...,99999",pattern)|>Enum.take(4)["1","2","3","4"]

starts_with?(string, prefix)

@spec starts_with?(t(),t() | [t()]) ::boolean()

Returnstrue ifstring starts with any of the prefixes given.

prefix can be either a string, a list of strings, or a compiledpattern.

Examples

iex>String.starts_with?("elixir","eli")trueiex>String.starts_with?("elixir",["erlang","elixir"])trueiex>String.starts_with?("elixir",["erlang","ruby"])false

An empty string will always match:

iex>String.starts_with?("elixir","")trueiex>String.starts_with?("elixir",["","other"])true

An empty list will never match:

iex>String.starts_with?("elixir",[])falseiex>String.starts_with?("",[])false

to_atom(string)

@spec to_atom(t()) ::atom()

Converts a string to an existing atom or creates a new one.

Warning: this function creates atoms dynamically and atoms arenot garbage-collected. Therefore,string should not be anuntrusted value, such as input received from a socket or duringa web request. Consider usingto_existing_atom/1 instead.

By default, the maximum number of atoms is1_048_576. This limitcan be raised or lowered using the VM option+t.

The maximum atom size is of 255 Unicode code points.

Inlined by the compiler.

Examples

iex>String.to_atom("my_atom"):my_atom

to_charlist(string)

@spec to_charlist(t()) ::charlist()

Converts a string into a charlist.

Specifically, this function takes a UTF-8 encoded binary and returns a list of its integercode points. It is similar tocodepoints/1 except that the latter returns a list of code points asstrings.

In case you need to work with bytes, take a look at the:binary module.

Examples

iex>String.to_charlist("foo")~c"foo"

to_existing_atom(string)

@spec to_existing_atom(t()) ::atom()

Converts a string to an existing atom or raises ifthe atom does not exist.

The maximum atom size is of 255 Unicode code points.Raises anArgumentError if the atom does not exist.

Inlined by the compiler.

Atoms and modules

Since Elixir is a compiled language, the atoms defined in a modulewill only exist after said module is loaded, which typically happenswhenever a function in the module is executed. Therefore, it isgenerally recommended to callString.to_existing_atom/1 only toconvert atoms defined within the module making the function calltoto_existing_atom/1.

To create a module name itself from a string safely,it is recommended to useModule.safe_concat/1.

Examples

iex>_=:my_atomiex>String.to_existing_atom("my_atom"):my_atom

to_float(string)

@spec to_float(t()) ::float()

Returns a float whose text representation isstring.

string must be the string representation of a float including leading digits and a decimalpoint. To parse a string without decimal point as a float, refer toFloat.parse/1. Otherwise,anArgumentError will be raised.

Inlined by the compiler.

Examples

iex>String.to_float("2.2017764e+0")2.2017764iex>String.to_float("3.0")3.0String.to_float("3")** (ArgumentError) argument errorString.to_float(".3")** (ArgumentError) argument error

to_integer(string)

@spec to_integer(t()) ::integer()

Returns an integer whose text representation isstring.

string must be the string representation of an integer.Otherwise, anArgumentError will be raised. If you wantto parse a string that may contain an ill-formatted integer,useInteger.parse/1.

Inlined by the compiler.

Examples

iex>String.to_integer("123")123

Passing a string that does not represent an integer leads to an error:

String.to_integer("invalid data")** (ArgumentError) argument error

to_integer(string, base)

@spec to_integer(t(), 2..36) ::integer()

Returns an integer whose text representation isstring in basebase.

Inlined by the compiler.

Examples

iex>String.to_integer("3FF",16)1023

trim(string)

@spec trim(t()) ::t()

Returns a string where all leading and trailing Unicode whitespaceshave been removed.

Examples

iex>String.trim("\n  abc\n  ")"abc"

trim(string, to_trim)

@spec trim(t(),t()) ::t()

Returns a string where all leading and trailingto_trim characters have beenremoved.

Examples

iex>String.trim("a  abc  a","a")"  abc  "

trim_leading(string)

@spec trim_leading(t()) ::t()

Returns a string where all leading Unicode whitespaceshave been removed.

Examples

iex>String.trim_leading("\n  abc   ")"abc   "

trim_leading(string, to_trim)

@spec trim_leading(t(),t()) ::t()

Returns a string where all leadingto_trim characters have been removed.

Examples

iex>String.trim_leading("__ abc _","_")" abc _"iex>String.trim_leading("1 abc","11")"1 abc"

trim_trailing(string)

@spec trim_trailing(t()) ::t()

Returns a string where all trailing Unicode whitespaceshas been removed.

Examples

iex>String.trim_trailing("   abc\n  ")"   abc"

trim_trailing(string, to_trim)

@spec trim_trailing(t(),t()) ::t()

Returns a string where all trailingto_trim characters have been removed.

Examples

iex>String.trim_trailing("_ abc __","_")"_ abc "iex>String.trim_trailing("abc 1","11")"abc 1"

upcase(string, mode \\ :default)

@spec upcase(t(), :default | :ascii | :greek | :turkic) ::t()

Converts all characters in the given string to uppercase according tomode.

mode may be:default,:ascii,:greek or:turkic. The:default mode considersall non-conditional transformations outlined in the Unicode standard.:asciiuppercases only the letters a to z.:greek includes the context sensitivemappings found in Greek.:turkic properly handles the letter i with the dotless variant.

Examples

iex>String.upcase("abcd")"ABCD"iex>String.upcase("ab 123 xpto")"AB 123 XPTO"iex>String.upcase("olá")"OLÁ"

The:ascii mode ignores Unicode characters and provides a moreperformant implementation when you know the string contains onlyASCII characters:

iex>String.upcase("olá",:ascii)"OLá"

And:turkic properly handles the letter i with the dotless variant:

iex>String.upcase("ıi")"II"iex>String.upcase("ıi",:turkic)"Iİ"

Also seedowncase/2 andcapitalize/2 for other conversions.

valid?(string, algorithm \\ :default)

@spec valid?(t(), :default | :fast_ascii) ::boolean()

Checks whetherstring contains only valid characters.

algorithm may be:default or:fast_ascii. Both algorithms are equivalentfrom a validation perspective (they will always produce the same output), but:fast_ascii can yield significant performance benefits in specific scenarios.

If all of the following conditions are true, you may want to experiment withthe:fast_ascii algorithm to see if it yields performance benefits in yourspecific scenario:

  • You are running Erlang/OTP 26 or newer on a 64 bit platform
  • You expect most of your strings to be longer than ~64 bytes
  • You expect most of your strings to contain mostly ASCII codepoints

Note that the:fast_ascii algorithm does not affect correctness, you canexpect the output ofString.valid?/2 to be the same regardless of algorithm.The only difference to be expected is one of performance, which can beexpected to improve roughly linearly in string length compared to the:default algorithm.

Examples

iex>String.valid?("a")trueiex>String.valid?("ø")trueiex>String.valid?(<<0xFFFF::16>>)falseiex>String.valid?(<<0xEF,0xB7,0x90>>)trueiex>String.valid?("asd"<><<0xFFFF::16>>)falseiex>String.valid?("a",:fast_ascii)trueiex>String.valid?(4)** (FunctionClauseError) no function clause matching in String.valid?/2

[8]ページ先頭

©2009-2025 Movatter.jp