NotificationsYou must be signed in to change notification settings
Fork25
Star56

Commitfb2eb6a

committed

doc

1 parentb1825ec commitfb2eb6aCopy full SHA for fb2eb6a

File tree

1 file changed

+202

-3

lines changed

docs/compute-engine
- 97-reference-strings.md

1 file changed

+202

-3

lines changed

`‎docs/compute-engine/97-reference-strings.md‎`

Lines changed: 202 additions & 3 deletions

Original file line number	Diff line number	Diff line change
`@@ -3,12 +3,211 @@ title: Strings`
`3`	`3`	`slug:/compute-engine/reference/strings/`
`4`	`4`	`---`
`5`	`5`
`6`		-A string is a sequence of characters such as`"Hello, World!"` or`"42"`.
	`6`	+A string is a sequence of characters such as <span style={{fontSize: "1.2rem"}}>`"Hello, 🌍!"`</span> or <span style={{fontSize: "1.2rem"}}>`"Simplify(👨‍🚀 + ⚡️) → 👨‍🎤"`.</span>
	`7`	`+`
`7`	`8`
`8`	`9`	`In the Compute Engine, strings are composed of encoding-independent Unicode`
`9`	`10`	`characters and provide access to those characters through a variety of Unicode`
`10`	`11`	`representations.`
`11`	`12`
	`13`	`+In the Compute Engine, strings arenot treated as collections. This is`
	`14`	`+because the concept of a "character" is inherently ambiguous: a single`
	`15`	`+user-perceived character (agrapheme cluster) may consist of multiple`
	`16`	`+Unicode scalars, and those scalars may in turn be represented differently`
	`17`	`+in various encodings. To avoid confusion and ensure consistent behavior,`
	`18`	`+strings must be explicitly converted to a sequence ofgrapheme clusters or`
	`19`	`+Unicode scalars when individual elements need to be accessed.`
	`20`	`+`
	`21`	`+`
	`22`	`+<navclassName="hidden">`
	`23`	`+###String`
	`24`	`+</nav>`
	`25`	`+`
	`26`	`+<FunctionDefinitionname="String">`
	`27`	`+`
	`28`	`+<Signaturename="String"returns="string">any*</Signature>`
	`29`	`+`
	`30`	`+A string created by joining its arguments. The arguments are converted to their default string representation.`
	`31`	`+`
	`32`	`+`
	`33`	+```json example
	`34`	`+["String","Hello",",","🌍","!"]`
	`35`	`+// ➔ "Hello, 🌍!"`
	`36`	`+`
	`37`	`+["String",42," is the answer"]`
	`38`	`+// ➔ "42 is the answer"`
	`39`	`+`
	`40`	+```
	`41`	`+`
	`42`	`+</FunctionDefinition>`
	`43`	`+`
	`44`	`+`
	`45`	`+<navclassName="hidden">`
	`46`	`+###StringFrom`
	`47`	`+</nav>`
	`48`	`+`
	`49`	`+<FunctionDefinitionname="StringFrom">`
	`50`	`+`
	`51`	`+<Signaturename="StringFrom"returns="string">any,_format_:string?</Signature>`
	`52`	`+`
	`53`	`+Convert the argument to a string, using the specified_format_.`
	`54`	`+`
	`55`	`+\|_format_\| Description\|`
	`56`	`+\| :---\| :---\|`
	`57`	+\|`utf-8`\| The argument is a list of UTF-8 code points\|
	`58`	+\|`utf-16`\| The argument is a list of UTF-16 code points\|
	`59`	+\|`unicode-scalars`\| The argument is a list of Unicode scalars (same as UTF-32)\|
	`60`	`+`
	`61`	`+For example:`
	`62`	`+`
	`63`	+```json example
	`64`	`+["StringFrom", [240,159,148,159],"utf-8"]`
	`65`	`+// ➔ "Hello"`
	`66`	`+`
	`67`	`+["StringFrom", [55357,56607],"utf-16"]`
	`68`	`+// ➔ "\u0048\u0065\u006c\u006c\u006f"`
	`69`	`+`
	`70`	`+["StringFrom", [128287],"unicode-scalars"]`
	`71`	`+// ➔ "🔟"`
	`72`	`+`
	`73`	`+["StringFrom", [127467,127479],"unicode-scalars"]`
	`74`	`+// ➔ "🇫🇷"`
	`75`	`+`
	`76`	+```
	`77`	`+`
	`78`	`+</FunctionDefinition>`
	`79`	`+`
	`80`	`+`
	`81`	`+<navclassName="hidden">`
	`82`	`+###Utf8`
	`83`	`+</nav>`
	`84`	`+`
	`85`	`+<FunctionDefinitionname="Utf8">`
	`86`	`+<Signaturename="Utf8"returns="list<integer>">string</Signature>`
	`87`	`+`
	`88`	`+Return a list of UTF-8 code points for the given_string_.`
	`89`	`+`
	`90`	`+Note: The values returned are UTF-8 bytes, not Unicode scalar values.`
	`91`	`+`
	`92`	+```json example
	`93`	`+["Utf8","Hello"]`
	`94`	`+// ➔ [72, 101, 108, 108, 111]`
	`95`	`+`
	`96`	`+["Utf8","👩‍🎓"]`
	`97`	`+// ➔ [240, 159, 145, 169, 226, 128, 141, 240, 159, 142, 147]`
	`98`	+```
	`99`	`+`
	`100`	`+</FunctionDefinition>`
	`101`	`+`
	`102`	`+`
	`103`	`+<navclassName="hidden">`
	`104`	`+###Utf16`
	`105`	`+</nav>`
	`106`	`+`
	`107`	`+<FunctionDefinitionname="Utf16">`
	`108`	`+<Signaturename="Utf16"returns="list<integer>">string</Signature>`
	`109`	`+`
	`110`	`+Return a list of utf-16 code points for the given_string_.`
	`111`	`+`
	`112`	`+Note: The values returned are UTF-16 code units, not Unicode scalar values.`
	`113`	`+`
	`114`	+```json example
	`115`	`+["Utf16","Hello"]`
	`116`	`+// ➔ [72, 101, 108, 108, 111]`
	`117`	`+`
	`118`	`+["Utf16","👩‍🎓"]`
	`119`	`+// ➔ [55357, 56489, 8205, 55356, 57235]`
	`120`	+```
	`121`	`+`
	`122`	`+</FunctionDefinition>`
	`123`	`+`
	`124`	`+`
	`125`	`+<navclassName="hidden">`
	`126`	`+###UnicodeScalars`
	`127`	`+</nav>`
	`128`	`+`
	`129`	`+<FunctionDefinitionname="UnicodeScalars">`
	`130`	`+<Signaturename="UnicodeScalars"returns="list<integer>">string</Signature>`
	`131`	`+`
	`132`	`+AUnicode scalar is any valid Unicode code point, represented as a number`
	`133`	+between`U+0000` and`U+10FFFF`, excluding the surrogate range
	`134`	+(`U+D800` to`U+DFFF`). In other words, Unicode scalars correspond exactly to UTF-32 code units.
	`135`	`+`
	`136`	`+`
	`137`	`+This function returns the sequence of Unicode scalars (code points) that make`
	`138`	`+up the string. Note that some characters perceived as a single visual unit`
	`139`	`+(grapheme clusters) may consist of multiple scalars. For example, the emoji`
	`140`	`+<span style={{fontSize: "1.2em"}}>👩‍🚀</span> is a single grapheme but is composed of several scalars.`
	`141`	`+`
	`142`	+```json example
	`143`	`+["UnicodeScalars","Hello"]`
	`144`	`+// ➔ [72, 101, 108, 108, 111]`
	`145`	`+`
	`146`	`+["UnicodeScalars","👩‍🎓"]`
	`147`	`+// ➔ [128105, 8205, 127891]`
	`148`	+```
	`149`	`+`
	`150`	`+</FunctionDefinition>`
	`151`	`+`
	`152`	`+`
	`153`	`+`
	`154`	`+<navclassName="hidden">`
	`155`	`+###GraphemeClusters`
	`156`	`+</nav>`
	`157`	`+`
	`158`	`+<FunctionDefinitionname="GraphemeClusters">`
	`159`	`+<Signaturename="GraphemeClusters"returns="list<string>">string</Signature>`
	`160`	`+`
	`161`	`+ Agrapheme cluster is the smallest unit of text that a reader perceives`
	`162`	`+as a single character. It may consist of one or moreUnicode scalars`
	`163`	`+(code points).`
	`164`	`+`
	`165`	+For example, the characteré can be a single scalar (`U+00E9`) or a
	`166`	+sequence of scalars (e`U+0065` +combining acute`U+0301`),
	`167`	`+but both form a single grapheme cluster.`
	`168`	`+`
	`169`	`+Here,NFC (Normalization Form C) refers to the precomposed form of characters, whileNFD (Normalization Form D) refers to the decomposed form where combining marks are used.`
	`170`	`+`
	`171`	`+Similarly, complex emojis (<span style={{fontSize: "1.2rem"}}>👩‍🚀</span>, <span style={{fontSize: "1.2rem"}}>🇫🇷</span>)`
	`172`	`+are grapheme clusters composed of multiple scalars.`
	`173`	`+`
	`174`	`+The exact definition of grapheme clusters is determined by the Unicode Standard`
	`175`	`+([UAX#29](https://unicode.org/reports/tr29/)) and may evolve over time as new`
	`176`	`+characters, scripts, or emoji sequences are introduced. In contrast, Unicode`
	`177`	`+scalars and their UTF-8, UTF-16, or UTF-32 encodings are fixed and stable across Unicode versions.`
	`178`	`+`
	`179`	`+`
	`180`	`+The table below illustrates the difference between grapheme clusters and Unicode scalars:`
	`181`	`+`
	`182`	`+\| String\| Grapheme Clusters\| Unicode Scalars (Code Points)\|`
	`183`	`+\|:-------------\|:--------------------\|:------------------------------------\|`
	`184`	+\| <span style={{fontSize: "1.3rem"}}>`é`</span> (NFC)\| <span style={{fontSize: "1.3rem"}}>`["é"]`</span>\|`[233]`\|
	`185`	+\| <span style={{fontSize: "1.3rem"}}>`é`</span> (NFD)\| <span style={{fontSize: "1.3rem"}}>`["é"]`</span>\|`[101, 769]`\|
	`186`	+\| <span style={{fontSize: "1.3rem"}}>`👩‍🎓`</span>\| <span style={{fontSize: "1.3rem"}}>`["👩‍🎓"]`</span>\|`[128105, 8205, 127891]`\|
	`187`	`+`
	`188`	`+In contrast, a Unicode scalar is a single code point in the Unicode standard,`
	`189`	`+ corresponding to a UTF-32 value. Grapheme clusters are built from one or more scalars.`
	`190`	`+`
	`191`	`+This function splits a string into grapheme clusters, not scalars.`
	`192`	`+`
	`193`	+```json example
	`194`	`+["GraphemeClusters","Hello"]`
	`195`	`+// ➔ ["H", "e", "l", "l", "o"]`
	`196`	`+`
	`197`	`+["GraphemeClusters","👩‍🎓"]`
	`198`	`+// ➔ ["👩‍🎓"]`
	`199`	`+`
	`200`	`+["UnicodeScalars","👩‍🎓"]`
	`201`	`+// ➔ [128105, 8205, 127891]`
	`202`	+```
	`203`	`+`
	`204`	`+For more details on how grapheme cluster boundaries are determined,`
	`205`	`+see[Unicode® Standard Annex#29](https://unicode.org/reports/tr29/).`
	`206`	`+`
	`207`	`+</FunctionDefinition>`
	`208`	`+`
	`209`	`+`
	`210`	`+`
`12`	`211`	`<navclassName="hidden">`
`13`	`212`	`###BaseForm`
`14`	`213`	`</nav>`
`@@ -17,9 +216,9 @@ representations.`
`17`	`216`
`18`	`217`	`<FunctionDefinitionname="BaseForm">`
`19`	`218`
`20`		`-<Signaturename="BaseForm"returns="string">_value:integer_</Signature>`
	`219`	`+<Signaturename="BaseForm"returns="string">_value_:integer</Signature>`
`21`	`220`
`22`		`-<Signaturename="BaseForm"returns="string">_value_:integer,_base_</Signature>`
	`221`	`+<Signaturename="BaseForm"returns="string">_value_:integer,_base_:integer</Signature>`
`23`	`222`
`24`	`223`	`Format an_integer_ in a specific_base_, such as hexadecimal or binary.`
`25`	`224`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commitfb2eb6a

File tree

1 file changed

1 file changed

`‎docs/compute-engine/97-reference-strings.md‎`

0 commit comments