Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitfb2eb6a

Browse files
committed
doc
1 parentb1825ec commitfb2eb6a

File tree

1 file changed

+202
-3
lines changed

1 file changed

+202
-3
lines changed

‎docs/compute-engine/97-reference-strings.md‎

Lines changed: 202 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,211 @@ title: Strings
33
slug:/compute-engine/reference/strings/
44
---
55

6-
A string is a sequence of characters such as`"Hello, World!"` or`"42"`.
6+
A string is a sequence of characters such as <span style={{fontSize: "1.2rem"}}>`"Hello, 🌍!"`</span> or <span style={{fontSize: "1.2rem"}}>`"Simplify(👨‍🚀 + ⚡️) → 👨‍🎤"`.</span>
7+
78

89
In the Compute Engine, strings are composed of encoding-independent Unicode
910
characters and provide access to those characters through a variety of Unicode
1011
representations.
1112

13+
In the Compute Engine, strings are**not treated as collections**. This is
14+
because the concept of a "character" is inherently ambiguous: a single
15+
user-perceived character (a**grapheme cluster**) may consist of multiple
16+
Unicode scalars, and those scalars may in turn be represented differently
17+
in various encodings. To avoid confusion and ensure consistent behavior,
18+
strings must be explicitly converted to a sequence of**grapheme clusters** or
19+
**Unicode scalars** when individual elements need to be accessed.
20+
21+
22+
<navclassName="hidden">
23+
###String
24+
</nav>
25+
26+
<FunctionDefinitionname="String">
27+
28+
<Signaturename="String"returns="string">any*</Signature>
29+
30+
A string created by joining its arguments. The arguments are converted to their default string representation.
31+
32+
33+
```json example
34+
["String","Hello",",","🌍","!"]
35+
// ➔ "Hello, 🌍!"
36+
37+
["String",42," is the answer"]
38+
// ➔ "42 is the answer"
39+
40+
```
41+
42+
</FunctionDefinition>
43+
44+
45+
<navclassName="hidden">
46+
###StringFrom
47+
</nav>
48+
49+
<FunctionDefinitionname="StringFrom">
50+
51+
<Signaturename="StringFrom"returns="string">any,_format_:string?</Signature>
52+
53+
Convert the argument to a string, using the specified_format_.
54+
55+
|_format_| Description|
56+
| :---| :---|
57+
|`utf-8`| The argument is a list of UTF-8 code points|
58+
|`utf-16`| The argument is a list of UTF-16 code points|
59+
|`unicode-scalars`| The argument is a list of Unicode scalars (same as UTF-32)|
60+
61+
For example:
62+
63+
```json example
64+
["StringFrom", [240,159,148,159],"utf-8"]
65+
// ➔ "Hello"
66+
67+
["StringFrom", [55357,56607],"utf-16"]
68+
// ➔ "\u0048\u0065\u006c\u006c\u006f"
69+
70+
["StringFrom", [128287],"unicode-scalars"]
71+
// ➔ "🔟"
72+
73+
["StringFrom", [127467,127479],"unicode-scalars"]
74+
// ➔ "🇫🇷"
75+
76+
```
77+
78+
</FunctionDefinition>
79+
80+
81+
<navclassName="hidden">
82+
###Utf8
83+
</nav>
84+
85+
<FunctionDefinitionname="Utf8">
86+
<Signaturename="Utf8"returns="list<integer>">string</Signature>
87+
88+
Return a list of UTF-8 code points for the given_string_.
89+
90+
**Note:** The values returned are UTF-8 bytes, not Unicode scalar values.
91+
92+
```json example
93+
["Utf8","Hello"]
94+
// ➔ [72, 101, 108, 108, 111]
95+
96+
["Utf8","👩‍🎓"]
97+
// ➔ [240, 159, 145, 169, 226, 128, 141, 240, 159, 142, 147]
98+
```
99+
100+
</FunctionDefinition>
101+
102+
103+
<navclassName="hidden">
104+
###Utf16
105+
</nav>
106+
107+
<FunctionDefinitionname="Utf16">
108+
<Signaturename="Utf16"returns="list<integer>">string</Signature>
109+
110+
Return a list of utf-16 code points for the given_string_.
111+
112+
**Note:** The values returned are UTF-16 code units, not Unicode scalar values.
113+
114+
```json example
115+
["Utf16","Hello"]
116+
// ➔ [72, 101, 108, 108, 111]
117+
118+
["Utf16","👩‍🎓"]
119+
// ➔ [55357, 56489, 8205, 55356, 57235]
120+
```
121+
122+
</FunctionDefinition>
123+
124+
125+
<navclassName="hidden">
126+
###UnicodeScalars
127+
</nav>
128+
129+
<FunctionDefinitionname="UnicodeScalars">
130+
<Signaturename="UnicodeScalars"returns="list<integer>">string</Signature>
131+
132+
A**Unicode scalar** is any valid Unicode code point, represented as a number
133+
between`U+0000` and`U+10FFFF`, excluding the surrogate range
134+
(`U+D800` to`U+DFFF`). In other words, Unicode scalars correspond exactly to UTF-32 code units.
135+
136+
137+
This function returns the sequence of Unicode scalars (code points) that make
138+
up the string. Note that some characters perceived as a single visual unit
139+
(grapheme clusters) may consist of multiple scalars. For example, the emoji
140+
<span style={{fontSize: "1.2em"}}>👩‍🚀</span> is a single grapheme but is composed of several scalars.
141+
142+
```json example
143+
["UnicodeScalars","Hello"]
144+
// ➔ [72, 101, 108, 108, 111]
145+
146+
["UnicodeScalars","👩‍🎓"]
147+
// ➔ [128105, 8205, 127891]
148+
```
149+
150+
</FunctionDefinition>
151+
152+
153+
154+
<navclassName="hidden">
155+
###GraphemeClusters
156+
</nav>
157+
158+
<FunctionDefinitionname="GraphemeClusters">
159+
<Signaturename="GraphemeClusters"returns="list<string>">string</Signature>
160+
161+
A**grapheme cluster** is the smallest unit of text that a reader perceives
162+
as a single character. It may consist of one or more**Unicode scalars**
163+
(code points).
164+
165+
For example, the character**é** can be a single scalar (`U+00E9`) or a
166+
sequence of scalars (**e**`U+0065` +**combining acute**`U+0301`),
167+
but both form a single grapheme cluster.
168+
169+
Here,**NFC** (Normalization Form C) refers to the precomposed form of characters, while**NFD** (Normalization Form D) refers to the decomposed form where combining marks are used.
170+
171+
Similarly, complex emojis (<span style={{fontSize: "1.2rem"}}>👩‍🚀</span>, <span style={{fontSize: "1.2rem"}}>🇫🇷</span>)
172+
are grapheme clusters composed of multiple scalars.
173+
174+
The exact definition of grapheme clusters is determined by the Unicode Standard
175+
([UAX#29](https://unicode.org/reports/tr29/)) and may evolve over time as new
176+
characters, scripts, or emoji sequences are introduced. In contrast, Unicode
177+
scalars and their UTF-8, UTF-16, or UTF-32 encodings are fixed and stable across Unicode versions.
178+
179+
180+
The table below illustrates the difference between grapheme clusters and Unicode scalars:
181+
182+
| String| Grapheme Clusters| Unicode Scalars (Code Points)|
183+
|:-------------|:--------------------|:------------------------------------|
184+
| <span style={{fontSize: "1.3rem"}}>`é`</span> (NFC)| <span style={{fontSize: "1.3rem"}}>`["é"]`</span>|`[233]`|
185+
| <span style={{fontSize: "1.3rem"}}>``</span> (NFD)| <span style={{fontSize: "1.3rem"}}>`["é"]`</span>|`[101, 769]`|
186+
| <span style={{fontSize: "1.3rem"}}>`👩‍🎓`</span>| <span style={{fontSize: "1.3rem"}}>`["👩‍🎓"]`</span>|`[128105, 8205, 127891]`|
187+
188+
In contrast, a Unicode scalar is a single code point in the Unicode standard,
189+
corresponding to a UTF-32 value. Grapheme clusters are built from one or more scalars.
190+
191+
This function splits a string into grapheme clusters, not scalars.
192+
193+
```json example
194+
["GraphemeClusters","Hello"]
195+
// ➔ ["H", "e", "l", "l", "o"]
196+
197+
["GraphemeClusters","👩‍🎓"]
198+
// ➔ ["👩‍🎓"]
199+
200+
["UnicodeScalars","👩‍🎓"]
201+
// ➔ [128105, 8205, 127891]
202+
```
203+
204+
For more details on how grapheme cluster boundaries are determined,
205+
see[Unicode® Standard Annex#29](https://unicode.org/reports/tr29/).
206+
207+
</FunctionDefinition>
208+
209+
210+
12211
<navclassName="hidden">
13212
###BaseForm
14213
</nav>
@@ -17,9 +216,9 @@ representations.
17216

18217
<FunctionDefinitionname="BaseForm">
19218

20-
<Signaturename="BaseForm"returns="string">_value:integer_</Signature>
219+
<Signaturename="BaseForm"returns="string">_value_:integer</Signature>
21220

22-
<Signaturename="BaseForm"returns="string">_value_:integer,_base_</Signature>
221+
<Signaturename="BaseForm"returns="string">_value_:integer,_base_:integer</Signature>
23222

24223
Format an_integer_ in a specific_base_, such as hexadecimal or binary.
25224

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp