You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/compute-engine/97-reference-strings.md
+202-3Lines changed: 202 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,12 +3,211 @@ title: Strings
3
3
slug:/compute-engine/reference/strings/
4
4
---
5
5
6
-
A string is a sequence of characters such as`"Hello, World!"` or`"42"`.
6
+
A string is a sequence of characters such as <span style={{fontSize: "1.2rem"}}>`"Hello, 🌍!"`</span> or <span style={{fontSize: "1.2rem"}}>`"Simplify(👨🚀 + ⚡️) → 👨🎤"`.</span>
7
+
7
8
8
9
In the Compute Engine, strings are composed of encoding-independent Unicode
9
10
characters and provide access to those characters through a variety of Unicode
10
11
representations.
11
12
13
+
In the Compute Engine, strings are**not treated as collections**. This is
14
+
because the concept of a "character" is inherently ambiguous: a single
15
+
user-perceived character (a**grapheme cluster**) may consist of multiple
16
+
Unicode scalars, and those scalars may in turn be represented differently
17
+
in various encodings. To avoid confusion and ensure consistent behavior,
18
+
strings must be explicitly converted to a sequence of**grapheme clusters** or
19
+
**Unicode scalars** when individual elements need to be accessed.
A**grapheme cluster** is the smallest unit of text that a reader perceives
162
+
as a single character. It may consist of one or more**Unicode scalars**
163
+
(code points).
164
+
165
+
For example, the character**é** can be a single scalar (`U+00E9`) or a
166
+
sequence of scalars (**e**`U+0065` +**combining acute**`U+0301`),
167
+
but both form a single grapheme cluster.
168
+
169
+
Here,**NFC** (Normalization Form C) refers to the precomposed form of characters, while**NFD** (Normalization Form D) refers to the decomposed form where combining marks are used.