You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
Besides encoding/decoding, there are few more functions for testing[string encoding](#string-types-table).
125
+
98
126
---
99
127
100
128
#The theory of`String` 😉
101
129
102
130
A JavaScript[String](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String) is a unicode string, which means that it is a[list of unicode characters](https://en.wikipedia.org/wiki/List_of_Unicode_characters), not a list of bytes!
103
131
And it does not map one-to-one to an array of bytes without some encoding either.
104
-
This is because a unicode character requires 3 bytes to be able to encode any of the growing list of137 000 symbols.
132
+
This is because a unicode character requires 3 bytes to be able to encode any of the growing list ofabout 144 000 symbols.
105
133
Thus`String` is not the best data type for working with binary data.
106
134
107
135
This is the main reason why the Node.js devs have come up with the[Buffer](https://nodejs.org/api/buffer.html) type.
108
-
Later on there have been invented the[TypedArray](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray) standard to the rescue and the Node.js devs have adopted the new typeas the parent type for the existing`Buffer` type (starting with Node.js v4).
136
+
Later on there have been invented the[TypedArray](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray) standard to the rescue and the Node.js devs have adopted the new type, namely`Uint8Array`,as the parent type for the existing`Buffer` type,starting with Node.js v4.
109
137
110
138
Meanwhile there have been written many libraries to encode, encrypt, hash or otherwise transform the data, all using the plain`String` type that was available to the community since the beginning of JS.
111
139
112
140
Even some browser built-in functions that came before the`TypedArray` standard rely on the`String` type to do their encoding (eg.[btoa](https://developer.mozilla.org/en-US/docs/Web/API/WindowOrWorkerGlobalScope/btoa) == "binary to ASCII").
113
141
142
+
Today, if you want to manipulate some bytes in JavaScript, you most likely need a`Uint8Array` instead of a`String` for best performance and compatibility with other environments and tools.
143
+
114
144
##String kinds (or encodings)
115
145
116
146
Judging by content, there are a few kinds of JS`String`s used in almost all applications.
@@ -145,24 +175,27 @@ ord(mbStr[2]); // 9876
145
175
146
176
Most encoding algorithms would not accept a multibyte`String`.
147
177
148
-
###ASCII
149
-
150
-
A subset of binary strings is[**ASCII**](https://www.asciitable.com/) only strings,
151
-
which represent the class of strings with character codes in the range[0..127].
152
-
Each ASCII character can be represented with only 7 bits.
178
+
If you try to run`btoa('€')`, you'll get an error like:
153
179
154
180
```js
155
-
constasciiStr='Any text using the 26 English letters, digits and punctuation!';
156
-
isASCII(asciiStr);// true
157
-
158
-
isASCII(binStr);// false
159
-
isASCII(utf8Str);// false
181
+
UncaughtDOMException:
182
+
Failed to execute'btoa' on'Window':
183
+
The string to be encoded contains characters outsideof the Latin1 range.
160
184
```
161
185
186
+
Because`€` is a multibyte character.
187
+
188
+
The solution is to encode the multibyte string into a singe-byte string somehow.
189
+
162
190
###UTF8 encoded
163
191
164
-
[UTF8](https://en.wikipedia.org/wiki/UTF-8) is the most used byte encoding of unicode/multibyte strings in computers today. It is the default encoding of web pages that travel over the wire (`content-type: text/html; charset=UTF-8`) and the default in many programing languages.
165
-
The important feature of UTF8 is that it is fully compatible with ASCII strings, which means any ASCII string is also a valid UTF8 encoded string. Unless you need symbols outside the ASCII table, this encoding is very compact, and uses more than a byte per character only where needed.
192
+
[UTF8](https://en.wikipedia.org/wiki/UTF-8) is the most widely used byte encoding of unicode/multibyte strings in computers today.
193
+
It is the default encoding of web pages that travel over the wire (`content-type: text/html; charset=UTF-8`)
194
+
and the default in many programing languages.
195
+
The important feature of UTF8 is that it is fully compatible with ASCII strings,
196
+
which means any ASCII string is also a valid UTF8 encoded string.
197
+
Unless you need symbols outside the ASCII table, this encoding is very compact,
198
+
and uses more than a byte per character only where needed.