- Notifications
You must be signed in to change notification settings - Fork0
Convert different types of JavaScript String to/from Uint8Array
License
duzun/string-encode.js
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
- Convert different types of JavaScript
Stringto/fromUint8Array. - Check for
Stringencoding.
The main target of this library is the Browser, where there is noBuffer type.
Node.js is welcome too, except fortoString('base64') which depends onbtoa.SeeNode.js equivalents.
npm i -S string-encode
Or add it directly to the browser:
<scriptsrc="https://unpkg.com/string-encode"></script><script>const{ str2buffer, buffer2str/* ... */}=stringEncode;// ...</script>
The most important functions of this library arestr2buffer(str, asUtf8) andbuffer2str(buf, asUtf8)for converting anyString, including multibyte, to and fromUint8Array.
import{str2buffer,buffer2str}from'string-encode';// When you know your string doesn't contain multibyte characters:letbuffer=str2buffer(binaryString,false);// ... do something with buffer ...letprocessedSting=buffer2str(buffer,false);// When you know your string might contain multibyte characters:letbuffer=str2buffer(mbString,true);// ...letprocessedMbString=buffer2str(buffer,true);// Let it guess whether to utf8 encode/decode or not - not recommended:letbuffer=str2buffer(anyStr);// ...letprocessedSting=buffer2str(buffer);
Simplesha1 function usingcrypto for Browser, that works withString and is compatible with the PHP counterpart:
import{str2buffer,toString}from'string-encode';constcrypto=window.crypto||window.msCrypto||window.webkitCrypto;constsubtle=crypto.subtle||crypto.webkitSubtle;asyncfunctionsha1(str,enc='hex'){letbuf=str2buffer(str,true);buf=awaitsubtle.digest('SHA-1',buf);buf=newUint8Array(buf);returntoString.call(buf,enc);}
How to use thissha1 function:
awaitsha1('something');// "1af17e73721dbe0c40011b82ed4bb1a7dbe3ce29"awaitsha1('something',false);// "\u001añ~sr\u001d¾\f@\u0001\u001b\u0082íK±§ÛãÎ)"awaitsha1('что-то');// "991fe0590dfec23402d71c0e817bc7a7ab217e2b"awaitsha1('что-то','base64');// "mR/gWQ3+wjQC1xwOgXvHp6shfis="
Base64 encode/decode a multibyte string:
import{utf8Encode,utf8Decode}from'string-encode';btoa(utf8Encode('⚔ или 😄'));// "4pqUINC40LvQuCDwn5iE"utf8Decode(atob('4pqUINC40LvQuCDwn5iE'));// "⚔ или 😄"
string-encode in Browser | Buffer in Node.js |
|---|---|
| str2buffer(str, false) | Buffer.from(str, 'binary') |
| str2buffer(str, true) | Buffer.from(str, 'utf8') |
| hex2buffer(str) | Buffer.from(str, 'hex') |
| str2buffer(atob(str), false) | Buffer.from(str, 'base64') |
| - | - |
| buffer2str(str, false) | Buffer.toString('binary') |
| buffer2str(str, true) | Buffer.toString('utf8') |
| buffer2hex(str) | Buffer.toString('hex') |
| btoa(buffer2str(str, false)) | Buffer.toString('base64') |
If you want yourUint8Array to be one step closer to the Node.js'sBuffer,just add the.toString() method to it.
import{toString}from'string-encode';letbuf=Uint8Array.from([65,108,111,104,97,44]);buf.toString=toString;// the magic methodconsole.log(buf+' world!');buf.toString('hex');// "416c6f68612c"buf.toString('base64');// "QWxvaGEs"
Besides encoding/decoding, there are few more functions for testingstring encoding.
A JavaScriptString is a unicode string, which means that it is alist of unicode characters, not a list of bytes!And it does not map one-to-one to an array of bytes without some encoding either.This is because a unicode character requires 3 bytes to be able to encode any of the growing list of about 144 000 symbols.ThusString is not the best data type for working with binary data.
This is the main reason why the Node.js devs have come up with theBuffer type.Later on there have been invented theTypedArray standard to the rescue and the Node.js devs have adopted the new type, namelyUint8Array, as the parent type for the existingBuffer type, starting with Node.js v4.
Meanwhile there have been written many libraries to encode, encrypt, hash or otherwise transform the data, all using the plainString type that was available to the community since the beginning of JS.
Even some browser built-in functions that came before theTypedArray standard rely on theString type to do their encoding (eg.btoa == "binary to ASCII").
Today, if you want to manipulate some bytes in JavaScript, you most likely need aUint8Array instead of aString for best performance and compatibility with other environments and tools.
Judging by content, there are a few kinds of JSStrings used in almost all applications.
AnyString that do not contain multibyte characters can be considered abinary string.In other words, each character's code is in the range [0..255].These strings can be mapped one-to-one to arrays of bytes, whichUint8Arrays basically are.
constbinStr='when © × ® = ?';isBinary(binStr);// truehasMultibyte(binStr);// falsebtoa(binStr);// "qSBpcyCu"str2buffer(binStr);// Uint8Array([169, 32, 105, 115, 32, 174])
Most old-fashion encoding functions accept only this type of strings (eg.btoa).
In JS the most common string is aMultibyte string,one that contains unicode characters,which require more than a byte of memory.
constmbStr='$ ⚔ ₽ 😄 € ™';isBinary(mbStr);// falsehasMultibyte(mbStr);// '⚔'ord(mbStr[2]);// 9876
Most encoding algorithms would not accept a multibyteString.
If you try to runbtoa('€'), you'll get an error like:
UncaughtDOMException:Failedtoexecute'btoa'on'Window':ThestringtobeencodedcontainscharactersoutsideoftheLatin1range.
Because€ is a multibyte character.
The solution is to encode the multibyte string into a singe-byte string somehow.
UTF8 is the most widely used byte encoding of unicode/multibyte strings in computers today.It is the default encoding of web pages that travel over the wire (content-type: text/html; charset=UTF-8)and the default in many programing languages.The important feature of UTF8 is that it is fully compatible with ASCII strings,which means any ASCII string is also a valid UTF8 encoded string.Unless you need symbols outside the ASCII table, this encoding is very compact,and uses more than a byte per character only where needed.
In fact,UTF8 should be the default choice of encoding you use in a program.
constmbStr='$ ⚔ ₽ 😄 € ™';constutf8Str=utf8Encode(mbStr);isBinary(utf8Str);// trueisUTF8(utf8Str);// trueisUTF8(asciiStr);// truebtoa(utf8Str);// '4oK9IOKalCAkIPCfmIQg4oKsIOKEog=='str2buffer(utf8Str);// Uint8Array([226, 130, 189, 32, 226, 154, 148, 32, 36, 32, 240, 159, 152, 132, 32, 226, 130, 172, 32, 226, 132, 162])
Even thoughutf8Str is still of typeString, it is no longer a multibyte string,and thus can be manipulated as an array of bytes.
A subset of binary strings isASCII only strings,which represent the class of strings with character codes in the range [0..127].Each ASCII character can be represented with only 7 bits.
constasciiStr='Any text using the 26 English letters, digits and punctuation!';isASCII(asciiStr);// trueisASCII(binStr);// falseisASCII(utf8Str);// false
All table headings are functions exported by this library.
| String | guessEncoding | hasMultibyte | isBinary | isASCII | isUTF8 | utf8bytes |
|---|---|---|---|---|---|---|
| "" | hex | false | true | true | true | 0 |
| "English alphabet is 26" | ascii | false | true | true | true | 0 |
| "$ ⚔ ₽ 😄 € ™" | mb | "⚔" | false | false | false | false |
| utf8Encode("$ ⚔ ₽ 😄 € ™") | utf8 | false | true | false | true | 16 |
| "when © × ® = ?" | binary | false | true | false | false | false |
| "Xש" | utf8 | false | true | false | true | 2 |
| utf8Decode("Xש") | mb | "Xש" | false | false | false | false |
| "© binary? ×" | ~utf8 | false | true | false | false | false | 2 |
I did not add theisHEX column because it is a trivial format - you can't confuse it with the others.
Note 1:
Sometimes you can't tell whether the string has beenutf8Encodeedor it is just a unicode string that by coincidence is also a valid utf8 string.
In the table above"Xש" could be the original string or could be the encoded string.
Note 2:
When slicing utf8 encoded strings, you might cut a multibyte character in half.What you get as a result could be considered a valid utf8 string, with async utf8 characters at the edges.
In the table above"© binary? ×" is such a slice.The"©" symbol could be the last byte of a utf8 encoded character,and"×" - the first of the two bytes of another character.
To be continued...
Further reading:
About
Convert different types of JavaScript String to/from Uint8Array
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors2
Uh oh!
There was an error while loading.Please reload this page.