1

I am wondering how is binary encoding for a string is given in Python.

For example,

>>> b'\x25'b'%'

or

>>>b'\xe2\x82\xac'.decode()'€'

but

>>> b'\xy9'File "<stdin>", line 1SyntaxError: (value error) invalid \x escape at position 0

Please, could you explain what\xe2 stand for and how this binary encoding works.

askedJul 27, 2017 at 13:07
A. Vaicenavicius's user avatar
1
  • 1
    That's hexadecimal which uses 0-9 and a-f. It complains becausey is not valid.CommentedJul 27, 2017 at 13:17

2 Answers2

3

\x is used to introduce a hexadecimal value, and must be followed byexactly two hexadecimal digits. For example,\xe2 represents the byte (in decimal) 226 (= 14 * 16 + 2).

In the first case, the two stringsb'\x25' andb'%' are identical; Python displays values using ASCII equivalents where possible.

answeredJul 27, 2017 at 13:11
chepner's user avatar
Sign up to request clarification or add additional context in comments.

4 Comments

Why does it have to be followed by exactly two hexadecimal digits? Why not just one?
Because then\x25 would be ambiguous: is it the single character% or is it a two-character sequence consisting of the ASCII control character START TRANSMISSION followed by the character5?
Right, thanks. I guess it's also becauseeach hexadecimal digit is 4 bits, so two hex digits is 8 bits or one byte
Yes. In abytes value,\x is used to specify a single byte by its hexadecimal value. In astr value, you can use\x (primarily for historical reasons) to specify a Unicode code point using a single value between 0 and 255,\u to specify one using a 16-bit value, or\U using a 32-bit value.
1

I assume that you use a Python 3 version. In Python 3 the default encoding isUTF-8, sob'\xe2\x82\xac'.decode() is in factb'\xe2\x82\xac'.decode('UTF-8).

It gives the character'€' which is U+20AC in unicode and the UTF8 encoding of U+20AC is indeed `b'\xe2\x82\xac' in 3 bytes.

So all ascii characters (code below 128) are encoded into one single byte with same value as the unicode code. Non ascii characters corresponding to one single 16 bit unicode value are utf8 encoded into 2 or 3 bytes (this is known as theBasic Multilingual Plane).

answeredJul 27, 2017 at 13:31
Serge Ballesta's user avatar

Comments

Your Answer

Sign up orlog in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

By clicking “Post Your Answer”, you agree to ourterms of service and acknowledge you have read ourprivacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.