This PEP proposes a literal syntax for thebytes objectsintroduced inPEP 358. The purpose is to provide a convenient way tospell ASCII strings and arbitrary binary data.
Existing spellings of an ASCII string in Python 3000 include:
bytes('Hello world','ascii')'Hello world'.encode('ascii')
The proposed syntax is:
b'Hello world'
Existing spellings of an 8-bit binary sequence in Python 3000 include:
bytes([0x7f,0x45,0x4c,0x46,0x01,0x01,0x01,0x00])bytes('\x7fELF\x01\x01\x01\0','latin-1')'7f454c4601010100'.decode('hex')
The proposed syntax is:
b'\x7f\x45\x4c\x46\x01\x01\x01\x00'b'\x7fELF\x01\x01\x01\0'
In both cases, the advantages of the new syntax are brevity, somesmall efficiency gain, and the detection of encoding errors at compiletime rather than at runtime. The brevity benefit is especially feltwhen using the string-like methods of bytes objects:
lines=bdata.split(bytes('\n','ascii'))# existing syntaxlines=bdata.split(b'\n')# proposed syntax
And when converting code from Python 2.x to Python 3000:
sok.send('EXIT\r\n')# Python 2.xsok.send('EXIT\r\n'.encode('ascii'))# Python 3000 existingsok.send(b'EXIT\r\n')# proposed
The proposed syntax is an extension of the existing stringsyntax[1].
The new syntax for strings, including the new bytes literal, is:
stringliteral:[stringprefix](shortstring|longstring)stringprefix:"b"|"r"|"br"|"B"|"R"|"BR"|"Br"|"bR"shortstring:"'"shortstringitem*"'"|'"'shortstringitem*'"'longstring:"'''"longstringitem*"'''"|'"""'longstringitem*'"""'shortstringitem:shortstringchar|escapeseqlongstringitem:longstringchar|escapeseqshortstringchar:<anysourcecharacterexcept"\" or newline or the quote>longstringchar:<anysourcecharacterexcept"\">escapeseq:"\" NL|"\\"|"\'"|'\"'|"\a"|"\b"|"\f"|"\n"|"\r"|"\t"|"\v"|"\ooo"|"\xhh"|"\uxxxx"|"\Uxxxxxxxx"|"\N{name}"
The following additional restrictions apply only to bytes literals(stringliteral tokens withb orB in thestringprefix):
shortstringchar orlongstringchar must be a characterbetween 1 and 127 inclusive, regardless of any encodingdeclaration[2] in the source file.\uxxxx,\Uxxxxxxxx, and\N{name} are unrecognized inPython 2.x and forbidden in Python 3000.Adjacent bytes literals are subject to the same concatenation rules asadjacent string literals[3]. A bytes literal adjacent to astring literal is an error.
Each evaluation of a bytes literal produces a newbytes object.The bytes in the new object are the bytes represented by theshortstringitem orlongstringitem parts of the literal, in thesame order.
The proposed syntax provides a cleaner migration path from Python 2.xto Python 3000 for most code involving 8-bit strings. Preserving theold 8-bit meaning of a string literal is usually as simple as adding ab prefix. The one exception is Python 2.x strings containingbytes >127, which must be rewritten using escape sequences.Transcoding a source file from one encoding to another, and fixing upthe encoding declaration, should preserve the meaning of the program.Python 2.x non-Unicode strings violate this principle; Python 3000bytes literals shouldn’t.
A string literal with ab in the prefix is always a syntax errorin Python 2.5, so this syntax can be introduced in Python 2.6, alongwith thebytes type.
A bytes literal produces a new object each time it is evaluated, likelist displays and unlike string literals. This is necessary becausebytes literals, like lists and unlike strings, aremutable[4].
Thomas Wouters has checked an implementation into the Py3K branch,r53872.
This document has been placed in the public domain.
Source:https://github.com/python/peps/blob/main/peps/pep-3112.rst
Last modified:2025-02-01 08:59:27 GMT