Python Enhancement Proposals

Python »
PEP Index »
PEP 3112

PEP 3112 – Bytes literals in Python 3000

Author:: Jason Orendorff <jason.orendorff at gmail.com>
Status:

Table of Contents

Abstract

This PEP proposes a literal syntax for thebytes objectsintroduced inPEP 358. The purpose is to provide a convenient way tospell ASCII strings and arbitrary binary data.

Motivation

Existing spellings of an ASCII string in Python 3000 include:

bytes('Hello world','ascii')'Hello world'.encode('ascii')

The proposed syntax is:

b'Hello world'

Existing spellings of an 8-bit binary sequence in Python 3000 include:

bytes([0x7f,0x45,0x4c,0x46,0x01,0x01,0x01,0x00])bytes('\x7fELF\x01\x01\x01\0','latin-1')'7f454c4601010100'.decode('hex')

The proposed syntax is:

b'\x7f\x45\x4c\x46\x01\x01\x01\x00'b'\x7fELF\x01\x01\x01\0'

In both cases, the advantages of the new syntax are brevity, somesmall efficiency gain, and the detection of encoding errors at compiletime rather than at runtime. The brevity benefit is especially feltwhen using the string-like methods of bytes objects:

lines=bdata.split(bytes('\n','ascii'))# existing syntaxlines=bdata.split(b'\n')# proposed syntax

And when converting code from Python 2.x to Python 3000:

sok.send('EXIT\r\n')# Python 2.xsok.send('EXIT\r\n'.encode('ascii'))# Python 3000 existingsok.send(b'EXIT\r\n')# proposed

Grammar Changes

The proposed syntax is an extension of the existing stringsyntax[1].

The new syntax for strings, including the new bytes literal, is:

stringliteral:[stringprefix](shortstring|longstring)stringprefix:"b"|"r"|"br"|"B"|"R"|"BR"|"Br"|"bR"shortstring:"'"shortstringitem*"'"|'"'shortstringitem*'"'longstring:"'''"longstringitem*"'''"|'"""'longstringitem*'"""'shortstringitem:shortstringchar|escapeseqlongstringitem:longstringchar|escapeseqshortstringchar:<anysourcecharacterexcept"\" or newline or the quote>longstringchar:<anysourcecharacterexcept"\">escapeseq:"\" NL|"\\"|"\'"|'\"'|"\a"|"\b"|"\f"|"\n"|"\r"|"\t"|"\v"|"\ooo"|"\xhh"|"\uxxxx"|"\Uxxxxxxxx"|"\N{name}"

The following additional restrictions apply only to bytes literals(stringliteral tokens withb orB in thestringprefix):

Eachshortstringchar orlongstringchar must be a characterbetween 1 and 127 inclusive, regardless of any encodingdeclaration[2] in the source file.
The Unicode-specific escape sequences\uxxxx,\Uxxxxxxxx, and\N{name} are unrecognized inPython 2.x and forbidden in Python 3000.

Adjacent bytes literals are subject to the same concatenation rules asadjacent string literals[3]. A bytes literal adjacent to astring literal is an error.

Semantics

Each evaluation of a bytes literal produces a newbytes object.The bytes in the new object are the bytes represented by theshortstringitem orlongstringitem parts of the literal, in thesame order.

Rationale

The proposed syntax provides a cleaner migration path from Python 2.xto Python 3000 for most code involving 8-bit strings. Preserving theold 8-bit meaning of a string literal is usually as simple as adding ab prefix. The one exception is Python 2.x strings containingbytes >127, which must be rewritten using escape sequences.Transcoding a source file from one encoding to another, and fixing upthe encoding declaration, should preserve the meaning of the program.Python 2.x non-Unicode strings violate this principle; Python 3000bytes literals shouldn’t.

A string literal with ab in the prefix is always a syntax errorin Python 2.5, so this syntax can be introduced in Python 2.6, alongwith thebytes type.

A bytes literal produces a new object each time it is evaluated, likelist displays and unlike string literals. This is necessary becausebytes literals, like lists and unlike strings, aremutable[4].

Reference Implementation

Thomas Wouters has checked an implementation into the Py3K branch,r53872.

References

[1]

http://docs.python.org/reference/lexical_analysis.html#string-literals

[2]

http://docs.python.org/reference/lexical_analysis.html#encoding-declarations

[3]

http://docs.python.org/reference/lexical_analysis.html#string-literal-concatenation

[4]

https://mail.python.org/pipermail/python-3000/2007-February/005779.html

Copyright

This document has been placed in the public domain.

Source:https://github.com/python/peps/blob/main/peps/pep-3112.rst

Last modified:2025-02-01 08:59:27 GMT

Movatterモバイル変換

PEP 3112 – Bytes literals in Python 3000