Python Enhancement Proposals

Python »
PEP Index »
PEP 3137

PEP 3137 – Immutable Bytes and Mutable Buffer

Author:: Guido van Rossum <guido at python.org>
Status:

Introduction

After releasing Python 3.0a1 with a mutable bytes type, pressuremounted to add a way to represent immutable bytes. Gregory P. Smithproposed a patch that would allow making a bytes object temporarilyimmutable by requesting that the data be locked using the new bufferAPI fromPEP 3118. This did not seem the right approach to me.

Jeffrey Yasskin, with the help of Adam Hupp, then prepared a patch tomake the bytes type immutable (by crudely removing all mutating APIs)and fix the fall-out in the test suite. This showed that there aren’tall that many places that depend on the mutability of bytes, with theexception of code that builds up a return value from small pieces.

Thinking through the consequences, and noticing that using the arraymodule as an ersatz mutable bytes type is far from ideal, andrecalling a proposal put forward earlier by Talin, I floated thesuggestion to have both a mutable and an immutable bytes type. (Thishad been brought up before, but until seeing the evidence of Jeffrey’spatch I wasn’t open to the suggestion.)

Moreover, a possible implementation strategy became clear: use the oldPyString implementation, stripped down to remove locale support andimplicit conversions to/from Unicode, for the immutable bytes type,and keep the new PyBytes implementation as the mutable bytes type.

The ensuing discussion made it clear that the idea is welcome butneeds to be specified more precisely. Hence this PEP.

Advantages

One advantage of having an immutable bytes type is that code objectscan use these. It also makes it possible to efficiently create hashtables using bytes for keys; this may be useful when parsing protocolslike HTTP or SMTP which are based on bytes representing text.

Porting code that manipulates binary data (or encoded text) in Python2.x will be easier using the new design than using the original 3.0design with mutable bytes; simply replacestr withbytes andchange ‘…’ literals into b’…’ literals.

Naming

I propose the following type names at the Python level:

bytes is an immutable array of bytes (PyString)
bytearray is a mutable array of bytes (PyBytes)
memoryview is a bytes view on another object (PyMemory)

The old type namedbuffer is so similar to the new typememoryview, introduce byPEP 3118, that it is redundant. The restof this PEP doesn’t discuss the functionality ofmemoryview; it isjust mentioned here to justify getting rid of the oldbuffer type.(An earlier version of this PEP proposedbuffer as the new namefor PyBytes; in the end this name was deemed to confusing given themany other uses of the word buffer.)

While eventually it makes sense to change the C API names, this PEPmaintains the old C API names, which should be familiar to all.

Summary

Here’s a simple ASCII-art table summarizing the type names in variousPython versions:

+--------------+-------------+------------+--------------------------+|Cname|2.xrepr|3.0a1repr|3.0a2repr|+--------------+-------------+------------+--------------------------+|PyUnicode|unicodeu''|str''|str''||PyString|str''|str8s''|bytesb''||PyBytes|N/A|bytesb''|bytearraybytearray(b'')||PyBuffer|buffer|buffer|N/A||PyMemoryView|N/A|memoryview|memoryview<...>|+--------------+-------------+------------+--------------------------+

Literal Notations

The b’…’ notation introduced in Python 3.0a1 returns an immutablebytes object, whatever variation is used. To create a mutable arrayof bytes, use bytearray(b’…’) or bytearray([…]). The latter formtakes a list of integers in range(256).

Functionality

PEP 3118 Buffer API

Both bytes and bytearray implement thePEP 3118 buffer API. The bytestype only implements read-only requests; the bytearray type allowswritable and data-locked requests as well. The element data type isalways ‘B’ (i.e. unsigned byte).

Constructors

There are four forms of constructors, applicable to both bytes andbytearray:

bytes(<bytes>),bytes(<bytearray>),bytearray(<bytes>),bytearray(<bytearray>): simple copying constructors, with thenote thatbytes(<bytes>) might return its (immutable)argument, butbytearray(<bytearray>) always makes a copy.
bytes(<str>,<encoding>[,<errors>]),bytearray(<str>,<encoding>[,<errors>]): encode a text string. Note that thestr.encode() method returns animmutable bytes object. The<encoding> argument is mandatory; <errors> is optional.<encoding> and <errors>, if given, must bestr instances.
bytes(<memoryview>),bytearray(<memoryview>): constructa bytes or bytearray object from anything that implements the PEP3118 buffer API.
bytes(<iterableofints>),bytearray(<iterableofints>):construct a bytes or bytearray object from a stream of integers inrange(256).
bytes(<int>),bytearray(<int>): construct azero-initialized bytes or bytearray object of a given length.

Comparisons

The bytes and bytearray types are comparable with each other andorderable, so that e.g. b’abc’ == bytearray(b’abc’) < b’abd’.

Comparing either type to a str object for equality returns Falseregardless of the contents of either operand. Ordering comparisonswith str raise TypeError. This is all conformant to the standardrules for comparison and ordering between objects of incompatibletypes.

(Note: in Python 3.0a1, comparing a bytes instance with a strinstance would raise TypeError, on the premise that this would catchthe occasional mistake quicker, especially in code ported from Python2.x. However, a long discussion on the python-3000 list pointed outso many problems with this that it is clearly a bad idea, to be rolledback in 3.0a2 regardless of the fate of the rest of this PEP.)

Slicing

Slicing a bytes object returns a bytes object. Slicing a bytearrayobject returns a bytearray object.

Slice assignment to a bytearray object accepts anything thatimplements thePEP 3118 buffer API, or an iterable of integers inrange(256).

Indexing

Indexing bytes and bytearray returns small ints (like the bytes type in3.0a1, and like lists or array.array(‘B’)).

Assignment to an item of a bytearray object accepts an int inrange(256). (To assign from a bytes sequence, use a sliceassignment.)

Str() and Repr()

The str() and repr() functions return the same thing for theseobjects. The repr() of a bytes object returns a b’…’ style literal.The repr() of a bytearray returns a string of the form “bytearray(b’…’)”.

Operators

The following operators are implemented by the bytes and bytearraytypes, except where mentioned:

b1+b2: concatenation. With mixed bytes/bytearray operands,the return type is that of the first argument (this seems arbitraryuntil you consider how+= works).
b1+=b2: mutates b1 if it is a bytearray object.
b*n,n*b: repetition; n must be an integer.
b*=n: mutates b if it is a bytearray object.
b1inb2,b1notinb2: substring test; b1 can be anyobject implementing thePEP 3118 buffer API.
iinb,inotinb: single-byte membership test; i mustbe an integer (if it is a length-1 bytes array, it is consideredto be a substring test, with the same outcome).
len(b): the number of bytes.
hash(b): the hash value; only implemented by the bytes type.

Note that the % operator isnot implemented. It does not appearworth the complexity.

Methods

The following methods are implemented by bytes as well as bytearray, withsimilar semantics. They accept anything that implements thePEP 3118buffer API for bytes arguments, and return the same type as the objectwhose method is called (“self”):

.capitalize(),.center(),.count(),.decode(),.endswith(),.expandtabs(),.find(),.index(),.isalnum(),.isalpha(),.isdigit(),.islower(),.isspace(),.istitle(),.isupper(),.join(),.ljust(),.lower(),.lstrip(),.partition(),.replace(),.rfind(),.rindex(),.rjust(),.rpartition(),.rsplit(),.rstrip(),.split(),.splitlines(),.startswith(),.strip(),.swapcase(),.title(),.translate(),.upper(),.zfill()

This is exactly the set of methods present on the str type in Python2.x, with the exclusion of .encode(). The signatures and semanticsare the same too. However, whenever character classes like letter,whitespace, lower case are used, the ASCII definitions of theseclasses are used. (The Python 2.x str type uses the definitions fromthe current locale, settable through the locale module.) The.encode() method is left out because of the more strict definitions ofencoding and decoding in Python 3000: encoding always takes a Unicodestring and returns a bytes sequence, and decoding always takes a bytessequence and returns a Unicode string.

In addition, both types implement the class method.fromhex(),which constructs an object from a string containing hexadecimal values(with or without spaces between the bytes).

The bytearray type implements these additional methods from theMutableSequence ABC (seePEP 3119):

.extend(), .insert(), .append(), .reverse(), .pop(), .remove().

Bytes and the Str Type

Like the bytes type in Python 3.0a1, and unlike the relationshipbetween str and unicode in Python 2.x, attempts to mix bytes (orbytearray) objects and str objects without specifying an encoding willraise a TypeError exception. (However, comparing bytes/bytearray andstr objects for equality will simply return False; see the section onComparisons above.)

Conversions between bytes or bytearray objects and str objects mustalways be explicit, using an encoding. There are two equivalent APIs:str(b,<encoding>[,<errors>]) is equivalent tob.decode(<encoding>[,<errors>]), andbytes(s,<encoding>[,<errors>]) is equivalent tos.encode(<encoding>[,<errors>]).

There is one exception: we can convert from bytes (or bytearray) to strwithout specifying an encoding by writingstr(b). This producesthe same result asrepr(b). This exception is necessary becauseof the general promise thatany object can be printed, and printingis just a special case of conversion to str. There is however nopromise that printing a bytes object interprets the individual bytesas characters (unlike in Python 2.x).

The str type currently implements thePEP 3118 buffer API. While thisis perhaps occasionally convenient, it is also potentially confusing,because the bytes accessed via the buffer API represent aplatform-depending encoding: depending on the platform byte order anda compile-time configuration option, the encoding could be UTF-16-BE,UTF-16-LE, UTF-32-BE, or UTF-32-LE. Worse, a different implementationof the str type might completely change the bytes representation,e.g. to UTF-8, or even make it impossible to access the data as acontiguous array of bytes at all. Therefore, thePEP 3118 buffer APIwill be removed from the str type.

Last modified:2025-02-01 08:59:27 GMT

Movatterモバイル変換