Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 467 – Minor API improvements for binary sequences

Author:
Alyssa Coghlan <ncoghlan at gmail.com>, Ethan Furman <ethan at stoneleaf.us>
Discussions-To:
Discourse thread
Status:
Draft
Type:
Standards Track
Created:
30-Mar-2014
Python-Version:
3.15
Post-History:
30-Mar-2014, 15-Aug-2014, 16-Aug-2014, 07-Jun-2016, 01-Sep-2016,13-Apr-2021, 03-Nov-2021, 27-Dec-2023

Table of Contents

Abstract

This PEP proposes small adjustments to the APIs of thebytes andbytearray types to make it easier to operate entirely in the binary domain:

  • Addfromsize alternative constructor
  • Addfromint alternative constructor
  • Addgetbyte byte retrieval method
  • Additerbytes alternative iterator

The last two (getbyte anditerbytes) will also be added tomemoryview.

Rationale

During the initial development of the Python 3 language specification, thecorebytes type for arbitrary binary data started as the mutable typethat is now referred to asbytearray. Other aspects of operating inthe binary domain in Python have also evolved over the course of the Python3 series, for example withPEP 461.

Motivation

With Python 3 and the split betweenstr andbytes, one small butimportant area of programming became slightly more difficult, and much morepainful – wire format protocols.

This area of programming is characterized by a mixture of binary data andASCII compatible segments of text (aka ASCII-encoded text). The addition ofthe new constructors, methods, and iterators will aid both in writing newwire format code, and in updating existing code.

Common use-cases includedbf andpdf file formats,emailformats, andFTP andHTTP communications, among many others.

Proposals

Addition of explicit “count and byte initialised sequence” constructors

To replace the discouraged behavior of creating zero-filledbytes-likeobjects from the basic constructors (i.e.bytes(1) –>b'\x00'), thisPEP proposes the addition of an explicitfromsize alternative constructoras a class method on bothbytes andbytearray whose first argumentis the count, and whose second argument is the fill byte to use (defaultsto\x00):

>>>bytes.fromsize(3)b'\x00\x00\x00'>>>bytearray.fromsize(3)bytearray(b'\x00\x00\x00')>>>bytes.fromsize(5,b'\x0a')b'\x0a\x0a\x0a\x0a\x0a'>>>bytearray.fromsize(5,fill=b'\x0a')bytearray(b'\x0a\x0a\x0a\x0a\x0a')

fromsize will behave just as the current constructors behave when passed asingle integer, while allowing for non-zero fill values when needed.

Addition of explicit “single byte” constructors

As binary counterparts to the textchr function, this PEP proposesthe addition of an explicitfromint alternative constructor as a classmethod on bothbytes andbytearray:

>>>bytes.fromint(65)b'A'>>>bytearray.fromint(65)bytearray(b'A')

These methods will only accept integers in the range 0 to 255 (inclusive):

>>>bytes.fromint(512)Traceback (most recent call last):  File"<stdin>", line1, in<module>ValueError:integer must be in range(0, 256)>>>bytes.fromint(1.0)Traceback (most recent call last):  File"<stdin>", line1, in<module>TypeError:'float' object cannot be interpreted as an integer

The documentation of theord builtin will be updated to explicitly notethatbytes.fromint is the primary inverse operation for binary data, whilechr is the inverse operation for text data, and thatbytearray.fromintalso exists.

Behaviorally,bytes.fromint(x) will be equivalent to the currentbytes([x]) (and similarly forbytearray). The new spelling isexpected to be easier to discover and easier to read (especially when usedin conjunction with indexing operations on binary sequence types).

As a separate method, the new spelling will also work better with higherorder functions likemap.

These new methods intentionally do NOT offer the same level of general integersupport as the existingint.to_bytes conversion method, which allowsarbitrarily large integers to be converted to arbitrarily long bytes objects. Therestriction to only accept positive integers that fit in a single byte meansthat no byte order information is needed, and there is no need to handlenegative numbers. The documentation of the new methods will refer readers toint.to_bytes for use cases where handling of arbitrary integers is needed.

Addition of “getbyte” method to retrieve a single byte

This PEP proposes thatbytes,bytearray, andmemoryview gain themethodgetbyte which will always returnbytes:

>>>b'abc'.getbyte(0)b'a'

If an index is asked for that doesn’t exist,IndexError is raised:

>>>b'abc'.getbyte(9)Traceback (most recent call last):  File"<stdin>", line1, in<module>IndexError:index out of range

Addition of optimised iterator methods that producebytes objects

This PEP proposes thatbytes,bytearray, andmemoryview gain anoptimisediterbytes method that produces length 1bytes objects ratherthan integers:

forxindata.iterbytes():# x is a length 1 ``bytes`` object, rather than an integer

For example:

>>>tuple(b"ABC".iterbytes())(b'A', b'B', b'C')

Design discussion

Why not rely on sequence repetition to create zero-initialised sequences?

Zero-initialised sequences can be created via sequence repetition:

>>>b'\x00'*3b'\x00\x00\x00'>>>bytearray(b'\x00')*3bytearray(b'\x00\x00\x00')

However, this was also the case when thebytearray type was originallydesigned, and the decision was made to add explicit support for it in thetype constructor. The immutablebytes type then inherited that featurewhen it was introduced inPEP 3137.

This PEP isn’t revisiting that original design decision, just changing thespelling as users sometimes find the current behavior of the binary sequenceconstructors surprising. In particular, there’s a reasonable case to be madethatbytes(x) (wherex is an integer) should behave like thebytes.fromint(x) proposal in this PEP. Providing both behaviors as separateclass methods avoids that ambiguity.

Current Workarounds

After nearly a decade, there’s seems to be no consensus on the best workaroundsfor byte iteration, as demonstrated byGet single-byte bytes objects from bytes objects.

Omitting the originally proposed builtin function

When submitted to the Steering Council, this PEP proposed the introduction ofabchr builtin (with the same behaviour asbytes.fromint), recreatingtheord/chr/unichr trio from Python 2 under a different namingscheme (ord/bchr/chr).

The SC indicated they didn’t think this functionality was needed often enoughto justify offering two ways of doing the same thing, especially when one ofthose ways was a new builtin function. That part of the proposal was thereforedropped as being redundant with thebytes.fromint alternate constructor.

Developers that use this method frequently will instead have the option todefine their ownbchr=bytes.fromint aliases.

References

Copyright

This document has been placed in the public domain.


Source:https://github.com/python/peps/blob/main/peps/pep-0467.rst

Last modified:2025-05-06 21:00:16 GMT


[8]ページ先頭

©2009-2025 Movatter.jp