This PEP proposes adding % formatting operations similar to Python 2’sstrtype tobytes andbytearray[1][2].
While interpolation is usually thought of as a string operation, there arecases where interpolation onbytes orbytearrays make sense, and thework needed to make up for this missing functionality detracts from the overallreadability of the code.
With Python 3 and the split betweenstr andbytes, one small butimportant area of programming became slightly more difficult, and much morepainful – wire format protocols[3].
This area of programming is characterized by a mixture of binary data andASCII compatible segments of text (aka ASCII-encoded text). Bringing back arestricted %-interpolation forbytes andbytearray will aid both inwriting new wire format code, and in porting Python 2 wire format code.
Common use-cases includedbf andpdf file formats,emailformats, andFTP andHTTP communications, among many others.
bytes andbytearray formattingAll the numeric formatting codes (d,i,o,u,x,X,e,E,f,F,g,G, and any that are subsequently addedto Python 3) will be supported, and will work as they do for str, includingthe padding, justification and other related modifiers (currently#,0,-, space, and+ (plus any added to Python 3)). The onlynon-numeric codes allowed arec,b,a, ands (which is asynonym for b).
For the numeric codes, the only difference betweenstr andbytes (orbytearray) interpolation is that the results from these codes will beASCII-encoded text, not unicode. In other words, for any numeric formattingcode%x:
b"%x"%val
is equivalent to:
("%x"%val).encode("ascii")
Examples:
>>>b'%4x'%10b' a'>>>b'%#4x'%10' 0xa'>>>b'%04X'%10'000A'
%c will insert a single byte, either from anint in range(256), or fromabytes argument of length 1, not from astr.
Examples:
>>>b'%c'%48b'0'>>>b'%c'%b'a'b'a'
%b will insert a series of bytes. These bytes are collected in one of twoways:
Py_buffer[4]?use it to collect the necessary bytes__bytes__ method[5] ; if there isn’t one, raise aTypeErrorIn particular,%b will not accept numbers norstr.str is rejectedas the string to bytes conversion requires an encoding, and we are refusing toguess; numbers are rejected because:
%s is included as a synonym for%b for the sole purpose of making 2/3 codebases easier to maintain. Python 3 only code should use%b.
Examples:
>>>b'%b'%b'abc'b'abc'>>>b'%b'%'some string'.encode('utf8')b'some string'>>>b'%b'%3.14Traceback (most recent call last):...TypeError:b'%b' does not accept 'float'>>>b'%b'%'hello world!'Traceback (most recent call last):...TypeError:b'%b' does not accept 'str'
%a will give the equivalent ofrepr(some_obj).encode('ascii','backslashreplace') on the interpolatedvalue. Use cases include developing a new protocol and writing landmarksinto the stream; debugging data going into an existing protocol to see ifthe problem is the protocol itself or bad data; a fall-back for a serializationformat; or any situation where defining__bytes__ would not be appropriatebut a readable/informative representation is needed[6].
%r is included as a synonym for%a for the sole purpose of making 2/3code bases easier to maintain. Python 3 only code use%a[7].
Examples:
>>>b'%a'%3.14b'3.14'>>>b'%a'%b'abc'b"b'abc'">>>b'%a'%'def'b"'def'"
As noted above,%s and%r are being included solely to help easemigration from, and/or have a single code base with, Python 2. This isimportant as there are modules both in the wild and behind closed doors thatcurrently use the Python 2str type as abytes container, and henceare using%s as a bytes interpolator.
However,%b and%a should be used in new, Python 3 only code, so%sand%r will immediately be deprecated, but not removed from the 3.x series[7].
It has been proposed to automatically use.encode('ascii','strict') forstr arguments to%b.
It has been proposed to have%b return the ascii-encoded repr when thevalue is astr (b’%b’ % ‘abc’ –> b“‘abc’”).
Originally this PEP also proposed adding format-style formatting, but it wasdecided that format and its related machinery were all strictly text (akastr) based, and it was dropped.
Various new special methods were proposed, such as__ascii__,__format_bytes__, etc.; such methods are not needed at this time, but canbe visited again later if real-world use shows deficiencies with this solution.
A competing PEP,PEP 460 Add binary interpolation and formatting,also exists.
The objections raised against this PEP were mainly variations on two themes:
bytes andbytearray types are for pure binary data, with noassumptions about encodingsstr/unicode text modelAs was seen during the discussion,bytes andbytearray are also usedfor mixed binary data and ASCII-compatible segments: file formats such asdbf andpdf, network protocols such asftp andemail, etc.
bytes andbytearray already have several methods which assume an ASCIIcompatible encoding.upper(),isalpha(), andexpandtabs() to namejust a few. %-interpolation, with its very restricted mini-language, will notbe any more of a nuisance than the already existing methods.
Some have objected to allowing the full range of numeric formatting codes withthe claim that decimal alone would be sufficient. However, at least twoformats (dbf and pdf) make use of non-decimal numbers.
memoryview,array.array,bytearray,bytes%r was not allowed,but was added for consistency during the 3.5 alpha stage.This document has been placed in the public domain.
Source:https://github.com/python/peps/blob/main/peps/pep-0461.rst
Last modified:2025-02-01 08:55:40 GMT