Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 688 – Making the buffer protocol accessible in Python

PEP 688 – Making the buffer protocol accessible in Python

Author:
Jelle Zijlstra <jelle.zijlstra at gmail.com>
Discussions-To:
Discourse thread
Status:
Final
Type:
Standards Track
Topic:
Typing
Created:
23-Apr-2022
Python-Version:
3.12
Post-History:
23-Apr-2022,25-Apr-2022,06-Oct-2022,26-Oct-2022
Resolution:
07-Mar-2023

Table of Contents

Important

This PEP is a historical document. The up-to-date, canonical documentation can now be found atEmulating buffer types.

×

SeePEP 1 for how to propose changes.

Abstract

This PEP proposes a Python-level API for the buffer protocol,which is currently accessible only to C code. This allows typecheckers to evaluate whether objects implement the protocol.

Motivation

The CPython C API provides a versatile mechanism for accessing theunderlying memory of an object—thebuffer protocolintroduced inPEP 3118.Functions that accept binary data are usually written to handle anyobject implementing the buffer protocol. For example, at the time of writing,there are around 130 functions in CPython using the Argument ClinicPy_buffer type, which accepts the buffer protocol.

Currently, there is no way for Python code to inspect whether an objectsupports the buffer protocol. Moreover, the static type systemdoes not provide a type annotation to represent the protocol.This is acommon problemwhen writing type annotations for code that accepts generic buffers.

Similarly, it is impossible for a class written in Python to supportthe buffer protocol. A buffer class inPython would give users the ability to easily wrap a C buffer object, or to testthe behavior of an API that consumes the buffer protocol. Granted, this is nota particularly common need. However, there has been aCPython feature requestfor supporting buffer classes written in Python that has been open since 2012.

Rationale

Current options

There are two known workarounds for annotating buffer types inthe type system, but neither is adequate.

First, thecurrent workaroundfor buffer types in typeshed is a type aliasthat lists well-known buffer types in the standard library, such asbytes,bytearray,memoryview, andarray.array. Thisapproach works for the standard library, but it does not extend tothird-party buffer types.

Second, thedocumentationfortyping.ByteString currently states:

This type represents the typesbytes,bytearray, andmemoryview of byte sequences.

As a shorthand for this type,bytes can be used to annotatearguments of any of the types mentioned above.

Although this sentence has been in the documentationsince 2015,the use ofbytes to include these other types is not specifiedin any of the typing PEPs. Furthermore, this mechanism has a number ofproblems. It does not include all possible buffer types, and itmakes thebytes type ambiguous in type annotations. After all,there are many operations that are valid onbytes objects, butnot onmemoryview objects, and it is perfectly possible fora function to acceptbytes but notmemoryview objects.A mypy userreportsthat this shortcut has caused significant problems for thepsycopg project.

Kinds of buffers

The C buffer protocol supportsmany options,affecting strides, contiguity, and support for writing to the buffer. Some of theseoptions would be useful in the type system. For example, typeshedcurrently provides separate type aliases for writable and read-onlybuffers.

However, in the C buffer protocol, most of these options cannot bequeried directly on the type object. The only way to figure outwhether an object supports a particular flag is to actuallyask for the buffer. For some types, such asmemoryview,the supported flags depend on the instance. As a result, it wouldbe difficult to represent support for these flags in the type system.

Specification

Python-level buffer protocol

We propose to add two Python-level special methods,__buffer__and__release_buffer__. Pythonclasses that implement these methods are usable as buffers from Ccode. Conversely, classes implemented in C that support thebuffer protocol acquire synthesized methods accessible from Pythoncode.

The__buffer__ method is called to create a buffer from a Pythonobject, for example by thememoryview() constructor.It corresponds to thebf_getbuffer C slot.The Python signature for this method isdef__buffer__(self,flags:int,/)->memoryview:.... The methodmust return amemoryview object. If thebf_getbuffer slotis invoked on a Python class with a__buffer__ method,the interpreter extracts the underlyingPy_buffer from thememoryview returned by the methodand returns it to the C caller. Similarly, if Python code calls the__buffer__ method on an instance of a C class thatimplementsbf_getbuffer, the returned buffer is wrapped in amemoryview for consumption by Python code.

The__release_buffer__ method should be called when a caller nolonger needs the buffer returned by__buffer__. It corresponds to thebf_releasebuffer C slot. This is anoptional part of the buffer protocol.The Python signature for this method isdef__release_buffer__(self,buffer:memoryview,/)->None:....The buffer to be released is wrapped in amemoryview. When thismethod is invoked through CPython’s buffer API (for example, throughcallingmemoryview.release on amemoryview returned by__buffer__), the passedmemoryview is the same objectas was returned by__buffer__. It isalso possible to call__release_buffer__ on a C class thatimplementsbf_releasebuffer.

If__release_buffer__ exists on an object,Python code that calls__buffer__ directly on the object mustcall__release_buffer__ on the same object when it is donewith the buffer. Otherwise, resources used by the object maynot be reclaimed. Similarly, it is a programming errorto call__release_buffer__ without a previous call to__buffer__, or to call it multiple times for a single callto__buffer__. For objects that implement the C buffer protocol,calls to__release_buffer__ where the argument is not amemoryview wrapping the same object will raise an exception.After a valid call to__release_buffer__, thememoryviewis invalidated (as if itsrelease() method had been called),and any subsequent calls to__release_buffer__ with the samememoryview will raise an exception.The interpreter will ensure that misuseof the Python API will not break invariants at the C level – forexample, it will not cause memory safety violations.

inspect.BufferFlags

To help implementations of__buffer__, we addinspect.BufferFlags,a subclass ofenum.IntFlag. This enum contains all flags defined in theC buffer protocol. For example,inspect.BufferFlags.SIMPLE has the samevalue as thePyBUF_SIMPLE constant.

collections.abc.Buffer

We add a new abstract base classes,collections.abc.Buffer,which requires the__buffer__ method.This class is intended primarily for use in type annotations:

defneed_buffer(b:Buffer)->memoryview:returnmemoryview(b)need_buffer(b"xy")# okneed_buffer("xy")# rejected by static type checkers

It can also be used inisinstance andissubclass checks:

>>>fromcollections.abcimportBuffer>>>isinstance(b"xy",Buffer)True>>>issubclass(bytes,Buffer)True>>>issubclass(memoryview,Buffer)True>>>isinstance("xy",Buffer)False>>>issubclass(str,Buffer)False

In the typeshed stub files, the class should be defined as aProtocol,following the precedent of other simple ABCs incollections.abc such ascollections.abc.Iterable orcollections.abc.Sized.

Example

The following is an example of a Python class that implements thebuffer protocol:

importcontextlibimportinspectclassMyBuffer:def__init__(self,data:bytes):self.data=bytearray(data)self.view=Nonedef__buffer__(self,flags:int)->memoryview:ifflags!=inspect.BufferFlags.FULL_RO:raiseTypeError("Only BufferFlags.FULL_RO supported")ifself.viewisnotNone:raiseRuntimeError("Buffer already held")self.view=memoryview(self.data)returnself.viewdef__release_buffer__(self,view:memoryview)->None:assertself.viewisview# guaranteed to be trueself.view.release()self.view=Nonedefextend(self,b:bytes)->None:ifself.viewisnotNone:raiseRuntimeError("Cannot extend held buffer")self.data.extend(b)buffer=MyBuffer(b"capybara")withmemoryview(buffer)asview:view[0]=ord("C")withcontextlib.suppress(RuntimeError):buffer.extend(b"!")# raises RuntimeErrorbuffer.extend(b"!")# ok, buffer is no longer heldwithmemoryview(buffer)asview:assertview.tobytes()==b"Capybara!"

Equivalent for older Python versions

New typing features are usually backported to older Python versionsin thetyping_extensionspackage. Because the buffer protocolis currently accessible only in C, this PEP cannot be fully implementedin a pure-Python package liketyping_extensions. As a temporaryworkaround, an abstract base classtyping_extensions.Bufferwill be provided for Python versionsthat do not havecollections.abc.Buffer available.

After this PEP is implemented, inheriting fromcollections.abc.Buffer willnot be necessary to indicate that an object supports the buffer protocol.However, in older Python versions, it will be necessary to explicitlyinherit fromtyping_extensions.Buffer to indicate to type checkers thata class supports the buffer protocol, since objects supporting the bufferprotocol will not have a__buffer__ method. It is expected that thiswill happen primarily in stub files, because buffer classes are necessarilyimplemented in C code, which cannot have types defined inline.For runtime uses, theABC.register API can be used to registerbuffer classes withtyping_extensions.Buffer.

No special meaning forbytes

The special case stating thatbytes may be used as a shorthandfor otherByteString types will be removed from thetypingdocumentation.Withcollections.abc.Buffer available as an alternative, there will be no goodreason to allowbytes as a shorthand.Type checkers currently implementing this behaviorshould deprecate and eventually remove it.

Backwards Compatibility

__buffer__ and__release_buffer__ attributes

As the runtime changes in this PEP only add new functionality, there arefew backwards compatibility concerns.

However, code that uses a__buffer__ or__release_buffer__ attribute forother purposes may be affected. While all dunders are technically reserved for thelanguage, it is still good practice to ensure that a new dunder does notinterfere with too much existing code, especially widely used packages. A surveyof publicly accessible code found:

  • PyPysupportsa__buffer__ method with compatible semantics to those proposed in thisPEP. A PyPy core developerexpressed his supportfor this PEP.
  • pyzmqimplementsa PyPy-compatible__buffer__ method.
  • mpi4pydefinesaSupportsBuffer protocol that would be equivalent to this PEP’scollections.abc.Buffer.
  • NumPy used to have an undocumented behavior where it would access a__buffer__ attribute(not method) to get an object’s buffer. This wasremovedin 2019 for NumPy 1.17. The behavior would have last worked in NumPy 1.16, which only supportedPython 3.7 and older. Python 3.7 will have reached its end of life by the time this PEP is expected tobe implemented.

Thus, this PEP’s use of the__buffer__ method will improve interoperability withPyPy and not interfere with the current versions of any major Python packages.

No publicly accessible code uses the name__release_buffer__.

Removal of thebytes special case

Separately, the recommendation to remove the special behavior forbytes in type checkers does have a backwards compatibilityimpact on their users. Anexperimentwith mypy shows that several major open source projects that use itfor type checking will see new errors if thebytes promotionis removed. Many of these errors can be fixed by improvingthe stubs in typeshed, as has already been done for thebuiltins,binascii,pickle, andre modules.Areview of allusage ofbytes types in typeshed is in progress.Overall, the change improves type safety and makes the type systemmore consistent, so we believe the migration cost is worth it.

How to Teach This

We will add notes pointing tocollections.abc.Buffer in appropriate places in thedocumentation, such astyping.python.organd themypy cheat sheet.Type checkers may provide additional pointers in their error messages. For example,when they encounter a buffer object being passed to a function thatis annotated to only acceptbytes, the error message could include a note suggestingthe use ofcollections.abc.Buffer instead.

Reference Implementation

An implementation of this PEP isavailablein the author’s fork.

Rejected Ideas

types.Buffer

An earlier version of this PEP proposed adding a newtypes.Buffer type withan__instancecheck__ implemented in C so thatisinstance() checks can beused to check whether a type implements the buffer protocol. This avoids thecomplexity of exposing the full buffer protocol to Python code, while stillallowing the type system to check for the buffer protocol.

However, that approachdoes not compose well with the rest of the type system, becausetypes.Bufferwould be a nominal type, not a structural one. For example, there would be no wayto represent “an object that supports both the buffer protocol and__len__”. Withthe current proposal,__buffer__ is like any other special method, so aProtocol can be defined combining it with another method.

More generally, no other part of Python works like the proposedtypes.Buffer.The current proposal is more consistent with the rest of the language, whereC-level slots usually have corresponding Python-level special methods.

Keepbytearray compatible withbytes

It has been suggested to remove the special case wherememoryview isalways compatible withbytes, but keep it forbytearray, becausethe two types have very similar interfaces. However, several standardlibrary functions (e.g.,re.compile,socket.getaddrinfo, and mostfunctions accepting path-like arguments) acceptbytes but notbytearray. In most codebases,bytearray is alsonot a very common type. We prefer to have users spell out accepted typesexplicitly (or useProtocol fromPEP 544 if only a specific set ofmethods is required). This aspect of the proposal wasspecificallydiscussedon the typing-sig mailing list, without any strong disagreement from thetyping community.

Distinguish between mutable and immutable buffers

The most frequently used distinction within buffer types iswhether or not the buffer is mutable. Some functions accept onlymutable buffers (e.g.,bytearray, somememoryview objects),others accept all buffers.

An earlier version of this PEP proposed using the presence of thebf_releasebuffer slot to determine whether a buffer type is mutable.This rule holds for most standard library buffer types, but the relationshipbetween mutability and the presence of this slot is not absolute. Forexample,numpy arrays are mutable but do not have this slot.

The current buffer protocol does not provide any way to reliablydetermine whether a buffer type represents a mutable or immutablebuffer. Therefore, this PEP does not add type system supportfor this distinction.The question can be revisited in the future if the buffer protocolis enhanced to provide static introspection support.Asketchfor such a mechanism exists.

Acknowledgments

Many people have provided useful feedback on drafts of this PEP.Petr Viktorin has been particularly helpful in improving my understandingof the subtleties of the buffer protocol.

Copyright

This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.


Source:https://github.com/python/peps/blob/main/peps/pep-0688.rst

Last modified:2025-03-05 16:28:34 GMT


[8]ページ先頭

©2009-2026 Movatter.jp