Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 353 – Using ssize_t as the index type

Author:
Martin von Löwis <martin at v.loewis.de>
Status:
Final
Type:
Standards Track
Created:
18-Dec-2005
Python-Version:
2.5
Post-History:


Table of Contents

Abstract

In Python 2.4, indices of sequences are restricted to the C typeint. On 64-bit machines, sequences therefore cannot use the fulladdress space, and are restricted to 2**31 elements. This PEP proposesto change this, introducing a platform-specific index typePy_ssize_t. An implementation of the proposed change is inhttp://svn.python.org/projects/python/branches/ssize_t.

Rationale

64-bit machines are becoming more popular, and the size of main memoryincreases beyond 4GiB. On such machines, Python currently is limited,in that sequences (strings, unicode objects, tuples, lists,array.arrays, …) cannot contain more than 2GiElements.

Today, very few machines have memory to represent larger lists: aseach pointer is 8B (in a 64-bit machine), one needs 16GiB to just holdthe pointers of such a list; with data in the list, the memoryconsumption grows even more. However, there are three container typesfor which users request improvements today:

  • strings (currently restricted to 2GiB)
  • mmap objects (likewise; plus the system typicallywon’t keep the whole object in memory concurrently)
  • Numarray objects (from Numerical Python)

As the proposed change will cause incompatibilities on 64-bitmachines, it should be carried out while such machines are not in wideuse (IOW, as early as possible).

Specification

A new type Py_ssize_t is introduced, which has the same size as thecompiler’s size_t type, but is signed. It will be a typedef forssize_t where available.

The internal representation of the length fields of all containertypes is changed from int to ssize_t, for all types included in thestandard distribution. In particular, PyObject_VAR_HEAD is changed touse Py_ssize_t, affecting all extension modules that use that macro.

All occurrences of index and length parameters and results are changedto use Py_ssize_t, including the sequence slots in type objects, andthe buffer interface.

New conversion functions PyInt_FromSsize_t and PyInt_AsSsize_t, areintroduced. PyInt_FromSsize_t will transparently return a long intobject if the value exceeds the LONG_MAX; PyInt_AsSsize_t willtransparently process long int objects.

New function pointer typedefs ssizeargfunc, ssizessizeargfunc,ssizeobjargproc, ssizessizeobjargproc, and lenfunc are introduced. Thebuffer interface function types are now called readbufferproc,writebufferproc, segcountproc, and charbufferproc.

A new conversion code ‘n’ is introduced for PyArg_ParseTuplePy_BuildValue, PyObject_CallFunction and PyObject_CallMethod.This code operates on Py_ssize_t.

The conversion codes ‘s#’ and ‘t#’ will output Py_ssize_tif the macro PY_SSIZE_T_CLEAN is defined before Python.his included, and continue to output int if that macroisn’t defined.

At places where a conversion from size_t/Py_ssize_t toint is necessary, the strategy for conversion is chosenon a case-by-case basis (see next section).

To prevent loading extension modules that assume a 32-bitsize type into an interpreter that has a 64-bit size type,Py_InitModule4 is renamed to Py_InitModule4_64.

Conversion guidelines

Module authors have the choice whether they support this PEP in theircode or not; if they support it, they have the choice of differentlevels of compatibility.

If a module is not converted to support this PEP, it will continue towork unmodified on a 32-bit system. On a 64-bit system, compile-timeerrors and warnings might be issued, and the module might crash theinterpreter if the warnings are ignored.

Conversion of a module can either attempt to continue using intindices, or use Py_ssize_t indices throughout.

If the module should continue to use int indices, care must be takenwhen calling functions that return Py_ssize_t or size_t, inparticular, for functions that return the length of an object (thisincludes the strlen function and the sizeof operator). A good compilerwill warn when a Py_ssize_t/size_t value is truncated into an int.In these cases, three strategies are available:

  • statically determine that the size can never exceed an int(e.g. when taking the sizeof a struct, or the strlen ofa file pathname). In this case, write:
    some_int=Py_SAFE_DOWNCAST(some_value,Py_ssize_t,int);

    This will add an assertion in debug mode that the valuereally fits into an int, and just add a cast otherwise.

  • statically determine that the value shouldn’t overflow anint unless there is a bug in the C code somewhere. Testwhether the value is smaller than INT_MAX, and raise anInternalError if it isn’t.
  • otherwise, check whether the value fits an int, and raisea ValueError if it doesn’t.

The same care must be taken for tp_as_sequence slots, inaddition, the signatures of these slots change, and theslots must be explicitly recast (e.g. from intargfuncto ssizeargfunc). Compatibility with previous Pythonversions can be achieved with the test:

#if PY_VERSION_HEX < 0x02050000 && !defined(PY_SSIZE_T_MIN)typedefintPy_ssize_t;#define PY_SSIZE_T_MAX INT_MAX#define PY_SSIZE_T_MIN INT_MIN#endif

and then using Py_ssize_t in the rest of the code. Forthe tp_as_sequence slots, additional typedefs mightbe necessary; alternatively, by replacing:

PyObject*foo_item(structMyType*obj,intindex){...}

with:

PyObject*foo_item(PyObject*_obj,Py_ssize_tindex){structMyType*obj=(structMyType*)_obj;...}

it becomes possible to drop the cast entirely; the typeof foo_item should then match the sq_item slot in allPython versions.

If the module should be extended to use Py_ssize_t indices, all usagesof the type int should be reviewed, to see whether it should bechanged to Py_ssize_t. The compiler will help in finding the spots,but a manual review is still necessary.

Particular care must be taken for PyArg_ParseTuple calls:they need all be checked for s# and t# converters, andPY_SSIZE_T_CLEAN must be defined before including Python.hif the calls have been updated accordingly.

Fredrik Lundh has written ascanner which checks the codeof a C module for usage of APIs whose signature has changed.

Discussion

Why not size_t

An initial attempt to implement this feature tried to usesize_t. It quickly turned out that this cannot work: Pythonuses negative indices in many places (to indicate countingfrom the end). Even in places where size_t would be usable,too many reformulations of code where necessary, e.g. inloops like:

for(index=length-1;index>=0;index--)

This loop will never terminate if index is changed fromint to size_t.

Why not Py_intptr_t

Conceptually, Py_intptr_t and Py_ssize_t are different things:Py_intptr_t needs to be the same size as void*, and Py_ssize_tthe same size as size_t. These could differ, e.g. on machineswhere pointers have segment and offset. On current flat-addressspace machines, there is no difference, so for all practicalpurposes, Py_intptr_t would have worked as well.

Doesn’t this break much code?

With the changes proposed, code breakage is fairlyminimal. On a 32-bit system, no code will break, asPy_ssize_t is just a typedef for int.

On a 64-bit system, the compiler will warn in manyplaces. If these warnings are ignored, the code willcontinue to work as long as the container sizes don’texceed 2**31, i.e. it will work nearly as good asit does currently. There are two exceptions to thisstatement: if the extension module implements thesequence protocol, it must be updated, or the callingconventions will be wrong. The other exception isthe places where Py_ssize_t is output through apointer (rather than a return value); this appliesmost notably to codecs and slice objects.

If the conversion of the code is made, the same codecan continue to work on earlier Python releases.

Doesn’t this consume too much memory?

One might think that using Py_ssize_t in all tuples,strings, lists, etc. is a waste of space. This isnot true, though: on a 32-bit machine, there is nochange. On a 64-bit machine, the size of manycontainers doesn’t change, e.g.

  • in lists and tuples, a pointer immediately followsthe ob_size member. This means that the compilercurrently inserts a 4 padding bytes; with thechange, these padding bytes become part of the size.
  • in strings, the ob_shash field follows ob_size.This field is of type long, which is a 64-bittype on most 64-bit systems (except Win64), sothe compiler inserts padding before it as well.

Open Issues

  • Marc-Andre Lemburg commented that complete backwardscompatibility with existing source code should bepreserved. In particular, functions that havePy_ssize_t* output arguments should continue to runcorrectly even if the callers pass int*.

    It is not clear what strategy could be used to implementthat requirement.

Copyright

This document has been placed in the public domain.


Source:https://github.com/python/peps/blob/main/peps/pep-0353.rst

Last modified:2025-02-01 08:59:27 GMT


[8]ページ先頭

©2009-2025 Movatter.jp