Python Enhancement Proposals

Python »
PEP Index »
PEP 237

PEP 237 – Unifying Long Integers and Integers

Author:: Moshe Zadka, Guido van Rossum
Status:

Abstract

Python currently distinguishes between two kinds of integers (ints): regularor short ints, limited by the size of a C long (typically 32 or 64 bits), andlong ints, which are limited only by available memory. When operations onshort ints yield results that don’t fit in a C long, they raise an error.There are some other distinctions too. This PEP proposes to do away with mostof the differences in semantics, unifying the two types from the perspectiveof the Python user.

Rationale

Many programs find a need to deal with larger numbers after the fact, andchanging the algorithms later is bothersome. It can hinder performance in thenormal case, when all arithmetic is performed using long ints whether or notthey are needed.

Having the machine word size exposed to the language hinders portability. Forexamples Python source files and .pyc’s are not portable between 32-bit and64-bit machines because of this.

There is also the general desire to hide unnecessary details from the Pythonuser when they are irrelevant for most applications. An example is memoryallocation, which is explicit in C but automatic in Python, giving us theconvenience of unlimited sizes on strings, lists, etc. It makes sense toextend this convenience to numbers.

It will give new Python programmers (whether they are new to programming ingeneral or not) one less thing to learn before they can start using thelanguage.

Implementation

Initially, two alternative implementations were proposed (one by each author):

ThePyInt type’s slot for a C long will be turned into a:
```
union{longi;struct{unsignedlonglength;digitdigits[1];}bignum;};
```
Only then-1 lower bits of thelong have any meaning; the top bitis always set. This distinguishes theunion. AllPyInt functionswill check this bit before deciding which types of operations to use.
The existing short and long int types remain, but operations returna long int instead of raisingOverflowError when a result cannot berepresented as a short int. A new type,integer, may be introducedthat is an abstract base type of which both theint andlongimplementation types are subclassed. This is useful so that programs cancheck integer-ness with a single test:
```
ifisinstance(i,integer):...
```

After some consideration, the second implementation plan was selected, sinceit is far easier to implement, is backwards compatible at the C API level, andin addition can be implemented partially as a transitional measure.

Incompatibilities

The following operations have (usually subtly) different semantics for shortand for long integers, and one or the other will have to be changed somehow.This is intended to be an exhaustive list. If you know of any other operationthat differ in outcome depending on whether a short or a long int with the samevalue is passed, please write the second author.

Currently, all arithmetic operators on short ints except<< raiseOverflowError if the result cannot be represented as a short int. Thiswill be changed to return a long int instead. The following operators cancurrently raiseOverflowError:x+y,x-y,x*y,x**y,divmod(x,y),x/y,x%y, and-x. (The last four can onlyoverflow when the value-sys.maxint-1 is involved.)
Currently,x<<n can lose bits for short ints. This will be changed toreturn a long int containing all the shifted-out bits, if returning a shortint would lose bits (where changing sign is considered a special case oflosing bits).
Currently, hex and oct literals for short ints may specify negative values;for example0xffffffff==-1 on a 32-bit machine. This will be changedto equal0xffffffffL (2**32-1).
Currently, the%u,%x,%X and%o string formatting operatorsand thehex() andoct() built-in functions behave differently fornegative numbers: negative short ints are formatted as unsigned C long,while negative long ints are formatted with a minus sign. This will bechanged to use the long int semantics in all cases (but without the trailingL that currently distinguishes the output ofhex() andoct() forlong ints). Note that this means that%u becomes an alias for%d.It will eventually be removed.
Currently,repr() of a long int returns a string ending inL whilerepr() of a short int doesn’t. TheL will be dropped; but not beforePython 3.0.
Currently, an operation with long operands will never return a short int.Thismay change, since it allows some optimization. (No changes have beenmade in this area yet, and none are planned.)
The expressiontype(x).__name__ depends on whetherx is a short or along int. Since implementation alternative 2 is chosen, this differencewill remain. (In Python 3.0, wemay be able to deploy a trick to hide thedifference, because itis annoying to reveal the difference to user code,and more so as the difference between the two types is less visible.)
Long and short ints are handled different by themarshal module, and bythepickle andcPickle modules. This difference will remain (atleast until Python 3.0).
Short ints with small values (typically between -1 and 99 inclusive) areinterned – whenever a result has such a value, an existing short int withthe same value is returned. This is not done for long ints with the samevalues. This difference will remain. (Since there is no guarantee of thisinterning, it is debatable whether this is a semantic difference – but codemay exist that usesis for comparisons of short ints and happens to workbecause of this interning. Such code may fail if used with long ints.)

Literals

A trailingL at the end of an integer literal will stop having anymeaning, and will be eventually become illegal. The compiler will choose theappropriate type solely based on the value. (Until Python 3.0, it will forcethe literal to be a long; but literals without a trailingL may also belong, if they are not representable as short ints.)

Built-in Functions

The functionint() will return a short or a long int depending on theargument value. In Python 3.0, the functionlong() will call the functionint(); before then, it will continue to force the result to be a long int,but otherwise work the same way asint(). The built-in namelong willremain in the language to represent the long implementation type (unless it iscompletely eradicated in Python 3.0), but using theint() function isstill recommended, since it will automatically return a long when needed.

C API

The C API remains unchanged; C code will still need to be aware of thedifference between short and long ints. (The Python 3.0 C API will probablybe completely incompatible.)

ThePyArg_Parse*() APIs already accept long ints, as long as they arewithin the range representable by C ints or longs, so that functions taking Cint or long argument won’t have to worry about dealing with Python longs.

Transition

There are three major phases to the transition:

Short int operations that currently raiseOverflowError return a longint value instead. This is the only change in this phase. Literals willstill distinguish between short and long ints. The other semanticdifferences listed above (including the behavior of<<) will remain.Because this phase only changes situations that currently raiseOverflowError, it is assumed that this won’t break existing code.(Code that depends on this exception would have to be too convoluted to beconcerned about it.) For those concerned about extreme backwardscompatibility, a command line option (or a call to the warnings module)will allow a warning or an error to be issued at this point, but this isoff by default.
The remaining semantic differences are addressed. In all cases the longint semantics will prevail. Since this will introduce backwardsincompatibilities which will break some old code, this phase may require afuture statement and/or warnings, and a prolonged transition phase. ThetrailingL will continue to be used for longs as input and byrepr().
1. Warnings are enabled about operations that will change their numericoutcome in stage 2B, in particularhex() andoct(),%u,%x,%X and%o,hex andoct literals in the(inclusive) range[sys.maxint+1,sys.maxint*2+1], and left shiftslosing bits.
2. The new semantic for these operations are implemented. Operations thatgive different results than before willnot issue a warning.
The trailingL is dropped fromrepr(), and made illegal on input.(If possible, thelong type completely disappears.) The trailingLis also dropped fromhex() andoct().

Phase 1 will be implemented in Python 2.2.

Phase 2 will be implemented gradually, with 2A in Python 2.3 and 2B inPython 2.4.

Phase 3 will be implemented in Python 3.0 (at least two years after Python 2.4is released).

OverflowWarning

Here are the rules that guide warnings generated in situations that currentlyraiseOverflowError. This applies to transition phase 1. Historicalnote: despite that phase 1 was completed in Python 2.2, and phase 2A in Python2.3, nobody noticed that OverflowWarning was still generated in Python 2.3.It was finally disabled in Python 2.4. The Python builtinOverflowWarning, and the corresponding C APIPyExc_OverflowWarning,are no longer generated or used in Python 2.4, but will remain for the(unlikely) case of user code until Python 2.5.

A new warning category is introduced,OverflowWarning. This is abuilt-in name.
If an int result overflows, anOverflowWarning warning is issued, with amessage argument indicating the operation, e.g. “integer addition”. Thismay or may not cause a warning message to be displayed onsys.stderr, ormay cause an exception to be raised, all under control of the-W commandline and the warnings module.
TheOverflowWarning warning is ignored by default.
TheOverflowWarning warning can be controlled like all warnings, via the-W command line option or via thewarnings.filterwarnings() call.For example:
```
python-Wdefault::OverflowWarning
```
cause theOverflowWarning to be displayed the first time it occurs at aparticular source line, and:
```
python-Werror::OverflowWarning
```
cause theOverflowWarning to be turned into an exception whenever ithappens. The following code enables the warning from inside the program:
```
importwarningswarnings.filterwarnings("default","",OverflowWarning)
```
See the pythonman page for the-W option and thewarningsmodule documentation forfilterwarnings().
If theOverflowWarning warning is turned into an error,OverflowError is substituted. This is needed for backwardscompatibility.
Unless the warning is turned into an exceptions, the result of the operation(e.g.,x+y) is recomputed after converting the arguments to long ints.

Example

If you pass a long int to a C function or built-in operation that takes aninteger, it will be treated the same as a short int as long as the value fits(by virtue of howPyArg_ParseTuple() is implemented). If the long valuedoesn’t fit, it will still raise anOverflowError. For example:

deffact(n):ifn<=1:return1returnn*fact(n-1)A="ABCDEFGHIJKLMNOPQ"n=input("Gimme an int: ")printA[fact(n)%17]

Forn>=13, this currently raisesOverflowError (unless the userenters a trailingL as part of their input), even though the calculatedindex would always be inrange(17). With the new approach this code willdo the right thing: the index will be calculated as a long int, but its valuewill be in range.

Resolved Issues

These issues, previously open, have been resolved.

hex() andoct() applied to longs will continue to produce a trailingL until Python 3000. The original text above wasn’t clear about this,but since it didn’t happen in Python 2.4 it was thought better to leave italone. BDFL pronouncement here:
https://mail.python.org/pipermail/python-dev/2006-June/065918.html
What to do aboutsys.maxint? Leave it in, since it is still relevantwhenever the distinction between short and long ints is still relevant (e.g.when inspecting the type of a value).
Should we remove%u completely? Remove it.
Should we warn about<< not truncating integers? Yes.
Should the overflow warning be on a portable maximum size? No.

Implementation

The implementation work for the Python 2.x line is completed; phase 1 wasreleased with Python 2.2, phase 2A with Python 2.3, and phase 2B will bereleased with Python 2.4 (and is already in CVS).

Copyright

This document has been placed in the public domain.

Source:https://github.com/python/peps/blob/main/peps/pep-0237.rst

Last modified:2025-02-01 08:59:27 GMT

Movatterモバイル変換

PEP 237 – Unifying Long Integers and Integers