Posted onJun 14, 2020

Making Python Integers Iterable

#python #cpython #internals

Iterables in Python are objects and containers that could be stepped through one item at a time, usually using afor ... in loop. Not all objects can be iterated, for example - we cannot iterate an integer, it is a singular value. The best we can do here is iterate on a range of integers using therange type which helps us iterate through all integers in the range[0, n).

Since integers, individualistically, are not iterable, when we try to do afor x in 7, it raises an exception statingTypeError: 'int' object is not iterable. So what if, we change the Python's source code and make integers iterable, say every time we do afor x in 7, instead of raising an exception it actually iterates through the values[0, 7). In this essay, we would be going through exactly that, and the entire agenda being:

What is a Python iterable?
What is an iterator protocol?
Changing Python's source code and make integers iterable, and
Why it might be a bad idea to do so?

Python Iterables

Any object that could be iterated is an Iterable in Python. The list has to be the most popular iterable out there and it finds its usage in almost every single Python application - directly or indirectly. Even before the first user command is executed, the Python interpreter, while booting up, has already created406 lists, for its internal usage.

In the example below, we see how a lista is iterated through using afor ... in loop and each element can be accessed via variablex.

>>>a=[2,3,5,7,11,13]>>>forxina:print(x,end=" ")23571113

Similar tolist,range is a python type that allows us to iterate on integer values starting with the valuestart and going tillend while stepping overstep values at each time.range is most commonly used for implementing a C-likefor loop in Python. In the example below, thefor loop iterates over arange that starts from0, goes till7 with a step of1 - producing the sequence[0, 7).

# The range(0, 7, 1) will iterate through values 0 to 6 and every time# it will increment the current value by 1 i.e. the step.>>>forxinrange(0,7,1):print(x,end=" ")0123456

Apart fromlist andrange otheriterables are -tuple,set,frozenset,str,bytes,bytearray,memoryview, anddict. Python also allows us to create custom iterables by making objects and types follow theIterator Protocol.

Iterators and Iterator Protocol

Python, keeping things simple, defines iterable as any object that follows theIterator Protocol; which means the object or a container implements the following functions

__iter__ should return an iterator object having implemented the__next__ method
__next__ should return the next item of the iteration and if items are exhausted then raise aStopIteration exception.

So, in a gist,__iter__ is something that makes any python object iterable; hence to make integers iterable we need to have__iter__ function set for integers.

Iterable in CPython

The most famous and widely used implementation of Python isCPython where the core is implemented in pure C. Since we need to make changes to one of the core datatypes of Python, we will be modifying CPython, add__iter__ function to an Integer type, and rebuild the binary. But before jumping into the implementation, it is important to understand a few fundamentals.

The`PyTypeObject`

Every object in Python is associated with a type and eachtype is an instance of a struct namedPyTypeObject. A new instance of this structure is effectively a new type in python. This structure holds a few meta information and a bunch of C function pointers - each implementing a small segment of the type's functionality. Most of these "slots" in the structure are optional which could be filled by putting appropriate function pointers and driving the corresponding functionality.

The`tp_iter` slot

Among all the slots available, the slot that interests us is thetp_iter slot which can hold a pointer to a function that returns an iterator object. This slot corresponds to the__iter__ function which effectively makes the object iterable. A nonNULL value of this slot indicates iterability. Thetp_iter holds the function with the following signature

PyObject*tp_iter(PyObject*);

Integers in Python do not have a fixed size; rather the size of integer depends on the value it holds.How Python implements super long integers is a story on its own but the core implementation can be found atlongobject.c. The instance ofPyTypeObject that defines integer/long type isPyLong_Type and has itstp_iter slot set to0 i.e.NULL which asserts the fact that Integers in python are not iterable.

PyTypeObjectPyLong_Type={..."int",/* tp_name */offsetof(PyLongObject,ob_digit),/* tp_basicsize */sizeof(digit),/* tp_itemsize */...0,/* tp_iter */...};

ThisNULL value fortp_iter makesint object not iterable and hence if this slot was occupied by an appropriate function pointer with the aforementioned signature, this could well make any integer iterable.

Implementing`long_iter`

Now we implement thetp_iter function on integer type, naming itlong_iter, that returns an iterator object, as required by the convention. The core functionality we are looking to implement here is - when an integern is iterated, it should iterate through the sequence[0, n) with step1. This behavior is very close to the pre-definedrange type, that iterates over a range of integer values, more specifically arange that starts at0, goes tilln with a step of1.

We define a utility function inrangeobject.c that, given a python integer, returns an instance oflongrangeiterobject as per our specifications. This utility function will instantiate thelongrangeiterobject with start as0, ending at the long value given in the argument, and step as1. The utility function is as illustrated below.

/* *  PyLongRangeIter_ZeroToN creates and returns a range iterator on long *  iterating on values in the range [0, n). * *  The function creates and returns a range iterator from 0 till the *  provided long value. */PyObject*PyLongRangeIter_ZeroToN(PyObject*long_obj){// creating a new instance of longrangeiterobjectlongrangeiterobject*it;it=PyObject_New(longrangeiterobject,&PyLongRangeIter_Type);// if unable to allocate memoty to it, return NULL.if(it==NULL)returnNULL;// we set the start to 0it->start=_PyLong_Zero;// we set the step to 1it->step=_PyLong_One;// we set the index to 0, since we want to always start from the first// element of the iterationit->index=_PyLong_Zero;// we set the total length of iteration to be equal to the provided valueit->len=long_obj;// we increment the reference count for each of the values referencedPy_INCREF(it->start);Py_INCREF(it->step);Py_INCREF(it->len);Py_INCREF(it->index);// downcast the iterator instance to PyObject and returnreturn(PyObject*)it;}

The utility functionPyLongRangeIter_ZeroToN is defined inrangeobject.c and will be declared inrangeobject.h so that it can be used across the CPython. Declaration of function inrangeobject.h using standard Python macros goes like this

PyAPI_FUNC(PyObject*)PyLongRangeIter_ZeroToN(PyObject*);

The function occupying thetp_iter slot will receive theself object as the input argument and is expected to return the iterator instance. Hence, thelong_iter function will receive the python integer object (self) that is being iterated as an input argument and it should return the iterator instance. Here we would use the utility functionPyLongRangeIter_ZeroToN, we just defined, which is returning us an instance of range iterator. The entirelong_iter function could be defined as

/* *  long_iter creates an instance of range iterator using PyLongRangeIter_ZeroToN *  and returns the iterator instance. * *  The argument to the `tp_iter` is the `self` object and since we are trying to *  iterate an integer here, the input argument to `long_iter` will be the *  PyObject of type PyLong_Type, holding the integer value. */staticPyObject*long_iter(PyObject*long_obj){returnPyLongRangeIter_ZeroToN(long_obj);}

Now that we havelong_iter defined, we can place the function on thetp_iter slot ofPyLong_Type that enables the required iterability on integers.

PyTypeObjectPyLong_Type={..."int",/* tp_name */offsetof(PyLongObject,ob_digit),/* tp_basicsize */sizeof(digit),/* tp_itemsize */...long_iter,/* tp_iter */...};

Consolidated flow

Once we have everything in place, the entire flow goes like this -

Every time an integer is iterated, using any iteration method - for examplefor ... in, it would check thetp_iter of thePyLongType and since now it holds the function pointerlong_iter, the function will be invoked. This invocation will return an iterator object of typelongrangeiterobject with a fixed start, index, and step values - which in pythonic terms is effectively arange(0, n, 1). Hence thefor x in 7 is inherently evaluated asfor x in range(0, 7, 1) allowing us to iterate integers.

These changes are also hosted on a remote branchcpython@02-long-iter and Pull Request holding thediff can be foundhere.

Integer iteration in action

Once we build a new python binary with the aforementioned changes, we can see iterable integers in actions. Now when we dofor x in 7, instead of raising an exception, it actually iterates through values[0, 7).

>>>foriin7:print(i,end=" ");0123456# Since integers are now iterable, we can create a list of [0, 7) using `list`# Internally `list` tries to iterate on the given object i.e. `7`# now that the iteration is defined as [0, 7) we get the list from# from iteration, instead of an exception>>>list(7)[0,1,2,3,4,5,6]

Why it is not a good idea

Although it seems fun, and somewhat useful, to have iterable integers, it is really not a great idea. The core reason for this is that it makes unpacking unpredictable. Unpacking is when you unpack an iterable and assign it to multiple variables. For example:a, b = 3, 4 will assign 3 to a and 4 to b. So assigninga, b = 7 should be an error because there is just one value on the right side and multiple on the left.

Unpacking treats right-hand size as iterable and tries to iterate on it; and now since Integers are iterable the right-hand side, post iteration yields 7 values which the left-hand side has mere 2 variables; Hence it raises an exceptionValueError: too many values to unpack (expected 2).

Things would work just fine if we doa, b = 2 as now the right-hand side, post iteration, has two values, and the left-hand side has two variables. Thus two very similar statements result in two very different outcomes, making unpacking unpredictable.

>>>a,b=7Traceback(mostrecentcalllast):File"<stdin>",line1,in<module>ValueError:toomanyvaluestounpack(expected2)>>>a,b=2>>>a,b0,1

Conclusion

In this essay, we modified the Python's source code and made integers iterable. Even though it is not a good idea to do so, but it is fun to play around with the code and make changes in our favorite programming language. It helps us get a detailed idea about core python implementation and may pave the way for us to become a Python core developer. This is one of many articles in Python Internals series -How python implements super long integers? andPython Caches Integers.

Movatterモバイル変換

DEV Community

Making Python Integers Iterable

Python Iterables

Iterators and Iterator Protocol

Iterable in CPython

The`PyTypeObject`

The`tp_iter` slot

Implementing`long_iter`

Consolidated flow

Integer iteration in action

Why it is not a good idea

Conclusion

References

Other articles that you might like

Top comments(0)

More fromArpit Bhayani

Movatterモバイル変換

Python Iterables

Iterators and Iterator Protocol

Iterable in CPython

ThePyTypeObject

Thetp_iter slot

Implementinglong_iter

Consolidated flow

Integer iteration in action

Why it is not a good idea

Conclusion

References

Other articles that you might like

The`PyTypeObject`

The`tp_iter` slot

Implementing`long_iter`