
Iterables in Python are objects and containers that could be stepped through one item at a time, usually using afor ... in
loop. Not all objects can be iterated, for example - we cannot iterate an integer, it is a singular value. The best we can do here is iterate on a range of integers using therange
type which helps us iterate through all integers in the range[0, n)
.
Since integers, individualistically, are not iterable, when we try to do afor x in 7
, it raises an exception statingTypeError: 'int' object is not iterable
. So what if, we change the Python's source code and make integers iterable, say every time we do afor x in 7
, instead of raising an exception it actually iterates through the values[0, 7)
. In this essay, we would be going through exactly that, and the entire agenda being:
- What is a Python iterable?
- What is an iterator protocol?
- Changing Python's source code and make integers iterable, and
- Why it might be a bad idea to do so?
Python Iterables
Any object that could be iterated is an Iterable in Python. The list has to be the most popular iterable out there and it finds its usage in almost every single Python application - directly or indirectly. Even before the first user command is executed, the Python interpreter, while booting up, has already created406
lists, for its internal usage.
In the example below, we see how a lista
is iterated through using afor ... in
loop and each element can be accessed via variablex
.
>>>a=[2,3,5,7,11,13]>>>forxina:print(x,end=" ")23571113
Similar tolist
,range
is a python type that allows us to iterate on integer values starting with the valuestart
and going tillend
while stepping overstep
values at each time.range
is most commonly used for implementing a C-likefor
loop in Python. In the example below, thefor
loop iterates over arange
that starts from0
, goes till7
with a step of1
- producing the sequence[0, 7)
.
# The range(0, 7, 1) will iterate through values 0 to 6 and every time# it will increment the current value by 1 i.e. the step.>>>forxinrange(0,7,1):print(x,end=" ")0123456
Apart fromlist
andrange
otheriterables are -tuple
,set
,frozenset
,str
,bytes
,bytearray
,memoryview
, anddict
. Python also allows us to create custom iterables by making objects and types follow theIterator Protocol.
Iterators and Iterator Protocol
Python, keeping things simple, defines iterable as any object that follows theIterator Protocol; which means the object or a container implements the following functions
__iter__
should return an iterator object having implemented the__next__
method__next__
should return the next item of the iteration and if items are exhausted then raise aStopIteration
exception.
So, in a gist,__iter__
is something that makes any python object iterable; hence to make integers iterable we need to have__iter__
function set for integers.
Iterable in CPython
The most famous and widely used implementation of Python isCPython where the core is implemented in pure C. Since we need to make changes to one of the core datatypes of Python, we will be modifying CPython, add__iter__
function to an Integer type, and rebuild the binary. But before jumping into the implementation, it is important to understand a few fundamentals.
ThePyTypeObject
Every object in Python is associated with a type and eachtype is an instance of a struct namedPyTypeObject. A new instance of this structure is effectively a new type in python. This structure holds a few meta information and a bunch of C function pointers - each implementing a small segment of the type's functionality. Most of these "slots" in the structure are optional which could be filled by putting appropriate function pointers and driving the corresponding functionality.
Thetp_iter
slot
Among all the slots available, the slot that interests us is thetp_iter
slot which can hold a pointer to a function that returns an iterator object. This slot corresponds to the__iter__
function which effectively makes the object iterable. A nonNULL
value of this slot indicates iterability. Thetp_iter
holds the function with the following signature
PyObject*tp_iter(PyObject*);
Integers in Python do not have a fixed size; rather the size of integer depends on the value it holds.How Python implements super long integers is a story on its own but the core implementation can be found atlongobject.c. The instance ofPyTypeObject
that defines integer/long type isPyLong_Type
and has itstp_iter
slot set to0
i.e.NULL
which asserts the fact that Integers in python are not iterable.
PyTypeObjectPyLong_Type={..."int",/* tp_name */offsetof(PyLongObject,ob_digit),/* tp_basicsize */sizeof(digit),/* tp_itemsize */...0,/* tp_iter */...};
ThisNULL
value fortp_iter
makesint
object not iterable and hence if this slot was occupied by an appropriate function pointer with the aforementioned signature, this could well make any integer iterable.
Implementinglong_iter
Now we implement thetp_iter
function on integer type, naming itlong_iter
, that returns an iterator object, as required by the convention. The core functionality we are looking to implement here is - when an integern
is iterated, it should iterate through the sequence[0, n)
with step1
. This behavior is very close to the pre-definedrange
type, that iterates over a range of integer values, more specifically arange
that starts at0
, goes tilln
with a step of1
.
We define a utility function inrangeobject.c
that, given a python integer, returns an instance oflongrangeiterobject
as per our specifications. This utility function will instantiate thelongrangeiterobject
with start as0
, ending at the long value given in the argument, and step as1
. The utility function is as illustrated below.
/* * PyLongRangeIter_ZeroToN creates and returns a range iterator on long * iterating on values in the range [0, n). * * The function creates and returns a range iterator from 0 till the * provided long value. */PyObject*PyLongRangeIter_ZeroToN(PyObject*long_obj){// creating a new instance of longrangeiterobjectlongrangeiterobject*it;it=PyObject_New(longrangeiterobject,&PyLongRangeIter_Type);// if unable to allocate memoty to it, return NULL.if(it==NULL)returnNULL;// we set the start to 0it->start=_PyLong_Zero;// we set the step to 1it->step=_PyLong_One;// we set the index to 0, since we want to always start from the first// element of the iterationit->index=_PyLong_Zero;// we set the total length of iteration to be equal to the provided valueit->len=long_obj;// we increment the reference count for each of the values referencedPy_INCREF(it->start);Py_INCREF(it->step);Py_INCREF(it->len);Py_INCREF(it->index);// downcast the iterator instance to PyObject and returnreturn(PyObject*)it;}
The utility functionPyLongRangeIter_ZeroToN
is defined inrangeobject.c
and will be declared inrangeobject.h
so that it can be used across the CPython. Declaration of function inrangeobject.h
using standard Python macros goes like this
PyAPI_FUNC(PyObject*)PyLongRangeIter_ZeroToN(PyObject*);
The function occupying thetp_iter
slot will receive theself
object as the input argument and is expected to return the iterator instance. Hence, thelong_iter
function will receive the python integer object (self) that is being iterated as an input argument and it should return the iterator instance. Here we would use the utility functionPyLongRangeIter_ZeroToN
, we just defined, which is returning us an instance of range iterator. The entirelong_iter
function could be defined as
/* * long_iter creates an instance of range iterator using PyLongRangeIter_ZeroToN * and returns the iterator instance. * * The argument to the `tp_iter` is the `self` object and since we are trying to * iterate an integer here, the input argument to `long_iter` will be the * PyObject of type PyLong_Type, holding the integer value. */staticPyObject*long_iter(PyObject*long_obj){returnPyLongRangeIter_ZeroToN(long_obj);}
Now that we havelong_iter
defined, we can place the function on thetp_iter
slot ofPyLong_Type
that enables the required iterability on integers.
PyTypeObjectPyLong_Type={..."int",/* tp_name */offsetof(PyLongObject,ob_digit),/* tp_basicsize */sizeof(digit),/* tp_itemsize */...long_iter,/* tp_iter */...};
Consolidated flow
Once we have everything in place, the entire flow goes like this -
Every time an integer is iterated, using any iteration method - for examplefor ... in
, it would check thetp_iter
of thePyLongType
and since now it holds the function pointerlong_iter
, the function will be invoked. This invocation will return an iterator object of typelongrangeiterobject
with a fixed start, index, and step values - which in pythonic terms is effectively arange(0, n, 1)
. Hence thefor x in 7
is inherently evaluated asfor x in range(0, 7, 1)
allowing us to iterate integers.
These changes are also hosted on a remote branchcpython@02-long-iter and Pull Request holding the
diff
can be foundhere.
Integer iteration in action
Once we build a new python binary with the aforementioned changes, we can see iterable integers in actions. Now when we dofor x in 7
, instead of raising an exception, it actually iterates through values[0, 7)
.
>>>foriin7:print(i,end=" ");0123456# Since integers are now iterable, we can create a list of [0, 7) using `list`# Internally `list` tries to iterate on the given object i.e. `7`# now that the iteration is defined as [0, 7) we get the list from# from iteration, instead of an exception>>>list(7)[0,1,2,3,4,5,6]
Why it is not a good idea
Although it seems fun, and somewhat useful, to have iterable integers, it is really not a great idea. The core reason for this is that it makes unpacking unpredictable. Unpacking is when you unpack an iterable and assign it to multiple variables. For example:a, b = 3, 4
will assign 3 to a and 4 to b. So assigninga, b = 7
should be an error because there is just one value on the right side and multiple on the left.
Unpacking treats right-hand size as iterable and tries to iterate on it; and now since Integers are iterable the right-hand side, post iteration yields 7 values which the left-hand side has mere 2 variables; Hence it raises an exceptionValueError: too many values to unpack (expected 2)
.
Things would work just fine if we doa, b = 2
as now the right-hand side, post iteration, has two values, and the left-hand side has two variables. Thus two very similar statements result in two very different outcomes, making unpacking unpredictable.
>>>a,b=7Traceback(mostrecentcalllast):File"<stdin>",line1,in<module>ValueError:toomanyvaluestounpack(expected2)>>>a,b=2>>>a,b0,1
Conclusion
In this essay, we modified the Python's source code and made integers iterable. Even though it is not a good idea to do so, but it is fun to play around with the code and make changes in our favorite programming language. It helps us get a detailed idea about core python implementation and may pave the way for us to become a Python core developer. This is one of many articles in Python Internals series -How python implements super long integers? andPython Caches Integers.
References
Other articles that you might like
- Python Caches Integers
- How python implements super long integers?
- I changed my Python and made it dubious | Python Internals
- Building Finite State Machines with Python Coroutines
- Personalize your python prompt
If you liked what you read, consider subscribing to my weekly newsletter atarpitbhayani.me/newsletter were, once a week, I write an essay about programming languages internals, or a deep dive on some super-clever algorithm, or just a few tips on building highly scalable distributed systems.
You can always find me browsing through twitter@arpit_bhayani.
Top comments(0)
For further actions, you may consider blocking this person and/orreporting abuse