This PEP has been deferred. Copyable iterators are a nice idea, but afterfour years, no implementation or widespread interest has emerged.
This PEP suggests that some iterator types should support shallowcopies of their instances by exposing a__copy__ method which meetssome specific requirements, and indicates how code using an iteratormight exploit such a__copy__ method when present.
Support for__copy__ was included in Py2.4’sitertools.tee().
Adding__copy__ methods to existing iterators will change thebehavior undertee(). Currently, the copied iterators remaintied to the original iterator. If the original advances, thenso do all of the copies. Good practice is to overwrite theoriginal so that anomalies don’t result:a,b=tee(a).Code that doesn’t follow that practice may observe a semanticchange if a__copy__ method is added to an iterator.
In Python up to 2.3, most built-in iterator types don’t let the usercopy their instances. User-coded iterators that do let their clientscall copy.copy on their instances may, or may not, happen to return,as a result of the copy, a separate iterator object that may beiterated upon independently from the original.
Currently, “support” for copy.copy in a user-coded iterator type isalmost invariably “accidental” – i.e., the standard machinery of thecopy method in Python’s standard library’s copy module does build andreturn a copy. However, the copy will be independently iterable withrespect to the original only if calling.next() on an instance of thatclass happens to change instance state solely by rebinding someattributes to new values, and not by mutating some attributes’existing values.
For example, an iterator whose “index” state is held as an integerattribute will probably give usable copies, since (integers beingimmutable).next() presumably just rebinds that attribute. On theother hand, another iterator whose “index” state is held as a listattribute will probably mutate the same list object when.next()executes, and therefore copies of such an iterator will not beiterable separately and independently from the original.
Given this existing situation,copy.copy(it) on some iterator objectisn’t very useful, nor, therefore, is it at all widely used. However,there are many cases in which being able to get a “snapshot” of aniterator, as a “bookmark”, so as to be able to keep iterating alongthe sequence but later iterate again on the same sequence from thebookmark onwards, is useful. To support such “bookmarking”, moduleitertools, in 2.4, has grown a ‘tee’ function, to be used as:
it,bookmark=itertools.tee(it)
The previous value of ‘it’ must not be used again, which is why thistypical usage idiom rebinds the name. After this call, ‘it’ and‘bookmark’ are independently-iterable iterators on the same underlyingsequence as the original value of ‘it’: this satisfies applicationneeds for “iterator copying”.
However, when itertools.tee can make no hypotheses about the nature ofthe iterator it is passed as an argument, it must save in memory allitems through which one of the two ‘teed’ iterators, but not yet both,have stepped. This can be quite costly in terms of memory, if the twoiterators get very far from each other in their stepping; indeed, insome cases it may be preferable to make a list from the iterator so asto be able to step repeatedly through the subsequence, or, if that istoo costy in terms of memory, save items to disk, again in order to beable to iterate through them repeatedly.
This PEP proposes another idea that will, in some important cases,allowitertools.tee to do its job with minimal cost in terms ofmemory; user code may also occasionally be able to exploit the idea inorder to decide whether to copy an iterator, make a list from it, oruse an auxiliary disk file.
The key consideration is that some important iterators, such as thosewhich built-in function iter builds over sequences, would beintrinsically easy to copy: just get another reference to the samesequence, and a copy of the integer index. However, in Python 2.3,those iterators don’t expose the state, and don’t supportcopy.copy.
The purpose of this PEP, therefore, is to have those iterator typesexpose a suitable__copy__ method. Similarly, user-coded iteratortypes that can provide copies of their instances, suitable forseparate and independent iteration, with limited costs in time andspace, should also expose a suitable__copy__ method. Whilecopy.copy also supports other ways to let a type control the wayits instances are copied, it is suggested, for simplicity, thatiterator types that support copying always do so by exposing a__copy__ method, and not in the other wayscopy.copy supports.
Having iterators expose a suitable__copy__ when feasible will affordeasy optimization of itertools.tee and similar user code, as in:
deftee(it):it=iter(it)try:copier=it.__copy__exceptAttributeError:# non-copyable iterator, do all the needed hard work# [snipped!]else:returnit,copier()
Note that this function does NOT call “copy.copy(it)”, which (evenafter this PEP is implemented) might well still “just happen tosucceed”. for some iterator type that is implemented as a user-codedclass. without really supplying an adequate “independently iterable”copy object as its result.
Any iterator type X may expose a method__copy__ that is callablewithout arguments on any instance x of X. The method should beexposed if and only if the iterator type can provide copyability withreasonably little computational and memory effort. Furthermore, thenew object y returned by method__copy__ should be a new instanceof X that is iterable independently and separately from x, steppingalong the same “underlying sequence” of items.
For example, suppose a class Iter essentially duplicated thefunctionality of the iter builtin for iterating on a sequence:
classIter(object):def__init__(self,sequence):self.sequence=sequenceself.index=0def__iter__(self):returnselfdefnext(self):try:result=self.sequence[self.index]exceptIndexError:raiseStopIterationself.index+=1returnresult
To make this Iter class compliant with this PEP, the followingaddition to the body of class Iter would suffice:
def__copy__(self):result=self.__class__(self.sequence)result.index=self.indexreturnresult
Note that__copy__, in this case, does not even try to copy thesequence; if the sequence is altered while either or both of theoriginal and copied iterators are still stepping on it, the iterationbehavior is quite likely to go awry anyway – it is not__copy__’sresponsibility to change this normal Python behavior for iteratorswhich iterate on mutable sequences (that might, perhaps, be thespecification for a__deepcopy__ method of iterators, which, however,this PEP does not deal with).
Consider also a “random iterator”, which provides a nonterminatingsequence of results from some method of a random instance, calledwith given arguments:
classRandomIterator(object):def__init__(self,bound_method,*args):self.call=bound_methodself.args=argsdef__iter__(self):returnselfdefnext(self):returnself.call(*self.args)def__copy__(self):importcopy,newim_self=copy.copy(self.call.im_self)method=new.instancemethod(self.call.im_func,im_self)returnself.__class__(method,*self.args)
This iterator type is slightly more general than its name implies, asit supports calls to any bound method (or other callable, but if thecallable is not a bound method, then method__copy__ will fail). Butthe use case is for the purpose of generating random streams, as in:
importrandomdefshow5(it):fori,resultinenumerate(it):print'%6.3f'%result,ifi==4:breakprintnormit=RandomIterator(random.Random().gauss,0,1)show5(normit)copit=normit.__copy__()show5(normit)show5(copit)
which will display some output such as:
-0.5361.936-1.182-1.690-1.1840.666-0.7011.2140.3481.3730.666-0.7011.2140.3481.373
the key point being that the second and third lines are equal, becausethe normit and copit iterators will step along the same “underlyingsequence”. (As an aside, note that to get a copy ofself.call.im_selfwe must usecopy.copy, NOT try getting at a__copy__ method directly,because for example instances ofrandom.Random support copying via__getstate__ and__setstate__, NOT via__copy__; indeed, usingcopy.copy is the normal way to get a shallow copy of any object –copyable iterators are different because of the already-mentioneduncertainty about the result ofcopy.copy supporting these “copyableiterator” specs).
Besides adding to the Python docs a recommendation that user-codediterator types support a__copy__ method (if and only if it can beimplemented with small costs in memory and runtime, and produce anindependently-iterable copy of an iterator object), this PEP’simplementation will specifically include the addition of copyabilityto the iterators over sequences that built-in iter returns, and alsoto the iterators over a dictionary returned by the methods__iter__,iterkeys, itervalues, and iteritems of built-in type dict.
Iterators produced by generator functions will not be copyable.However, iterators produced by the new “generator expressions” ofPython 2.4 (PEP 289) should be copyable if their underlyingiterator[s] are; the strict limitations on what is possible in agenerator expression, compared to the much vaster generality of agenerator, should make that feasible. Similarly, the iteratorsproduced by the built-in functionenumerate, and certain functionssuppiled by module itertools, should be copyable if the underlyingiterators are.
The implementation of this PEP will also include the optimization ofthe new itertools.tee function mentioned in the Motivation section.
The main use case for (shallow) copying of an iterator is the same asfor the functionitertools.tee (new in 2.4). User code will notdirectly attempt to copy an iterator, because it would have to dealseparately with uncopyable cases; callingitertools.tee willinternally perform the copy when appropriate, and implicitly fallbackto a maximally efficient non-copying strategy for iterators that arenot copyable. (Occasionally, user code may want more direct control,specifically in order to deal with non-copyable iterators by otherstrategies, such as making a list or saving the sequence to disk).
A tee’d iterator may serve as a “reference point”, allowing processingof a sequence to continue or resume from a known point, while theother independent iterator can be freely advanced to “explore” afurther part of the sequence as needed. A simple example: a generatorfunction which, given an iterator of numbers (assumed to be positive),returns a corresponding iterator, each of whose items is the fractionof the total corresponding to each corresponding item of the inputiterator. The caller may pass the total as a value, if known inadvance; otherwise, the iterator returned by calling this generatorfunction will first compute the total.
deffractions(numbers,total=None):iftotalisNone:numbers,aux=itertools.tee(numbers)total=sum(aux)total=float(total)foriteminnumbers:yielditem/total
The ability to tee the numbers iterator allows this generator toprecompute the total, if needed, without necessarily requiringO(N) auxiliary memory if the numbers iterator is copyable.
As another example of “iterator bookmarking”, consider a stream ofnumbers with an occasional string as a “postfix operator” now andthen. By far most frequent such operator is a ‘+’, whereupon we mustsum all previous numbers (since the last previous operator if any, orelse since the start) and yield the result. Sometimes we find a ‘*’instead, which is the same except that the previous numbers mustinstead be multiplied, not summed.
deffilter_weird_stream(stream):it=iter(stream)whileTrue:it,bookmark=itertools.tee(it)total=0foriteminit:ifitem=='+':yieldtotalbreakelifitem=='*':product=1foriteminbookmark:ifitem=='*':yieldproductbreakelse:product*=itemelse:total+=item
Similar use cases of itertools.tee can support such tasks as“undo” on a stream of commands represented by an iterator,“backtracking” on the parse of a stream of tokens, and so on.(Of course, in each case, one should also consider simplerpossibilities such as saving relevant portions of the sequenceinto lists while stepping on the sequence with just one iterator,depending on the details of one’s task).
Here is an example, in pure Python, of how the ‘enumerate’built-in could be extended to support__copy__ if its underlyingiterator also supported__copy__:
classenumerate(object):def__init__(self,it):self.it=iter(it)self.i=-1def__iter__(self):returnselfdefnext(self):self.i+=1returnself.i,self.it.next()def__copy__(self):result=self.__class__.__new__()result.it=self.it.__copy__()result.i=self.ireturnresult
Here is an example of the kind of “fragility” produced by “accidentalcopyability” of an iterator – the reason why one must NOT usecopy.copy expecting, if it succeeds, to receive as a result aniterator which is iterable-on independently from the original. Hereis an iterator class that iterates (in preorder) on “trees” which, forsimplicity, are just nested lists – any item that’s a list is treatedas a subtree, any other item as a leaf.
classListreeIter(object):def__init__(self,tree):self.tree=[tree]self.indx=[-1]def__iter__(self):returnselfdefnext(self):ifnotself.indx:raiseStopIterationself.indx[-1]+=1try:result=self.tree[-1][self.indx[-1]]exceptIndexError:self.tree.pop()self.indx.pop()returnself.next()iftype(result)isnotlist:returnresultself.tree.append(result)self.indx.append(-1)returnself.next()
Now, for example, the following code:
importcopyx=[[1,2,3],[4,5,[6,7,8],9],10,11,[12]]print'showing all items:',it=ListreeIter(x)foriinit:printi,ifi==6:cop=copy.copy(it)printprint'showing items >6 again:'foriincop:printi,print
does NOT work as intended – the “cop” iterator gets consumed, andexhausted, step by step as the original “it” iterator is, becausethe accidental (rather than deliberate) copying performed bycopy.copy shares, rather than duplicating the “index” list, whichis the mutable attributeit.indx (a list of numerical indices).Thus, this “client code” of the iterator, which attempts to iteratetwice over a portion of the sequence via acopy.copy on theiterator, is NOT correct.
Some correct solutions include usingitertools.tee, i.e., changingthe first for loop into:
foriinit:printi,ifi==6:it,cop=itertools.tee(it)breakforiinit:printi,
(note that we MUST break the loop in two, otherwise we’d stillbe looping on the ORIGINAL value of it, which must NOT be usedfurther after the call to tee!!!); or making a list, i.e.
foriinit:printi,ifi==6:cop=lit=list(it)breakforiinlit:printi,
(again, the loop must be broken in two, since iterator ‘it’gets exhausted by the calllist(it)).
Finally, all of these solutions would work if Listiter supplieda suitable__copy__ method, as this PEP recommends:
def__copy__(self):result=self.__class__.new()result.tree=copy.copy(self.tree)result.indx=copy.copy(self.indx)returnresult
There is no need to get any “deeper” in the copy, but the twomutable “index state” attributes must indeed be copied in orderto achieve a “proper” (independently iterable) iterator-copy.
The recommended solution is to have class Listiter supply this__copy__ method AND have client code useitertools.tee (withthe split-in-two-parts loop as shown above). This will makeclient code maximally tolerant of different iterator types itmight be using AND achieve good performance for tee’ing of thisspecific iterator type at the same time.
[1] Discussion on python-dev starting at post:https://mail.python.org/pipermail/python-dev/2003-October/038969.html
[2] Online documentation for the copy module of the standard library:https://docs.python.org/release/2.6/library/copy.html
This document has been placed in the public domain.
Source:https://github.com/python/peps/blob/main/peps/pep-0323.rst
Last modified:2025-02-01 08:59:27 GMT