Python Enhancement Proposals

Python »
PEP Index »
PEP 618

PEP 618 – Add Optional Length-Checking To zip

Author:: Brandt Bucher <brandt at python.org>
Sponsor:: Antoine Pitrou <antoine at python.org>
BDFL-Delegate:: Guido van Rossum <guido at python.org>
Status:

Table of Contents

Abstract

This PEP proposes adding an optionalstrict boolean keywordparameter to the built-inzip. When enabled, aValueError israised if one of the arguments is exhausted before the others.

It is clear from the author’s personal experience and asurvey of thestandard library that much (if not most)zip usageinvolves iterables thatmust be of equal length. Sometimes thisinvariant is proven true from the context of the surrounding code, butoften the data being zipped is passed from the caller, sourcedseparately, or generated in some fashion. In any of these cases, thedefault behavior ofzip means that faulty refactoring or logicerrors could easily result in silently losing data. These bugs arenot only difficult to diagnose, but difficult to even detect at all.

It is easy to come up with simple cases where this could be a problem.For example, the following code may work fine whenitems is asequence, but silently start producing shortened, mismatched resultsifitems is refactored by the caller to be a consumable iterator:

defapply_calculations(items):transformed=transform(items)fori,tinzip(items,transformed):yieldcalculate(i,t)

There are several other ways in whichzip is commonly used.Idiomatic tricks are especially susceptible, because they are oftenemployed by users who lack a complete understanding of how the codeworks. One example is unpacking intozip to lazily “unzip” or“transpose” nested iterables:

>>>x=[[1,2,3],["one""two""three"]]>>>xt=list(zip(*x))

Another is “chunking” data into equal-sized groups:

>>>n=3>>>x=range(n**2),>>>xn=list(zip(*[iter(x)]*n))

In the first case, non-rectangular data is usually a logic error. Inthe second case, data with a length that is not a multiple ofn isoften an error as well. However, both of these idioms will silentlyomit the tail-end items of malformed input.

Perhaps most convincingly, the use ofzip in the standard-libraryast module created a bug inliteral_eval whichsilentlydropped parts of malformed nodes:

>>>fromastimportConstant,Dict,literal_eval>>>nasty_dict=Dict(keys=[Constant(None)],values=[])>>>literal_eval(nasty_dict)# Like eval("{None: }"){}

In fact, the author hascounted dozens of other call sites in Python’s standard library and tooling where itwould be appropriate to enable this new feature immediately.

Rationale

Some critics assert that constant boolean switches are a “code-smell”,or go against Python’s design philosophy. However, Python currentlycontains several examples of boolean keyword parameters on built-infunctions which are typically called with compile-time constants:

compile(...,dont_inherit=True)
open(...,closefd=False)
print(...,flush=True)
sorted(...,reverse=True)

Many more exist in the standard library.

The idea and name for this new parameter wereoriginally proposedby Ram Rachum. The thread received over 100 replies, with thealternative “equal” receiving a similar amount of support.

The author does not have a strong preference between the two choices,though “equal equals”is a bit awkward in prose. It may also(wrongly) imply some notion of “equality” between the zipped items:

>>>z=zip([2.0,4.0,6.0],[2,4,8],equal=True)

Specification

When the built-inzip is called with the keyword-only argumentstrict=True, the resulting iterator will raise aValueError ifthe arguments are exhausted at differing lengths. This error willoccur at the point when iteration would normally stop today.

Backward Compatibility

This change is fully backward-compatible.zip currently takes nokeyword arguments, and the “non-strict” default behavior whenstrict is omitted remains unchanged.

Reference Implementation

The author has drafted aC implementation.

An approximate Python translation is:

defzip(*iterables,strict=False):ifnotiterables:returniterators=tuple(iter(iterable)foriterableiniterables)try:whileTrue:items=[]foriteratoriniterators:items.append(next(iterator))yieldtuple(items)exceptStopIteration:ifnotstrict:returnifitems:i=len(items)plural=" "ifi==1else"s 1-"msg=f"zip() argument{i+1} is shorter than argument{plural}{i}"raiseValueError(msg)sentinel=object()fori,iteratorinenumerate(iterators[1:],1):ifnext(iterator,sentinel)isnotsentinel:plural=" "ifi==1else"s 1-"msg=f"zip() argument{i+1} is longer than argument{plural}{i}"raiseValueError(msg)

Rejected Ideas

Add`itertools.zip_strict`

This is the alternative with the most support on the Python-Ideasmailing list, so it deserves to be discussed in some detail here. Itdoes not have any disqualifying flaws, and could work well enough as asubstitute if this PEP is rejected.

With that in mind, this section aims to outline why adding an optionalparameter tozip is a smaller change that ultimately does a betterjob of solving the problems motivating this PEP.

Precedent

It seems that a great deal of the motivation driving this alternativeis thatzip_longest already exists initertools. However,zip_longest is in many ways a much more complicated, specializedutility: it takes on the responsibility of filling in missing values,a job neither of the other variants needs to concern themselves with.

If bothzip andzip_longest lived alongside each other initertools or as builtins, then addingzip_strict in the samelocation would indeed be a much stronger argument. However, the new“strict” variant is conceptuallymuch closer tozip in interfaceand behavior thanzip_longest, while still not meeting the highbar of being its own builtin. Given this situation, it seems mostnatural forzip to grow this new option in-place.

Usability

Ifzip is capable of preventing this class of bug, it becomes muchsimpler for users to enable the check at call sites with thisproperty. Compare this with importing a drop-in replacement for abuilt-in utility, which feels somewhat heavy just to check a trickycondition that should “always” be true.

Some have also argued that a new function buried in the standardlibrary is somehow more “discoverable” than a keyword parameter on thebuilt-in itself. The author does not agree with this assessment.

Maintenance Cost

While implementation should only be a secondary concern when makingusability improvements, it is important to recognize that adding a newutility is significantly more complicated than modifying an existingone. The CPython implementation accompanying this PEP is simple andhas no measurable performance impact on defaultzip behavior,while adding an entirely new utility toitertools would requireeither:

Duplicating much of the existingzip logic, aszip_longestalready does.
Significantly refactoring eitherzip,zip_longest, or bothto share a common or inherited implementation (which may impactperformance).

Add Several “Modes” To Switch Between

This option only makes more sense than a binary flag if we anticipatehaving three or more modes. The “obvious” three choices for theseenumerated or constant modes would be “shortest” (the currentzipbehavior), “strict” (the proposed behavior), and “longest”(theitertools.zip_longest behavior).

However, it doesn’t seem like adding behaviors other than the currentdefault and the proposed “strict” mode is worth the additionalcomplexity. The clearest candidate, “longest”, would require a newfillvalue parameter (which is meaningless for both other modes).This mode is also already handled perfectly byitertools.zip_longest, and adding it would create two ways ofdoing the same thing. It’s not clear which would be the “obvious”choice: themode parameter on the built-inzip, or thelong-lived namesake utility initertools.

Add A Method Or Alternate Constructor To The`zip` Type

Consider the following two options, which have both been proposed:

>>>zm=zip(*iters).strict()>>>zd=zip.strict(*iters)

It’s not obvious which one will succeed, or how the other will fail.Ifzip.strict is implemented as a method,zm will succeed, butzd will fail in one of several confusing ways:

Yield results that aren’t wrapped in a tuple (ifiters containsjust one item, azip iterator).
Raise aTypeError for an incorrect argument type (ifiterscontains just one item, not azip iterator).
Raise aTypeError for an incorrect number of arguments(otherwise).

Ifzip.strict is implemented as aclassmethod orstaticmethod,zd will succeed, andzm will silently yieldnothing (which is the problem we are trying to avoid in the firstplace).

This proposal is further complicated by the fact that CPython’s actualzip type is currently an undocumented implementation detail. Thismeans that choosing one of the above behaviors will effectively “lockin” the current implementation (or at least require it to be emulated)going forward.

Change The Default Behavior Of`zip`

There is nothing “wrong” with the default behavior ofzip, sincethere are many cases where it is indeed the correct way to handleunequally-sized inputs. It’s extremely useful, for example, whendealing with infinite iterators.

itertools.zip_longest already exists to service those cases wherethe “extra” tail-end data is still needed.

Accept A Callback To Handle Remaining Items

While able to do basically anything a user could need, this solutionmakes handling the more common cases (like rejecting mismatchedlengths) unnecessarily complicated and non-obvious.

Raise An`AssertionError`

There are no built-in functions or types that raise anAssertionError as part of their API. Further, theofficialdocumentationsimply reads (in its entirety):

Raised when anassert statement fails.

Since this feature has nothing to do with Python’sassertstatement, raising anAssertionError here would be inappropriate.Users desiring a check that is disabled in optimized mode (like anassert statement) can usestrict=__debug__ instead.

Add A Similar Feature to`map`

This PEP does not propose any changes tomap, since the use ofmap with multiple iterable arguments is quite rare. However, thisPEP’s ruling shall serve as precedent such a future discussion (shouldit occur).

If rejected, the feature is realistically not worth pursuing. Ifaccepted, such a change tomap should not require its own PEP(though, like all enhancements, its usefulness should be carefullyconsidered). For consistency, it should follow same API and semanticsdebated here forzip.

Do Nothing

This option is perhaps the least attractive.

Silently truncated data is a particularly nasty class of bug, andhand-writing a robust solution that gets this rightisn’t trivial.The real-world motivating examples from Python’s own standard libraryare evidence that it’svery easy to fall into the sort of trap thatthis feature aims to avoid.

Last modified:2025-02-01 08:59:27 GMT

Movatterモバイル変換