Python Enhancement Proposals

Python »
PEP Index »
PEP 207

PEP 207 – Rich Comparisons

Author:: Guido van Rossum <guido at python.org>, David Ascher <DavidA at ActiveState.com>
Status:

Abstract

This PEP proposes several new features for comparisons:

Allow separately overloading of <, >, <=, >=, ==, !=, both inclasses and in C extensions.
Allow any of those overloaded operators to return something elsebesides a Boolean result.

The main motivation comes from NumPy, whose users agree that A<Bshould return an array of elementwise comparison outcomes; theycurrently have to spell this as less(A,B) because A<B can onlyreturn a Boolean result or raise an exception.

An additional motivation is that frequently, types don’t have anatural ordering, but still need to be compared for equality.Currently such a typemust implement comparison and thus definean arbitrary ordering, just so that equality can be tested.

Also, for some object types an equality test can be implementedmuch more efficiently than an ordering test; for example, listsand dictionaries that differ in length are unequal, but theordering requires inspecting some (potentially all) items.

Previous Work

Rich Comparisons have been proposed before; in particular by DavidAscher, after experience with Numerical Python:

http://starship.python.net/crew/da/proposals/richcmp.html

It is also included below as an Appendix. Most of the material inthis PEP is derived from David’s proposal.

Concerns

Backwards compatibility, both at the Python level (classes using__cmp__ need not be changed) and at the C level (extensionsdefiningtp_comparea need not be changed, code usingPyObject_Compare() must work even if the compared objects usethe new rich comparison scheme).
When A<B returns a matrix of elementwise comparisons, an easymistake to make is to use this expression in a Boolean context.Without special precautions, it would always be true. This useshould raise an exception instead.
If a class overrides x==y but nothing else, should x!=y becomputed as not(x==y), or fail? What about the similarrelationship between < and >=, or between > and <=?
Similarly, should we allow x<y to be calculated from y>x? Andx<=y from not(x>y)? And x==y from y==x, or x!=y from y!=x?
When comparison operators return elementwise comparisons, whatto do about shortcut operators like A<B<C,A<BandC<D,A<BorC<D?
What to do aboutmin() andmax(), the ‘in’ and ‘not in’operators,list.sort(), dictionary key comparison, and otheruses of comparisons by built-in operations?

Proposed Resolutions

Full backwards compatibility can be achieved as follows. Whenan object definestp_compare() but nottp_richcompare(), and arich comparison is requested, the outcome oftp_compare() isused in the obvious way. E.g. if “<” is requested, an exception iftp_compare() raises an exception, the outcome is 1 iftp_compare() is negative, and 0 if it is zero or positive. Etc.
Full forward compatibility can be achieved as follows. When aclassic comparison is requested on an object that implementstp_richcompare(), up to three comparisons are used: first == istried, and if it returns true, 0 is returned; next, < is triedand if it returns true, -1 is returned; next, > is tried and ifit returns true, +1 is returned. If any operator tried returnsa non-Boolean value (see below), the exception raised byconversion to Boolean is passed through. If none of theoperators tried returns true, the classic comparison fallbacksare tried next.
(I thought long and hard about the order in which the threecomparisons should be tried. At one point I had a convincingargument for doing it in this order, based on the behavior ofcomparisons for cyclical data structures. But since that codehas changed again, I’m not so sure that it makes a differenceany more.)
Any type that returns a collection of Booleans instead of asingle boolean should definenb_nonzero() to raise an exception.Such a type is considered a non-Boolean.
The == and != operators are not assumed to be each other’scomplement (e.g. IEEE 754 floating point numbers do not satisfythis). It is up to the type to implement this if desired.Similar for < and >=, or > and <=; there are lots of exampleswhere these assumptions aren’t true (e.g. tabnanny).
The reflexivity rulesare assumed by Python. Thus, theinterpreter may swap y>x with x<y, y>=x with x<=y, and may swapthe arguments of x==y and x!=y. (Note: Python currently assumesthat x==x is always true and x!=x is never true; this should notbe assumed.)
In the current proposal, when A<B returns an array ofelementwise comparisons, this outcome is considered non-Boolean,and its interpretation as Boolean by the shortcut operatorsraises an exception. David Ascher’s proposal tries to dealwith this; I don’t think this is worth the additional complexityin the code generator. Instead of A<B<C, you can write(A<B)&(B<C).
Themin() andlist.sort() operations will only use the< operator; max() will only use the > operator. The ‘in’ and‘not in’ operators and dictionary lookup will only use the ==operator.

Implementation Proposal

This closely follows David Ascher’s proposal.

C API

New functions:
```
PyObject*PyObject_RichCompare(PyObject*,PyObject*,int)
```
This performs the requested rich comparison, returning a Pythonobject or raising an exception. The 3rd argument must be one ofPy_LT, Py_LE, Py_EQ, Py_NE, Py_GT or Py_GE.
```
intPyObject_RichCompareBool(PyObject*,PyObject*,int)
```
This performs the requested rich comparison, returning aBoolean: -1 for exception, 0 for false, 1 for true. The 3rdargument must be one of Py_LT, Py_LE, Py_EQ, Py_NE, Py_GT orPy_GE. Note that whenPyObject_RichCompare() returns anon-Boolean object,PyObject_RichCompareBool() will raise anexception.

New typedef:

typedefPyObject*(*richcmpfunc)(PyObject*,PyObject*,int);

New slot in type object, replacing spare tp_xxx7:
```
richcmpfunctp_richcompare;
```
This should be a function with the same signature asPyObject_RichCompare(), and performing the same comparison.At least one of the arguments is of the type whosetp_richcompare slot is being used, but the other may have adifferent type. If the function cannot compare the particularcombination of objects, it should return a new reference toPy_NotImplemented.
PyObject_Compare() is changed to try rich comparisons if theyare defined (but only if classic comparisons aren’t defined).

Changes to the interpreter

WheneverPyObject_Compare() is called with the intent of gettingthe outcome of a particular comparison (e.g. inlist.sort(), andof course for the comparison operators in ceval.c), the code ischanged to callPyObject_RichCompare() orPyObject_RichCompareBool() instead; if the C code needs to knowthe outcome of the comparison,PyObject_IsTrue() is called onthe result (which may raise an exception).
Most built-in types that currently define a comparison will bemodified to define a rich comparison instead. (This isoptional; I’ve converted lists, tuples, complex numbers, andarrays so far, and am not sure whether I will convert others.)

Classes

Classes can define new special methods__lt__,__le__,__eq__,__ne__,__gt__,__ge__ to override the corresponding operators.(I.e., <, <=, ==, !=, >, >=. You gotta love the Fortranheritage.) If a class defines__cmp__ as well, it is only usedwhen__lt__ etc. have been tried and returnNotImplemented.

Copyright

This document has been placed in the public domain.

Appendix

Here is most of David Ascher’s original proposal (version 0.2.1,dated Wed Jul 22 16:49:28 1998; I’ve left the Contents, Historyand Patches sections out). It addresses almost all concernsabove.

Abstract

A new mechanism allowing comparisons of Python objects to returnvalues other than -1, 0, or 1 (or raise exceptions) isproposed. This mechanism is entirely backwards compatible, and canbe controlled at the level of the CPyObject type or of the Pythonclass definition. There are three cooperating parts to theproposed mechanism:

the use of the last slot in the type object structure to store apointer to a rich comparison function
the addition of special methods for classes
the addition of an optional argument to the builtincmp()function.

Motivation

The current comparison protocol for Python objects assumes thatany two Python objects can be compared (as of Python 1.5, objectcomparisons can raise exceptions), and that the return value forany comparison should be -1, 0 or 1. -1 indicates that the firstargument to the comparison function is less than the right one, +1indicating the contrapositive, and 0 indicating that the twoobjects are equal. While this mechanism allows the establishmentof an order relationship (e.g. for use by thesort() method of listobjects), it has proven to be limited in the context of NumericPython (NumPy).

Specifically, NumPy allows the creation of multidimensionalarrays, which support most of the numeric operators. Thus:

x=array((1,2,3,4))y=array((2,2,4,4))

are two NumPy arrays. While they can be added elementwise,:

z=x+y# z == array((3,4,7,8))

they cannot be compared in the current framework - the releasedversion of NumPy compares the pointers, (thus yielding junkinformation) which was the only solution before the recentaddition of the ability (in 1.5) to raise exceptions in comparisonfunctions.

Even with the ability to raise exceptions, the current protocolmakes array comparisons useless. To deal with this fact, NumPyincludes several functions which perform the comparisons:less(),less_equal(),greater(),greater_equal(),equal(),not_equal(). These functions return arrays with the same shape astheir arguments (modulo broadcasting), filled with 0’s and 1’sdepending on whether the comparison is true or not for eachelement pair. Thus, for example, using the arrays x and y definedabove:

less(x,y)

would be an array containing the numbers (1,0,0,0).

The current proposal is to modify the Python object interface toallow the NumPy package to make it so that x < y returns the samething as less(x,y). The exact return value is up to the NumPypackage – what this proposal really asks for is changing thePython core so that extension objects have the ability to returnsomething other than -1, 0, 1, should their authors choose to doso.

Current State of Affairs

The current protocol is, at the C level, that each object typedefines atp_compare slot, which is a pointer to a function whichtakes twoPyObject* references and returns -1, 0, or 1. Thisfunction is called by thePyObject_Compare() function defined inthe C API.PyObject_Compare() is also called by the builtinfunctioncmp() which takes two arguments.

Proposed Mechanism

Changes to the C structure for type objects
The last available slot in thePyTypeObject, reserved up to nowfor future expansion, is used to optionally store a pointer to anew comparison function, of type richcmpfunc defined by:
```
typedefPyObject*(*richcmpfunc)Py_PROTO((PyObject*,PyObject*,int));
```
This function takes three arguments. The first two are the objectsto be compared, and the third is an integer corresponding to anopcode (one of LT, LE, EQ, NE, GT, GE). If this slot is left NULL,then rich comparison for that object type is not supported (exceptfor class instances whose class provide the special methodsdescribed below).
The above opcodes need to be added to the published Python/C API(probably under the names Py_LT, Py_LE, etc.)
Additions of special methods for classes
Classes wishing to support the rich comparison mechanisms must addone or more of the following new special methods:
```
def__lt__(self,other):...def__le__(self,other):...def__gt__(self,other):...def__ge__(self,other):...def__eq__(self,other):...def__ne__(self,other):...
```
Each of these is called when the class instance is the on theleft-hand-side of the corresponding operators (<, <=, >, >=, ==,and != or <>). The argument other is set to the object on theright side of the operator. The return value of these methods isup to the class implementor (after all, that’s the entire point ofthe proposal).
If the object on the left side of the operator does not define anappropriate rich comparison operator (either at the C level orwith one of the special methods, then the comparison is reversed,and the right hand operator is called with the opposite operator,and the two objects are swapped. This assumes that a < b and b > aare equivalent, as are a <= b and b >= a, and that == and != arecommutative (e.g. a == b if and only if b == a).
For example, if obj1 is an object which supports the richcomparison protocol and x and y are objects which do not supportthe rich comparison protocol, then obj1 < x will call the__lt__method of obj1 with x as the second argument. x < obj1 will callobj1’s__gt__ method with x as a second argument, and x < y willjust use the existing (non-rich) comparison mechanism.
The above mechanism is such that classes can get away with notimplementing either__lt__ and__le__ or__gt__ and__ge__. Further smarts could have been added to the comparisonmechanism, but this limited set of allowed “swaps” was chosenbecause it doesn’t require the infrastructure to do any processing(negation) of return values. The choice of six special methods wasmade over a single (e.g.__richcmp__) method to allow thedispatching on the opcode to be performed at the level of the Cimplementation rather than the user-defined method.
Addition of an optional argument to the builtincmp()
The builtincmp() is still used for simple comparisons. For richcomparisons, it is called with a third argument, one of “<”, “<=”,“>”, “>=”, “==”, “!=”, “<>” (the last two have the samemeaning). When called with one of these strings as the thirdargument,cmp() can return any Python object. Otherwise, it canonly return -1, 0 or 1 as before.

Chained Comparisons

Problem

It would be nice to allow objects for which the comparison returnssomething other than -1, 0, or 1 to be used in chainedcomparisons, such as:

x<y<z

Currently, this is interpreted by Python as:

temp1=x<yiftemp1:returny<zelse:returntemp1

Note that this requires testing the truth value of the result ofcomparisons, with potential “shortcutting” of the right-sidecomparison testings. In other words, the truth-value of the resultof the result of the comparison determines the result of a chainedoperation. This is problematic in the case of arrays, since if x,y and z are three arrays, then the user expects:

x<y<z

to be an array of 0’s and 1’s where 1’s are in the locationscorresponding to the elements of y which are between thecorresponding elements in x and z. In other words, the right-handside must be evaluated regardless of the result of x < y, which isincompatible with the mechanism currently in use by the parser.

Solution

Guido mentioned that one possible way out would be to change thecode generated by chained comparisons to allow arrays to bechained-compared intelligently. What follows is a mixture of hisidea and my suggestions. The code generated for x < y < z would beequivalent to:

temp1=x<yiftemp1:temp2=y<zreturnboolean_combine(temp1,temp2)else:returntemp1

where boolean_combine is a new function which does something likethe following:

defboolean_combine(a,b):ifhasattr(a,'__boolean_and__')or \hasattr(b,'__boolean_and__'):try:returna.__boolean_and__(b)except:returnb.__boolean_and__(a)else:# standard behaviorifa:returnbelse:return0

where the__boolean_and__ special method is implemented forC-level types by another value of the third argument to therichcmp function. This method would perform a boolean comparisonof the arrays (currently implemented in the umath module as thelogical_and ufunc).

Thus, objects returned by rich comparisons should always testtrue, but should define another special method which createsboolean combinations of them and their argument.

This solution has the advantage of allowing chained comparisons towork for arrays, but the disadvantage that it requires comparisonarrays to always return true (in an ideal world, I’d have themalways raise an exception on truth testing, since the meaning oftesting “if a>b:” is massively ambiguous.

The inlining already present which deals with integer comparisonswould still apply, resulting in no performance cost for the mostcommon cases.

Source:https://github.com/python/peps/blob/main/peps/pep-0207.rst

Last modified:2025-02-01 08:55:40 GMT

Movatterモバイル変換

PEP 207 – Rich Comparisons