This PEP proposes several new features for comparisons:
The main motivation comes from NumPy, whose users agree that A<Bshould return an array of elementwise comparison outcomes; theycurrently have to spell this as less(A,B) because A<B can onlyreturn a Boolean result or raise an exception.
An additional motivation is that frequently, types don’t have anatural ordering, but still need to be compared for equality.Currently such a typemust implement comparison and thus definean arbitrary ordering, just so that equality can be tested.
Also, for some object types an equality test can be implementedmuch more efficiently than an ordering test; for example, listsand dictionaries that differ in length are unequal, but theordering requires inspecting some (potentially all) items.
Rich Comparisons have been proposed before; in particular by DavidAscher, after experience with Numerical Python:
It is also included below as an Appendix. Most of the material inthis PEP is derived from David’s proposal.
__cmp__ need not be changed) and at the C level (extensionsdefiningtp_comparea need not be changed, code usingPyObject_Compare() must work even if the compared objects usethe new rich comparison scheme).A<BandC<D,A<BorC<D?min() andmax(), the ‘in’ and ‘not in’operators,list.sort(), dictionary key comparison, and otheruses of comparisons by built-in operations?tp_compare() but nottp_richcompare(), and arich comparison is requested, the outcome oftp_compare() isused in the obvious way. E.g. if “<” is requested, an exception iftp_compare() raises an exception, the outcome is 1 iftp_compare() is negative, and 0 if it is zero or positive. Etc.Full forward compatibility can be achieved as follows. When aclassic comparison is requested on an object that implementstp_richcompare(), up to three comparisons are used: first == istried, and if it returns true, 0 is returned; next, < is triedand if it returns true, -1 is returned; next, > is tried and ifit returns true, +1 is returned. If any operator tried returnsa non-Boolean value (see below), the exception raised byconversion to Boolean is passed through. If none of theoperators tried returns true, the classic comparison fallbacksare tried next.
(I thought long and hard about the order in which the threecomparisons should be tried. At one point I had a convincingargument for doing it in this order, based on the behavior ofcomparisons for cyclical data structures. But since that codehas changed again, I’m not so sure that it makes a differenceany more.)
nb_nonzero() to raise an exception.Such a type is considered a non-Boolean.min() andlist.sort() operations will only use the< operator; max() will only use the > operator. The ‘in’ and‘not in’ operators and dictionary lookup will only use the ==operator.This closely follows David Ascher’s proposal.
PyObject*PyObject_RichCompare(PyObject*,PyObject*,int)
This performs the requested rich comparison, returning a Pythonobject or raising an exception. The 3rd argument must be one ofPy_LT, Py_LE, Py_EQ, Py_NE, Py_GT or Py_GE.
intPyObject_RichCompareBool(PyObject*,PyObject*,int)
This performs the requested rich comparison, returning aBoolean: -1 for exception, 0 for false, 1 for true. The 3rdargument must be one of Py_LT, Py_LE, Py_EQ, Py_NE, Py_GT orPy_GE. Note that whenPyObject_RichCompare() returns anon-Boolean object,PyObject_RichCompareBool() will raise anexception.
typedefPyObject*(*richcmpfunc)(PyObject*,PyObject*,int);
richcmpfunctp_richcompare;
This should be a function with the same signature asPyObject_RichCompare(), and performing the same comparison.At least one of the arguments is of the type whosetp_richcompare slot is being used, but the other may have adifferent type. If the function cannot compare the particularcombination of objects, it should return a new reference toPy_NotImplemented.
PyObject_Compare() is changed to try rich comparisons if theyare defined (but only if classic comparisons aren’t defined).PyObject_Compare() is called with the intent of gettingthe outcome of a particular comparison (e.g. inlist.sort(), andof course for the comparison operators in ceval.c), the code ischanged to callPyObject_RichCompare() orPyObject_RichCompareBool() instead; if the C code needs to knowthe outcome of the comparison,PyObject_IsTrue() is called onthe result (which may raise an exception).__lt__,__le__,__eq__,__ne__,__gt__,__ge__ to override the corresponding operators.(I.e., <, <=, ==, !=, >, >=. You gotta love the Fortranheritage.) If a class defines__cmp__ as well, it is only usedwhen__lt__ etc. have been tried and returnNotImplemented.This document has been placed in the public domain.
Here is most of David Ascher’s original proposal (version 0.2.1,dated Wed Jul 22 16:49:28 1998; I’ve left the Contents, Historyand Patches sections out). It addresses almost all concernsabove.
A new mechanism allowing comparisons of Python objects to returnvalues other than -1, 0, or 1 (or raise exceptions) isproposed. This mechanism is entirely backwards compatible, and canbe controlled at the level of the CPyObject type or of the Pythonclass definition. There are three cooperating parts to theproposed mechanism:
cmp()function.The current comparison protocol for Python objects assumes thatany two Python objects can be compared (as of Python 1.5, objectcomparisons can raise exceptions), and that the return value forany comparison should be -1, 0 or 1. -1 indicates that the firstargument to the comparison function is less than the right one, +1indicating the contrapositive, and 0 indicating that the twoobjects are equal. While this mechanism allows the establishmentof an order relationship (e.g. for use by thesort() method of listobjects), it has proven to be limited in the context of NumericPython (NumPy).
Specifically, NumPy allows the creation of multidimensionalarrays, which support most of the numeric operators. Thus:
x=array((1,2,3,4))y=array((2,2,4,4))
are two NumPy arrays. While they can be added elementwise,:
z=x+y# z == array((3,4,7,8))
they cannot be compared in the current framework - the releasedversion of NumPy compares the pointers, (thus yielding junkinformation) which was the only solution before the recentaddition of the ability (in 1.5) to raise exceptions in comparisonfunctions.
Even with the ability to raise exceptions, the current protocolmakes array comparisons useless. To deal with this fact, NumPyincludes several functions which perform the comparisons:less(),less_equal(),greater(),greater_equal(),equal(),not_equal(). These functions return arrays with the same shape astheir arguments (modulo broadcasting), filled with 0’s and 1’sdepending on whether the comparison is true or not for eachelement pair. Thus, for example, using the arrays x and y definedabove:
less(x,y)
would be an array containing the numbers (1,0,0,0).
The current proposal is to modify the Python object interface toallow the NumPy package to make it so that x < y returns the samething as less(x,y). The exact return value is up to the NumPypackage – what this proposal really asks for is changing thePython core so that extension objects have the ability to returnsomething other than -1, 0, 1, should their authors choose to doso.
The current protocol is, at the C level, that each object typedefines atp_compare slot, which is a pointer to a function whichtakes twoPyObject* references and returns -1, 0, or 1. Thisfunction is called by thePyObject_Compare() function defined inthe C API.PyObject_Compare() is also called by the builtinfunctioncmp() which takes two arguments.
The last available slot in thePyTypeObject, reserved up to nowfor future expansion, is used to optionally store a pointer to anew comparison function, of type richcmpfunc defined by:
typedefPyObject*(*richcmpfunc)Py_PROTO((PyObject*,PyObject*,int));
This function takes three arguments. The first two are the objectsto be compared, and the third is an integer corresponding to anopcode (one of LT, LE, EQ, NE, GT, GE). If this slot is left NULL,then rich comparison for that object type is not supported (exceptfor class instances whose class provide the special methodsdescribed below).
The above opcodes need to be added to the published Python/C API(probably under the names Py_LT, Py_LE, etc.)
Classes wishing to support the rich comparison mechanisms must addone or more of the following new special methods:
def__lt__(self,other):...def__le__(self,other):...def__gt__(self,other):...def__ge__(self,other):...def__eq__(self,other):...def__ne__(self,other):...
Each of these is called when the class instance is the on theleft-hand-side of the corresponding operators (<, <=, >, >=, ==,and != or <>). The argument other is set to the object on theright side of the operator. The return value of these methods isup to the class implementor (after all, that’s the entire point ofthe proposal).
If the object on the left side of the operator does not define anappropriate rich comparison operator (either at the C level orwith one of the special methods, then the comparison is reversed,and the right hand operator is called with the opposite operator,and the two objects are swapped. This assumes that a < b and b > aare equivalent, as are a <= b and b >= a, and that == and != arecommutative (e.g. a == b if and only if b == a).
For example, if obj1 is an object which supports the richcomparison protocol and x and y are objects which do not supportthe rich comparison protocol, then obj1 < x will call the__lt__method of obj1 with x as the second argument. x < obj1 will callobj1’s__gt__ method with x as a second argument, and x < y willjust use the existing (non-rich) comparison mechanism.
The above mechanism is such that classes can get away with notimplementing either__lt__ and__le__ or__gt__ and__ge__. Further smarts could have been added to the comparisonmechanism, but this limited set of allowed “swaps” was chosenbecause it doesn’t require the infrastructure to do any processing(negation) of return values. The choice of six special methods wasmade over a single (e.g.__richcmp__) method to allow thedispatching on the opcode to be performed at the level of the Cimplementation rather than the user-defined method.
cmp()The builtincmp() is still used for simple comparisons. For richcomparisons, it is called with a third argument, one of “<”, “<=”,“>”, “>=”, “==”, “!=”, “<>” (the last two have the samemeaning). When called with one of these strings as the thirdargument,cmp() can return any Python object. Otherwise, it canonly return -1, 0 or 1 as before.
It would be nice to allow objects for which the comparison returnssomething other than -1, 0, or 1 to be used in chainedcomparisons, such as:
x<y<z
Currently, this is interpreted by Python as:
temp1=x<yiftemp1:returny<zelse:returntemp1
Note that this requires testing the truth value of the result ofcomparisons, with potential “shortcutting” of the right-sidecomparison testings. In other words, the truth-value of the resultof the result of the comparison determines the result of a chainedoperation. This is problematic in the case of arrays, since if x,y and z are three arrays, then the user expects:
x<y<z
to be an array of 0’s and 1’s where 1’s are in the locationscorresponding to the elements of y which are between thecorresponding elements in x and z. In other words, the right-handside must be evaluated regardless of the result of x < y, which isincompatible with the mechanism currently in use by the parser.
Guido mentioned that one possible way out would be to change thecode generated by chained comparisons to allow arrays to bechained-compared intelligently. What follows is a mixture of hisidea and my suggestions. The code generated for x < y < z would beequivalent to:
temp1=x<yiftemp1:temp2=y<zreturnboolean_combine(temp1,temp2)else:returntemp1
where boolean_combine is a new function which does something likethe following:
defboolean_combine(a,b):ifhasattr(a,'__boolean_and__')or \hasattr(b,'__boolean_and__'):try:returna.__boolean_and__(b)except:returnb.__boolean_and__(a)else:# standard behaviorifa:returnbelse:return0
where the__boolean_and__ special method is implemented forC-level types by another value of the third argument to therichcmp function. This method would perform a boolean comparisonof the arrays (currently implemented in the umath module as thelogical_and ufunc).
Thus, objects returned by rich comparisons should always testtrue, but should define another special method which createsboolean combinations of them and their argument.
This solution has the advantage of allowing chained comparisons towork for arrays, but the disadvantage that it requires comparisonarrays to always return true (in an ideal world, I’d have themalways raise an exception on truth testing, since the meaning oftesting “if a>b:” is massively ambiguous.
The inlining already present which deals with integer comparisonswould still apply, resulting in no performance cost for the mostcommon cases.
Source:https://github.com/python/peps/blob/main/peps/pep-0207.rst
Last modified:2025-02-01 08:55:40 GMT