Movatterモバイル変換
[0]ホーム
Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7)
Jeremy Hyltonjeremy@cnri.reston.va.us
Thu, 13 Apr 2000 15:08:12 -0400 (EDT)
>>>>> "TP" == Tim Peters <tim_one@email.msn.com> writes: TP> [Jeremy Hylton]> >> So the real problem is defining some reasonable semantics for >> comparison of recursive objects. TP> I think this is exactly a graph isomorphism problem, since TP> Python always compares "by value" (so isomorphism is the natural TP> generalization).I'm not familiar with any algorithms for the graph isomorphismproblem, but I took a stab at a simple comparison algorithm. The ideais to detect comparisons that would cross back-edges in the objectgraphs. Instead of starting a new comparison, assume they are thesame. If, in fact, the objects are not the same, they must differ insome other way; some other part of the comparison will fail. TP> This isn't hard (!= tedious, alas) to define or to implement TP> naively, but a straightforward implementation would be very TP> expensive at runtime compared to the status quo. That's why TP> "real languages" <wink> would rather suffer an infinite loop. TP> It's expensive because there's no cheap way to know whether you TP> have a loop in an object.My first attempt at implementing this is expensive. I maintain adictionary that contains all the object pairs that are currently being compared. Specifically, the dictionary is used to implement a set ofobject id pairs. Every call to PyObject_Compare will add a newpair to the dictionary when it is called and remove it when it returns (except for a few trivial cases).A naive patch is included below. It does seem to involve a bigperformance hit -- more than 10% slower on pystone. It also uses alot of extra space.Note that the patch has all its initialization code inline inPyObject_Compare; moving that elsewhere will help a little. It alsouse a bunch of function calls where macros would be more efficient. TP> An anal compromise would be to run comparisons full speed TP> without trying to detect loops, but if the recursion got "too TP> deep" break out and start over with an expensive alternative TP> that does check for loops. The later requires machinery similar TP> to copy.deepcopy's.It looks like the anal compromise might be necessary. I'llre-implement the patch more carefully and see what the realeffect on performance is.JeremyIndex: object.c===================================================================RCS file: /projects/cvsroot/python/dist/src/Objects/object.c,vretrieving revision 2.67diff -r2.67 object.c239c239< "__repr__ returned non-string (type %.200s)",---> "__repr__ returned non-string (type %s)",276c276< "__str__ returned non-string (type %.200s)",---> "__str__ returned non-string (type %s)",300a301,328> static PyObject *cmp_state_key = NULL;>> static PyObject*> cmp_state_make_pair(v, w)> PyObject *v, *w;> {> PyObject *pair = PyTuple_New(2);> if (pair == NULL)> return NULL;> if ((long)v <= (long)w) {> PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)v));> PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)w));> } else {> PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)w));> PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)v));> }> return pair;> }>> void> cmp_state_clear_pair(dict, key)> PyObject *dict, *key;> {> PyDict_DelItem(dict, key);> Py_DECREF(key);> }>>305a334,336> PyObject *tstate_dict, *cmp_dict, *pair;> int result;>311a343,376> tstate_dict = PyThreadState_GetDict();> if (tstate_dict == NULL) {> PyErr_BadInternalCall();> return -1;> }> /* fprintf(stderr, "PyObject_Compare(%X: %s, %X: %s)\n", (long)v,> v->ob_type->tp_name, (long)w, w->ob_type->tp_name);> */> /* XXX should initialize elsewhere */> if (cmp_state_key == NULL) {> cmp_state_key = PyString_InternFromString("compare_state");> cmp_dict = PyDict_New();> if (cmp_dict == NULL)> return -1;> PyDict_SetItem(tstate_dict, cmp_state_key, cmp_dict);> } else {> cmp_dict = PyDict_GetItem(tstate_dict, cmp_state_key);> if (cmp_dict == NULL)> return NULL;> PyDict_SetItem(tstate_dict, cmp_state_key, cmp_dict);> }>> pair = cmp_state_make_pair(v, w);> if (pair == NULL) {> PyErr_BadInternalCall();> return -1;> }> if (PyDict_GetItem(cmp_dict, pair)) {> /* already comparing these objects. assume they're> equal until shown otherwise> */> Py_DECREF(pair);> return 0;> }316a382,384> if (PyDict_SetItem(cmp_dict, pair, pair) == -1) {> return -1;> }317a386> cmp_state_clear_pair(cmp_dict, pair);329a399,401> if (PyDict_SetItem(cmp_dict, pair, pair) == -1) {> return -1;> }344a417> cmp_state_clear_pair(cmp_dict, pair);350,364c423,425< else if (PyUnicode_Check(v) || PyUnicode_Check(w)) {< int result = PyUnicode_Compare(v, w);< if (result == -1 && PyErr_Occurred() && < PyErr_ExceptionMatches(PyExc_TypeError))< /* TypeErrors are ignored: if Unicode coercion< fails due to one of the arguments not< having the right type, we continue as< defined by the coercion protocol (see< above). Luckily, decoding errors are< reported as ValueErrors and are not masked< by this technique. */< PyErr_Clear();< else< return result;< }---> cmp_state_clear_pair(cmp_dict, pair);> if (PyUnicode_Check(v) || PyUnicode_Check(w))> return PyUnicode_Compare(v, w);372c433,434< if (vtp->tp_compare == NULL)---> if (vtp->tp_compare == NULL) {> cmp_state_clear_pair(cmp_dict, pair);374c436,439< return (*vtp->tp_compare)(v, w);---> }> result = (*vtp->tp_compare)(v, w);> cmp_state_clear_pair(cmp_dict, pair);> return result;
[8]ページ先頭