
This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.
Created on2019-02-16 20:15 byrhettinger, last changed2022-04-11 14:59 byadmin. This issue is nowclosed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| perf-screenshot.png | nascheme,2019-02-19 18:35 | |||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 11907 | merged | scoder,2019-02-17 14:31 | |
| Messages (11) | |||
|---|---|---|---|
| msg335714 -(view) | Author: Raymond Hettinger (rhettinger)*![]() | Date: 2019-02-16 20:15 | |
Benchmark show what writes to class variables are anomalously slow. class A(object): pass A.x = 1 # This write is 3 to 5 times slower than other writes.FWIW, the same operation for old-style classes in Python 2.7 was several times faster.We should investigate to understand why the writes are so slow. There might be a good reason or there might be an opportunity for optimization.-------------------------------------------------$ python3.8Tools/scripts/var_access_benchmark.pyVariable and attribute read access: 4.3 nsread_local 4.6 nsread_nonlocal 14.5 nsread_global 19.0 nsread_builtin 18.4 nsread_classvar_from_class 16.2 nsread_classvar_from_instance 24.7 nsread_instancevar 19.7 nsread_instancevar_slots 19.5 nsread_namedtuple 26.4 nsread_boundmethodVariable and attribute write access: 4.4 nswrite_local 5.1 nswrite_nonlocal 18.2 nswrite_global 103.9 nswrite_classvar <== Outlier 35.4 nswrite_instancevar 25.6 nswrite_instancevar_slots | |||
| msg335733 -(view) | Author: Pablo Galindo Salgado (pablogsal)*![]() | Date: 2019-02-16 23:34 | |
It seems 50% of the overhead (50ns) is due to two reasons:- 30-40% is due to the call to update_slot(type, name) after the item is set in the class dictionary.- 70-60% is due to all the extra work from _PyObject_GenericSetAttrWithDict until it ends calling _PyObjectDict_SetItem with the slot. If in the code of typeobject.c:type_setattro you change:res = _PyObject_GenericSetAttrWithDict((PyObject *)type, name, value, NULL);by PyObject* dictptr = _PyObject_GetDictPtr(type);res = _PyObjectDict_SetItem(type, dictptr, name, value);and delete the update_slot(type, name) call afterwards, the times are reduced to 50ns. | |||
| msg335747 -(view) | Author: Yasser Alshalaan (Yasser Alshalaan)* | Date: 2019-02-17 02:53 | |
can you include your python 2.7 runs? for me it looks similar | |||
| msg335763 -(view) | Author: Raymond Hettinger (rhettinger)*![]() | Date: 2019-02-17 08:50 | |
> can you include your python 2.7 runs? for me it looks similarIt will give similar results unless you switch to old-style classes (edit out the inheritance from object). class A: pass A.x = 1---------------------------------------$ python2.7 var_access_benchmark.pyVariable and attribute read access: 6.7 nsread_local 10.9 nsread_global 18.9 nsread_builtin 24.4 nsread_classvar_from_class 30.2 nsread_classvar_from_instance 22.7 nsread_instancevar 25.5 nsread_instancevar_slots 99.3 nsread_namedtuple 40.9 nsread_boundmethodVariable and attribute write access: 8.2 nswrite_local 18.7 nswrite_global 32.9 nswrite_classvar <== Not formerly an outlier 32.5 nswrite_instancevar 31.2 nswrite_instancevar_slots | |||
| msg335776 -(view) | Author: Stefan Behnel (scoder)*![]() | Date: 2019-02-17 14:36 | |
It turns out that "update_slot()" is always called, even when we are not updating a slot name (which is always a special dunder-name). The linear search for names in "update_slots()" is a huge waste of time here, and short-circuiting out of it when the name does not start with "_" cuts the overall update time by 50%. I pushed a PR.Another improvement would be a sub-linear algorithm for searching the slot name, but that's a bigger change. | |||
| msg335794 -(view) | Author: Pablo Galindo Salgado (pablogsal)*![]() | Date: 2019-02-18 00:59 | |
This are the timings that I am measuring withPR 11907:Variable and attribute read access: 5.7 ns read_local 5.9 ns read_nonlocal 16.2 ns read_global 24.5 ns read_builtin 20.9 ns read_classvar_from_class 20.0 ns read_classvar_from_instance 29.7 ns read_instancevar 24.7 ns read_instancevar_slots 22.9 ns read_namedtuple 36.8 ns read_boundmethodVariable and attribute write access: 6.9 ns write_local 6.9 ns write_nonlocal 26.7 ns write_global 65.4 ns write_classvar <----- Down from 120.6 ns 49.5 ns write_instancevar 34.5 ns write_instancevar_slots | |||
| msg335795 -(view) | Author: Raymond Hettinger (rhettinger)*![]() | Date: 2019-02-18 01:32 | |
Wow, I didn't expect to get an immediate win this of this magnitude :-) | |||
| msg335992 -(view) | Author: Neil Schemenauer (nascheme)*![]() | Date: 2019-02-19 18:25 | |
Some profiling using 'perf'. This is for cpython63fa1cfece4912110ce3a0ff11fb3ade3ff5e756. children self [...]+ 97.27% 0.00% run_mod (inlined)+ 88.53% 6.33% PyObject_SetAttr+ 79.34% 6.80% type_setattro+ 43.92% 43.92% update_slot+ 26.87% 5.84% _PyObject_GenericSetAttrWithDict+ 14.62% 6.52% insertdict+ 8.80% 8.80% lookdict_unicode_nodummy+ 6.57% 0.00% _Py_DECREF (inlined)+ 5.19% 5.19% PyUnicode_InternInPlace+ 3.57% 3.57% _PyObjectDict_SetItem+ 3.38% 3.38% _PyType_Lookup+ 1.71% 0.00% _Py_INCREF (inlined)+ 1.42% 0.00% _Py_XDECREF (inlined) [...]After applyingPR 11907: children self [...]+ 94.76% 0.00% run_mod (inlined)+ 75.22% 6.77% PyObject_SetAttr+ 64.65% 13.08% type_setattro+ 47.51% 11.96% _PyObject_GenericSetAttrWithDict+ 22.99% 13.95% insertdict+ 10.10% 10.10% lookdict_unicode_nodummy+ 9.47% 9.47% PyUnicode_InternInPlace+ 7.10% 7.10% _PyObjectDict_SetItem+ 7.02% 7.02% _PyType_Lookup+ 6.52% 0.00% _Py_DECREF (inlined)+ 3.17% 0.00% _Py_INCREF (inlined)+ 3.16% 0.00% _Py_XDECREF (inlined)+ 2.59% 0.00% PyDict_SetItem (inlined)+ 1.50% 0.00% is_dunder_name (inlined) [...]The PyUnicode_InternInPlace() can mostly be eliminated by testing PyUnicode_CHECK_INTERNED() first (we already have called PyUnicode_Check() on it). That only gives a 7% speedup on my machine though. The is_dunder_name() is a much bigger optimization. | |||
| msg335995 -(view) | Author: Neil Schemenauer (nascheme)*![]() | Date: 2019-02-19 18:35 | |
BTW, 'perf report [...]' has a really neat annotated assembly view. Scroll to the function you are interested in and press 'a'. Press 't' to toggle the time units (left side numbers). I'm attaching a screenshot of the disassembly of the lookdict function. The time units are sample accounts. Each count is a point where the profiler woke up on that specific instruction. | |||
| msg336031 -(view) | Author: Raymond Hettinger (rhettinger)*![]() | Date: 2019-02-20 00:45 | |
Thanks Neil. The tooling does indeed look nice.I added your PyUnicode_InternInPlace() suggestion to the PR.At this point, the PR looks ready-to-go unless any of you think we've missed some low-hanging fruit. | |||
| msg336129 -(view) | Author: Raymond Hettinger (rhettinger)*![]() | Date: 2019-02-20 17:29 | |
New changesetd8b9e1fc2e45d2bc3f4a9737c375f2adb8a8c7de by Raymond Hettinger (Stefan Behnel) in branch 'master':bpo-36012: Avoid linear slot search for non-dunder methods (GH-11907)https://github.com/python/cpython/commit/d8b9e1fc2e45d2bc3f4a9737c375f2adb8a8c7de | |||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:59:11 | admin | set | github: 80193 |
| 2019-02-20 17:29:55 | rhettinger | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2019-02-20 17:29:26 | rhettinger | set | messages: +msg336129 |
| 2019-02-20 00:45:18 | rhettinger | set | messages: +msg336031 |
| 2019-02-19 18:35:45 | nascheme | set | files: +perf-screenshot.png messages: +msg335995 |
| 2019-02-19 18:25:57 | nascheme | set | messages: +msg335992 |
| 2019-02-19 11:27:22 | vstinner | set | nosy: -vstinner |
| 2019-02-18 01:32:29 | rhettinger | set | messages: +msg335795 |
| 2019-02-18 00:59:21 | pablogsal | set | messages: +msg335794 |
| 2019-02-17 14:36:15 | scoder | set | nosy: +scoder messages: +msg335776 |
| 2019-02-17 14:31:39 | scoder | set | keywords: +patch stage: patch review pull_requests: +pull_request11932 |
| 2019-02-17 08:50:24 | rhettinger | set | messages: +msg335763 |
| 2019-02-17 02:53:16 | Yasser Alshalaan | set | nosy: +Yasser Alshalaan messages: +msg335747 |
| 2019-02-16 23:34:44 | pablogsal | set | messages: +msg335733 |
| 2019-02-16 20:15:18 | rhettinger | create | |