Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-115754: Get singletons via function calls#116572

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed
vstinner wants to merge2 commits intopython:mainfromvstinner:getnone

Conversation

vstinner
Copy link
Member

@vstinnervstinner commentedMar 10, 2024
edited by bedevere-appbot
Loading

@vstinner
Copy link
MemberAuthor

PR created to measure the impact on performance of getting the 5 singletons via function calls.

@vstinner
Copy link
MemberAuthor

vstinner commentedMar 11, 2024
edited
Loading

Patch adding a benchmark function:

diff --git a/Modules/_testcapimodule.c b/Modules/_testcapimodule.cindex b6536045e6..03dcbd0cb9 100644--- a/Modules/_testcapimodule.c+++ b/Modules/_testcapimodule.c@@ -3284,6 +3284,39 @@ test_weakref_capi(PyObject *Py_UNUSED(module), PyObject *Py_UNUSED(args))     _Py_COMP_DIAG_POP }+static PyObject *+bench_none(PyObject *Py_UNUSED(module), PyObject *args)+{+    Py_ssize_t loops;+    if (!PyArg_ParseTuple(args, "n", &loops)) {+        return NULL;+    }++    PyTime_t t1;+    (void)PyTime_PerfCounter(&t1);+    for (Py_ssize_t i=0; i < loops; i++) {+        PyObject *obj;+        int res = 0;++        obj = Py_None;+        res += Py_IsNone(obj) + Py_IsFalse(obj) + Py_IsTrue(obj);++        obj = Py_False;+        res += Py_IsNone(obj) + Py_IsFalse(obj) + Py_IsTrue(obj);++        obj = Py_True;+        res += Py_IsNone(obj) + Py_IsFalse(obj) + Py_IsTrue(obj);++        if (res != 3) {+            abort();+        }+    }+    PyTime_t t2;+    (void)PyTime_PerfCounter(&t2);++    return PyFloat_FromDouble(PyTime_AsSecondsDouble(t2 - t1));+}+  static PyMethodDef TestMethods[] = {     {"set_errno",               set_errno,                       METH_VARARGS},@@ -3423,6 +3456,7 @@ static PyMethodDef TestMethods[] = {     {"function_set_kw_defaults", function_set_kw_defaults, METH_VARARGS, NULL},     {"check_pyimport_addmodule", check_pyimport_addmodule, METH_VARARGS},     {"test_weakref_capi", test_weakref_capi, METH_NOARGS},+    {"bench_none", bench_none, METH_VARARGS},     {NULL, NULL} /* sentinel */ };

Benchmark script:

importpyperfimport_testcapirunner=pyperf.Runner()runner.bench_time_func('bench',_testcapi.bench_none)

Results of themicrobenchmark with CPU isolation on Linux. Python built with -O3 without PGO nor LTO. PGO/LTO doesn't matter here, since _testcapi is built as a shared library: Py_None becomes a regular function call, there is nothing to optimize.

$ python3 -m pyperf  compare_to ref.json func.json Mean +- std dev: [ref] 25.8 ns +- 0.0 ns -> [func] 56.3 ns +- 0.1 ns: 2.18x slower

ConvertingPy_None constant to a function call makes accessing this variable around 2x slower. Well, that's not surprising, and that's the maximum overhead on a microbenchmark.

@vstinner
Copy link
MemberAuthor

vstinner commentedMar 11, 2024
edited
Loading

I also ran pyperformance 1.11.0, this time with PGO+LTO and CPU isolation on Linux. My shell script run to run pyperformance:

set -e -xcd maingit clean -fdxrm -rf /opt/bench_python./configure --enable-optimizations --with-lto --prefix /opt/bench_pythonmakemake installcd ../opt/bench_python/bin/python3 -m pip install pyperformance/opt/bench_python/bin/python3 -m pyperformance run -o bench.json

Results.

Benchmarks with tag 'apps':===========================+----------------+----------+------------------------+| Benchmark      | ref      | func                   |+================+==========+========================+| 2to3           | 429 ms   | 434 ms: 1.01x slower   |+----------------+----------+------------------------+| chameleon      | 11.0 ms  | 11.0 ms: 1.01x slower  |+----------------+----------+------------------------+| docutils       | 4.24 sec | 4.23 sec: 1.00x faster |+----------------+----------+------------------------+| tornado_http   | 194 ms   | 195 ms: 1.01x slower   |+----------------+----------+------------------------+| Geometric mean | (ref)    | 1.01x slower           |+----------------+----------+------------------------+Benchmark hidden because not significant (1): html5libBenchmarks with tag 'asyncio':==============================+-------------------------------+----------+------------------------+| Benchmark                     | ref      | func                   |+===============================+==========+========================+| async_tree_io_tg              | 1.61 sec | 1.62 sec: 1.01x slower |+-------------------------------+----------+------------------------+| async_tree_eager_cpu_io_mixed | 669 ms   | 674 ms: 1.01x slower   |+-------------------------------+----------+------------------------+| async_tree_io                 | 1.60 sec | 1.62 sec: 1.01x slower |+-------------------------------+----------+------------------------+| async_tree_eager_io_tg        | 1.67 sec | 1.69 sec: 1.01x slower |+-------------------------------+----------+------------------------+| async_tree_eager_tg           | 124 ms   | 125 ms: 1.01x slower   |+-------------------------------+----------+------------------------+| async_tree_none_tg            | 655 ms   | 665 ms: 1.02x slower   |+-------------------------------+----------+------------------------+| async_tree_memoization_tg     | 797 ms   | 811 ms: 1.02x slower   |+-------------------------------+----------+------------------------+| Geometric mean                | (ref)    | 1.01x slower           |+-------------------------------+----------+------------------------+Benchmark hidden because not significant (9): async_tree_eager_cpu_io_mixed_tg, async_tree_eager, async_tree_eager_memoization_tg, async_tree_eager_memoization, async_tree_cpu_io_mixed, async_tree_cpu_io_mixed_tg, async_tree_eager_io, async_tree_memoization, async_tree_noneBenchmarks with tag 'math':===========================+----------------+--------+----------------------+| Benchmark      | ref    | func                 |+================+========+======================+| nbody          | 132 ms | 128 ms: 1.03x faster |+----------------+--------+----------------------+| float          | 125 ms | 126 ms: 1.01x slower |+----------------+--------+----------------------+| Geometric mean | (ref)  | 1.01x faster         |+----------------+--------+----------------------+Benchmark hidden because not significant (1): pidigitsBenchmarks with tag 'regex':============================+----------------+---------+-----------------------+| Benchmark      | ref     | func                  |+================+=========+=======================+| regex_v8       | 37.5 ms | 37.1 ms: 1.01x faster |+----------------+---------+-----------------------+| Geometric mean | (ref)   | 1.00x faster          |+----------------+---------+-----------------------+Benchmark hidden because not significant (3): regex_effbot, regex_compile, regex_dnaBenchmarks with tag 'serialize':================================+-------------------+---------+-----------------------+| Benchmark         | ref     | func                  |+===================+=========+=======================+| unpickle_list     | 7.63 us | 7.53 us: 1.01x faster |+-------------------+---------+-----------------------+| xml_etree_process | 97.8 ms | 98.6 ms: 1.01x slower |+-------------------+---------+-----------------------+| json_loads        | 40.8 us | 41.3 us: 1.01x slower |+-------------------+---------+-----------------------+| xml_etree_parse   | 238 ms  | 241 ms: 1.01x slower  |+-------------------+---------+-----------------------+| json_dumps        | 15.5 ms | 16.5 ms: 1.06x slower |+-------------------+---------+-----------------------+| pickle            | 17.6 us | 19.7 us: 1.12x slower |+-------------------+---------+-----------------------+| pickle_dict       | 43.1 us | 50.2 us: 1.17x slower |+-------------------+---------+-----------------------+| pickle_list       | 6.64 us | 7.87 us: 1.19x slower |+-------------------+---------+-----------------------+| Geometric mean    | (ref)   | 1.04x slower          |+-------------------+---------+-----------------------+Benchmark hidden because not significant (6): unpickle, tomli_loads, unpickle_pure_python, xml_etree_generate, pickle_pure_python, xml_etree_iterparseBenchmarks with tag 'startup':==============================+------------------------+---------+-----------------------+| Benchmark              | ref     | func                  |+========================+=========+=======================+| python_startup         | 15.4 ms | 15.5 ms: 1.00x slower |+------------------------+---------+-----------------------+| python_startup_no_site | 13.6 ms | 13.6 ms: 1.01x slower |+------------------------+---------+-----------------------+| Geometric mean         | (ref)   | 1.00x slower          |+------------------------+---------+-----------------------+Benchmarks with tag 'template':===============================+-----------------+---------+-----------------------+| Benchmark       | ref     | func                  |+=================+=========+=======================+| mako            | 16.1 ms | 16.2 ms: 1.01x slower |+-----------------+---------+-----------------------+| genshi_xml      | 83.8 ms | 84.6 ms: 1.01x slower |+-----------------+---------+-----------------------+| django_template | 56.5 ms | 57.4 ms: 1.01x slower |+-----------------+---------+-----------------------+| genshi_text     | 35.6 ms | 36.7 ms: 1.03x slower |+-----------------+---------+-----------------------+| Geometric mean  | (ref)   | 1.02x slower          |+-----------------+---------+-----------------------+All benchmarks:===============+-------------------------------+----------+------------------------+| Benchmark                     | ref      | func                   |+===============================+==========+========================+| coverage                      | 499 ms   | 152 ms: 3.29x faster   |+-------------------------------+----------+------------------------+| dask                          | 885 ms   | 628 ms: 1.41x faster   |+-------------------------------+----------+------------------------+| logging_silent                | 166 ns   | 155 ns: 1.07x faster   |+-------------------------------+----------+------------------------+| nbody                         | 132 ms   | 128 ms: 1.03x faster   |+-------------------------------+----------+------------------------+| telco                         | 12.6 ms  | 12.3 ms: 1.03x faster  |+-------------------------------+----------+------------------------+| scimark_fft                   | 522 ms   | 511 ms: 1.02x faster   |+-------------------------------+----------+------------------------+| unpickle_list                 | 7.63 us  | 7.53 us: 1.01x faster  |+-------------------------------+----------+------------------------+| regex_v8                      | 37.5 ms  | 37.1 ms: 1.01x faster  |+-------------------------------+----------+------------------------+| pprint_pformat                | 2.45 sec | 2.42 sec: 1.01x faster |+-------------------------------+----------+------------------------+| logging_simple                | 9.51 us  | 9.43 us: 1.01x faster  |+-------------------------------+----------+------------------------+| pprint_safe_repr              | 1.20 sec | 1.19 sec: 1.01x faster |+-------------------------------+----------+------------------------+| generators                    | 45.4 ms  | 45.1 ms: 1.01x faster  |+-------------------------------+----------+------------------------+| coroutines                    | 35.4 ms  | 35.2 ms: 1.01x faster  |+-------------------------------+----------+------------------------+| docutils                      | 4.24 sec | 4.23 sec: 1.00x faster |+-------------------------------+----------+------------------------+| sympy_integrate               | 32.3 ms  | 32.4 ms: 1.00x slower  |+-------------------------------+----------+------------------------+| python_startup                | 15.4 ms  | 15.5 ms: 1.00x slower  |+-------------------------------+----------+------------------------+| sympy_sum                     | 254 ms   | 255 ms: 1.00x slower   |+-------------------------------+----------+------------------------+| dulwich_log                   | 129 ms   | 129 ms: 1.00x slower   |+-------------------------------+----------+------------------------+| sqlglot_optimize              | 86.5 ms  | 86.9 ms: 1.00x slower  |+-------------------------------+----------+------------------------+| sqlglot_normalize             | 172 ms   | 173 ms: 1.01x slower   |+-------------------------------+----------+------------------------+| richards_super                | 83.0 ms  | 83.4 ms: 1.01x slower  |+-------------------------------+----------+------------------------+| python_startup_no_site        | 13.6 ms  | 13.6 ms: 1.01x slower  |+-------------------------------+----------+------------------------+| chameleon                     | 11.0 ms  | 11.0 ms: 1.01x slower  |+-------------------------------+----------+------------------------+| asyncio_tcp_ssl               | 2.36 sec | 2.37 sec: 1.01x slower |+-------------------------------+----------+------------------------+| asyncio_tcp                   | 689 ms   | 694 ms: 1.01x slower   |+-------------------------------+----------+------------------------+| async_tree_io_tg              | 1.61 sec | 1.62 sec: 1.01x slower |+-------------------------------+----------+------------------------+| mako                          | 16.1 ms  | 16.2 ms: 1.01x slower  |+-------------------------------+----------+------------------------+| scimark_monte_carlo           | 99.3 ms  | 100.0 ms: 1.01x slower |+-------------------------------+----------+------------------------+| async_tree_eager_cpu_io_mixed | 669 ms   | 674 ms: 1.01x slower   |+-------------------------------+----------+------------------------+| float                         | 125 ms   | 126 ms: 1.01x slower   |+-------------------------------+----------+------------------------+| xml_etree_process             | 97.8 ms  | 98.6 ms: 1.01x slower  |+-------------------------------+----------+------------------------+| deepcopy                      | 553 us   | 557 us: 1.01x slower   |+-------------------------------+----------+------------------------+| async_generators              | 686 ms   | 692 ms: 1.01x slower   |+-------------------------------+----------+------------------------+| tornado_http                  | 194 ms   | 195 ms: 1.01x slower   |+-------------------------------+----------+------------------------+| mdp                           | 3.95 sec | 3.99 sec: 1.01x slower |+-------------------------------+----------+------------------------+| genshi_xml                    | 83.8 ms  | 84.6 ms: 1.01x slower  |+-------------------------------+----------+------------------------+| go                            | 222 ms   | 224 ms: 1.01x slower   |+-------------------------------+----------+------------------------+| async_tree_io                 | 1.60 sec | 1.62 sec: 1.01x slower |+-------------------------------+----------+------------------------+| async_tree_eager_io_tg        | 1.67 sec | 1.69 sec: 1.01x slower |+-------------------------------+----------+------------------------+| async_tree_eager_tg           | 124 ms   | 125 ms: 1.01x slower   |+-------------------------------+----------+------------------------+| nqueens                       | 121 ms   | 122 ms: 1.01x slower   |+-------------------------------+----------+------------------------+| hexiom                        | 8.78 ms  | 8.87 ms: 1.01x slower  |+-------------------------------+----------+------------------------+| sympy_expand                  | 751 ms   | 759 ms: 1.01x slower   |+-------------------------------+----------+------------------------+| 2to3                          | 429 ms   | 434 ms: 1.01x slower   |+-------------------------------+----------+------------------------+| json_loads                    | 40.8 us  | 41.3 us: 1.01x slower  |+-------------------------------+----------+------------------------+| richards                      | 74.0 ms  | 74.9 ms: 1.01x slower  |+-------------------------------+----------+------------------------+| bench_thread_pool             | 1.42 ms  | 1.43 ms: 1.01x slower  |+-------------------------------+----------+------------------------+| xml_etree_parse               | 238 ms   | 241 ms: 1.01x slower   |+-------------------------------+----------+------------------------+| raytrace                      | 402 ms   | 408 ms: 1.01x slower   |+-------------------------------+----------+------------------------+| django_template               | 56.5 ms  | 57.4 ms: 1.01x slower  |+-------------------------------+----------+------------------------+| deepcopy_reduce               | 4.89 us  | 4.96 us: 1.01x slower  |+-------------------------------+----------+------------------------+| async_tree_none_tg            | 655 ms   | 665 ms: 1.02x slower   |+-------------------------------+----------+------------------------+| crypto_pyaes                  | 103 ms   | 104 ms: 1.02x slower   |+-------------------------------+----------+------------------------+| deepcopy_memo                 | 57.9 us  | 58.8 us: 1.02x slower  |+-------------------------------+----------+------------------------+| sqlglot_transpile             | 2.50 ms  | 2.54 ms: 1.02x slower  |+-------------------------------+----------+------------------------+| sqlglot_parse                 | 1.99 ms  | 2.02 ms: 1.02x slower  |+-------------------------------+----------+------------------------+| async_tree_memoization_tg     | 797 ms   | 811 ms: 1.02x slower   |+-------------------------------+----------+------------------------+| spectral_norm                 | 155 ms   | 158 ms: 1.02x slower   |+-------------------------------+----------+------------------------+| pathlib                       | 26.5 ms  | 27.2 ms: 1.02x slower  |+-------------------------------+----------+------------------------+| scimark_sor                   | 199 ms   | 204 ms: 1.03x slower   |+-------------------------------+----------+------------------------+| genshi_text                   | 35.6 ms  | 36.7 ms: 1.03x slower  |+-------------------------------+----------+------------------------+| sqlite_synth                  | 3.73 us  | 3.90 us: 1.05x slower  |+-------------------------------+----------+------------------------+| typing_runtime_protocols      | 172 us   | 180 us: 1.05x slower   |+-------------------------------+----------+------------------------+| json_dumps                    | 15.5 ms  | 16.5 ms: 1.06x slower  |+-------------------------------+----------+------------------------+| pickle                        | 17.6 us  | 19.7 us: 1.12x slower  |+-------------------------------+----------+------------------------+| pickle_dict                   | 43.1 us  | 50.2 us: 1.17x slower  |+-------------------------------+----------+------------------------+| pickle_list                   | 6.64 us  | 7.87 us: 1.19x slower  |+-------------------------------+----------+------------------------+| Geometric mean                | (ref)    | 1.01x faster           |+-------------------------------+----------+------------------------+Benchmark hidden because not significant (35): unpickle, regex_effbot, gc_traversal, tomli_loads, meteor_contest, scimark_lu, logging_format, comprehensions, async_tree_eager_cpu_io_mixed_tg, regex_compile, async_tree_eager, unpack_sequence, bench_mp_pool, unpickle_pure_python, async_tree_eager_memoization_tg, pyflate, xml_etree_generate, deltablue, asyncio_websockets, fannkuch, pidigits, regex_dna, scimark_sparse_mat_mult, create_gc_cycles, html5lib, async_tree_eager_memoization, pickle_pure_python, xml_etree_iterparse, async_tree_cpu_io_mixed, sympy_str, chaos, async_tree_cpu_io_mixed_tg, async_tree_eager_io, async_tree_memoization, async_tree_none

@vstinnervstinner marked this pull request as ready for reviewMarch 11, 2024 08:40
@vstinnervstinner changed the titleWIP: gh-115754: Get singletons via function callsgh-115754: Get singletons via function callsMar 11, 2024
In the limited C API version 3.13, getting Py_None, Py_False,Py_True, Py_Ellipsis and Py_NotImplemented singletons is nowimplemented as function calls at the stable ABI level to hideimplementation details. Getting these constants still return borrowedreferences.
@vstinnervstinner requested a review froma team as acode ownerMarch 11, 2024 12:11
@vstinner
Copy link
MemberAuthor

@encukou@zooba: Would you mind to review my PR?

@zooba
Copy link
Member

Is that benchmark using the macroPy_IsNone or the exportedPy_IsNone? Do we see a difference between the two? Not that I'm at all concerned by the benchmark - the operation is trivially optimised out of a tight loop. But having the check occur on our side means we can more easily optimise for comparison if we ever change how "get" works.

Did you try with the singlePy_GetSingleton(int) approach as well? I'd expect that to be slower again, but far more forwards/backwards compatible than individual functions. And behind the macro it still looks the same to users.

@vstinner
Copy link
MemberAuthor

Is that benchmark using the macro Py_IsNone or the exported Py_IsNone?

My microbenchmark used the macro.

@vstinner
Copy link
MemberAuthor

Did you try with the single Py_GetSingleton(int) approach as well?

I didn't try. I dislikePy_GetSingleton(int) since we need a slower switch/case, the function can fail (for unknown constant and invalid values), and we need to implement new constants (identifiers for these constants).

@zooba
Copy link
Member

zooba commentedMar 11, 2024
edited
Loading

I think it's best to have limited API functions that do the checks inside of Python, so that callers who have an unknown object would pass it in and we would return a bool saying if it is/isn't that singleton (or in thePy_GetSingleton approach, they could get the singleton id for an object). That allows us to do faster checks, completely avoids the borrowed/stolen reference count issues, and avoids any potential not-really-singleton issues that may arise in the future or in other implementations. (This is similar to_PyUnicode_EqualToASCIIString which is an API that I like.)

Getting another instance of the singleton and running a normal comparison is like converting yourchar * toPyUnicodeObject and doing the comparison. Technically it's correct and more flexible, but it's also easy and faster for us to do the check on our side.


Edit So what I'm saying is, hopefully the code that is calling the new getters is also calling the function comparisons, not the macros. I assume the benchmarks would look the same right now since the comparison is still cheap, though.

@vstinner
Copy link
MemberAuthor

Alternative: PRgh-116883 "Add Py_GetConstantRef() function".

erlend-aasland and zooba reacted with thumbs up emoji

@vstinner
Copy link
MemberAuthor

Alternative: PR#116883 "Add Py_GetConstantRef() function".

Apparently, this approach is preferred. I close this PR.

@vstinnervstinner deleted the getnone branchMarch 18, 2024 21:26
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@encukouencukouAwaiting requested review from encukouencukou is a code owner

@ericsnowcurrentlyericsnowcurrentlyAwaiting requested review from ericsnowcurrently

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

2 participants
@vstinner@zooba

[8]ページ先頭

©2009-2025 Movatter.jp