Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-126868: Add freelist for compact int objects#126865

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
markshannon merged 25 commits intopython:mainfromeendebakpt:int_freelist
Dec 13, 2024

Conversation

@eendebakpt
Copy link
Contributor

@eendebakpteendebakpt commentedNov 15, 2024
edited
Loading

We can add freelists for the int object to improve performance. Using the new methods from#121934 the amount of code needed for adding a freelist is quite small. We only implement the freelist for compact ints (e.g. a single digit). For multi-digit int objects adding freelists is more complex (we need a size-based freelist) and the gains are smaller (for very large int objects the allocation is not a significant part of the computation time)

Notes:

  • The freelist size was chosen to be 100 (equal to the freelist size of float), but perhaps this can be tuned better
  • Thelong_dealloc and_PyLong_ExactDealloc are almost identical, we could keep justlong_dealloc at the cost of a tiny bit of performance.

Some references to discussions on freelists

The freelist improves performance ofint operations in microbenchmarks:

bench_long: Mean +- std dev: [main_long] 106 ns +- 5 ns -> [pr_long1c] 99.8 ns +- 4.4 ns: 1.07x fasterbench_alloc: Mean +- std dev: [main_long] 210 us +- 6 us -> [pr_long1c] 177 us +- 10 us: 1.19x fasterBenchmark hidden because not significant (1): bench_collatzGeometric mean: 1.08x faster
Benchmark script
# Quick benchmark for cpython long objectsimport pyperfdef collatz(a):    while a > 1:        if a % 2 == 0:            a = a // 2        else:            a = 3 * a + 1def bench_collatz(loops):    range_it = range(loops)    t0 = pyperf.perf_counter()    for ii in range_it:        collatz(ii)    return pyperf.perf_counter() - t0def bench_long(loops):    range_it = range(loops)    t0 = pyperf.perf_counter()    x = 10    for ii in range_it:        x = x * x        y = x // 2        x = y + ii + x        if x > 10**10:            x = x % 1000    return pyperf.perf_counter() - t0def bench_alloc(loops):    range_it = range(loops)    t0 = pyperf.perf_counter()    for ii in range_it:        for kk in range(20_000):            del kk    return pyperf.perf_counter() - t0# %timeit bench_long(1000)if __name__ == "__main__":    runner = pyperf.Runner()    runner.bench_time_func("bench_collatz", bench_collatz)    runner.bench_time_func("bench_long", bench_long)    runner.bench_time_func("bench_alloc", bench_alloc)

On the pyperformance test suite (actually, a subset of the suite, not all benchmarks run on my system) shows the percentage of successfull freelist allocations increases significantly

Main:

Allocations from freelist 2,004,971,371 39.8%Frees to freelist 2,005,350,418 Allocations 3,034,877,938 60.2%Allocations to 512 bytes 3,008,791,812 59.7%Allocations to 4 kbytes 18,648,072 0.4%Allocations over 4 kbytes 7,438,054 0.1%Frees 3,142,033,922

PR

Allocations from freelist 3,058,347,887 58.6%Frees to freelist 3,058,576,117 Allocations 2,159,771,546 41.4%Allocations to 512 bytes 2,133,373,693 40.9%Allocations to 4 kbytes 18,802,328 0.4%Allocations over 4 kbytes 7,595,525 0.1%Frees 2,267,538,686

@eendebakpteendebakpt changed the titleDraft: Add freelist of compact int objectsDraft: gh-126868: Add freelist for compact int objectsNov 15, 2024
@eendebakpteendebakpt marked this pull request as draftNovember 15, 2024 12:50
@mdboom
Copy link
Contributor

I'm running this PR over pyperformance on our benchmarking hardware. It will take ~3 hours.

@mdboom
Copy link
Contributor

I'm running this PR over pyperformance on our benchmarking hardware. It will take ~3 hours.

Actually, scratch that -- I'll wait until the tests are passing here. That's required for PGO builds.

@eendebakpt
Copy link
ContributorAuthor

Tests are passing now, but I disabled returning objects to the freelist at a couple of places. ChangingPyObject_Free to_PyLong_ExactDealloc on the lines of code marked with "needs to be converted to freelist" introduces refleaks. It seems related to#125323.@fidgetSpinner Do you perhaps have an idea what is going on?

@markshannon
Copy link
Member

What happens exactly when you change these lines:

PyStackRef_CLOSE_SPECIALIZED(left, (destructor)PyObject_Free);// needs to be converted to freelist

to

PyStackRef_CLOSE_SPECIALIZED(left, (destructor)_PyLong_ExactDealloc);

?

@eendebakpt
Copy link
ContributorAuthor

@markshannon When I change the lines several tests related to refleaks in the CI are failing. For exampletest_no_memleak, which can be reproduced from the command line with:

python -Xshowrefcount -c  "[x+x*x for x in range(1000)]"

The output should be[0 refs, 0 blocks], but instead I get[982 refs, 0 blocks] (exact numbers depending on which lines I change and the exact code executed).

Copy link
Member

@Fidget-SpinnerFidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The "leaks" are due to the freelist itself. They are a sign the freelist is working :).
Can you please convert all to _PyLong_ExactDealloc and apply the suggestion below, and tell me how many allocations this removes?

@Fidget-Spinner
Copy link
Member

@eendebakpt sorry I pushed to your branch as I'm really eager to get benchmark results on this :).

eendebakpt reacted with thumbs up emoji

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commentedNov 20, 2024
edited
Loading

On my machine using the benchmark script provided above (release build, no PGO, no LTO):

bench_collatz: Mean +- std dev: [without_freelist] 9.41 us +- 0.08 us -> [with_freelist] 9.22 us +- 0.08 us: 1.02x fasterbench_long: Mean +- std dev: [without_freelist] 187 ns +- 1 ns -> [with_freelist] 164 ns +- 2 ns: 1.14x fasterbench_alloc: Mean +- std dev: [without_freelist] 411 us +- 4 us -> [with_freelist] 333 us +- 2 us: 1.24x fasterGeometric mean: 1.13x faster

}

if (PyLong_CheckExact(self)) {
if (_PyLong_IsCompact((PyLongObject*)self)) {
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@Fidget-Spinner The_PyLong_IsCompact check has already been done in this method, can we move this into theif (pylong && _PyLong_IsCompact(pylong)) part?

Also, can we remove thepylong && part? I thinkpylong can never be NULL. (PyLong_CheckExact assumes the pointer is not NULL I think)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ok that sounds good

Copy link
Member

@picnixzpicnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Out of curiosity, are compact integers allowed to be signed or not? (you mentioned that we only focus on single digit numbers but the C API considers compact objects as being an implementation detail IIRC). If not, how hard would it be to make them support free lists? (I don't have much knowledge in free lists; for instance it's a mystery to me for how the free list grows)

#include"pycore_pyerrors.h"// _PyErr_GetRaisedException()
#include"pycore_pystate.h"// _PyInterpreterState_GET()
#include"pycore_range.h"// _PyRangeIterObject
#include"pycore_long.h"// void _PyLong_ExactDealloc(PyLongObject *op);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can we align the // (on mobile it seems 1 char off)?

0,/* tp_alloc */
long_new,/* tp_new */
PyObject_Free,/* tp_free */
(freefunc)PyObject_Free,/* tp_free */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can we align the .tp_free comment? (maybe tabs and spaces are mixed, hence that's why I see them unaligned on monile). Also, is the cast necessary to avoid UBSan failures? If not, you can just use.tp_free = ... to emphasize the semantic

Comment on lines 3641 to 3643
* we accidentally decref small Ints out of existence. Instead,
* since small Ints are immortal, re-set the reference count.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
*weaccidentallydecrefsmallIntsoutofexistence.Instead,
*sincesmallIntsareimmortal,re-setthereferencecount.
*/
*weaccidentallydecrefsmallIntsoutofexistence.Instead,
*sincesmallIntsareimmortal,re-setthereferencecount.
*/

#include"pycore_bitutils.h"// _Py_popcount32()
#include"pycore_initconfig.h"// _PyStatus_OK()
#include"pycore_call.h"// _PyObject_MakeTpCall
#include"pycore_freelist.h"// _Py_FREELIST_FREE(), _Py_FREELIST_POP()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Comment alignment.


/* other API */

PyAPI_FUNC(void)_PyLong_ExactDealloc(PyObject*self);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Do you need it to be exported like that or could you live with a simpleextern? If not, can you explain which file needs this export?

Copy link
Member

@Fidget-SpinnerFidget-SpinnerNov 21, 2024
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It's needed by the JIT on Windows, as it's used in bytecodes.c. This is a pretty common pattern in the internal C API unfortunately

picnixz reacted with thumbs up emoji
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

For semantic purposes, would it be better to have a macro named differently? (it would be a simple alias but it could help semantics and reviewers)

{
assert(PyLong_CheckExact(op));
_Py_DECREF_SPECIALIZED((PyObject*)op, (destructor)PyObject_Free);
_Py_DECREF_SPECIALIZED((PyObject*)op, (destructor)_PyLong_ExactDealloc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
_Py_DECREF_SPECIALIZED((PyObject*)op, (destructor)_PyLong_ExactDealloc);
_Py_DECREF_SPECIALIZED((PyObject*)op, (destructor)_PyLong_ExactDealloc);

I'm also not sure whether you need the cast. If you need the cast, I'd suggest changing the signature of the drstructor itself.

@bedevere-app
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phraseI have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@eendebakpt
Copy link
ContributorAuthor

The generated files need to be regenerated, and I'd like to rerun the benchmarks to confirm the additional small int checks don't impact performance too much.

I assume you mean the small int checks in_PyLong_ExactDealloc? In current main these checks are not present (PyObject_Free is called directly), so we could argue they are not needed in_PyLong_ExactDealloc either. On the other hand, if I understand things well to be correct they should be added.

@eendebakpt
Copy link
ContributorAuthor

I have made the requested changes; please review again

@bedevere-app
Copy link

Thanks for making the requested changes!

@markshannon: please review the changes made to this pull request.

@markshannon
Copy link
Member

0.6% speedup

Copy link
Member

@markshannonmarkshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for doing this. The code looks good now.

The speedup may not be as large as initially claimed due to addition checks for small integers, but it is a real speedup and should be further improved by#127620

@markshannon
Copy link
Member

It seems likely the thread sanitizer failure is not caused by this PR, but this PR does expose it.
Hopefully it will be fixed soon, but we'll need to wait a bit before merging this PR.

@colesbury
Copy link
Contributor

TSan failure is addressed in#127880.

@colesbury
Copy link
Contributor

The TSan failure is fixed in main now, if you want to merge main back into the PR.

eendebakpt reacted with thumbs up emoji

@markshannonmarkshannon merged commit5fc6bb2 intopython:mainDec 13, 2024
57 checks passed
@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure⚠️⚠️⚠️

Hi! The buildbotiOS ARM64 Simulator 3.x has failed when building commit5fc6bb2.

What do you need to do:

  1. Don't panic.
  2. Checkthe buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/#/builders/1380/builds/2109) and take a look at the build logs.
  4. Check if the failure is related to this commit (5fc6bb2) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/#/builders/1380/builds/2109

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last):

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@picnixzpicnixzpicnixz left review comments

@Fidget-SpinnerFidget-SpinnerFidget-Spinner left review comments

@markshannonmarkshannonmarkshannon approved these changes

@ericsnowcurrentlyericsnowcurrentlyAwaiting requested review from ericsnowcurrentlyericsnowcurrently is a code owner

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

7 participants

@eendebakpt@mdboom@markshannon@Fidget-Spinner@colesbury@bedevere-bot@picnixz

[8]ページ先頭

©2009-2025 Movatter.jp