Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Use freelist for range object, iterator objects and other often used objects #126703

Open
Labels
interpreter-core(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usagetype-featureA feature request or enhancement
@eendebakpt

Description

@eendebakpt

Feature or enhancement

Proposal:

We can add freelists for therange object and various iter objects to improve performance. Using the new methods from#121934 the amount of code needed for adding a freelist is quite small. A prototype implementation is here:

main...eendebakpt:cpython:iter_freelists
main...eendebakpt:cpython:range_freelist

@markshannon Infaster-cpython/ideas#132 (comment) you noted that freelists should be used for all types. Is adding freelists in a similar style to the already existing freelists a good approach? We could also try to make a more generic construction (for example insidePyObject_New/PyObject_Free), but that would have a much larger impact and I cannot oversee all consequences.

The freelists from the prototype improve performance ofrange in microbenchmarks:

range(1): Mean +- std dev: [rp_main] 50.0 ns +- 0.4 ns -> [rp_pr] 40.6 ns +- 0.5 ns: 1.23x fasteriter(range(1)): Mean +- std dev: [rp_main] 91.7 ns +- 2.4 ns -> [rp_pr] 68.5 ns +- 0.7 ns: 1.34x fasterlist(range(1)): Mean +- std dev: [rp_main] 165 ns +- 1 ns -> [rp_pr] 144 ns +- 1 ns: 1.15x fasterlist(iter(range(1))): Mean +- std dev: [rp_main] 240 ns +- 17 ns -> [rp_pr] 229 ns +- 13 ns: 1.05x fasterrange(2, 1): Mean +- std dev: [rp_main] 61.6 ns +- 0.6 ns -> [rp_pr] 46.7 ns +- 0.4 ns: 1.32x fasteriter(range(2, 1)): Mean +- std dev: [rp_main] 100 ns +- 4 ns -> [rp_pr] 79.6 ns +- 4.0 ns: 1.26x fasterlist(iter(range(2, 1))): Mean +- std dev: [rp_main] 227 ns +- 3 ns -> [rp_pr] 209 ns +- 4 ns: 1.08x fasterfor loop length 1: Mean +- std dev: [rp_main] 146 ns +- 4 ns -> [rp_pr] 126 ns +- 6 ns: 1.16x fasterrange(10): Mean +- std dev: [rp_main] 51.7 ns +- 2.4 ns -> [rp_pr] 41.0 ns +- 0.9 ns: 1.26x fasteriter(range(10)): Mean +- std dev: [rp_main] 92.7 ns +- 3.5 ns -> [rp_pr] 68.3 ns +- 2.2 ns: 1.36x fasterlist(range(10)): Mean +- std dev: [rp_main] 205 ns +- 4 ns -> [rp_pr] 184 ns +- 9 ns: 1.11x fasterlist(iter(range(10))): Mean +- std dev: [rp_main] 279 ns +- 7 ns -> [rp_pr] 261 ns +- 5 ns: 1.07x fasterrange(2, 10): Mean +- std dev: [rp_main] 57.7 ns +- 1.5 ns -> [rp_pr] 48.5 ns +- 2.2 ns: 1.19x fasteriter(range(2, 10)): Mean +- std dev: [rp_main] 101 ns +- 3 ns -> [rp_pr] 78.1 ns +- 1.1 ns: 1.29x fasterlist(iter(range(2, 10))): Mean +- std dev: [rp_main] 284 ns +- 17 ns -> [rp_pr] 262 ns +- 10 ns: 1.08x fasterfor loop length 10: Mean +- std dev: [rp_main] 329 ns +- 7 ns -> [rp_pr] 295 ns +- 5 ns: 1.12x fasterrange(100): Mean +- std dev: [rp_main] 52.3 ns +- 2.2 ns -> [rp_pr] 41.1 ns +- 1.3 ns: 1.27x fasteriter(range(100)): Mean +- std dev: [rp_main] 91.7 ns +- 3.5 ns -> [rp_pr] 69.6 ns +- 2.1 ns: 1.32x fasterlist(range(100)): Mean +- std dev: [rp_main] 653 ns +- 27 ns -> [rp_pr] 603 ns +- 29 ns: 1.08x fasterlist(iter(range(100))): Mean +- std dev: [rp_main] 701 ns +- 17 ns -> [rp_pr] 671 ns +- 17 ns: 1.04x fasterrange(2, 100): Mean +- std dev: [rp_main] 58.5 ns +- 2.8 ns -> [rp_pr] 48.1 ns +- 2.3 ns: 1.22x fasteriter(range(2, 100)): Mean +- std dev: [rp_main] 99.6 ns +- 1.5 ns -> [rp_pr] 78.4 ns +- 0.8 ns: 1.27x fasterlist(iter(range(2, 100))): Mean +- std dev: [rp_main] 710 ns +- 28 ns -> [rp_pr] 674 ns +- 16 ns: 1.05x fasterfor loop length 100: Mean +- std dev: [rp_main] 2.06 us +- 0.07 us -> [rp_pr] 1.98 us +- 0.11 us: 1.04x fasterrange(400): Mean +- std dev: [rp_main] 71.3 ns +- 2.2 ns -> [rp_pr] 61.7 ns +- 5.1 ns: 1.16x fasteriter(range(400)): Mean +- std dev: [rp_main] 112 ns +- 5 ns -> [rp_pr] 89.5 ns +- 2.8 ns: 1.25x fasterlist(iter(range(400))): Mean +- std dev: [rp_main] 3.88 us +- 0.18 us -> [rp_pr] 3.77 us +- 0.08 us: 1.03x fasterrange(2, 400): Mean +- std dev: [rp_main] 79.3 ns +- 1.7 ns -> [rp_pr] 66.0 ns +- 3.7 ns: 1.20x fasteriter(range(2, 400)): Mean +- std dev: [rp_main] 119 ns +- 2 ns -> [rp_pr] 99.5 ns +- 3.1 ns: 1.19x fasterfor loop length 400: Mean +- std dev: [rp_main] 12.1 us +- 0.7 us -> [rp_pr] 10.9 us +- 0.4 us: 1.11x fasterBenchmark hidden because not significant (2): list(range(400)), list(iter(range(2, 400)))Geometric mean: 1.16x faster
Benchmark script
import pyperfrunner = pyperf.Runner()loop = """def g(n):    x=0    for ii in range(n):        x += 1"""        for s in [1, 10, 100, 400]:time = runner.timeit(name=f'range({s})', stmt=f"range({s})")time = runner.timeit(name=f'iter(range({s}))', stmt=f"iter(range({s}))")time = runner.timeit(name=f'list(range({s}))', stmt=f"list(range({s}))")time = runner.timeit(name=f'list(iter(range({s})))', stmt=f"list(iter(range({s})))")time = runner.timeit(name=f'range(2, {s})', stmt=f"range(2, {s})")time = runner.timeit(name=f'iter(range(2, {s}))', stmt=f"iter(range(2, {s}))")time = runner.timeit(name=f'list(iter(range(2, {s})))', stmt=f"list(iter(range(2, {s})))")time = runner.timeit(name=f'for loop length {s}', stmt=f"g({s})", setup=loop)``</details>On the pyperformance test suite (actually, a subset of the suite, not all benchmarks run on my system) shows the percentage of successfull freelist allocations increases about 1%.Main:

Allocations from freelist 2,004,971,371 39.8%
Frees to freelist 2,005,350,418
Allocations 3,034,877,938 60.2%
Allocations to 512 bytes 3,008,791,812 59.7%
Allocations to 4 kbytes 18,648,072 0.4%
Allocations over 4 kbytes 7,438,054 0.1%
Frees 3,142,033,922

PR

Allocations from freelist 2,064,126,652 40.8%
Frees to freelist 2,064,499,538
Allocations 2,989,239,063 59.2%
Allocations to 512 bytes 2,963,207,194 58.6%
Allocations to 4 kbytes 18,596,157 0.4%
Allocations over 4 kbytes 7,435,712 0.1%
Frees 3,096,349,170

(Some quick testing shows that most of the allocations not via freelists are due to python int objects, so that is another candidate to use freelists for)### Has this already been discussed elsewhere?No response given### Links to previous discussion of this feature:Some references to discussions on freelists:* https://github.com/python/cpython/pull/121934* https://github.com/faster-cpython/ideas/discussions/132* https://github.com/python/cpython/issues/91912* https://github.com/python/cpython/pull/101453<!-- gh-linked-prs -->### Linked PRs* gh-128368* gh-128592* gh-128594* gh-128619* gh-128692* gh-129638* gh-129921* gh-132319* gh-135233<!-- /gh-linked-prs -->

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usagetype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2025 Movatter.jp