Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Support PGO for clang-cl #130090

Closed
Closed
Labels
OS-windowsbuildThe build process and cross-buildtype-featureA feature request or enhancement
@chris-eibl

Description

@chris-eibl

Feature or enhancement

Proposal:

Support PGO (profile guided optimization) for clang-cl on Windows using a similar approach as done in the Linux makefiles for clang.

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

Discussion has started in the PR#129907 while being draft.

Linked PRs

64bit pyperformance results on my Windows 10 PC (dusty i5-4570 CPU) run with--fast --affinity 0 for commit9db1a29 with

Benchmarkmsvc.release.9db1a297d9clang.release.9db1a297d9msvc.pgo.9db1a297d9clang.pgo.9db1a297d9
Geometric mean(ref)1.27x faster1.28x faster1.47x faster
Benchmarkmsvc.release.9db1a297d9clang.release.9db1a297d9
Geometric mean(ref)1.27x faster

clang 18.1.8 is faster than 19.1.1, and 20.1.0.rc2 with tailcalling is the fastest:

Benchmarkmsvc.pgo.9db1a297d9clang.pgo.18.1.8.9db1a297d9clang.pgo.9db1a297d9clang.pgo.tc.20.1.0.rc2.9db1a297d9
Geometric mean(ref)1.19x faster1.15x faster1.25x faster
Details

Benchmarks with tag 'apps':

Benchmarkmsvc.release.9db1a297d9clang.release.9db1a297d9msvc.pgo.9db1a297d9clang.pgo.9db1a297d9
2to3586 ms491 ms: 1.19x faster462 ms: 1.27x faster426 ms: 1.38x faster
docutils4.27 sec3.75 sec: 1.14x faster3.50 sec: 1.22x faster3.31 sec: 1.29x faster
html5lib104 ms81.6 ms: 1.28x faster77.9 ms: 1.34x faster74.5 ms: 1.40x faster
Geometric mean(ref)1.20x faster1.28x faster1.35x faster

Benchmarks with tag 'asyncio':

Benchmarkmsvc.release.9db1a297d9clang.release.9db1a297d9msvc.pgo.9db1a297d9clang.pgo.9db1a297d9
async_tree_none511 ms383 ms: 1.33x faster394 ms: 1.30x faster357 ms: 1.43x faster
async_tree_cpu_io_mixed933 ms805 ms: 1.16x faster749 ms: 1.25x faster697 ms: 1.34x faster
async_tree_cpu_io_mixed_tg891 ms776 ms: 1.15x faster716 ms: 1.24x faster665 ms: 1.34x faster
async_tree_eager209 ms153 ms: 1.37x faster160 ms: 1.31x faster133 ms: 1.57x faster
async_tree_eager_cpu_io_mixed656 ms630 ms: 1.04x faster567 ms: 1.16x faster535 ms: 1.23x faster
async_tree_eager_cpu_io_mixed_tg830 ms741 ms: 1.12x faster681 ms: 1.22x faster646 ms: 1.28x faster
async_tree_eager_io1.12 sec870 ms: 1.29x faster874 ms: 1.28x faster817 ms: 1.37x faster
async_tree_eager_io_tg1.12 sec890 ms: 1.26x faster898 ms: 1.25x faster840 ms: 1.33x faster
async_tree_eager_memoization393 ms304 ms: 1.29x faster304 ms: 1.29x faster281 ms: 1.40x faster
async_tree_eager_memoization_tg546 ms420 ms: 1.30x faster427 ms: 1.28x faster397 ms: 1.37x faster
async_tree_eager_tg408 ms312 ms: 1.31x faster321 ms: 1.27x faster297 ms: 1.38x faster
async_tree_io1.14 sec868 ms: 1.31x faster889 ms: 1.28x faster824 ms: 1.38x faster
async_tree_io_tg1.14 sec871 ms: 1.31x faster877 ms: 1.30x faster807 ms: 1.41x faster
async_tree_memoization649 ms493 ms: 1.32x faster509 ms: 1.28x faster458 ms: 1.42x faster
async_tree_memoization_tg605 ms453 ms: 1.34x faster462 ms: 1.31x faster425 ms: 1.42x faster
async_tree_none_tg497 ms371 ms: 1.34x faster382 ms: 1.30x faster352 ms: 1.41x faster
Geometric mean(ref)1.26x faster1.27x faster1.38x faster

Benchmarks with tag 'math':

Benchmarkmsvc.release.9db1a297d9clang.release.9db1a297d9msvc.pgo.9db1a297d9clang.pgo.9db1a297d9
float145 ms108 ms: 1.35x faster116 ms: 1.25x faster96.8 ms: 1.50x faster
nbody203 ms155 ms: 1.31x faster171 ms: 1.19x faster128 ms: 1.58x faster
pidigits245 ms250 ms: 1.02x slower250 ms: 1.02x slower240 ms: 1.02x faster
Geometric mean(ref)1.20x faster1.13x faster1.34x faster

Benchmarks with tag 'regex':

Benchmarkmsvc.release.9db1a297d9clang.release.9db1a297d9msvc.pgo.9db1a297d9clang.pgo.9db1a297d9
regex_compile237 ms180 ms: 1.31x faster180 ms: 1.31x faster157 ms: 1.51x faster
regex_dna226 ms256 ms: 1.14x slower210 ms: 1.07x faster211 ms: 1.07x faster
regex_effbot4.05 msnot significant3.66 ms: 1.11x faster3.39 ms: 1.20x faster
regex_v838.7 ms35.7 ms: 1.08x faster33.7 ms: 1.15x faster29.8 ms: 1.30x faster
Geometric mean(ref)1.06x faster1.16x faster1.26x faster

Benchmarks with tag 'serialize':

Benchmarkmsvc.release.9db1a297d9clang.release.9db1a297d9msvc.pgo.9db1a297d9clang.pgo.9db1a297d9
json_dumps19.6 ms16.9 ms: 1.16x faster15.0 ms: 1.31x faster12.9 ms: 1.52x faster
json_loads48.1 us46.7 us: 1.03x faster36.8 us: 1.31x faster32.7 us: 1.47x faster
pickle21.5 us17.9 us: 1.20x faster19.1 us: 1.13x faster15.0 us: 1.44x faster
pickle_dict46.0 us34.3 us: 1.34x faster43.2 us: 1.07x faster27.6 us: 1.67x faster
pickle_list8.16 us6.19 us: 1.32x faster6.89 us: 1.18x faster5.05 us: 1.62x faster
pickle_pure_python672 us455 us: 1.48x faster463 us: 1.45x faster378 us: 1.78x faster
tomli_loads3.84 sec2.79 sec: 1.38x faster2.88 sec: 1.33x faster2.38 sec: 1.61x faster
unpickle26.2 us24.0 us: 1.09x faster19.8 us: 1.32x faster17.9 us: 1.46x faster
unpickle_list7.29 us6.03 us: 1.21x faster6.87 us: 1.06x faster5.38 us: 1.36x faster
unpickle_pure_python505 us321 us: 1.57x faster336 us: 1.50x faster257 us: 1.96x faster
xml_etree_parse232 ms228 ms: 1.02x faster200 ms: 1.16x faster210 ms: 1.10x faster
xml_etree_iterparse185 ms160 ms: 1.16x faster154 ms: 1.21x faster145 ms: 1.27x faster
xml_etree_generate181 ms148 ms: 1.22x faster135 ms: 1.35x faster119 ms: 1.53x faster
xml_etree_process128 ms100 ms: 1.28x faster94.4 ms: 1.36x faster82.0 ms: 1.56x faster
Geometric mean(ref)1.24x faster1.26x faster1.51x faster

Benchmarks with tag 'startup':

Benchmarkmsvc.release.9db1a297d9clang.release.9db1a297d9msvc.pgo.9db1a297d9clang.pgo.9db1a297d9
python_startup45.4 msnot significant43.1 ms: 1.05x faster43.7 ms: 1.04x faster
python_startup_no_site37.1 msnot significant35.4 ms: 1.05x faster35.9 ms: 1.03x faster
Geometric mean(ref)1.00x faster1.05x faster1.04x faster

Benchmarks with tag 'template':

Benchmarkmsvc.release.9db1a297d9clang.release.9db1a297d9msvc.pgo.9db1a297d9clang.pgo.9db1a297d9
django_template75.6 ms55.6 ms: 1.36x faster52.1 ms: 1.45x faster42.1 ms: 1.79x faster
genshi_text44.5 ms31.4 ms: 1.42x faster32.5 ms: 1.37x faster26.3 ms: 1.69x faster
genshi_xml102 ms74.0 ms: 1.37x faster74.6 ms: 1.36x faster63.1 ms: 1.61x faster
mako23.3 ms17.7 ms: 1.31x faster16.7 ms: 1.39x faster14.4 ms: 1.61x faster
Geometric mean(ref)1.36x faster1.39x faster1.67x faster

All benchmarks:

Benchmarkmsvc.release.9db1a297d9clang.release.9db1a297d9msvc.pgo.9db1a297d9clang.pgo.9db1a297d9
2to3586 ms491 ms: 1.19x faster462 ms: 1.27x faster426 ms: 1.38x faster
async_generators696 ms565 ms: 1.23x faster577 ms: 1.21x faster514 ms: 1.35x faster
async_tree_none511 ms383 ms: 1.33x faster394 ms: 1.30x faster357 ms: 1.43x faster
async_tree_cpu_io_mixed933 ms805 ms: 1.16x faster749 ms: 1.25x faster697 ms: 1.34x faster
async_tree_cpu_io_mixed_tg891 ms776 ms: 1.15x faster716 ms: 1.24x faster665 ms: 1.34x faster
async_tree_eager209 ms153 ms: 1.37x faster160 ms: 1.31x faster133 ms: 1.57x faster
async_tree_eager_cpu_io_mixed656 ms630 ms: 1.04x faster567 ms: 1.16x faster535 ms: 1.23x faster
async_tree_eager_cpu_io_mixed_tg830 ms741 ms: 1.12x faster681 ms: 1.22x faster646 ms: 1.28x faster
async_tree_eager_io1.12 sec870 ms: 1.29x faster874 ms: 1.28x faster817 ms: 1.37x faster
async_tree_eager_io_tg1.12 sec890 ms: 1.26x faster898 ms: 1.25x faster840 ms: 1.33x faster
async_tree_eager_memoization393 ms304 ms: 1.29x faster304 ms: 1.29x faster281 ms: 1.40x faster
async_tree_eager_memoization_tg546 ms420 ms: 1.30x faster427 ms: 1.28x faster397 ms: 1.37x faster
async_tree_eager_tg408 ms312 ms: 1.31x faster321 ms: 1.27x faster297 ms: 1.38x faster
async_tree_io1.14 sec868 ms: 1.31x faster889 ms: 1.28x faster824 ms: 1.38x faster
async_tree_io_tg1.14 sec871 ms: 1.31x faster877 ms: 1.30x faster807 ms: 1.41x faster
async_tree_memoization649 ms493 ms: 1.32x faster509 ms: 1.28x faster458 ms: 1.42x faster
async_tree_memoization_tg605 ms453 ms: 1.34x faster462 ms: 1.31x faster425 ms: 1.42x faster
async_tree_none_tg497 ms371 ms: 1.34x faster382 ms: 1.30x faster352 ms: 1.41x faster
asyncio_tcp1.64 sec1.55 sec: 1.06x faster1.48 sec: 1.11x fasternot significant
asyncio_websockets732 ms578 ms: 1.27x faster758 ms: 1.04x slowernot significant
chaos132 ms88.6 ms: 1.48x faster90.8 ms: 1.45x faster74.3 ms: 1.77x faster
comprehensions34.7 us24.5 us: 1.42x faster25.2 us: 1.38x faster19.2 us: 1.80x faster
bench_mp_pool213 ms196 ms: 1.09x faster177 ms: 1.20x faster190 ms: 1.12x faster
bench_thread_pool1.95 ms1.74 ms: 1.12x faster1.68 ms: 1.16x faster1.63 ms: 1.19x faster
coroutines45.3 ms33.9 ms: 1.34x faster36.1 ms: 1.25x faster26.9 ms: 1.68x faster
coverage130 ms119 ms: 1.09x faster120 ms: 1.09x faster103 ms: 1.26x faster
crypto_pyaes147 ms109 ms: 1.35x faster109 ms: 1.35x faster86.3 ms: 1.70x faster
deepcopy516 us391 us: 1.32x faster388 us: 1.33x faster309 us: 1.67x faster
deepcopy_reduce5.30 us4.19 us: 1.26x faster3.95 us: 1.34x faster3.23 us: 1.64x faster
deepcopy_memo67.1 us41.6 us: 1.61x faster46.8 us: 1.44x faster34.8 us: 1.93x faster
deltablue7.72 ms4.52 ms: 1.71x faster4.92 ms: 1.57x faster3.80 ms: 2.03x faster
django_template75.6 ms55.6 ms: 1.36x faster52.1 ms: 1.45x faster42.1 ms: 1.79x faster
docutils4.27 sec3.75 sec: 1.14x faster3.50 sec: 1.22x faster3.31 sec: 1.29x faster
dulwich_log156 ms141 ms: 1.11x faster129 ms: 1.20x faster131 ms: 1.19x faster
fannkuch770 ms592 ms: 1.30x faster637 ms: 1.21x faster516 ms: 1.49x faster
float145 ms108 ms: 1.35x faster116 ms: 1.25x faster96.8 ms: 1.50x faster
create_gc_cycles1.62 ms1.71 ms: 1.05x slowernot significant1.71 ms: 1.05x slower
gc_traversal5.03 msnot significant4.02 ms: 1.25x faster5.71 ms: 1.13x slower
generators65.1 ms40.4 ms: 1.61x faster44.4 ms: 1.47x faster36.0 ms: 1.81x faster
genshi_text44.5 ms31.4 ms: 1.42x faster32.5 ms: 1.37x faster26.3 ms: 1.69x faster
genshi_xml102 ms74.0 ms: 1.37x faster74.6 ms: 1.36x faster63.1 ms: 1.61x faster
go255 ms147 ms: 1.73x faster170 ms: 1.50x faster132 ms: 1.94x faster
hexiom13.4 ms8.49 ms: 1.58x faster9.22 ms: 1.46x faster7.11 ms: 1.89x faster
html5lib104 ms81.6 ms: 1.28x faster77.9 ms: 1.34x faster74.5 ms: 1.40x faster
json_dumps19.6 ms16.9 ms: 1.16x faster15.0 ms: 1.31x faster12.9 ms: 1.52x faster
json_loads48.1 us46.7 us: 1.03x faster36.8 us: 1.31x faster32.7 us: 1.47x faster
logging_format21.2 us16.4 us: 1.29x faster14.7 us: 1.44x faster13.6 us: 1.56x faster
logging_silent213 ns143 ns: 1.49x faster152 ns: 1.40x faster109 ns: 1.95x faster
logging_simple19.4 us14.6 us: 1.33x faster13.5 us: 1.44x faster12.2 us: 1.60x faster
mako23.3 ms17.7 ms: 1.31x faster16.7 ms: 1.39x faster14.4 ms: 1.61x faster
mdp3.99 sec4.12 sec: 1.03x slower3.76 sec: 1.06x faster3.37 sec: 1.18x faster
meteor_contest175 ms133 ms: 1.32x faster139 ms: 1.26x faster124 ms: 1.41x faster
nbody203 ms155 ms: 1.31x faster171 ms: 1.19x faster128 ms: 1.58x faster
nqueens179 ms129 ms: 1.38x faster131 ms: 1.37x faster103 ms: 1.73x faster
pathlib278 ms266 ms: 1.04x faster256 ms: 1.09x faster262 ms: 1.06x faster
pickle21.5 us17.9 us: 1.20x faster19.1 us: 1.13x faster15.0 us: 1.44x faster
pickle_dict46.0 us34.3 us: 1.34x faster43.2 us: 1.07x faster27.6 us: 1.67x faster
pickle_list8.16 us6.19 us: 1.32x faster6.89 us: 1.18x faster5.05 us: 1.62x faster
pickle_pure_python672 us455 us: 1.48x faster463 us: 1.45x faster378 us: 1.78x faster
pidigits245 ms250 ms: 1.02x slower250 ms: 1.02x slower240 ms: 1.02x faster
pprint_safe_repr1.46 sec1.09 sec: 1.34x faster1.09 sec: 1.34x faster934 ms: 1.57x faster
pprint_pformat3.00 sec2.22 sec: 1.35x faster2.23 sec: 1.35x faster1.91 sec: 1.57x faster
pyflate875 ms626 ms: 1.40x faster668 ms: 1.31x faster537 ms: 1.63x faster
python_startup45.4 msnot significant43.1 ms: 1.05x faster43.7 ms: 1.04x faster
python_startup_no_site37.1 msnot significant35.4 ms: 1.05x faster35.9 ms: 1.03x faster
raytrace587 ms385 ms: 1.52x faster414 ms: 1.42x faster321 ms: 1.83x faster
regex_compile237 ms180 ms: 1.31x faster180 ms: 1.31x faster157 ms: 1.51x faster
regex_dna226 ms256 ms: 1.14x slower210 ms: 1.07x faster211 ms: 1.07x faster
regex_effbot4.05 msnot significant3.66 ms: 1.11x faster3.39 ms: 1.20x faster
regex_v838.7 ms35.7 ms: 1.08x faster33.7 ms: 1.15x faster29.8 ms: 1.30x faster
richards102 ms65.3 ms: 1.56x faster64.7 ms: 1.58x faster49.7 ms: 2.05x faster
richards_super116 ms74.3 ms: 1.57x faster74.7 ms: 1.56x faster56.2 ms: 2.07x faster
scimark_fft664 ms485 ms: 1.37x faster493 ms: 1.35x faster358 ms: 1.85x faster
scimark_lu227 ms159 ms: 1.43x faster164 ms: 1.39x faster132 ms: 1.72x faster
scimark_monte_carlo138 ms91.6 ms: 1.51x faster101 ms: 1.37x faster74.6 ms: 1.85x faster
scimark_sor256 ms176 ms: 1.46x faster195 ms: 1.31x faster151 ms: 1.69x faster
scimark_sparse_mat_mult8.76 ms6.31 ms: 1.39x faster6.06 ms: 1.45x faster5.01 ms: 1.75x faster
spectral_norm179 ms136 ms: 1.32x faster151 ms: 1.19x faster110 ms: 1.63x faster
sqlglot_normalize204 ms156 ms: 1.31x faster151 ms: 1.35x faster131 ms: 1.55x faster
sqlglot_optimize97.0 ms77.7 ms: 1.25x faster74.5 ms: 1.30x faster66.2 ms: 1.47x faster
sqlglot_parse2.52 ms1.72 ms: 1.46x faster1.81 ms: 1.39x faster1.51 ms: 1.66x faster
sqlglot_transpile3.02 ms2.15 ms: 1.41x faster2.21 ms: 1.37x faster1.85 ms: 1.63x faster
sqlite_synth4.08 us3.81 us: 1.07x faster3.75 us: 1.09x faster3.44 us: 1.18x faster
sympy_expand818 ms681 ms: 1.20x faster640 ms: 1.28x faster578 ms: 1.42x faster
sympy_integrate33.5 ms28.2 ms: 1.19x faster27.1 ms: 1.24x faster24.2 ms: 1.38x faster
sympy_sum258 ms222 ms: 1.16x faster213 ms: 1.21x faster199 ms: 1.29x faster
sympy_str484 ms405 ms: 1.20x faster383 ms: 1.26x faster344 ms: 1.41x faster
telco13.1 ms11.1 ms: 1.17x faster10.7 ms: 1.22x faster9.37 ms: 1.40x faster
tomli_loads3.84 sec2.79 sec: 1.38x faster2.88 sec: 1.33x faster2.38 sec: 1.61x faster
typing_runtime_protocols296 us239 us: 1.24x faster223 us: 1.32x faster193 us: 1.53x faster
unpack_sequence152 ns58.2 ns: 2.61x faster84.8 ns: 1.79x faster59.3 ns: 2.56x faster
unpickle26.2 us24.0 us: 1.09x faster19.8 us: 1.32x faster17.9 us: 1.46x faster
unpickle_list7.29 us6.03 us: 1.21x faster6.87 us: 1.06x faster5.38 us: 1.36x faster
unpickle_pure_python505 us321 us: 1.57x faster336 us: 1.50x faster257 us: 1.96x faster
xml_etree_parse232 ms228 ms: 1.02x faster200 ms: 1.16x faster210 ms: 1.10x faster
xml_etree_iterparse185 ms160 ms: 1.16x faster154 ms: 1.21x faster145 ms: 1.27x faster
xml_etree_generate181 ms148 ms: 1.22x faster135 ms: 1.35x faster119 ms: 1.53x faster
xml_etree_process128 ms100 ms: 1.28x faster94.4 ms: 1.36x faster82.0 ms: 1.56x faster
Geometric mean(ref)1.27x faster1.28x faster1.47x faster

Benchmark hidden because not significant (1): asyncio_tcp_ssl

More benchmarks (including clang-cl 18.1.8, 20.1.0.rc2,computed gotos andtailcall) can be found inhttps://gist.github.com/chris-eibl/114a42f22563956fdb5cd0335b28c7ae.

Raw data is herehttps://gist.github.com/chris-eibl/c73b02762a7c467e9a410a0aa19c7701.

Metadata

Metadata

Assignees

No one assigned

    Labels

    OS-windowsbuildThe build process and cross-buildtype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2025 Movatter.jp