This started off as a build time analysis (#130090 (comment) ), but since I now have the infrastructure, I tried-flto=thin, too:
faster in building 520.6 vs 651.2 seconds is neutral on the pyperformance benchmarks would bring us in sync with Linux, because thereCONFIGURE_CFLAGS_NODIST andCONFIGURE_LDFLAGS_NOLTO both use-flto=thin when I configure for clang in WSL Ubuntu-24.04. See also the discussion why not to use full-flto inRevert to default fullLTO on Clang #130048 Benchmark clang.pgo.20.1.0-rc2 clang.pgo.thin.20.1.0-rc2 Geometric mean (ref) 1.00x faster
Detailed pybenchmark results
Benchmark clang.pgo.20.1.0-rc2 clang.pgo.thin.20.1.0-rc2 float 95.0 ms 89.7 ms: 1.06x faster json_loads 29.8 us 28.6 us: 1.04x faster mdp 2.86 sec 2.77 sec: 1.03x faster html5lib 68.3 ms 66.2 ms: 1.03x faster async_tree_none_tg 330 ms 320 ms: 1.03x faster pyflate 518 ms 505 ms: 1.03x faster sqlite_synth 3.21 us 3.13 us: 1.03x faster pidigits 228 ms 223 ms: 1.02x faster bench_mp_pool 168 ms 165 ms: 1.02x faster async_tree_eager_io 742 ms 727 ms: 1.02x faster generators 34.5 ms 33.8 ms: 1.02x faster comprehensions 18.3 us 17.9 us: 1.02x faster async_tree_cpu_io_mixed 641 ms 629 ms: 1.02x faster scimark_sparse_mat_mult 4.51 ms 4.43 ms: 1.02x faster async_tree_memoization 425 ms 417 ms: 1.02x faster sympy_expand 538 ms 529 ms: 1.02x faster unpack_sequence 57.0 ns 56.0 ns: 1.02x faster regex_dna 209 ms 205 ms: 1.02x faster async_generators 465 ms 458 ms: 1.02x faster scimark_sor 140 ms 137 ms: 1.02x faster sympy_str 319 ms 314 ms: 1.02x faster async_tree_io_tg 751 ms 740 ms: 1.01x faster regex_effbot 3.14 ms 3.10 ms: 1.01x faster async_tree_eager_tg 272 ms 268 ms: 1.01x faster pickle_dict 27.3 us 27.0 us: 1.01x faster async_tree_eager_memoization_tg 363 ms 359 ms: 1.01x faster sympy_integrate 22.5 ms 22.2 ms: 1.01x faster sympy_sum 181 ms 179 ms: 1.01x faster 2to3 390 ms 386 ms: 1.01x faster hexiom 6.68 ms 6.61 ms: 1.01x faster docutils 3.03 sec 3.00 sec: 1.01x faster sqlglot_normalize 121 ms 120 ms: 1.01x faster async_tree_memoization_tg 392 ms 389 ms: 1.01x faster async_tree_cpu_io_mixed_tg 614 ms 609 ms: 1.01x faster tomli_loads 2.20 sec 2.18 sec: 1.01x faster spectral_norm 102 ms 101 ms: 1.01x faster python_startup_no_site 34.4 ms 34.2 ms: 1.01x faster genshi_text 24.6 ms 24.5 ms: 1.01x faster dulwich_log 119 ms 118 ms: 1.00x faster go 128 ms 128 ms: 1.00x faster deltablue 3.62 ms 3.63 ms: 1.00x slower unpickle_pure_python 247 us 248 us: 1.00x slower xml_etree_generate 107 ms 107 ms: 1.01x slower django_template 39.2 ms 39.4 ms: 1.01x slower coroutines 24.8 ms 25.0 ms: 1.01x slower mako 13.3 ms 13.5 ms: 1.01x slower unpickle 15.9 us 16.1 us: 1.01x slower nbody 119 ms 121 ms: 1.01x slower fannkuch 465 ms 472 ms: 1.01x slower crypto_pyaes 81.3 ms 82.6 ms: 1.02x slower json_dumps 11.5 ms 11.7 ms: 1.02x slower deepcopy 285 us 291 us: 1.02x slower pprint_safe_repr 858 ms 876 ms: 1.02x slower xml_etree_iterparse 136 ms 139 ms: 1.02x slower gc_traversal 5.03 ms 5.14 ms: 1.02x slower meteor_contest 115 ms 117 ms: 1.02x slower deepcopy_memo 33.8 us 34.7 us: 1.03x slower richards_super 51.1 ms 52.6 ms: 1.03x slower scimark_fft 327 ms 337 ms: 1.03x slower richards 44.9 ms 46.3 ms: 1.03x slower pickle_list 4.83 us 4.99 us: 1.03x slower deepcopy_reduce 2.93 us 3.03 us: 1.03x slower pprint_pformat 1.74 sec 1.80 sec: 1.03x slower logging_simple 10.9 us 11.4 us: 1.05x slower logging_format 12.1 us 12.6 us: 1.05x slower xml_etree_parse 197 ms 208 ms: 1.05x slower Geometric mean (ref) 1.00x faster
pgo_clang_20.1.0-rc2 pgo_clang_thin_20.1.0-rc2 pginstr 297.2 219.3 pgo 70.0 69.0 kill 1.2 0.5 pgupd 282.8 231.7 total time 651.2 520.6
Details pginstrument
pgo_clang_20.1.0-rc2 pgo_clang_thin_20.1.0-rc2 _freeze_module 38.5 40.0 python314 141.5 81.3 pyexpat 52.7 3.9 _elementtree 51.8 5.3 sqlite3 46.0 42.4 liblzma 18.2 16.5 _decimal 12.4 7.7 _testcapi 8.3 7.1 _bz2 7.0 4.9 _ctypes 6.9 7.5 _testlimitedcapi 4.9 4.3 _wmi 4.5 3.0 _overlapped 4.5 3.2 _asyncio 4.0 5.2 _lzma 3.8 1.8 _ssl 3.7 5.5 _ctypes_test 3.7 3.4 _multiprocessing 3.5 2.7 _sqlite3 3.4 2.8 venvwlauncher 3.3 2.7 _zoneinfo 3.1 3.4 unicodedata 2.7 3.0 pyshellext 2.7 2.6 pyw 2.7 2.7 py 2.6 2.5 _socket 2.4 3.7 _testinternalcapi 2.4 2.2 _tkinter 2.2 4.1 _testclinic 2.0 1.9 _hashlib 1.8 3.1 select 1.8 2.2 venvlauncher 1.8 1.7 winsound 1.7 3.3 _uuid 1.6 3.2 _queue 1.6 2.3 _testembed 1.5 1.5 _testbuffer 1.4 1.3 pythonw 1.1 1.1 _testconsole 1.1 1.1 _testmultiphase 1.0 1.0 _testsinglephase 1.0 1.0 python 1.0 0.9 _testclinic_limited 0.9 0.9 _testimportmultiple 0.9 0.9 python3 0.5 0.5 total 465.8 303.3
Details pgupdate
pgo_clang_20.1.0-rc2 pgo_clang_thin_20.1.0-rc2 _freeze_module 38.0 39.5 python314 141.9 95.4 sqlite3 44.4 42.9 liblzma 17.3 16.5 _decimal 11.2 8.7 _testcapi 8.6 7.3 _ctypes 8.0 7.2 _bz2 7.8 5.5 _ssl 5.2 5.6 _testlimitedcapi 5.0 4.2 pyexpat 4.6 3.6 _asyncio 4.5 4.6 _socket 4.3 3.5 _tkinter 4.0 4.2 _ctypes_test 3.7 3.4 _overlapped 3.5 3.7 _elementtree 3.5 4.5 _wmi 3.5 3.1 _zoneinfo 3.2 3.2 _lzma 3.2 1.9 unicodedata 3.2 3.0 _sqlite3 3.1 2.7 _hashlib 3.1 3.3 venvwlauncher 3.1 3.0 _multiprocessing 2.8 2.6 pyshellext 2.7 2.6 pyw 2.6 2.6 _uuid 2.6 2.8 py 2.6 2.7 _testinternalcapi 2.4 2.2 _testclinic 2.0 1.9 _queue 1.9 2.2 winsound 1.8 3.0 venvlauncher 1.7 1.5 select 1.6 2.0 _testembed 1.5 1.4 _testbuffer 1.4 1.3 _testconsole 1.1 1.0 pythonw 1.1 1.1 _testmultiphase 1.0 1.1 _testsinglephase 1.0 1.0 python 1.0 0.9 _testclinic_limited 0.9 0.9 _testimportmultiple 0.9 0.9 python3 0.5 0.5 total 372.9 316.8
Linked PRs