Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-144319:madvise(MADV_HUGEPAGE)#144353

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
maurycy wants to merge1 commit intopython:main
base:main
Choose a base branch
Loading
frommaurycy:pymalloc-madv-hugepage

Conversation

@maurycy
Copy link
Contributor

@maurycymaurycy commentedJan 31, 2026
edited
Loading

The hint enables Transparent Huge Pages on systems withmadvise, which seems to be the default on Ubuntu and Fedora,at least according to this article.

More on THP:

Importantly, it seems to cary noSIGBUS risk. mimalloc seems to already do this withMIMALLOC_LARGE_OS_PAGES=1.

Reusing the benchmark from#144319:

bench_obmalloc.py
importsys,gcdefbench_small_object_churn():objs= []for_inrange(200_000):objs.append(bytearray(64))for_inrange(200_000):objs.append(bytearray(64));objs.pop(0)defbench_bulk_small_alloc():objs= [bytearray(48)for_inrange(1_000_000)]foroinobjs:o[0]=1defbench_dict_churn():for_inrange(500_000):d= {"a":1,"b":2,"c":3,"d":4};delddefbench_mixed_sizes():sizes= [8,16,24,32,48,64,96,128,192,256,384,512]objs= [bytearray(sizes[i%12])foriinrange(500_000)]defbench_fragmentation():objs= [bytearray(128)for_inrange(500_000)]foriinrange(0,len(objs),2):objs[i]=Noneforiinrange(0,len(objs),2):objs[i]=bytearray(128)defbench_list_of_tuples():objs= [(i,i+1,i+2)foriinrange(1_000_000)]defbench_class_instances():classPt:__slots__= ('x','y','z')def__init__(s,x,y,z):s.x=x;s.y=y;s.z=zobjs= [Pt(i,i+1,i+2)foriinrange(500_000)]defbench_arena_pressure():layers= [[bytearray(256)for_inrange(200_000)]for_inrange(10)]defbench_random_walk():importrandom;random.seed(42)objs= [bytearray(64)for_inrange(1_000_000)]idx=list(range(len(objs)));random.shuffle(idx)foriinidx:objs[i][0]=i&0xffBENCHMARKS=dict(small_object_churn=bench_small_object_churn,bulk_small_alloc=bench_bulk_small_alloc,dict_churn=bench_dict_churn,mixed_sizes=bench_mixed_sizes,fragmentation=bench_fragmentation,list_of_tuples=bench_list_of_tuples,class_instances=bench_class_instances,arena_pressure=bench_arena_pressure,random_walk=bench_random_walk)if__name__=="__main__":gc.collect();gc.disable();BENCHMARKS[sys.argv[1]]();gc.enable()

on

[126] 2026-01-31T02:32:04.127734128+0100 maurycy@eiger /home/maurycy  % sudo cat /sys/kernel/mm/transparent_hugepage/enabledalways [madvise] never

Where the baseline is themain branch

Wall-clock time

BenchmarkBaselineWith MADV_HUGEPAGEChange
fragmentation0.107s0.101s-5.4%
bulk_small_alloc0.126s0.121s-4.1%
class_instances0.078s0.076s-2.9%
list_of_tuples0.102s0.101s-1.2%
mixed_sizes0.085s0.084s-1.1%
random_walk0.517s0.515s-0.4%
arena_pressure0.325s0.326s+0.3%

dTLB load misses

BenchmarkBaselineWith MADV_HUGEPAGEChange
fragmentation123,39099,413-19.4%
arena_pressure280,228237,222-15.3%
bulk_small_alloc93,89485,661-8.8%
list_of_tuples88,01981,778-7.1%

It's smaller thanMAP_HUGETLB becauseMADV_HUGEPAGE is just a hint, so maybe khugepaged did not kick in yet.

I noted no regression withTHP=always.

The only thing that I'm wondering whether and how it should be guarded. Enabling by default seems risky, but it's not exactly--with-pymalloc-hugepages. That's why I'm opening this as a draft.

pyperformance --rigorous suite (I'd say it's jitter:asyncio_tcp is I/O bound,scimark is numpy, the benchmarks are short-lived etc.)

uv run --with pyperf python -m pyperf compare_to /tmp/baseline_affinity.json /tmp/modified_affinity.json --table --table-format md
Benchmarkbaseline_affinitymodified_affinity
many_optionals693 us688 us: 1.01x faster
subparsers7.71 ms7.65 ms: 1.01x faster
async_generators290 ms288 ms: 1.00x faster
async_tree_cpu_io_mixed_tg411 ms414 ms: 1.01x slower
async_tree_eager_cpu_io_mixed344 ms344 ms: 1.00x faster
async_tree_eager_cpu_io_mixed_tg380 ms385 ms: 1.01x slower
async_tree_eager_memoization172 ms170 ms: 1.01x faster
async_tree_eager_tg162 ms165 ms: 1.02x slower
async_tree_io447 ms454 ms: 1.02x slower
async_tree_memoization243 ms234 ms: 1.04x faster
async_tree_memoization_tg252 ms239 ms: 1.06x faster
async_tree_none_tg200 ms195 ms: 1.03x faster
asyncio_tcp301 ms269 ms: 1.12x faster
asyncio_tcp_ssl1.28 sec1.27 sec: 1.00x faster
asyncio_websockets359 ms357 ms: 1.01x faster
chameleon12.1 ms12.1 ms: 1.01x faster
chaos44.4 ms44.1 ms: 1.01x faster
comprehensions12.6 us12.7 us: 1.01x slower
bench_thread_pool795 us800 us: 1.01x slower
crypto_pyaes56.8 ms57.3 ms: 1.01x slower
dask700 ms698 ms: 1.00x faster
deepcopy186 us186 us: 1.00x slower
deepcopy_reduce2.06 us2.08 us: 1.01x slower
deepcopy_memo18.6 us19.1 us: 1.03x slower
deltablue2.50 ms2.47 ms: 1.01x faster
django_template29.7 ms29.5 ms: 1.01x faster
docutils2.21 sec2.19 sec: 1.01x faster
dulwich_log44.2 ms45.0 ms: 1.02x slower
fannkuch285 ms280 ms: 1.02x faster
gc_traversal4.08 ms4.25 ms: 1.04x slower
generators22.9 ms22.6 ms: 1.01x faster
genshi_text17.1 ms17.3 ms: 1.01x slower
genshi_xml39.5 ms39.2 ms: 1.01x faster
go90.0 ms89.8 ms: 1.00x faster
hexiom4.39 ms4.47 ms: 1.02x slower
html5lib48.9 ms48.3 ms: 1.01x faster
json_dumps7.57 ms7.50 ms: 1.01x faster
json_loads18.4 us18.5 us: 1.01x slower
logging_simple4.54 us4.43 us: 1.02x faster
mako8.47 ms8.49 ms: 1.00x slower
mdp941 ms965 ms: 1.03x slower
meteor_contest95.9 ms94.8 ms: 1.01x faster
nbody67.5 ms67.9 ms: 1.01x slower
nqueens73.6 ms72.4 ms: 1.02x faster
pathlib10.0 ms10.1 ms: 1.01x slower
pickle13.8 us13.8 us: 1.01x faster
pickle_dict24.9 us24.6 us: 1.01x faster
pickle_list4.08 us4.11 us: 1.01x slower
pickle_pure_python250 us247 us: 1.01x faster
pidigits185 ms184 ms: 1.00x faster
pprint_safe_repr573 ms568 ms: 1.01x faster
pprint_pformat1.18 sec1.16 sec: 1.02x faster
pyflate327 ms324 ms: 1.01x faster
python_startup11.0 ms11.0 ms: 1.00x faster
python_startup_no_site6.48 ms6.48 ms: 1.00x faster
raytrace211 ms214 ms: 1.02x slower
regex_compile98.2 ms98.3 ms: 1.00x slower
regex_dna164 ms156 ms: 1.05x faster
regex_effbot2.32 ms2.18 ms: 1.06x faster
regex_v818.2 ms17.5 ms: 1.04x faster
richards33.6 ms34.3 ms: 1.02x slower
richards_super38.5 ms38.3 ms: 1.00x faster
scimark_fft204 ms203 ms: 1.01x faster
scimark_lu68.9 ms66.6 ms: 1.04x faster
scimark_monte_carlo43.2 ms44.0 ms: 1.02x slower
scimark_sor75.9 ms74.7 ms: 1.02x faster
scimark_sparse_mat_mult3.24 ms3.05 ms: 1.06x faster
spectral_norm64.5 ms64.7 ms: 1.00x slower
sphinx808 ms798 ms: 1.01x faster
sqlglot_v2_normalize82.4 ms83.6 ms: 1.01x slower
sqlglot_v2_optimize41.7 ms41.8 ms: 1.00x slower
sqlglot_v2_parse973 us990 us: 1.02x slower
sqlglot_v2_transpile1.26 ms1.25 ms: 1.00x faster
sympy_integrate16.4 ms16.4 ms: 1.00x slower
sympy_sum112 ms112 ms: 1.00x faster
sympy_str214 ms215 ms: 1.01x slower
telco118 ms120 ms: 1.02x slower
tomli_loads1.49 sec1.50 sec: 1.01x slower
tornado_http79.7 ms79.4 ms: 1.00x faster
typing_runtime_protocols124 us121 us: 1.02x faster
unpack_sequence32.7 ns31.9 ns: 1.03x faster
unpickle11.0 us10.7 us: 1.02x faster
unpickle_list3.95 us3.99 us: 1.01x slower
unpickle_pure_python163 us163 us: 1.00x slower
xdsl_constant_fold36.0 ms36.2 ms: 1.01x slower
xml_etree_parse109 ms108 ms: 1.01x faster
xml_etree_iterparse68.3 ms67.3 ms: 1.01x faster
xml_etree_generate68.3 ms67.3 ms: 1.01x faster
xml_etree_process47.4 ms47.7 ms: 1.01x slower
Geometric mean(ref)1.00x faster

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

1 participant

@maurycy

[8]ページ先頭

©2009-2026 Movatter.jp