Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-112532: Improve mimalloc page visiting#114133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
DinoV merged 1 commit intopython:mainfromcolesbury:112532-visitor
Jan 22, 2024

Conversation

colesbury
Copy link
Contributor

@colesburycolesbury commentedJan 16, 2024
edited
Loading

This adds support for visiting abandoned pages in mimalloc and improves the performance of the page visiting code. Abandoned pages contain memory blocks from threads that have exited. At some point, they may be later reclaimed by other threads. We still need to visit those pages in the free-threaded GC because they contain live objects.

This also reduces the overhead of visiting mimalloc pages:

  • Special cases for full, empty, and pages containing only a single block.
  • Fix free_map to use one bit instead of one byte per block.
  • Use fast integer division by a constant algorithm when computing block offset from block size and index.
  • Faster looping over bitmap.
  • Fix bug where blocks in the delayed-free list were erroneously visited as if they were live.

This adds support for visiting abandoned pages in mimalloc and improvesthe performance of the page visiting code. Abandoned pages containmemory blocks from threads that have exited. At some point, they may belater reclaimed by other threads. We still need to visit those pages inthe free-threaded GC because they contain live objects.This also reduces the overhead of visiting mimalloc pages: * Special cases for full, empty, and pages containing only a single   block. * Fix free_map to use one bit instead of one byte per block. * Use fast integer division by a constant algorithm when computing   block offset from block size and index.
Copy link
Contributor

@DinoVDinoV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Just curious about the unused function(s), otherwise LGTM!

}

// Visit all blocks in a abandoned segments
bool _mi_abandoned_pool_visit_blocks(mi_abandoned_pool_t* pool, uint8_t page_tag, bool visit_blocks, mi_block_visit_fun* visitor, void* arg) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This (and therefore the previous 2 functions) doesn't seem to be used anywhere?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

These will be used in the upcoming GC PR. Here is an example usage:

https://github.com/colesbury/cpython/blob/8314c7c1d9d9670d4a83b9dc12f23611493c8eaa/Python/gc_free_threading.c#L226-L227

I put them in this PR because:

  1. Keeping the mimalloc changes separate makes them a bit easier to track and upstream
  2. The GC PR will be big and doing this first makes the upcoming PR a bit smaller

@colesbury
Copy link
ContributorAuthor

@DinoV, would you please merge this when you are ready?

@DinoVDinoV merged commit412920a intopython:mainJan 22, 2024
@colesburycolesbury deleted the 112532-visitor branchJanuary 22, 2024 21:14
aisk pushed a commit to aisk/cpython that referenced this pull requestFeb 11, 2024
This adds support for visiting abandoned pages in mimalloc and improvesthe performance of the page visiting code. Abandoned pages containmemory blocks from threads that have exited. At some point, they may belater reclaimed by other threads. We still need to visit those pages inthe free-threaded GC because they contain live objects.This also reduces the overhead of visiting mimalloc pages: * Special cases for full, empty, and pages containing only a single   block. * Fix free_map to use one bit instead of one byte per block. * Use fast integer division by a constant algorithm when computing   block offset from block size and index.
daanx added a commit to microsoft/mimalloc that referenced this pull requestJun 3, 2024
daanx added a commit to microsoft/mimalloc that referenced this pull requestJun 3, 2024
daanx added a commit to microsoft/mimalloc that referenced this pull requestJun 3, 2024
daanx added a commit to microsoft/mimalloc that referenced this pull requestJun 3, 2024
Glyphack pushed a commit to Glyphack/cpython that referenced this pull requestSep 2, 2024
This adds support for visiting abandoned pages in mimalloc and improvesthe performance of the page visiting code. Abandoned pages containmemory blocks from threads that have exited. At some point, they may belater reclaimed by other threads. We still need to visit those pages inthe free-threaded GC because they contain live objects.This also reduces the overhead of visiting mimalloc pages: * Special cases for full, empty, and pages containing only a single   block. * Fix free_map to use one bit instead of one byte per block. * Use fast integer division by a constant algorithm when computing   block offset from block size and index.
clrpackages pushed a commit to clearlinux-pkgs/mimalloc that referenced this pull requestJan 16, 2025
…3.0.1ArtSin (1):      Fix int and long handling and the use of (u)intptr_t in _mi_vsnprintfDaan (65):      bump version to v1.8.8 for further development      typo in stress test      fix pthread initalization of mutexes      fix c++ compilation      decrease meta allocation zone to 4k (to reduce .bss)      increase thread data cache to32 entries      whitespace      remove old mi_abandoned_await_readers      re-enable tsan test in azure pipelines      add reference to page_malloc_zero in C++ build      increase iterations for tsan test      reduce UBSAN parameters to stay within pipeline limits      rename arena-abandoned to arena-abandon      update aligned documentation      add js for docs      add search js files for docs      add docs svg's      add heap tag to area descriptor      update docs      update docs      fix count/size order in mi_heap_alloc_new_n, issue #906      initial work on guarded objects      fix UINT32_MAX constant (see issue #913)      set lower parameters for guarded test      add guarded build to test pipeline      increase test timeout for azure pipeline      increase TSAN test to 400 iterations      add cmake option to add C pre processor definitions more easily      allow certain options to have defaults set via the pre-processor at build time -- see issue #945      add test for issue #944      fix MI_EXTRA_CPPDEFS setting      reorganize primitives for process initialization; use special data segment on Windows for thread termination by default on Windows now (issue #869)      add cmake option to fall back on the fiber api do detect thread termination on windows      fix build on windows      fix duplicate definition on windows      fix win32 compilation      fix fast divisor for 32-bit platforms      cleanup process init/done      fix issue where searching for abandoned blocks would skip the first one      add missing mi_thread_done definition      improve windows static library initialization to account for thread local destructors (issue #944)      fix assertion check      do not reclaim segments if free-ing from a thread with an already abandoned heap (issue #944)      update mimalloc redirect to v1.2 to handle static destructors that free memory (issue #944)      update mimalloc-redirect      update comments, set constructor priority to 101 on macOS      add 0 byte to canary to prevent spurious read overflow to read the canary (issue #951, pr #953)      disable aligned hinting or SV39 mmu's, issue #939, and pr #949      remove wrong assertion      update test file      update mimalloc-redirect to potentially fix issue #957      allow build time setting of sample rate      small fixes for macOS      various fixes for test pipeline      fix debug build of MI_GUARDED      fix missing void      fix macos 15 OS name      temporarily add macOS 13 and 12 for testing      fix for macOS 14 and earlier      use non-null tld in heap_init      fix assertion      fix TLS slot on macOS      add neon code for bit clear      add neon version for chunk_is_clear      Update readme.md to fix links (issue #978)Daan Leijen (61):      add initial primitive api for locks      move lock code to atomic.h      fix warnings      shuffle for 128 bit      set compile as C++ in VS IDE      clean up guarded allocation      add comments      fix use_guarded signature      use enqueue_from_full, and keep inserting at the end      fix std malloc compile of the stress test      add windows arm64 target to vs2022      add redirection dll for windows on arm64      add minject for windows arm64      add Windows arm64 support in cmame; name the mimalloc dll 'mimalloc-override.dll' on Windows with cmake (to match the IDE and minject      update readme      update arm64 redirection      testing on arm64      make timeout for tests in the pipeline up to 4 min      better stats for commit on overcommit systems (by not counting on-demand commit upfront)      add support for arm64ec      update redirection modules to v1.3      add _base test for redirection      update redirection readme      fix cmake for visual studio on arm64      update readme for cmake on windows      add link for VS generator      revert back to generating mimalloc.dll instead of mimalloc-override.dll      don't prefer high used candidate if it is too full      update IDE settings to match cmake output; in particular mimalloc-override.dll -> mimalloc.dll      add updated minject v1.2 that defaults to mimalloc.dll instead of mimalloc-override.dll      update readme to use mimalloc.dll (instead of mimalloc-override.dll)      fix cmake to generate mimalloc.dll on windows      don't override a page candidate with a page that is too full      insert full pages at the end of the queue; only override page candidate if the page is not too full      fix build      fix max va bits on unix      fix issue #976      fix initializer warning on clang-18      rename segment_map_destroy to segment_map_unsafe_destroy      add filters for vs projects      remove older vs projects as they became stale      update vs project filter      avoid accessing heap->tld after running thread_done (pr #974)      fix potentially warning on gcc (pr #935)      add newline      fix alignment for mi_manage_os_memory (issue #980)      add thread_local for c++      disable large pages by default      fix signedness warning      fix initialization warning on gcc      combine flags and xthread_id      nicer logic in free      merge from dev3-bin      update to v1.8.8      bump version to 3.0.0      allow large OS pages on Linux by default (but not on Android)      fix link in readme      bump version to 3.0.1 for further development      bump version to 1.8.9 for further development      fix large OS page behaviour on Linux; default is now 2 which only uses large OS pages (not huge) through madvise      display full version during cmakeDaisuke Fujimura (fd0) (1):      Build on cygwinDanny Lin (1):      Change macOS mmap tag to fix conflict with IOAcceleratorDavid Carlier (1):      _mi_memcpy/_mi_memzero: tighten criteria for intrinsics for windows.Diego Russo (1):      Fix illegal instruction for older Arm architecturesIkko Eltociear Ashimine (1):      docs: update readme.mdJavier Blazquez (1):      free segment map when destroy_on_exit is setJim-Wang (1):      fix build error on linuxJoris van der Geer (1):      readme - describe how to run under Valgrind with dynamic overrideMichael Neumann (1):      Fix build on FreeBSD-derivate DragonFlyPhilip Brown (1):      Musl needs __libc* functions tooQuarticCat (1):      fix typosRui Ueyama (1):      Add a missing #includeZhihua Lai (1):      Fix typodaanx (240):      prevent UB in arena reservation      fix spelling      increase max arenas      add support for sub-processes (to supportpython/cpython#113717)      add initial support for visiting abandoned segments per subprocess, upstream forpython/cpython#114133      add support to visit _all_ abandoned segment blocks per sub-process, upstream forpython/cpython#114133      optimize heap walks, by Sam Gross, upstream ofpython/cpython#114133      fix leak in abandoned block visiting      only reclaim for exclusive heaps in their associated arena      revise the segment map to only apply to OS allocated segments and reduce the .BSS footprint      fix cast; make segment map static      reduce delayed output from redirection to 16KiB to reduce the .bss size      use EFAULT if a target heap tag cannot be found on reclaim      always include sys/prctl.h on linux to disable THP if large_os_pages are not enabled      switch between OS and arena allocation in stress test      more aggressive reclaim from free for OS blocks      revisit atomic reclaim for abandoned segments      push os abandoned blocks at the tail end      maintain count of the abandoned os list      fix leak where OS abandoned blocks were not always reclaimed      refactor arena abandonment in a separate file      refactor arena-abandoned to be an include for backward compat with existing build scripts      fix vs 2022 ide      don't reset a segment thread id when iterating      don't reset a segment thread id when iterating      fix asan tracking by explicitly setting memory to undefined before a free      fix potential race on subproc field in the segment      update documentation      update doxyfile      add extra assertions to check that blocks are always aligned to MI_MAX_ALIGN_SIZE      fix alignment test      initial working guarded pages      fix multi-threaded free to unprotect guarded blocks      clean up guarded pages code      don't consider memory as large OS pages if only madvise'd      prefer pages that do not expand      search N pages for a best fit      insert full pages that became unfull, at the start of the page queue to increase potential reuse      revert back to unfull at the end of queues as it slows down some benchmarks (like alloc-test1)      reduce page search to 8      add virtual address bits and physical memory to the mem config      add address hint to primitive allocation API      update guarded implementation to use block tags      rename mi_debug_guarded_  to mi_guarded_      add sampling for guarded objects      add guarded objects that are sampled (and fit a size range). guarded sample rate etc can be set per heap as well as defaulted with options      fix asan with MI_GUARDED      update azure pipeline to use sample rate of 1000 for guarded objects      Extend azure pipeline with Ubuntu 24 & 20, windows 2019, and macOS 15      fix azure pipeline      add target_segments_per_thread option      clean up candidate search; add mi_collect_reduce      ensure forced abandoned pages can be accessed after free      wip: initial work on mimalloc3 without segments      wip: further progress on removing segments      wip: further progress on segment removal; arena allocation      wip: further progress on segment removal; arena allocation      can compile without missing functions      wip: update any_set      wip: can run initial test      wip: bug fixes      wip: bug fixes      wip: bug fixes      wip: add generic find_and_xset      wip: rename arena blocks to slices      compile with clang and gcc      wip      first version that passes the make test      pass all debug tests      bug fixes      wip: cannot compile      wip: use epoch with 512bit chunks      wip: can run mstress      fix free stats      add base and size to OS memid      can run basic test      can run the full test suite      revise free reclaim; ensure unown cannot race with a free      fix assertions      increase MAX_OBJ_SLICES to a full chunk (32MiB)      wip: initial large bitmaps      large bitmaps working; lock on arena_reserve      small fixes      more documentation; better pairmap find_and_set_to_busy, busy flag is now 0x10      small adjustments      change to full_page_retain      tune free-ing and abandoning      initial no more pairmap      working simplified version without pairmaps and bitmap epoch      record max_clear bit      fix page info size and order; atomic page flags      compile for 32-bit as well      small fixes      Add MI_ARCHOPT option to enable architecture specific optimizations      revise visiting arenas, better bitmap scanning      Add MI_ARCHOPT support for msvc      arch specific optimizations      check heaptag on abandonded page allocation      specialize bitmap operations for common page sizes      check for running in a threadpool to disable page reclaim      only enable architecture specific optimization for armv8.1      update bit primitives      fix spelling      update optimization on haswell      delete old files      add dedicated meta data allocation for threads and tld      comments      fix write to empty heap in mi_guarded build      remove os_tld and stats parameters to os interface      fix bug where only the first chunkmap field would be considered      set default arena reserve back to 1GiB      various improvements      add cast to avoid errors on clang 7      add cast to avoid errors on clang 7      fix 32 bit multiply in generic ctz/clz      add bsf/bsr for compilation with older compilers (clang 7)      improve generic ctz/clz      add extra checks for valid pointers in the pagemap, add max_vabits and debug_commit_full_pagemap options      fix generic ctz/clz      improve popcount      fix MI_GUARDED build      better block alignment      add asan/ubsan/tsan and valgrind to default debug build      heap meta data always uses mi_meta_zalloc      ensure incompatible heaps are not absorbed      fix comments in types; fix guarded alignment bug      small updates      fix build error      use frac 8 for reclaim_on_free and reabandon; halve full_page_retain if running in a threadpool      wip: allow arena (re)loading      maintain pages set for arenas; improve arena load/unload      space out threads when searching for free pages      use thread spacing for reclaim as well      use thread spacing for reclaim as well      lower full page retain more aggressively in a threadpool      fix free bug for meta data      add debug output for page map; free tld on thread exit      comment      nicer debug output      wip: start on purge      enable purging of free committed slices from arenas      clean up bitmap api      fix avx2 bug with atomics      flexible clearN_ that can start at any index      fix concurrent mi_tld access bug      small fixes      wip: binned bitmap for the free slices      more bbin size classes, bug fixes      remove maxaccessed from general bitmaps      add delay to purg'ing; call collect_retired every N generic allocs      comments      comments      fix infoslices needed calculation      fix bug in bitmap_forall_ranges      fix purging with ranges      atomically clear purge bits when visiting      update minject to v1.1      add ajust stats to compensate for double counting      adjust stats more clearly to avoid double counting commits      adjust stats more clearly to avoid double counting commits      update stat adjustment for purging      update arch detection in cmake      syntax error      add comments/doc      fix MI_ARCH test      add specialized is_set for 1 bit      small fixes; max object size  1/8th of a pages      remove busy wait for arena reservation      use srw lock on windows      subprocesses own arena's      fix lock recursion      make stats part of a subproc      merge subproc stats on delete      track os abandoned pages in a list      allocate heaps associated with an arena in that arena      add initial load/unload for heaps      update lock primitive; fix arena exclusive allocation      remove req_arena parameter to arena_reserve      limit purgeing to one purge cycle per purge delay      fix build error      limit candidate search to 4      merge from dev      re-add deferred free and heap retired collect      enable collecting from the full page queue      fix signed/unsigned; fix heap_destroy assert failure      initial work on a two-level page-map      fix page_map initialization      revert back to flat address map      add -mtune=native with opt arch      experiment with 2 level pagemap      improving level 2 page-map      small fixes      rename option pagemap_commit; always commit the page map on macos (for now)      support full secure build      clean up      cleanup, some renaming      cleanup      old purge delay      merge from dev3      fix recursive tls access on macOS <= 14      document way to use a TLS slot on windows      add abandoned_visit_blocks      commit 2level page-map on over-commit systems      remove is_large member (and use is_pinned for this)      add _mi_os_guard_page_size      fix guard page size      fix purge delay check for arenas      double arena per 4; large page objects 1/8 of large page size      max obj size 1/8 of a page      commit page on demand      improve page commit on demand      fix assertion for huge pages      fix huge page allocation size      fix rounding issue with huge size allocations      rename page options      nice colors for heap maps      remove is_expandable requirement on page candidates      fix build warning      fix page commit-on-demand setting      commit page-map within one allocation      wip: merging from upstream      improve commit stats      small fixes      fix debug_show_arenas parameters      fix constructor re-initialization on subproc_main      fix c++ initializer warning      renamed vcxproj      add comments about TLS      add attr_noexept for better codegen on msvc      add declspec hidden to improve codegen on arm64      use fixed tls on windows with static linking      merge from dev3      add comments      make bitmap scan cross bfields for NX; disable the use of large object pages      fix debug output      fix scan of NX      fix NX test in try_find_and_clearN      fix pointer alignment for zero-sized large alignment case      search size bins from small to large      fix enable large pages
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@DinoVDinoVDinoV left review comments

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

2 participants
@colesbury@DinoV

[8]ページ先頭

©2009-2025 Movatter.jp