Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Tracemalloc C API scales poorly in multithreaded use #143057

Closed
Labels
interpreter-core(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usagetype-bugAn unexpected behavior, bug, or error
@ngoldbaum

Description

@ngoldbaum

Numpy has some wrappers for data allocation that call into the tracemalloc C API. For example, here's the wrapper aroundmalloc:

https://github.com/numpy/numpy/blob/f6440be7b8eec4a6481832f15f6730d984d78ef0/numpy/_core/src/multiarray/alloc.c#L255-L271

Recently astackoverflow question led me to report anumpy issue about poor multithreaded scaling. I think the bulk of the scaling bottleneck is due tothe global mutex in the tracemalloc implementation, as you can see in the flame graph and profile in the linked NumPy issue.

From the NumPy issue:

On my M3 Macbook Pro, I get the following stdout running the script:

Inner loops 10, multithreading  time: 6.68 sec, result sum: 717434683.1879175Inner loops 10, multiprocessing time: 4.86 sec, result sum: 717434683.1879175

@Yhg1s told me on Discord that he has a patch that adds a fast path to tracemalloc based on an atomic flag and that seems to help a lot.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usagetype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2026 Movatter.jp