Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-113055: Use pointer for interp->obmalloc state.#113412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
nascheme merged 5 commits intopython:mainfromnascheme:obmalloc_state_ptr
Jan 27, 2024

Conversation

@nascheme
Copy link
Member

@naschemenascheme commentedDec 22, 2023
edited by bedevere-appbot
Loading

For interpreters that share state with the main interpreter, this points to the same static memory structure. For interpreters with their own obmalloc state, it is heap allocated. Add free_obmalloc_arenas() which will free the obmalloc arenas and radix tree structures for interpreters with their own obmalloc state.

Note that the free is only done if obmalloc used blocks is zero. If some blocks are used, it's not safe to free the memory since some extension might still be using it.

For interpreters that share state with the main interpreter, this pointsto the same static memory structure.  For interpreters with their ownobmalloc state, it is heap allocated.  Add free_obmalloc_arenas() whichwill free the obmalloc arenas and radix tree structures for interpreterswith their own obmalloc state.
@nascheme
Copy link
MemberAuthor

This should improve the situtation forgh-113055. There will still be memory leaks if interpreters don't free their memory (or load extensions who don't free their memory). However, when the Py_RTFLAGS_USE_MAIN_OBMALLOC flag is set (the default, I think), the behaviour will be much like it was in 3.11. I.e. interpreters use the main interpreter obmalloc state.

The obmalloc state for the main interpreter is back to being a static structure in the BSS, as it was in 3.11. My guess is theget_state() function in obmalloc is a tiny bit faster, because there is no test branch in it. I didn't profile that yet though. I thinkget_state() will be fast enough that it doesn't matter.

The obmalloc_state_main global is okay and should be ignored.
@nascheme
Copy link
MemberAuthor

It looks like the performance different forget_state() is tiny. Before this change (fromperf record andperf report):

perf_obmalloc_before

And after:

perf_obmalloc_ptr

The counts in the left hand column are then number of samples that hit that instruction. My quick-and-dirty benchmark was:

./Programs/_testembed test_repeated_init_exec "$(cat bench_embed.py)"

withbench_embed.py containing:

import unittestfor i in range(4):    unittest.main('test.test_dict', verbosity=1, exit=False)    unittest.main('test.test_decimal', verbosity=1, exit=False)    unittest.main('test.test_generators', verbosity=1, exit=False)    unittest.main('test.test_set', verbosity=1, exit=False)

I didn't build with LTO and PGO, that would change things a bit I would guess.

@nascheme
Copy link
MemberAuthor

With PGO and LTO turned on, the code becomes quite jumbled up and hard to follow._PyObject_Free() disappears from the profile because it gets inlined in a bunch of places. Looking atPyObject_GC_Del(), there are no samples for the instruction corresponding toreturn interp->obmalloc inget_state().

image

@neonene
Copy link
Contributor

neonene commentedDec 23, 2023
edited
Loading

Non-debug builds have memory leaks on Windows even using33e0adb. At least,_PyObject_Arena.free() seems to never be called in theinsert_to_freepool function.

The same goes for Python 3.12.1 with/without this patch applied.

EDIT: Observed withtest_repeated_simple_init

@neonene
Copy link
Contributor

neonene commentedDec 23, 2023
edited
Loading

_PyObject_Arena.free() seems to never be called in the insert_to_freepool function

Bisected to:ea2c001, which this PR does not fully affect on my non-debug build.

@nascheme
Copy link
MemberAuthor

I think this PR should get merged. It should be safe and it reduces memory used by sub-interpreters that share the main interpreter obmalloc state. It also eliminates a memory leak in the case of sub-interpreters that don't share the state. Seefree_obmalloc_arenas(). This call is conditional, only if_PyInterpreterState_GetAllocatedBlocks() returns zero. A more aggressive but less safe approach would be to always callfree_obmalloc_arenas(). We can decide on that after merging this change.

neonene reacted with thumbs up emoji

@neonene
Copy link
Contributor

@Yhg1s Any thoughts?

@gvanrossum
Copy link
Member

@ericsnowcurrently Have you reviewed this? Do you feel it's ready to be merged?

@ericsnowcurrently
Copy link
Member

I'm planning on taking a look in the next day or two.

Copy link
Member

@ericsnowcurrentlyericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for doing this! I think this is basically right. I didn't see any correctness issues.

Clearly I did leave a bunch of comments, but it's mostly little things like alternate spellings, whether or not to statically initializeobmalloc_state_main, and whether or not we really have to makePyInterpreterState.obmalloc a pointer.

@bedevere-app
Copy link

When you're done making the requested changes, leave the comment:I have made the requested changes; please review again.

@ericsnowcurrently
Copy link
Member

Also, weren't you considering freeing the blocks that don't have any objects in them any more, to reduce how much leaks?

@ericsnowcurrently
Copy link
Member

Also, what do you think our options are for backporting this to 3.12?

@nascheme
Copy link
MemberAuthor

I would prefer to use a pointer. I think it's cleaner/faster and I think the memory savings, while not huge, are non-trivial. On 64-bit platforms,struct _obmalloc_state is 256 kB. Most of that is the arena_map_root. If you allocate it in the BSS then typically only one entry of that array is actually used and so only one page of memory is actually resident. So typically 4 kB vs 256 kB. Also, every sub-interpreter that uses the main obmalloc state saves 256 kB of resident memory.

@nascheme
Copy link
MemberAuthor

Also, what do you think our options are for backporting this to 3.12?

Possible but a little scary. It's not a lot of code but it changes behavior quite a bit. Up to the release manager, I guess. The number of people affected by the memory usage/leaks is likely to be really small. At least, that's my guess. So maybe they need to skip 3.12 and go from 3.11 to 3.13 it this is problem for them.@neonene might have another perspective.

neonene reacted with thumbs up emoji

@ericsnowcurrently
Copy link
Member

I would prefer to use a pointer.

Sounds good.

FWIW, subinterpreters that use the main obmalloc state supported for backward compatibility, are already uncommon, and I expect they will be more so as we move forward. So I wouldn't worry about factoring them into the decision very much.

@nascheme
Copy link
MemberAuthor

Also, weren't you considering freeing the blocks that don't have any objects in them any more, to reduce how much leaks?

That's an idea. A bit of code to do that but probably not horrible. Another idea would be to add an-X flag or something similar that would enable freeing of all obmalloc arenas, even if there is a non-zero in-use count. Embedding programs could set this if they want to eliminate the leaks and are prepared to deal with extensions that crash because of it.

If it is safe to enablegh-113601 then maybe it is safe to just free the arenas as well. I think the same logic applies although it seems more likely freeing the arenas would be more likely to trigger a use-after-free bug.

naschemeand others added2 commitsJanuary 26, 2024 10:39
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
- Remove 'runtime' argument.- Remove 'initialized' and 'heap_allocated' members.- Use _Py_IsMainInterpreter()- Remove spurious whitespace change.- Return -1 on error and 0 on success, as normal convention.
@nascheme
Copy link
MemberAuthor

Thanks for the detailed review feedback. I think I've addressed all of your suggestions.

Copy link
Member

@ericsnowcurrentlyericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM

@ericsnowcurrently
Copy link
Member

Thanks for doing this!

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
@ericsnowcurrently
Copy link
Member

@Yhg1s, what are your thoughts on backporting this to 3.12?

aisk pushed a commit to aisk/cpython that referenced this pull requestFeb 11, 2024
)For interpreters that share state with the main interpreter, this pointsto the same static memory structure.  For interpreters with their ownobmalloc state, it is heap allocated.  Add free_obmalloc_arenas() whichwill free the obmalloc arenas and radix tree structures for interpreterswith their own obmalloc state.Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
@trygveaa
Copy link

This fixes a segfault happening with 3.12, so it would be great if this could be backported to 3.12. I posted more details about the crash in#116510.

@bacon-cheeseburger
Copy link

Just wanted to offer that this issue appears to impact Kodi, which has a considerable user base in the millions. The number of users affected by the problem could be quite substantial.

@ericsnowcurrently
Copy link
Member

ping@Yhg1s

Glyphack pushed a commit to Glyphack/cpython that referenced this pull requestSep 2, 2024
)For interpreters that share state with the main interpreter, this pointsto the same static memory structure.  For interpreters with their ownobmalloc state, it is heap allocated.  Add free_obmalloc_arenas() whichwill free the obmalloc arenas and radix tree structures for interpreterswith their own obmalloc state.Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@ericsnowcurrentlyericsnowcurrentlyericsnowcurrently approved these changes

@pablogsalpablogsalAwaiting requested review from pablogsal

@vstinnervstinnerAwaiting requested review from vstinner

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

6 participants

@nascheme@neonene@gvanrossum@ericsnowcurrently@trygveaa@bacon-cheeseburger

[8]ページ先頭

©2009-2025 Movatter.jp