Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-115103: Implement delayed memory reclamation (QSBR)#115180

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
colesbury merged 9 commits intopython:mainfromcolesbury:gh-115103-qsbr
Feb 16, 2024

Conversation

colesbury
Copy link
Contributor

@colesburycolesbury commentedFeb 8, 2024
edited
Loading

This adds a safe memory reclamation scheme for the free-threaded build based on FreeBSD's "GUS" and quiescent state based reclamation (QSBR). The API provides a mechanism for callers to detect when it is safe to free memory that may be concurrently accessed by readers.

Related PRs that build on this:


📚 Documentation preview 📚:https://cpython-previews--115180.org.readthedocs.build/

This adds a safe memory reclamation scheme based on FreeBSD's "GUS" andquiescent state based reclamation (QSBR). The API provides a mechanismfor callers to detect when it is safe to free memory that may beconcurrently accessed by readers.
@colesbury
Copy link
ContributorAuthor

Notes for reviewers:

  • This PR does not contain any uses of the API; I wanted to keep this PR a manageable size. I'll put up a PR that makes use of these APIs soon.
  • The per-thread states (struct _qsbr_thread_state) are stored in a contiguous array to make scanning quicker, but this makes the thread initialization code a bit more complicated.
  • I think this is one of the cases where memory orderings are particularly tricky and hard to reason about intuitively. I found it helpful to verify the orderings and fences withCDSChecker. This is the model I used:https://github.com/colesbury/c11-model-checker/blob/cpython-models/test/qsbr.c.
  • _Py_qsbr_poll() will have scaling issues with large numbers of threads. That's something we'll want to deal with eventually, but possibly not in the 3.13 timeframe.

if (QSBR_LT(rd_seq, min_seq)) {
// It's okay if the compare-exchange failed: another thread updated it
(void)_Py_atomic_compare_exchange_uint64(&shared->rd_seq, &rd_seq, min_seq);
rd_seq = min_seq;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If the compare/exchange fails is it worth returning the greater ofrd_seq andmin_seq?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It doesn't matter for correctness and I'm not sure it's worth any extra complexity for performance.

DinoV reacted with thumbs up emoji
Python/qsbr.c Outdated
// Starting size of the array of qsbr thread states
#define MIN_ARRAY_SIZE 8

// The shared write sequence is always odd and incremented by two. Detached
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can we document why the shared write sequence is always odd? I'm assuming that's just to avoid issues with it wrapping around and colliding withQSBR_OFFLINE?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yeah, that was the original motivation, but as I wrote in another comment, with 64-bit counters we don't really have to worry about wrap around.

I'm still tempted to keep this numbering scheme and theQBSR_LT wrap-around safe comparisons, but let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think it's fine to keep it this way, it's just worth a comment on why it's that way :)

@@ -169,6 +169,10 @@ extern PyTypeObject _PyExc_MemoryError;
{ .threshold = 10, }, \
}, \
}, \
.qsbr = { \
.wr_seq = 1, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can these be inited toQSBR_INITIAL (it seems like includingpycore_qsbr.h shouldn't be an issue?)

colesbury reacted with thumbs up emoji
// If there are no free entries, we pause all threads, grow the array,
// and update the pointers in PyThreadState to entries in the new array.
if (qsbr == NULL) {
_PyEval_StopTheWorld(interp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It seems like this isn't safe to do with the mutex locked and not-detached? If another thread is trying to reserve at the same time it won't be detached, but it also won't be responding to the request for the eval breaker.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You are right -- I mixed up the locking discipline. The outer lock should not use_Py_LOCK_DONT_DETACH. I think the locking discipline with stop-the-world should be:

  1. A lock can be acquired before starting a stop-the-world pause or after starting it, but not both (consistent lock ordering)
  2. If acquired before the stop-the-world, the acquisition should use_Py_LOCK_DETACH (the default)
  3. If acquired within the stop-the-world, the acquisition should use_Py_LOCK_DONT_DETACH

HEAD_LOCK(runtime) is an example of case 3. This is case 2.

The reasoning for case (3) is a bit subtle: locks can be directly handed off to a detached thread, so with a normalPyMutex_Lock(&m) a thread can both hold the mutexm and have paused for another thread's stop-the-world.

Python/qsbr.c Outdated

// Initialize (or reintialize) the freelist of QSBR thread states
static void
initialize_freelist(struct _qsbr_shared *shared)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybeinitialize_new_array or something like that? It's just that it's not only initing the free lists, it's also initializing all of the thread state pointers into the array (and thenew part covers the re-initialization)

colesbury reacted with thumbs up emoji
uint64_t
_Py_qsbr_advance(struct _qsbr_shared *shared)
{
return _Py_atomic_add_uint64(&shared->wr_seq, QSBR_INCR) + QSBR_INCR;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The FreeBSD version has logic to avoid making sure thatwr_seq doesn't get too far ahead fromrd_seq and if so blocks until it catches up to avoid extreme wrap around issues... Is there a reason we don't need that? :). Or should we at least add a TODO and/or an assertion?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We are using 64-bit sequence counters to avoid worrying about wrap around. FreeBSD uses 32-bit counters. (2^62 ns = ~146 years; 2^30 ns = ~1 sec)

We could consider using 32-bit counters and handling wrap around in the future. It wouldn't matter much for x86-64 and aarch64, but would probably be more efficient on some other platforms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Okay, that makes sense, FreeBSD's is buried in a typedef so I hadn't noticed it was 32-bit.

return true;
}

rd_seq = qsbr_poll_scan(qsbr->shared);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is probably just related to future scaling, but I wonder if we could pass in the goal and bail on the full-scan if any thread hasn't reached it yet. It might prevent us from updating the shared for someone else who would benefit from it though...

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It's not clear to me if that case will happen often enough to be worth handling. Unfortunately, we can't really measure or test for that until we disable the GIL. With the GIL, only one thread is ever attached and active.

DinoV reacted with thumbs up emoji
Copy link
Contributor

@DinoVDinoV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM!

@colesbury
Copy link
ContributorAuthor

@sethmlarson, does this need to be included in the SBOM? This includes code derived from a FreeBSD component, but there isn't really any shared code. In other words, it's derivative enough that I think the copyright notice should be preserved, but different enough that if there were any security vulnerabilities in the FreeBSD component, they would not be relevant here.

@sethmlarson
Copy link
Contributor

@colesbury Thanks for the ping! Great question, the SBOM for CPython actually doesn't concern itself with copyright or licensing right now, it's primary use-case is for vulnerability management and tracking. Given this I don't think we need to update the SBOM for this contribution.

colesbury reacted with thumbs up emoji

@colesburycolesbury merged commit5903190 intopython:mainFeb 16, 2024
@colesburycolesbury deleted the gh-115103-qsbr branchFebruary 16, 2024 20:25
woodruffw pushed a commit to woodruffw-forks/cpython that referenced this pull requestMar 4, 2024
…115180)This adds a safe memory reclamation scheme based on FreeBSD's "GUS" andquiescent state based reclamation (QSBR). The API provides a mechanismfor callers to detect when it is safe to free memory that may beconcurrently accessed by readers.
diegorusso pushed a commit to diegorusso/cpython that referenced this pull requestApr 17, 2024
…115180)This adds a safe memory reclamation scheme based on FreeBSD's "GUS" andquiescent state based reclamation (QSBR). The API provides a mechanismfor callers to detect when it is safe to free memory that may beconcurrently accessed by readers.
LukasWoodtli pushed a commit to LukasWoodtli/cpython that referenced this pull requestJan 22, 2025
…115180)This adds a safe memory reclamation scheme based on FreeBSD's "GUS" andquiescent state based reclamation (QSBR). The API provides a mechanismfor callers to detect when it is safe to free memory that may beconcurrently accessed by readers.
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@DinoVDinoVDinoV approved these changes

@ericsnowcurrentlyericsnowcurrentlyAwaiting requested review from ericsnowcurrentlyericsnowcurrently is a code owner

@markshannonmarkshannonAwaiting requested review from markshannonmarkshannon is a code owner

@gvanrossumgvanrossumAwaiting requested review from gvanrossum

@sethmlarsonsethmlarsonAwaiting requested review from sethmlarson

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@colesbury@sethmlarson@DinoV

[8]ページ先頭

©2009-2025 Movatter.jp