NotificationsYou must be signed in to change notification settings
Fork33.3k
Star69.7k

gh-112087: Make list_repr and list_length to be thread-safe#114582

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

corona10 merged 3 commits intopython:mainfromcorona10:gh-112087-repr-length

Jan 26, 2024

Merged

gh-112087: Make list_repr and list_length to be thread-safe#114582

corona10 merged 3 commits intopython:mainfromcorona10:gh-112087-repr-length

Jan 26, 2024

Conversation

Copy link

Member

corona10 commentedJan 26, 2024•
edited by bedevere-appbot
Loading

Issue:Makelist objects thread-safe in--disable-gil builds #112087

pythongh-112087: Make list_repr and list_length to be thread-safe

9b8c0c7

corona10 added skip news topic-free-threading labels

Jan 26, 2024

corona10 requested review fromDinoV,colesbury andserhiy-storchaka

January 26, 2024 08:15

bedevere-appbot added the awaiting core review label

Jan 26, 2024

bedevere-appbot mentioned this pull request

Jan 26, 2024

Makelist objects thread-safe in--disable-gil builds#112087

Closed

4 tasks

Copy link

MemberAuthor

corona10 commentedJan 26, 2024•
edited
Loading

I just worked on 2 methods becauselist_length is the first method that requires_Py_atomic_load_ssize_relaxed which creates fragmented implementation. until now, just usingPy_BEGIN_CRITICAL_SECTION API or@critical_section annotation was enough, but from now it's not.

So from now, I would like to check other core devs opinions about this.

I expect that@serhiy-storchaka can propose a better solution :)
Also,@ambv can have other ideas about this.
The reference implementation is based oncolesbury/nogil-3.12@df4c51f82b

Once we decide about the fragmented implementations with atomic API, we can start to implement the rest of things in a unified and satisfied way.

corona10 requested a review fromambv

January 26, 2024 08:16

Copy link

Member

encukou commentedJan 26, 2024

This looks maintainable enough to me.

I can't quite check if it's “thread safe” -- AFAIK we don't really define what that is, yet.
One thing that's not clear to me is whetherPy_SIZE itself should be atomic, or if all calls to it should be replaced by an atomic load (or in critical sections).

corona10 commented

Jan 26, 2024

View reviewed changes

Objects/listobject.c Outdated

		#ifdefPy_GIL_DISABLED
		return_Py_atomic_load_ssize_relaxed(&(_PyVarObject_CAST(a)->ob_size));
		#else
		returnPy_SIZE(a);

Copy link

MemberAuthor

corona10Jan 26, 2024•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@encukou said that makingPy_SIZE to be thread-safe for free-threaded build would be beneficial rather than implementinglist_length with macro if possible :)

Copy link

Member

encukouJan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Itcould. I don't know enough about the grand plan to know for sure :)

Copy link

Contributor

colesburyJan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think pushing the atomic load down toPy_SIZE() is reasonable and will probably simplify call sites.

That was something I tried and then abandoned in nogil-3.9 because at the time the macro was also used as an l-value likePy_SIZE(ob) = 1. Fortunately, that's no longer a concern.

Copy link

Contributor

colesburyJan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Alternatively, we could just do the atomic load inPyList_GET_SIZE() and consistently use that within this file. I think I'd prefer that.

There are some loops that usePy_SIZE() on immutable objects that might slow down if we makePy_SIZE use an atomic load. For example, incodeobject.c:

cpython/Objects/codeobject.c

Lines 433 to 436 in30b7b4f

	while (entry_point<Py_SIZE(co)&&
	_PyCode_CODE(co)[entry_point].op.code!=RESUME) {
	entry_point++;
	}

The compiler will currently lift thePy_SIZE() load outside the loop. But it won't do that optimization with an atomic load.

It's hard to know if these sorts of things will make a difference overall, but it's often easier to avoid the potential performance issues in the first place than trying to find, benchmark, and fix the issues later.

Copy link

Contributor

colesburyJan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Sorry, my opinion keeps changing as I read throughlistobject.c. Changing all thePy_SIZE() calls toPyList_GET_SIZE() seems like it would be a lot of noise. To be honest, I think there are a lot of reasonable approaches, and whatever you decide to do is fine with me. Here is my current thinking:

Most of thePy_SIZE() calls are reads from functions that will be within critical sections in the future. Those are fine as is.
If we readob_size outside of a critical section, it should use an atomic load. Either directly or indirectly, such as by callinglist_length().

Forlist_repr, my inclination would be to push the zero-size check down into the critical section, but using an atomic load also seems fine.

Copy link

Contributor

erlend-aaslandJan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Fortunately, that's no longer a concern.

Thanks to@vstinner for fixing this! 👏

Copy link

MemberAuthor

corona10Jan 26, 2024•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hmm.. well, from the view of maintaining implementation detail as not changed as possible for the list object itself, updating PyList_GET_SIZE() to be an atomic operation is worth doing. It's good negotiation point I guess.

If we need to make Py_SIZE() as atomic operation, we can handle it later.

Copy link

Member

serhiy-storchaka commentedJan 26, 2024

I think that either all reads and writes ofob_size should be atomic, or they all should be in critical sections.

Copy link

MemberAuthor

corona10 commentedJan 26, 2024•
edited
Loading

@colesbury
What about makingPy_SIZE() to be atomic instead of touching this from runtime implementation?
A side effect is that there will be some minor effect to something like a tuple, but not that huge IIUC.

If we are okay with this, I will submit a new PR.

corona10 mentioned this pull request

Jan 26, 2024

Update Py_SIZE() as the atomic operation for free-threaded build.#114603

Closed

Address code review

7009cec

Copy link

MemberAuthor

corona10 commentedJan 26, 2024

@colesbury

Updated! The implementations become more clear and easy to maintain.
Now I understand our consensus about how to handle atomic operations in runtime.
I will submit PRs with similar approaches.

nit

4d0fe6a

Copy link

Contributor

colesbury commentedJan 26, 2024

@serhiy-storchaka wrote:

I think that either all reads and writes of ob_size should be atomic, or they all should be in critical sections.

I think we should follow slightly different pattern inlist:

Writes toob_size should be atomic AND in critical sections
Reads fromob_size should be either atomic OR in critical sections

Exceptions: initialization, deallocation, and certain special functions can use plain accesses without critical sections (e.g.,PyList_New(),list_dealloc,list_traverse).

The motivation behind this is that we will have some accesses outside of critical sections (see"Optimistic dict and list Access Summary" in PEP 703), so we can't use critical sections everywhere, but we will use them for most operations.

At the same time, I think we should prefer plain, non-atomic reads from within critical sections. We could conservatively use atomic reads and still be correct, but my experience is that adding atomic operations where they are not necessary tends to hide data races from thread sanitizer.

Note that writes toob_size have to be at atomic even if they are within critical sections because there might be concurrent reads.

Copy link

Member

serhiy-storchaka commentedJan 26, 2024

You at least need to make writing the list size atomic.

Copy link

Contributor

colesbury commentedJan 26, 2024

You at least need to make writing the list size atomic.

I agree, but I don't think we should do that in this PR. I think it's easier to write and review the PRs organized around functions than around fields (i.e., "fix list_repr" instead of "fix ob_size"). It keeps the changes smaller and close together.

The current changes look sufficient to me if we later address all the functions that write toob_size.