NotificationsYou must be signed in to change notification settings
Fork33.7k
Star70.4k

Make concurrent iteration over pairwise, combinations, permutations, cwr, product, etc. from itertools safe under free-threading #123471

Open

Make concurrent iteration over pairwise, combinations, permutations, cwr, product, etc. from itertools safe under free-threading#123471

Assignees

Labels

extension-modulesC modules in the Modules dirsprinttopic-free-threadingtype-bugAn unexpected behavior, bug, or error

Description

eendebakpt

opened

on Aug 29, 2024

Bug report

Bug description:

Several methods from the C implementation of the itertools module are not yet safe to use under the free-threading build. In this issue we list several issues to be addressed. The issues below are discussed foritertools.product, but the issues are similar for the other classes.

When iterating overproduct the result tuple is re-used when the reference count is 1. We can use the new_PyObject_IsUniquelyReferenced method to perform the check whether we can re-use the tuple. (this issue was also reported inenum_next and pairwise_next can result in tuple elements with zero reference count in free-threading build #121464)
On the first invocation ofproduct a new result is constructed.

cpython/Modules/itertoolsmodule.c

Lines 2038 to 2044 in58ce131

	if (result==NULL) {
	/* On the first pass, return an initial tuple filled with the
	first element from each pool. */
	result=PyTuple_New(npools);
	if (result==NULL)
	gotoempty;
	lz->result=result;

This is not thread-safe, as multiple threads could haveresult == NULL evaluate to true. We could move the construction of theproductobject.result to the constructor ofproduct. This does mean thatproduct will use more memory before the first invocation ofnext. This seems to be acceptable, as constructing aproduct without iterating over it seems rare in practice.
The tuple also needs to be filled with data. Forproduct it seems safe to do this in the constructor, as the data is coming
fromproductobject->pools which is a tuple of tuples. But forpairwise the data is coming from an iterable

cpython/Modules/itertoolsmodule.c

Lines 337 to 343 in58ce131

	if (old==NULL) {
	old= (*Py_TYPE(it)->tp_iternext)(it);
	Py_XSETREF(po->old,old);
	if (old==NULL) {
	Py_CLEAR(po->it);
	returnNULL;
	}

which could be a generator. Reading data from the iterator before the first invocation ofpairwise_next seems like a behavior change we do not want to make.

An alternative is to use some kind of locking insideproduct_next, but the locking should not add any overhead in the common path otherwise the single-thread performance will suffer.

In case iterables are exhausted some cleaning up is done. For example inpairwise_next at

cpython/Modules/itertoolsmodule.c

Lines 352 to 356 in58ce131

	if (new==NULL) {
	Py_CLEAR(po->it);
	Py_CLEAR(po->old);
	Py_DECREF(old);
	returnNULL;

This cleaning up is not safe in concurrent iteration. Instead we can defer the cleaning up untill the object itself is decallocated (this approach was used forreversed, seehttps://github.com/python/cpython/pull/120971/files#r1653313765)

Actually constructing the new result requires some care as well. Even if we are fine with having funny results under concurrent iteration (see the discussionSequence iterator thread-safety #120496), the concurrent iteration should not corrupt the interpreter. For example this code is not safe:

cpython/Modules/itertoolsmodule.c

Lines 2077 to 2088 in58ce131

	indices[i]++;
	if (indices[i]==PyTuple_GET_SIZE(pool)) {
	/* Roll-over and advance to next pool */
	indices[i]=0;
	elem=PyTuple_GET_ITEM(pool,0);
	Py_INCREF(elem);
	oldelem=PyTuple_GET_ITEM(result,i);
	PyTuple_SET_ITEM(result,i,elem);
	Py_DECREF(oldelem);
	}else {
	/* No rollover. Just increment and stop here. */
	elem=PyTuple_GET_ITEM(pool,indices[i]);

If two threads both incrementindices[i] the check on line 2078 is never true end we end up indexingpool withPyTuple_GET_ITEM outside the bounds on line 2088. Here we could change the check intoindices[i] >= PyTuple_GET_SIZE(pool). That is equivalent for the single-threaded case, but does not lead to out-of-bounds indexing in the multi-threaded case (although it does lead to funny results!)

@rhettinger @colesbury Any input on the points above would be welcome.

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Linked PRs

Metadata

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Make concurrent iteration over pairwise, combinations, permutations, cwr, product, etc. from itertools safe under free-threading #123471

Description

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions