Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-128002: fixasyncio.all_tasks against concurrent deallocations of tasks#128541

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
kumaraditya303 merged 4 commits intopython:mainfromkumaraditya303:async-finalize
Jan 9, 2025

Conversation

kumaraditya303
Copy link
Contributor

@kumaraditya303kumaraditya303 commentedJan 6, 2025
edited by bedevere-appbot
Loading

Copy link
Contributor

@colesburycolesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This doesn't look right to me. The object can still be part way through a dealloc call when a stop the world call happens.

Can you provide more details about what bug or crash you are trying to fix?

@kumaraditya303
Copy link
ContributorAuthor

kumaraditya303 commentedJan 6, 2025
edited
Loading

This doesn't look right to me. The object can still be part way through a dealloc call when a stop the world call happens.

Please see#128002 (comment)

The bug is that the linked list holds borrowed references to tasks so it is possible that task is concurrently deallocated while it gets added to tasks list.

@kumaraditya303
Copy link
ContributorAuthor

Crash backtrace:

Objects/object.c:578: PyObject_CallFinalizerFromDealloc: Assertion failed: PyObject_CallFinalizerFromDealloc called on object with a non-zero refcount
Enable tracemalloc to get the memory block allocation traceback

object address : 0x7fb3781e1820
object refcount : 1
object type : 0x7fb3629e7d10
object type name: _asyncio.Task
object repr : <Task finished name='None' coro=<TestFreeThreading.test_all_tasks_race..main..coro() done, defined at /home/realkumaraditya/cpython/Lib/test/test_asyncio/test_free_threading.py:24> result=None>

Fatal Python error: _PyObject_AssertFailed: _PyObject_AssertFailed
Python runtime state: initialized

Thread 0x00007fb34edfd6c0 (most recent call first):
File "/home/realkumaraditya/cpython/Lib/test/test_asyncio/test_free_threading.py", line 29 in main
File "/home/realkumaraditya/cpython/Lib/asyncio/events.py", line 98 in _run
File "/home/realkumaraditya/cpython/Lib/asyncio/base_events.py", line 2033 in _run_once
File "/home/realkumaraditya/cpython/Lib/asyncio/base_events.py", line 678 in run_forever
File "/home/realkumaraditya/cpython/Lib/asyncio/base_events.py", line 707 in run_until_complete
File "/home/realkumaraditya/cpython/Lib/asyncio/runners.py", line 127 in run
File "/home/realkumaraditya/cpython/Lib/test/test_asyncio/test_free_threading.py", line 49 in runner
File "/home/realkumaraditya/cpython/Lib/threading.py", line 996 in run
File "/home/realkumaraditya/cpython/Lib/threading.py", line 1054 in _bootstrap_inner
File "/home/realkumaraditya/cpython/Lib/threading.py", line 1016 in _bootstrap

Current thread 0x00007fb34fdff6c0 (most recent call first):
File "/home/realkumaraditya/cpython/Lib/asyncio/runners.py", line 208 in _cancel_all_tasks
File "/home/realkumaraditya/cpython/Lib/asyncio/runners.py", line 71 in close
File "/home/realkumaraditya/cpython/Lib/asyncio/runners.py", line 63 inexit
File "/home/realkumaraditya/cpython/Lib/test/test_asyncio/test_free_threading.py", line 46 in runner
File "/home/realkumaraditya/cpython/Lib/threading.py", line 996 in run
File "/home/realkumaraditya/cpython/Lib/threading.py", line 1054 in _bootstrap_inner
File "/home/realkumaraditya/cpython/Lib/threading.py", line 1016 in _bootstrap

Thread 0x00007fb3542ff6c0 (most recent call first):
File "/home/realkumaraditya/cpython/Lib/asyncio/runners.py", line 208 in _cancel_all_tasks
File "/home/realkumaraditya/cpython/Lib/asyncio/runners.py", line 71 in close
File "/home/realkumaraditya/cpython/Lib/asyncio/runners.py", line 63 inexit
File "/home/realkumaraditya/cpython/Lib/test/test_asyncio/test_free_threading.py", line 46 in runner
File "/home/realkumaraditya/cpython/Lib/threading.py", line 996 in run
File "/home/realkumaraditya/cpython/Lib/threading.py", line 1054 in _bootstrap_inner
File "/home/realkumaraditya/cpython/Lib/threading.py", line 1016 in _bootstrap

Thread 0x00007fb3523fe6c0 (most recent call first):

Thread 0x00007fb3a66ebf40 (most recent call first):
File "/home/realkumaraditya/cpython/Lib/threading.py", line 1105 in join
File "/home/realkumaraditya/cpython/Lib/test/support/threading_helper.py", line 147 in start_threads
File "/home/realkumaraditya/cpython/Lib/contextlib.py", line 148 inexit
File "/home/realkumaraditya/cpython/Lib/test/test_asyncio/test_free_threading.py", line 57 in test_all_tasks_race
File "/home/realkumaraditya/cpython/Lib/unittest/case.py", line 606 in _callTestMethod
File "/home/realkumaraditya/cpython/Lib/unittest/case.py", line 660 in run
File "/home/realkumaraditya/cpython/Lib/unittest/case.py", line 716 incall
File "/home/realkumaraditya/cpython/Lib/unittest/suite.py", line 122 in run
File "/home/realkumaraditya/cpython/Lib/unittest/suite.py", line 84 incall
File "/home/realkumaraditya/cpython/Lib/unittest/suite.py", line 122 in run
File "/home/realkumaraditya/cpython/Lib/unittest/suite.py", line 84 incall
File "/home/realkumaraditya/cpython/Lib/test/libregrtest/testresult.py", line 148 in run
File "/home/realkumaraditya/cpython/Lib/test/libregrtest/single.py", line 58 in _run_suite
File "/home/realkumaraditya/cpython/Lib/test/libregrtest/single.py", line 38 in run_unittest
File "/home/realkumaraditya/cpython/Lib/test/libregrtest/single.py", line 136 in test_func
File "/home/realkumaraditya/cpython/Lib/test/libregrtest/single.py", line 92 in regrtest_runner
File "/home/realkumaraditya/cpython/Lib/test/libregrtest/single.py", line 139 in _load_run_test
File "/home/realkumaraditya/cpython/Lib/test/libregrtest/single.py", line 184 in _runtest_env_changed_exc
File "/home/realkumaraditya/cpython/Lib/test/libregrtest/single.py", line 284 in _runtest
File "/home/realkumaraditya/cpython/Lib/test/libregrtest/single.py", line 313 in run_single_test
File "/home/realkumaraditya/cpython/Lib/test/libregrtest/worker.py", line 83 in worker_process
File "/home/realkumaraditya/cpython/Lib/test/libregrtest/worker.py", line 118 in main
File "/home/realkumaraditya/cpython/Lib/test/libregrtest/worker.py", line 122 in
File "/home/realkumaraditya/cpython/Lib/runpy.py", line 88 in _run_code
File "/home/realkumaraditya/cpython/Lib/runpy.py", line 198 in _run_module_as_main
Kill <WorkerThread#1 running test=test_asyncio.test_free_threading pid=35002 time=4.4 sec> process group
Kill <WorkerThread#2 running test=test_asyncio.test_free_threading pid=35001 time=4.6 sec> process group
Kill <WorkerThread#3 running test=test_asyncio.test_free_threading pid=35045 time=1.8 sec> process group

== Tests result: FAILURE ==

1 test failed:
test_asyncio.test_free_threading

50 tests OK.

Total duration: 5 min 30 sec
Total tests: run=800
Total test files: run=51 failed=1
Result: FAILURE

@kumaraditya303kumaraditya303 marked this pull request as draftJanuary 6, 2025 15:40
@colesbury
Copy link
Contributor

colesbury commentedJan 6, 2025
edited
Loading

Do you have a command to reproduce the crash?

You wrote that:

[A task object] can be concurrently deallocated while list is being read in another thread as deallocation doesn't hold the state lock

There are two things that confuse me:

  • Theunregister_task() function holds the state lock, which prevents concurrent traversal of the list for that important function.
  • The state lock isn't held for the entire deallocation sequence, but it's also not atomic with respect to a stop the world pause. That makes me think that it's likely to have similar problems. For example, theunregister_task function may block on acquiring the state lock allowing a stop-the-world pause to happen concurrently (while the task is blocked on the lock). There may also be other points where a stop the world call can happen, such as when calling other objects' destructors.

Additionally, the way this linked list works worries me:

  • It's essentially a list of weak references, just not implemented usingPyWeakRef object for performance reasons
  • Weakrefs are cleared later on during dealloc -- not intp_finalize -- when we know that the object will definitely be destroyed.
  • Weakrefs havespecial handling for dealing with similar concurrent deallocations.
  • Weakrefs also have special hooks in the GC, but I'm not sure that's relevant here.

Anyways, I think it may be worth trying to match the weakref implementation since we know that it works.

@colesbury
Copy link
Contributor

From looking at the crash backtrace, I think the important thing here would be to use_Py_TryIncref inall_tasks like we do in_PyWeakref_GET_REF. This will avoid incref'ing objects that have zero refcount (and are either about to be deallocated or partway through deallocation).

Something like:

llist_for_each_safe(node,&state->asyncio_tasks_head) {TaskObj*task=llist_data(node,TaskObj,task_node);if (_Py_TryIncref(task)) {if (_PyList_AppendTakeRef(tasks, (PyObject*)task)<0) {                ...            }        }    }

@kumaraditya303
Copy link
ContributorAuthor

Do you have a command to reproduce the crash?

Runningtest_asyncio.test_free_threading in a loop on main reproduces the crash.
env TSAN_OPTIONS=suppressions={$PWD}/Tools/tsan/suppressions_free_threading.txt ./python -m test test_asyncio.test_free_threading -j 4 -F

@kumaraditya303
Copy link
ContributorAuthor

Anyways, I think it may be worth trying to match the weakref implementation since we know that it works.

So I think the following changes need to be made:

  • Moveunregister_task to be intp_dealloc so that it is called when task is definitely getting deallocated.
  • Inall_tasks use_Py_TryIncref so that if task is concurrently deallocated, it doesn't gets added to the tasks list.
  • The state lock is held while iteration of linked list as it was done before this PR
  • No need for stop the world pause in all_tasks

Did I get it correctly? Aside, thanks for your explanation! I didn't know that it was possible that stop the world could happen while a deallocator is running concurrently.

@colesbury
Copy link
Contributor

Yes, but on second thought let's keep theunregister_task in the finalizer for now. I'm not entirely sure where it should be. Normal dealloc calls clear weakrefs relatively late (after finalizers are invoked), but the cyclic GC clears them earlier (before finalizers). I can't think of any concrete problems immediately, but I'm a bit worried about the ordering.

@kumaraditya303kumaraditya303 changed the titlegh-128002: use a stop the world event inasyncio.all_tasksgh-128002: fixasyncio.all_tasks against concurrent deallocations of tasksJan 7, 2025
@kumaraditya303kumaraditya303 marked this pull request as ready for reviewJanuary 8, 2025 11:11
@kumaraditya303kumaraditya303 merged commit7dc41ad intopython:mainJan 9, 2025
45 checks passed
@kumaraditya303kumaraditya303 deleted the async-finalize branchJanuary 9, 2025 15:56
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@ZeroIntensityZeroIntensityZeroIntensity left review comments

@colesburycolesburycolesbury approved these changes

@ericsnowcurrentlyericsnowcurrentlyAwaiting requested review from ericsnowcurrently

@1st11st1Awaiting requested review from 1st11st1 is a code owner

@asvetlovasvetlovAwaiting requested review from asvetlovasvetlov is a code owner

@willingcwillingcAwaiting requested review from willingcwillingc is a code owner

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

test_asyncio.test_free_threading.TestCFreeThreading.test_all_tasks_race crashes intermittently in free-threaded builds
3 participants
@kumaraditya303@colesbury@ZeroIntensity

[8]ページ先頭

©2009-2025 Movatter.jp