Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

asynciomodule.c performance degradation in free-threading under heavy load #144337

Open
@jgrave3

Description

@jgrave3

Bug report

Bug description:

We have asyncio Python servers we are trying to run under free-threading that create event loops. Under, say, 800 QPS we noticed performance regressions relative to the GIL build and, upon tracing, most of the regression is in thecancel_all_tasks call when it calls the C implementation to get all tasks for the loop in question. My LLM suggested that the performance bottleneck is in the_PyEval_StopTheWorld call required. It suggested an alternative implementation using per-thread task buckets to avoid needing to stop the world.

It created the following repro script to show the performance issue. With its suggested changes it creates 10X loops and increases churn rate 83X under free-threading (changes would be ifdef-guarded to avoid a 10X regression in loop churn rate under standard build).

I still need to validate the changes it proposed but, without doing something, we would have to avoid asyncio + free-threading because of this.

importasyncioimportthreadingimporttimeimportargparseimportsys# ConfigurationNUM_CHURN_THREADS=20# Number of threads creating/destroying loopsRUN_DURATION=10# Seconds to runTARGET_LOOPS_PER_SEC=800# Total target loops per second across all threadsstop_event=threading.Event()defchurn_worker(worker_id):"""Continuously creates and destroys event loops."""count=0whilenotstop_event.is_set():# asyncio.run creates a new event loop, runs the coroutine, and closes it.asyncio.run(asyncio.sleep(0))count+=1# Optional: slight delay to throttle if needed, but we want max contention for this demo# time.sleep(0.001)returncountasyncdefmonitor_coro():"""Calls all_tasks continuously."""count=0start_time=time.time()whilenotstop_event.is_set():# This triggers the scan of all tasks in all threadstasks=asyncio.all_tasks()count+=1# Yield to allow other things to happen on this loop, though we mostly care about the scan costawaitasyncio.sleep(0)duration=time.time()-start_timereturncount,durationdefmonitor_worker(results_list):"""Runs a persistent loop that calls all_tasks."""try:count,duration=asyncio.run(monitor_coro())results_list.append((count,duration))exceptExceptionase:print(f"Monitor failed:{e}")defmain():parser=argparse.ArgumentParser(description="Asyncio Performance Demo")parser.add_argument("--threads",type=int,default=NUM_CHURN_THREADS,help="Number of churn threads")parser.add_argument("--duration",type=int,default=RUN_DURATION,help="Duration in seconds")args=parser.parse_args()print(f"Starting generic asyncio benchmark on{sys.version}...")print(f"Configuration:{args.threads} churn threads,{args.duration}s duration.")threads= []# Start churn threads# We use a ThreadPoolExecutor or just raw threads. Raw threads are fine.# To measure iterations, we can use a mutable list or classchurn_counts= [0]*args.threadsdefwrapped_churn(idx):churn_counts[idx]=churn_worker(idx)foriinrange(args.threads):t=threading.Thread(target=wrapped_churn,args=(i,))t.start()threads.append(t)# Start monitor threadmonitor_results= []monitor_thread=threading.Thread(target=monitor_worker,args=(monitor_results,))monitor_thread.start()threads.append(monitor_thread)# Run for specified durationtry:time.sleep(args.duration)exceptKeyboardInterrupt:passfinally:stop_event.set()# Join allfortinthreads:t.join()# Aggregate resultstotal_loops=sum(churn_counts)loops_per_sec=total_loops/args.durationmonitor_calls,mon_duration=0,1ifmonitor_results:monitor_calls,mon_duration=monitor_results[0]all_tasks_per_sec=monitor_calls/mon_durationifmon_duration>0else0print("-"*40)print(f"Results:")print(f"  Total Event Loops Created:{total_loops}")print(f"  Loop Churn Rate:{loops_per_sec:.2f} loops/sec")print(f"  all_tasks() Calls:{monitor_calls}")print(f"  all_tasks() Rate:{all_tasks_per_sec:.2f} calls/sec")print("-"*40)print("Interpretation:")print("  Higher 'Loop Churn Rate' indicates less blocking during thread/task destruction.")print("  Higher 'all_tasks() Rate' indicates faster scanning of tasks.")print("  On unoptimized Python (with StopTheWorld), these numbers should be significantly lower")print("  concurrently due to the global pause.")if__name__=="__main__":main()

CPython versions tested on:

3.14

Operating systems tested on:

Linux

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2026 Movatter.jp