Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork34k
Description
Bug report
Bug description:
We have asyncio Python servers we are trying to run under free-threading that create event loops. Under, say, 800 QPS we noticed performance regressions relative to the GIL build and, upon tracing, most of the regression is in thecancel_all_tasks call when it calls the C implementation to get all tasks for the loop in question. My LLM suggested that the performance bottleneck is in the_PyEval_StopTheWorld call required. It suggested an alternative implementation using per-thread task buckets to avoid needing to stop the world.
It created the following repro script to show the performance issue. With its suggested changes it creates 10X loops and increases churn rate 83X under free-threading (changes would be ifdef-guarded to avoid a 10X regression in loop churn rate under standard build).
I still need to validate the changes it proposed but, without doing something, we would have to avoid asyncio + free-threading because of this.
importasyncioimportthreadingimporttimeimportargparseimportsys# ConfigurationNUM_CHURN_THREADS=20# Number of threads creating/destroying loopsRUN_DURATION=10# Seconds to runTARGET_LOOPS_PER_SEC=800# Total target loops per second across all threadsstop_event=threading.Event()defchurn_worker(worker_id):"""Continuously creates and destroys event loops."""count=0whilenotstop_event.is_set():# asyncio.run creates a new event loop, runs the coroutine, and closes it.asyncio.run(asyncio.sleep(0))count+=1# Optional: slight delay to throttle if needed, but we want max contention for this demo# time.sleep(0.001)returncountasyncdefmonitor_coro():"""Calls all_tasks continuously."""count=0start_time=time.time()whilenotstop_event.is_set():# This triggers the scan of all tasks in all threadstasks=asyncio.all_tasks()count+=1# Yield to allow other things to happen on this loop, though we mostly care about the scan costawaitasyncio.sleep(0)duration=time.time()-start_timereturncount,durationdefmonitor_worker(results_list):"""Runs a persistent loop that calls all_tasks."""try:count,duration=asyncio.run(monitor_coro())results_list.append((count,duration))exceptExceptionase:print(f"Monitor failed:{e}")defmain():parser=argparse.ArgumentParser(description="Asyncio Performance Demo")parser.add_argument("--threads",type=int,default=NUM_CHURN_THREADS,help="Number of churn threads")parser.add_argument("--duration",type=int,default=RUN_DURATION,help="Duration in seconds")args=parser.parse_args()print(f"Starting generic asyncio benchmark on{sys.version}...")print(f"Configuration:{args.threads} churn threads,{args.duration}s duration.")threads= []# Start churn threads# We use a ThreadPoolExecutor or just raw threads. Raw threads are fine.# To measure iterations, we can use a mutable list or classchurn_counts= [0]*args.threadsdefwrapped_churn(idx):churn_counts[idx]=churn_worker(idx)foriinrange(args.threads):t=threading.Thread(target=wrapped_churn,args=(i,))t.start()threads.append(t)# Start monitor threadmonitor_results= []monitor_thread=threading.Thread(target=monitor_worker,args=(monitor_results,))monitor_thread.start()threads.append(monitor_thread)# Run for specified durationtry:time.sleep(args.duration)exceptKeyboardInterrupt:passfinally:stop_event.set()# Join allfortinthreads:t.join()# Aggregate resultstotal_loops=sum(churn_counts)loops_per_sec=total_loops/args.durationmonitor_calls,mon_duration=0,1ifmonitor_results:monitor_calls,mon_duration=monitor_results[0]all_tasks_per_sec=monitor_calls/mon_durationifmon_duration>0else0print("-"*40)print(f"Results:")print(f" Total Event Loops Created:{total_loops}")print(f" Loop Churn Rate:{loops_per_sec:.2f} loops/sec")print(f" all_tasks() Calls:{monitor_calls}")print(f" all_tasks() Rate:{all_tasks_per_sec:.2f} calls/sec")print("-"*40)print("Interpretation:")print(" Higher 'Loop Churn Rate' indicates less blocking during thread/task destruction.")print(" Higher 'all_tasks() Rate' indicates faster scanning of tasks.")print(" On unoptimized Python (with StopTheWorld), these numbers should be significantly lower")print(" concurrently due to the global pause.")if__name__=="__main__":main()
CPython versions tested on:
3.14
Operating systems tested on:
Linux
Metadata
Metadata
Assignees
Labels
Projects
Status