NotificationsYou must be signed in to change notification settings
Fork33.7k
Star70.4k

Commit8c21dc9

committed

gh-138122: Add blocking mode for accurate stack traces in Tachyon

Non-blocking sampling reads process memory while the target continuesrunning, which can produce torn stacks when generators or coroutinesrapidly switch between yield points. Blocking mode uses atomic processsuspension (task_suspend on macOS, NtSuspendProcess on Windows,PTRACE_SEIZE on Linux) to stop the target during each sample, ensuringconsistent snapshots.Use blocking mode with longer intervals (1ms+) to avoid impacting thetarget too much. The default non-blocking mode remains best for mostcases since it has zero overhead.Also fix a frame cache bug: the cache was including the last_profiled_frameitself when extending with cached data, but this frame was executing inthe previous sample and its line number may have changed. For example,if function A was sampled at line 6, then execution continued to line 10and called B→C, the next sample would incorrectly report A at line 6(from cache) instead of line 10. The fix uses start_idx + 1 to only trustframes ABOVE last_profiled_frame — these caller frames are frozen at theircall sites and cannot change until their callees return.Signed-off-by: Pablo Galindo <pablogsal@gmail.com>

1 parent049c252 commit8c21dc9Copy full SHA for 8c21dc9

File tree

14 files changed

+868

-78

lines changed

Doc/library
- profiling.sampling.rst
Lib
- profiling/sampling
  - cli.py
  - sample.py
- test
  - test_external_inspection.py
  - test_profiling/test_sampling_profiler
    - test_blocking.py
Misc/NEWS.d/next/Library
- 2025-12-20-02-33-05.gh-issue-138122.m3EF9E.rst
- 2025-12-20-02-33-05.gh-issue-142654.m3EF9E.rst
Modules/_remote_debugging
- _remote_debugging.h
- clinic
  - module.c.h
- frame_cache.c
- frames.c
- module.c
Python
- remote_debug.h
Tools/inspection
- benchmark_external_inspection.py

14 files changed

+868

-78

lines changed

`‎Doc/library/profiling.sampling.rst‎`

Lines changed: 53 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -312,6 +312,8 @@ The default configuration works well for most use cases:`
`312`	`312`	`- Disabled`
`313`	`313`	* - Default for ``--subprocesses``
`314`	`314`	`- Disabled`
	`315`	+ * - Default for ``--blocking``
	`316`	`+ - Disabled (non-blocking sampling)`
`315`	`317`
`316`	`318`
`317`	`319`	`Sampling interval and duration`
`@@ -362,6 +364,50 @@ This option is particularly useful when investigating concurrency issues or`
`362`	`364`	`when work is distributed across a thread pool.`
`363`	`365`
`364`	`366`
	`367`	`+.. _blocking-mode:`
	`368`	`+`
	`369`	`+Blocking mode`
	`370`	`+-------------`
	`371`	`+`
	`372`	`+By default, Tachyon reads the target process's memory without stopping it.`
	`373`	`+This non-blocking approach is ideal for most profiling scenarios because it`
	`374`	`+imposes virtually zero overhead on the target application: the profiled`
	`375`	`+program runs at full speed and is unaware it is being observed.`
	`376`	`+`
	`377`	`+However, non-blocking sampling can occasionally produce incomplete or`
	`378`	`+inconsistent stack traces in applications with many generators or coroutines`
	`379`	`+that rapidly switch between yield points, or in programs with very fast-changing`
	`380`	`+call stacks where functions enter and exit between the start and end of a single`
	`381`	`+stack read, resulting in reconstructed stacks that mix frames from different`
	`382`	`+execution states or that never actually existed.`
	`383`	`+`
	`384`	+For these cases, the:option:`--blocking` option stops the target process during
	`385`	`+each sample::`
	`386`	`+`
	`387`	`+ python -m profiling.sampling run --blocking script.py`
	`388`	`+ python -m profiling.sampling attach --blocking 12345`
	`389`	`+`
	`390`	`+When blocking mode is enabled, the profiler suspends the target process,`
	`391`	`+reads its stack, then resumes it. This guarantees that each captured stack`
	`392`	`+represents a real, consistent snapshot of what the process was doing at that`
	`393`	`+instant. The trade-off is that the target process runs slower because it is`
	`394`	`+repeatedly paused.`
	`395`	`+`
	`396`	`+..warning::`
	`397`	`+`
	`398`	+ Do not use very high sample rates (low ``--interval`` values) with blocking
	`399`	`+ mode. Suspending and resuming a process takes time, and if the sampling`
	`400`	`+ interval is too short, the target will spend more time stopped than running.`
	`401`	`+ For blocking mode, intervals of 1000 microseconds (1 millisecond) or higher`
	`402`	`+ are recommended. The default 100 microsecond interval may cause noticeable`
	`403`	`+ slowdown in the target application.`
	`404`	`+`
	`405`	`+Use blocking mode only when you observe inconsistent stacks in your profiles,`
	`406`	`+particularly with generator-heavy or coroutine-heavy code. For most`
	`407`	`+applications, the default non-blocking mode provides accurate results with`
	`408`	`+zero impact on the target process.`
	`409`	`+`
	`410`	`+`
`365`	`411`	`Special frames`
`366`	`412`	`--------------`
`367`	`413`
`@@ -1296,6 +1342,13 @@ Sampling options`
`1296`	`1342`	`Also profile subprocesses. Each subprocess gets its own profiler`
`1297`	`1343`	instance and output file. Incompatible with ``--live``.
`1298`	`1344`
	`1345`	`+..option::--blocking`
	`1346`	`+`
	`1347`	`+ Stop the target process during each sample. This ensures consistent`
	`1348`	`+ stack traces at the cost of slowing down the target. Use with longer`
	`1349`	+ intervals (1000 µs or higher) to minimize impact. See:ref:`blocking-mode`
	`1350`	`+ for details.`
	`1351`	`+`
`1299`	`1352`
`1300`	`1353`	`Mode options`
`1301`	`1354`	`------------`

`‎Lib/profiling/sampling/cli.py‎`

Lines changed: 11 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -342,6 +342,13 @@ def _add_sampling_options(parser):`
`342`	`342`	`action="store_true",`
`343`	`343`	`help="Also profile subprocesses. Each subprocess gets its own profiler and output file.",`
`344`	`344`	`)`
	`345`	`+sampling_group.add_argument(`
	`346`	`+"--blocking",`
	`347`	`+action="store_true",`
	`348`	`+help="Stop all threads in target process before sampling to get consistent snapshots. "`
	`349`	`+"Uses thread_suspend on macOS and ptrace on Linux. Adds overhead but ensures memory "`
	`350`	`+"reads are from a frozen state.",`
	`351`	`+ )`
`345`	`352`
`346`	`353`
`347`	`354`	`def_add_mode_options(parser):`
`@@ -778,6 +785,7 @@ def _handle_attach(args):`
`778`	`785`	`native=args.native,`
`779`	`786`	`gc=args.gc,`
`780`	`787`	`opcodes=args.opcodes,`
	`788`	`+blocking=args.blocking,`
`781`	`789`	`)`
`782`	`790`	`_handle_output(collector,args,args.pid,mode)`
`783`	`791`
`@@ -848,6 +856,7 @@ def _handle_run(args):`
`848`	`856`	`native=args.native,`
`849`	`857`	`gc=args.gc,`
`850`	`858`	`opcodes=args.opcodes,`
	`859`	`+blocking=args.blocking,`
`851`	`860`	`)`
`852`	`861`	`_handle_output(collector,args,process.pid,mode)`
`853`	`862`	`finally:`
`@@ -893,6 +902,7 @@ def _handle_live_attach(args, pid):`
`893`	`902`	`native=args.native,`
`894`	`903`	`gc=args.gc,`
`895`	`904`	`opcodes=args.opcodes,`
	`905`	`+blocking=args.blocking,`
`896`	`906`	`)`
`897`	`907`
`898`	`908`
`@@ -940,6 +950,7 @@ def _handle_live_run(args):`
`940`	`950`	`native=args.native,`
`941`	`951`	`gc=args.gc,`
`942`	`952`	`opcodes=args.opcodes,`
	`953`	`+blocking=args.blocking,`
`943`	`954`	`)`
`944`	`955`	`finally:`
`945`	`956`	`# Clean up the subprocess`

`‎Lib/profiling/sampling/sample.py‎`

Lines changed: 36 additions & 12 deletions

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,5 @@`
`1`	`1`	`import_remote_debugging`
	`2`	`+importcontextlib`
`2`	`3`	`importos`
`3`	`4`	`importpstats`
`4`	`5`	`importstatistics`
`@@ -9,6 +10,21 @@`
`9`	`10`	`from_colorizeimportANSIColors`
`10`	`11`
`11`	`12`	`from .pstats_collectorimportPstatsCollector`
	`13`	`+`
	`14`	`+`
	`15`	`+@contextlib.contextmanager`
	`16`	`+def_pause_threads(unwinder,blocking):`
	`17`	`+"""Context manager to pause/resume threads around sampling if blocking is True."""`
	`18`	`+ifblocking:`
	`19`	`+unwinder.pause_threads()`
	`20`	`+try:`
	`21`	`+yield`
	`22`	`+finally:`
	`23`	`+unwinder.resume_threads()`
	`24`	`+else:`
	`25`	`+yield`
	`26`	`+`
	`27`	`+`
`12`	`28`	`from .stack_collectorimportCollapsedStackCollector,FlamegraphCollector`
`13`	`29`	`from .heatmap_collectorimportHeatmapCollector`
`14`	`30`	`from .gecko_collectorimportGeckoCollector`
`@@ -28,12 +44,13 @@`
`28`	`44`
`29`	`45`
`30`	`46`	`classSampleProfiler:`
`31`		`-def__init__(self,pid,sample_interval_usec,all_threads,*,mode=PROFILING_MODE_WALL,native=False,gc=True,opcodes=False,skip_non_matching_threads=True,collect_stats=False):`
	`47`	`+def__init__(self,pid,sample_interval_usec,all_threads,*,mode=PROFILING_MODE_WALL,native=False,gc=True,opcodes=False,skip_non_matching_threads=True,collect_stats=False,blocking=False):`
`32`	`48`	`self.pid=pid`
`33`	`49`	`self.sample_interval_usec=sample_interval_usec`
`34`	`50`	`self.all_threads=all_threads`
`35`	`51`	`self.mode=mode# Store mode for later use`
`36`	`52`	`self.collect_stats=collect_stats`
	`53`	`+self.blocking=blocking`
`37`	`54`	`try:`
`38`	`55`	`self.unwinder=self._new_unwinder(native,gc,opcodes,skip_non_matching_threads)`
`39`	`56`	`exceptRuntimeErroraserr:`
`@@ -63,12 +80,11 @@ def sample(self, collector, duration_sec=10, *, async_aware=False):`
`63`	`80`	`running_time=0`
`64`	`81`	`num_samples=0`
`65`	`82`	`errors=0`
	`83`	`+interrupted=False`
`66`	`84`	`start_time=next_time=time.perf_counter()`
`67`	`85`	`last_sample_time=start_time`
`68`	`86`	`realtime_update_interval=1.0# Update every second`
`69`	`87`	`last_realtime_update=start_time`
`70`		`-interrupted=False`
`71`		`-`
`72`	`88`	`try:`
`73`	`89`	`whilerunning_time<duration_sec:`
`74`	`90`	`# Check if live collector wants to stop`
`@@ -78,20 +94,22 @@ def sample(self, collector, duration_sec=10, *, async_aware=False):`
`78`	`94`	`current_time=time.perf_counter()`
`79`	`95`	`ifnext_time<current_time:`
`80`	`96`	`try:`
`81`		`-ifasync_aware=="all":`
`82`		`-stack_frames=self.unwinder.get_all_awaited_by()`
`83`		`-elifasync_aware=="running":`
`84`		`-stack_frames=self.unwinder.get_async_stack_trace()`
`85`		`-else:`
`86`		`-stack_frames=self.unwinder.get_stack_trace()`
`87`		`-collector.collect(stack_frames)`
`88`		`-exceptProcessLookupError:`
	`97`	`+with_pause_threads(self.unwinder,self.blocking):`
	`98`	`+ifasync_aware=="all":`
	`99`	`+stack_frames=self.unwinder.get_all_awaited_by()`
	`100`	`+elifasync_aware=="running":`
	`101`	`+stack_frames=self.unwinder.get_async_stack_trace()`
	`102`	`+else:`
	`103`	`+stack_frames=self.unwinder.get_stack_trace()`
	`104`	`+collector.collect(stack_frames)`
	`105`	`+exceptProcessLookupErrorase:`
`89`	`106`	`duration_sec=current_time-start_time`
`90`	`107`	`break`
`91`		`-except (RuntimeError,UnicodeDecodeError,MemoryError,OSError):`
	`108`	`+except (RuntimeError,UnicodeDecodeError,MemoryError,OSError)ase:`
`92`	`109`	`collector.collect_failed_sample()`
`93`	`110`	`errors+=1`
`94`	`111`	`exceptExceptionase:`
	`112`	`+print(e)`
`95`	`113`	`ifnot_is_process_running(self.pid):`
`96`	`114`	`break`
`97`	`115`	`raiseefromNone`
`@@ -303,6 +321,7 @@ def sample(`
`303`	`321`	`native=False,`
`304`	`322`	`gc=True,`
`305`	`323`	`opcodes=False,`
	`324`	`+blocking=False,`
`306`	`325`	`):`
`307`	`326`	`"""Sample a process using the provided collector.`
`308`	`327`
`@@ -318,6 +337,7 @@ def sample(`
`318`	`337`	`native: Whether to include native frames`
`319`	`338`	`gc: Whether to include GC frames`
`320`	`339`	`opcodes: Whether to include opcode information`
	`340`	`+ blocking: Whether to stop all threads before sampling for consistent snapshots`
`321`	`341`
`322`	`342`	`Returns:`
`323`	`343`	`The collector with collected samples`
`@@ -343,6 +363,7 @@ def sample(`
`343`	`363`	`opcodes=opcodes,`
`344`	`364`	`skip_non_matching_threads=skip_non_matching_threads,`
`345`	`365`	`collect_stats=realtime_stats,`
	`366`	`+blocking=blocking,`
`346`	`367`	`)`
`347`	`368`	`profiler.realtime_stats=realtime_stats`
`348`	`369`
`@@ -364,6 +385,7 @@ def sample_live(`
`364`	`385`	`native=False,`
`365`	`386`	`gc=True,`
`366`	`387`	`opcodes=False,`
	`388`	`+blocking=False,`
`367`	`389`	`):`
`368`	`390`	`"""Sample a process in live/interactive mode with curses TUI.`
`369`	`391`
`@@ -379,6 +401,7 @@ def sample_live(`
`379`	`401`	`native: Whether to include native frames`
`380`	`402`	`gc: Whether to include GC frames`
`381`	`403`	`opcodes: Whether to include opcode information`
	`404`	`+ blocking: Whether to stop all threads before sampling for consistent snapshots`
`382`	`405`
`383`	`406`	`Returns:`
`384`	`407`	`The collector with collected samples`
`@@ -404,6 +427,7 @@ def sample_live(`
`404`	`427`	`opcodes=opcodes,`
`405`	`428`	`skip_non_matching_threads=skip_non_matching_threads,`
`406`	`429`	`collect_stats=realtime_stats,`
	`430`	`+blocking=blocking,`
`407`	`431`	`)`
`408`	`432`	`profiler.realtime_stats=realtime_stats`
`409`	`433`

`‎Lib/test/test_external_inspection.py‎`

Lines changed: 39 additions & 43 deletions

Original file line number	Diff line number	Diff line change
`@@ -2931,24 +2931,24 @@ def top():`
`2931`	`2931`	`"Test only runs on Linux with process_vm_readv support",`
`2932`	`2932`	`)`
`2933`	`2933`	`deftest_partial_stack_reuse(self):`
`2934`		`-"""Test that unchangedbottom frames are reused when topchanges (A→B→C to A→B→D)."""`
	`2934`	`+"""Test that unchangedparent frames are reusedfrom cachewhen topframe moves."""`
`2935`	`2935`	`script_body="""\`
`2936`		`- deffunc_c():`
`2937`		`- sock.sendall(b"at_c")`
	`2936`	`+ deflevel4():`
	`2937`	`+ sock.sendall(b"sync1")`
`2938`	`2938`	`sock.recv(16)`
`2939`		`-`
`2940`		`- def func_d():`
`2941`		`- sock.sendall(b"at_d")`
	`2939`	`+ sock.sendall(b"sync2")`
`2942`	`2940`	`sock.recv(16)`
`2943`	`2941`
`2944`		`- def func_b():`
`2945`		`- func_c()`
`2946`		`- func_d()`
	`2942`	`+ def level3():`
	`2943`	`+ level4()`
`2947`	`2944`
`2948`		`- def func_a():`
`2949`		`- func_b()`
	`2945`	`+ def level2():`
	`2946`	`+ level3()`
	`2947`	`+`
	`2948`	`+ def level1():`
	`2949`	`+ level2()`
`2950`	`2950`
`2951`		`-func_a()`
	`2951`	`+level1()`
`2952`	`2952`	`"""`
`2953`	`2953`
`2954`	`2954`	`withself._target_process(script_body)as (`
`@@ -2958,55 +2958,51 @@ def func_a():`
`2958`	`2958`	`):`
`2959`	`2959`	`unwinder=make_unwinder(cache_frames=True)`
`2960`	`2960`
`2961`		`-# Sampleat C: stack is A→B→C`
`2962`		`-frames_c=self._sample_frames(`
	`2961`	`+# Sample1: level4 at first sendall`
	`2962`	`+frames1=self._sample_frames(`
`2963`	`2963`	`client_socket,`
`2964`	`2964`	`unwinder,`
`2965`		`-b"at_c",`
	`2965`	`+b"sync1",`
`2966`	`2966`	`b"ack",`
`2967`		`- {"func_a","func_b","func_c"},`
	`2967`	`+ {"level1","level2","level3","level4"},`
`2968`	`2968`	`)`
`2969`		`-# Sampleat D: stack is A→B→D (C returned, D called)`
`2970`		`-frames_d=self._sample_frames(`
	`2969`	`+# Sample2: level4 at second sendall (same stack, different line)`
	`2970`	`+frames2=self._sample_frames(`
`2971`	`2971`	`client_socket,`
`2972`	`2972`	`unwinder,`
`2973`		`-b"at_d",`
	`2973`	`+b"sync2",`
`2974`	`2974`	`b"done",`
`2975`		`- {"func_a","func_b","func_d"},`
	`2975`	`+ {"level1","level2","level3","level4"},`
`2976`	`2976`	`)`
`2977`	`2977`
`2978`		`-self.assertIsNotNone(frames_c)`
`2979`		`-self.assertIsNotNone(frames_d)`
	`2978`	`+self.assertIsNotNone(frames1)`
	`2979`	`+self.assertIsNotNone(frames2)`
`2980`	`2980`
`2981`		`-# Find func_a and func_b frames in both samples`
`2982`	`2981`	`deffind_frame(frames,funcname):`
`2983`	`2982`	`forfinframes:`
`2984`	`2983`	`iff.funcname==funcname:`
`2985`	`2984`	`returnf`
`2986`	`2985`	`returnNone`
`2987`	`2986`
`2988`		`-frame_a_in_c=find_frame(frames_c,"func_a")`
`2989`		`-frame_b_in_c=find_frame(frames_c,"func_b")`
`2990`		`-frame_a_in_d=find_frame(frames_d,"func_a")`
`2991`		`-frame_b_in_d=find_frame(frames_d,"func_b")`
`2992`		`-`
`2993`		`-self.assertIsNotNone(frame_a_in_c)`
`2994`		`-self.assertIsNotNone(frame_b_in_c)`
`2995`		`-self.assertIsNotNone(frame_a_in_d)`
`2996`		`-self.assertIsNotNone(frame_b_in_d)`
`2997`		`-`
`2998`		`-# The bottom frames (A, B) should be the SAME objects (cache reuse)`
`2999`		`-self.assertIs(`
`3000`		`-frame_a_in_c,`
`3001`		`-frame_a_in_d,`
`3002`		`-"func_a frame should be reused from cache",`
`3003`		`- )`
`3004`		`-self.assertIs(`
`3005`		`-frame_b_in_c,`
`3006`		`-frame_b_in_d,`
`3007`		`-"func_b frame should be reused from cache",`
	`2987`	`+# level4 should have different line numbers (it moved)`
	`2988`	`+l4_1=find_frame(frames1,"level4")`
	`2989`	`+l4_2=find_frame(frames2,"level4")`
	`2990`	`+self.assertIsNotNone(l4_1)`
	`2991`	`+self.assertIsNotNone(l4_2)`
	`2992`	`+self.assertNotEqual(`
	`2993`	`+l4_1.location.lineno,`
	`2994`	`+l4_2.location.lineno,`
	`2995`	`+"level4 should be at different lines",`
`3008`	`2996`	`)`
`3009`	`2997`
	`2998`	`+# Parent frames (level1, level2, level3) should be reused from cache`
	`2999`	`+fornamein ["level1","level2","level3"]:`
	`3000`	`+f1=find_frame(frames1,name)`
	`3001`	`+f2=find_frame(frames2,name)`
	`3002`	`+self.assertIsNotNone(f1,f"{name} missing from sample 1")`
	`3003`	`+self.assertIsNotNone(f2,f"{name} missing from sample 2")`
	`3004`	`+self.assertIs(f1,f2,f"{name} should be reused from cache")`
	`3005`	`+`
`3010`	`3006`	`@skip_if_not_supported`
`3011`	`3007`	`@unittest.skipIf(`
`3012`	`3008`	`sys.platform=="linux"andnotPROCESS_VM_READV_SUPPORTED,`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Commit8c21dc9

File tree

14 files changed

14 files changed

`‎Doc/library/profiling.sampling.rst‎`

`‎Lib/profiling/sampling/cli.py‎`

`‎Lib/profiling/sampling/sample.py‎`

`‎Lib/test/test_external_inspection.py‎`

0 commit comments