Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit8c21dc9

Browse files
committed
gh-138122: Add blocking mode for accurate stack traces in Tachyon
Non-blocking sampling reads process memory while the target continuesrunning, which can produce torn stacks when generators or coroutinesrapidly switch between yield points. Blocking mode uses atomic processsuspension (task_suspend on macOS, NtSuspendProcess on Windows,PTRACE_SEIZE on Linux) to stop the target during each sample, ensuringconsistent snapshots.Use blocking mode with longer intervals (1ms+) to avoid impacting thetarget too much. The default non-blocking mode remains best for mostcases since it has zero overhead.Also fix a frame cache bug: the cache was including the last_profiled_frameitself when extending with cached data, but this frame was executing inthe previous sample and its line number may have changed. For example,if function A was sampled at line 6, then execution continued to line 10and called B→C, the next sample would incorrectly report A at line 6(from cache) instead of line 10. The fix uses start_idx + 1 to only trustframes ABOVE last_profiled_frame — these caller frames are frozen at theircall sites and cannot change until their callees return.Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
1 parent049c252 commit8c21dc9

File tree

14 files changed

+868
-78
lines changed

14 files changed

+868
-78
lines changed

‎Doc/library/profiling.sampling.rst‎

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -312,6 +312,8 @@ The default configuration works well for most use cases:
312312
- Disabled
313313
* - Default for ``--subprocesses``
314314
- Disabled
315+
* - Default for ``--blocking``
316+
- Disabled (non-blocking sampling)
315317

316318

317319
Sampling interval and duration
@@ -362,6 +364,50 @@ This option is particularly useful when investigating concurrency issues or
362364
when work is distributed across a thread pool.
363365

364366

367+
.. _blocking-mode:
368+
369+
Blocking mode
370+
-------------
371+
372+
By default, Tachyon reads the target process's memory without stopping it.
373+
This non-blocking approach is ideal for most profiling scenarios because it
374+
imposes virtually zero overhead on the target application: the profiled
375+
program runs at full speed and is unaware it is being observed.
376+
377+
However, non-blocking sampling can occasionally produce incomplete or
378+
inconsistent stack traces in applications with many generators or coroutines
379+
that rapidly switch between yield points, or in programs with very fast-changing
380+
call stacks where functions enter and exit between the start and end of a single
381+
stack read, resulting in reconstructed stacks that mix frames from different
382+
execution states or that never actually existed.
383+
384+
For these cases, the:option:`--blocking` option stops the target process during
385+
each sample::
386+
387+
python -m profiling.sampling run --blocking script.py
388+
python -m profiling.sampling attach --blocking 12345
389+
390+
When blocking mode is enabled, the profiler suspends the target process,
391+
reads its stack, then resumes it. This guarantees that each captured stack
392+
represents a real, consistent snapshot of what the process was doing at that
393+
instant. The trade-off is that the target process runs slower because it is
394+
repeatedly paused.
395+
396+
..warning::
397+
398+
Do not use very high sample rates (low ``--interval`` values) with blocking
399+
mode. Suspending and resuming a process takes time, and if the sampling
400+
interval is too short, the target will spend more time stopped than running.
401+
For blocking mode, intervals of 1000 microseconds (1 millisecond) or higher
402+
are recommended. The default 100 microsecond interval may cause noticeable
403+
slowdown in the target application.
404+
405+
Use blocking mode only when you observe inconsistent stacks in your profiles,
406+
particularly with generator-heavy or coroutine-heavy code. For most
407+
applications, the default non-blocking mode provides accurate results with
408+
zero impact on the target process.
409+
410+
365411
Special frames
366412
--------------
367413

@@ -1296,6 +1342,13 @@ Sampling options
12961342
Also profile subprocesses. Each subprocess gets its own profiler
12971343
instance and output file. Incompatible with ``--live``.
12981344

1345+
..option::--blocking
1346+
1347+
Stop the target process during each sample. This ensures consistent
1348+
stack traces at the cost of slowing down the target. Use with longer
1349+
intervals (1000 µs or higher) to minimize impact. See:ref:`blocking-mode`
1350+
for details.
1351+
12991352

13001353
Mode options
13011354
------------

‎Lib/profiling/sampling/cli.py‎

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -342,6 +342,13 @@ def _add_sampling_options(parser):
342342
action="store_true",
343343
help="Also profile subprocesses. Each subprocess gets its own profiler and output file.",
344344
)
345+
sampling_group.add_argument(
346+
"--blocking",
347+
action="store_true",
348+
help="Stop all threads in target process before sampling to get consistent snapshots. "
349+
"Uses thread_suspend on macOS and ptrace on Linux. Adds overhead but ensures memory "
350+
"reads are from a frozen state.",
351+
)
345352

346353

347354
def_add_mode_options(parser):
@@ -778,6 +785,7 @@ def _handle_attach(args):
778785
native=args.native,
779786
gc=args.gc,
780787
opcodes=args.opcodes,
788+
blocking=args.blocking,
781789
)
782790
_handle_output(collector,args,args.pid,mode)
783791

@@ -848,6 +856,7 @@ def _handle_run(args):
848856
native=args.native,
849857
gc=args.gc,
850858
opcodes=args.opcodes,
859+
blocking=args.blocking,
851860
)
852861
_handle_output(collector,args,process.pid,mode)
853862
finally:
@@ -893,6 +902,7 @@ def _handle_live_attach(args, pid):
893902
native=args.native,
894903
gc=args.gc,
895904
opcodes=args.opcodes,
905+
blocking=args.blocking,
896906
)
897907

898908

@@ -940,6 +950,7 @@ def _handle_live_run(args):
940950
native=args.native,
941951
gc=args.gc,
942952
opcodes=args.opcodes,
953+
blocking=args.blocking,
943954
)
944955
finally:
945956
# Clean up the subprocess

‎Lib/profiling/sampling/sample.py‎

Lines changed: 36 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
import_remote_debugging
2+
importcontextlib
23
importos
34
importpstats
45
importstatistics
@@ -9,6 +10,21 @@
910
from_colorizeimportANSIColors
1011

1112
from .pstats_collectorimportPstatsCollector
13+
14+
15+
@contextlib.contextmanager
16+
def_pause_threads(unwinder,blocking):
17+
"""Context manager to pause/resume threads around sampling if blocking is True."""
18+
ifblocking:
19+
unwinder.pause_threads()
20+
try:
21+
yield
22+
finally:
23+
unwinder.resume_threads()
24+
else:
25+
yield
26+
27+
1228
from .stack_collectorimportCollapsedStackCollector,FlamegraphCollector
1329
from .heatmap_collectorimportHeatmapCollector
1430
from .gecko_collectorimportGeckoCollector
@@ -28,12 +44,13 @@
2844

2945

3046
classSampleProfiler:
31-
def__init__(self,pid,sample_interval_usec,all_threads,*,mode=PROFILING_MODE_WALL,native=False,gc=True,opcodes=False,skip_non_matching_threads=True,collect_stats=False):
47+
def__init__(self,pid,sample_interval_usec,all_threads,*,mode=PROFILING_MODE_WALL,native=False,gc=True,opcodes=False,skip_non_matching_threads=True,collect_stats=False,blocking=False):
3248
self.pid=pid
3349
self.sample_interval_usec=sample_interval_usec
3450
self.all_threads=all_threads
3551
self.mode=mode# Store mode for later use
3652
self.collect_stats=collect_stats
53+
self.blocking=blocking
3754
try:
3855
self.unwinder=self._new_unwinder(native,gc,opcodes,skip_non_matching_threads)
3956
exceptRuntimeErroraserr:
@@ -63,12 +80,11 @@ def sample(self, collector, duration_sec=10, *, async_aware=False):
6380
running_time=0
6481
num_samples=0
6582
errors=0
83+
interrupted=False
6684
start_time=next_time=time.perf_counter()
6785
last_sample_time=start_time
6886
realtime_update_interval=1.0# Update every second
6987
last_realtime_update=start_time
70-
interrupted=False
71-
7288
try:
7389
whilerunning_time<duration_sec:
7490
# Check if live collector wants to stop
@@ -78,20 +94,22 @@ def sample(self, collector, duration_sec=10, *, async_aware=False):
7894
current_time=time.perf_counter()
7995
ifnext_time<current_time:
8096
try:
81-
ifasync_aware=="all":
82-
stack_frames=self.unwinder.get_all_awaited_by()
83-
elifasync_aware=="running":
84-
stack_frames=self.unwinder.get_async_stack_trace()
85-
else:
86-
stack_frames=self.unwinder.get_stack_trace()
87-
collector.collect(stack_frames)
88-
exceptProcessLookupError:
97+
with_pause_threads(self.unwinder,self.blocking):
98+
ifasync_aware=="all":
99+
stack_frames=self.unwinder.get_all_awaited_by()
100+
elifasync_aware=="running":
101+
stack_frames=self.unwinder.get_async_stack_trace()
102+
else:
103+
stack_frames=self.unwinder.get_stack_trace()
104+
collector.collect(stack_frames)
105+
exceptProcessLookupErrorase:
89106
duration_sec=current_time-start_time
90107
break
91-
except (RuntimeError,UnicodeDecodeError,MemoryError,OSError):
108+
except (RuntimeError,UnicodeDecodeError,MemoryError,OSError)ase:
92109
collector.collect_failed_sample()
93110
errors+=1
94111
exceptExceptionase:
112+
print(e)
95113
ifnot_is_process_running(self.pid):
96114
break
97115
raiseefromNone
@@ -303,6 +321,7 @@ def sample(
303321
native=False,
304322
gc=True,
305323
opcodes=False,
324+
blocking=False,
306325
):
307326
"""Sample a process using the provided collector.
308327
@@ -318,6 +337,7 @@ def sample(
318337
native: Whether to include native frames
319338
gc: Whether to include GC frames
320339
opcodes: Whether to include opcode information
340+
blocking: Whether to stop all threads before sampling for consistent snapshots
321341
322342
Returns:
323343
The collector with collected samples
@@ -343,6 +363,7 @@ def sample(
343363
opcodes=opcodes,
344364
skip_non_matching_threads=skip_non_matching_threads,
345365
collect_stats=realtime_stats,
366+
blocking=blocking,
346367
)
347368
profiler.realtime_stats=realtime_stats
348369

@@ -364,6 +385,7 @@ def sample_live(
364385
native=False,
365386
gc=True,
366387
opcodes=False,
388+
blocking=False,
367389
):
368390
"""Sample a process in live/interactive mode with curses TUI.
369391
@@ -379,6 +401,7 @@ def sample_live(
379401
native: Whether to include native frames
380402
gc: Whether to include GC frames
381403
opcodes: Whether to include opcode information
404+
blocking: Whether to stop all threads before sampling for consistent snapshots
382405
383406
Returns:
384407
The collector with collected samples
@@ -404,6 +427,7 @@ def sample_live(
404427
opcodes=opcodes,
405428
skip_non_matching_threads=skip_non_matching_threads,
406429
collect_stats=realtime_stats,
430+
blocking=blocking,
407431
)
408432
profiler.realtime_stats=realtime_stats
409433

‎Lib/test/test_external_inspection.py‎

Lines changed: 39 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -2931,24 +2931,24 @@ def top():
29312931
"Test only runs on Linux with process_vm_readv support",
29322932
)
29332933
deftest_partial_stack_reuse(self):
2934-
"""Test that unchangedbottom frames are reused when topchanges (A→B→C to A→B→D)."""
2934+
"""Test that unchangedparent frames are reusedfrom cachewhen topframe moves."""
29352935
script_body="""\
2936-
deffunc_c():
2937-
sock.sendall(b"at_c")
2936+
deflevel4():
2937+
sock.sendall(b"sync1")
29382938
sock.recv(16)
2939-
2940-
def func_d():
2941-
sock.sendall(b"at_d")
2939+
sock.sendall(b"sync2")
29422940
sock.recv(16)
29432941
2944-
def func_b():
2945-
func_c()
2946-
func_d()
2942+
def level3():
2943+
level4()
29472944
2948-
def func_a():
2949-
func_b()
2945+
def level2():
2946+
level3()
2947+
2948+
def level1():
2949+
level2()
29502950
2951-
func_a()
2951+
level1()
29522952
"""
29532953

29542954
withself._target_process(script_body)as (
@@ -2958,55 +2958,51 @@ def func_a():
29582958
):
29592959
unwinder=make_unwinder(cache_frames=True)
29602960

2961-
# Sampleat C: stack is A→B→C
2962-
frames_c=self._sample_frames(
2961+
# Sample1: level4 at first sendall
2962+
frames1=self._sample_frames(
29632963
client_socket,
29642964
unwinder,
2965-
b"at_c",
2965+
b"sync1",
29662966
b"ack",
2967-
{"func_a","func_b","func_c"},
2967+
{"level1","level2","level3","level4"},
29682968
)
2969-
# Sampleat D: stack is A→B→D (C returned, D called)
2970-
frames_d=self._sample_frames(
2969+
# Sample2: level4 at second sendall (same stack, different line)
2970+
frames2=self._sample_frames(
29712971
client_socket,
29722972
unwinder,
2973-
b"at_d",
2973+
b"sync2",
29742974
b"done",
2975-
{"func_a","func_b","func_d"},
2975+
{"level1","level2","level3","level4"},
29762976
)
29772977

2978-
self.assertIsNotNone(frames_c)
2979-
self.assertIsNotNone(frames_d)
2978+
self.assertIsNotNone(frames1)
2979+
self.assertIsNotNone(frames2)
29802980

2981-
# Find func_a and func_b frames in both samples
29822981
deffind_frame(frames,funcname):
29832982
forfinframes:
29842983
iff.funcname==funcname:
29852984
returnf
29862985
returnNone
29872986

2988-
frame_a_in_c=find_frame(frames_c,"func_a")
2989-
frame_b_in_c=find_frame(frames_c,"func_b")
2990-
frame_a_in_d=find_frame(frames_d,"func_a")
2991-
frame_b_in_d=find_frame(frames_d,"func_b")
2992-
2993-
self.assertIsNotNone(frame_a_in_c)
2994-
self.assertIsNotNone(frame_b_in_c)
2995-
self.assertIsNotNone(frame_a_in_d)
2996-
self.assertIsNotNone(frame_b_in_d)
2997-
2998-
# The bottom frames (A, B) should be the SAME objects (cache reuse)
2999-
self.assertIs(
3000-
frame_a_in_c,
3001-
frame_a_in_d,
3002-
"func_a frame should be reused from cache",
3003-
)
3004-
self.assertIs(
3005-
frame_b_in_c,
3006-
frame_b_in_d,
3007-
"func_b frame should be reused from cache",
2987+
# level4 should have different line numbers (it moved)
2988+
l4_1=find_frame(frames1,"level4")
2989+
l4_2=find_frame(frames2,"level4")
2990+
self.assertIsNotNone(l4_1)
2991+
self.assertIsNotNone(l4_2)
2992+
self.assertNotEqual(
2993+
l4_1.location.lineno,
2994+
l4_2.location.lineno,
2995+
"level4 should be at different lines",
30082996
)
30092997

2998+
# Parent frames (level1, level2, level3) should be reused from cache
2999+
fornamein ["level1","level2","level3"]:
3000+
f1=find_frame(frames1,name)
3001+
f2=find_frame(frames2,name)
3002+
self.assertIsNotNone(f1,f"{name} missing from sample 1")
3003+
self.assertIsNotNone(f2,f"{name} missing from sample 2")
3004+
self.assertIs(f1,f2,f"{name} should be reused from cache")
3005+
30103006
@skip_if_not_supported
30113007
@unittest.skipIf(
30123008
sys.platform=="linux"andnotPROCESS_VM_READV_SUPPORTED,

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp