What does this PR do?
This PR allows runtimes to register a callback to extract runtime stack. They can choose either to emit frames frame by frame, or to dump a whole stacktrace string. The important contract is that the logic to retrieve the runtime stack is happening within a fork of the crashing process, from a signal handler, so it must be async-signal-safe.
Currently, we add runtime stacks as a newruntime_stacks field in theExperimental field. If runtimes choose to emit frames one by one usingddog_RuntimeStackFrame, theruntime_stacks field will be nicely propagated. If they choose to dump the whole traceback string, additional parsing will have to be implemented in theReceiver side appropriate for each different runtime's style/syntax of tracebacks.
Motivation
Current crash tracking captures only native stack traces, which are insufficient for applications using interpreted languages. When a Python/Ruby/PHP application crashes, developers need visibility into both:
- The native call stack (C/C++ level)
- The runtime call stack (Python/Ruby/PHP script level)
Without runtime stack traces, debugging crashes in interpreted languages is hard as the native stack only shows interpreter internals and native extension modules, not the actual application code execution path.
Additional Notes
Anything else we should know when reviewing?
How to test the change?
Unit tests.
There is a very dummy implementation ofdd-trace-py consuming this API in this experimental PR:DataDog/dd-trace-py#14765
By triggering a crash with the tracer and agent attached, we can see outputtedExperimental fields:
- Frame by Frame
{ "ucontext": "ucontext_t { uc_flags: 7, uc_link: 0x0, uc_stack: stack_t { ss_sp: 0x743deadc0000, ss_flags: 0, ss_size: 65536 }, uc_mcontext: mcontext_t { gregs: [0, 127809331044922, 0, 127809325475264, 140726899625856, 140726899626064, 1, 0, 0, -1, 0, 4294967295, 127809325530248, 0, 140726899625160, 140726899625368, 127809330854360, 66179, 12103423998558259, 4, 14, 0, 0], fpregs: 0x743deadcf540, __private: [0, 0, 0, 0, 0, 0, 0, 0] }, uc_sigmask: sigset_t { __val: [0, 11, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] }, __private: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 127, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 160, 31, 0, 0, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 99, 0, 116, 0, 111, 0, 114, 0, 0, 0, 0, 0, 0, 0, 0, 0, 116, 0, 114, 0, 0, 0, 0, 0, 99, 0, 111, 0, 0, 0, 0, 0, 105, 0, 101, 0, 111, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 117, 0, 0, 0, 99, 0, 0, 0, 104, 0, 0, 0, 32, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 0, 111, 0, 0, 0, 114, 0, 0, 0, 32, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 128, 199, 48, 49, 93, 136, 85, 59, 0, 0, 0, 0, 0, 0, 0, 0, 100, 171, 99, 130, 7, 91, 229, 191, 0, 0, 0, 0, 0, 0, 0, 0, 27, 99, 108, 213, 49, 161, 233, 63, 0, 0, 0, 0, 0, 0, 0, 0, 233, 69, 72, 155, 91, 73, 242, 191, 0, 0, 0, 0, 0, 0, 0, 0, 52, 121, 227, 150, 79, 248, 140, 67, 143, 13, 128, 21, 35, 58, 40, 13, 211, 19, 203, 193, 101, 194, 7, 72, 227, 201, 177, 122, 54, 74, 46, 67, 23, 38, 50, 64, 173, 107, 173, 144, 4, 8, 252, 226, 102, 193, 248, 241, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 83, 88, 80, 70, 140, 10, 0, 0, 255, 2, 0, 0, 0, 0, 0, 0, 136, 10, 0, 0, 0, 0, 0, 0] }", "runtime_stack": { "format": "Datadog Runtime Callback 1.0", "frames": [ { "function": "string_at", "file": "/home/bits/.pyenv/versions/3.11.13/lib/python3.11/ctypes/__init__.py", "line": 519 }, { "function": "func16", "file": "tests/internal/crashtracker/test_crashtracker.py", "line": 724 }, ........ { "function": "func2", "file": "tests/internal/crashtracker/test_crashtracker.py", "line": 682 }, { "function": "func1", "file": "tests/internal/crashtracker/test_crashtracker.py", "line": 679 }, { "function": "<module>", "file": "tests/internal/crashtracker/test_crashtracker.py", "line": 734 } ], "runtime_type": "python" }}
- Whole traceback string
{ "ucontext": "ucontext_t { uc_flags: 7, uc_link: 0x0, uc_stack: stack_t { ss_sp: 0x7c4c9c841000, ss_flags: 0, ss_size: 65536 }, uc_mcontext: mcontext_t { gregs: [0, 136668534166074, 0, 136668528596416, 140724573444400, 140724573444608, 1, 0, 0, -1, 0, 4294967295, 136668528651400, 0, 140724573443704, 140724573443912, 136668533975512, 66179, 12103423998558259, 4, 14, 0, 0], fpregs: 0x7c4c9c850540, __private: [0, 0, 0, 0, 0, 0, 0, 0] }, uc_sigmask: sigset_t { __val: [0, 11, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] }, __private: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 127, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 160, 31, 0, 0, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 99, 0, 116, 0, 111, 0, 114, 0, 0, 0, 0, 0, 0, 0, 0, 0, 116, 0, 114, 0, 0, 0, 0, 0, 99, 0, 111, 0, 0, 0, 0, 0, 105, 0, 101, 0, 111, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 117, 0, 0, 0, 99, 0, 0, 0, 104, 0, 0, 0, 32, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 0, 111, 0, 0, 0, 114, 0, 0, 0, 32, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 128, 199, 48, 49, 93, 136, 85, 59, 0, 0, 0, 0, 0, 0, 0, 0, 100, 171, 99, 130, 7, 91, 229, 191, 0, 0, 0, 0, 0, 0, 0, 0, 27, 99, 108, 213, 49, 161, 233, 63, 0, 0, 0, 0, 0, 0, 0, 0, 233, 69, 72, 155, 91, 73, 242, 191, 0, 0, 0, 0, 0, 0, 0, 0, 255, 52, 127, 42, 141, 178, 13, 152, 156, 26, 175, 206, 42, 181, 80, 159, 182, 24, 207, 207, 28, 16, 244, 61, 0, 111, 67, 149, 212, 73, 16, 112, 165, 200, 182, 175, 115, 89, 172, 247, 213, 71, 73, 142, 243, 74, 61, 65, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 83, 88, 80, 70, 140, 10, 0, 0, 255, 2, 0, 0, 0, 0, 0, 0, 136, 10, 0, 0, 0, 0, 0, 0] }", "runtime_stack": { "format": "Datadog Runtime Callback 1.0", "stacktrace_string": "Current thread 0x00007c4c9f46db80 (most recent call first):\n File \"/home/bits/.pyenv/versions/3.11.13/lib/python3.11/ctypes/__init__.py\", line 519 in string_at\n File \"tests/internal/crashtracker/test_crashtracker.py\", line 724 in func16\n File \"tests/internal/crashtracker/test_crashtracker.py\", line 721 in func15\n File \"tests/internal/crashtracker/test_crashtracker.py\", line 718 in func14\n File \"tests/internal/crashtracker/test_crashtracker.py\", line 715 in func13\n File \"tests/internal/crashtracker/test_crashtracker.py\", line 712 in func12\n File \"tests/internal/crashtracker/test_crashtracker.py\", line 709 in func11\n File \"tests/internal/crashtracker/test_crashtracker.py\", line 706 in func10\n File \"tests/internal/crashtracker/test_crashtracker.py\", line 703 in func9\n File \"tests/internal/crashtracker/test_crashtracker.py\", line 700 in func8\n File \"tests/internal/crashtracker/test_crashtracker.py\", line 697 in func7\n File \"tests/internal/crashtracker/test_crashtracker.py\", line 694 in func6\n File \"tests/internal/crashtracker/test_crashtracker.py\", line 691 in func5\n File \"tests/internal/crashtracker/test_crashtracker.py\", line 688 in func4\n File \"tests/internal/crashtracker/test_crashtracker.py\", line 685 in func3\n File \"tests/internal/crashtracker/test_crashtracker.py\", line 682 in func2\n File \"tests/internal/crashtracker/test_crashtracker.py\", line 679 in func1\n File \"tests/internal/crashtracker/test_crashtracker.py\", line 734 in <module>", "runtime_type": "python" }}
Uh oh!
There was an error while loading.Please reload this page.
What does this PR do?
This PR allows runtimes to register a callback to extract runtime stack. They can choose either to emit frames frame by frame, or to dump a whole stacktrace string. The important contract is that the logic to retrieve the runtime stack is happening within a fork of the crashing process, from a signal handler, so it must be async-signal-safe.
Currently, we add runtime stacks as a new
runtime_stacksfield in theExperimentalfield. If runtimes choose to emit frames one by one usingddog_RuntimeStackFrame, theruntime_stacksfield will be nicely propagated. If they choose to dump the whole traceback string, additional parsing will have to be implemented in theReceiverside appropriate for each different runtime's style/syntax of tracebacks.Motivation
Current crash tracking captures only native stack traces, which are insufficient for applications using interpreted languages. When a Python/Ruby/PHP application crashes, developers need visibility into both:
Without runtime stack traces, debugging crashes in interpreted languages is hard as the native stack only shows interpreter internals and native extension modules, not the actual application code execution path.
Additional Notes
Anything else we should know when reviewing?
How to test the change?
Unit tests.
There is a very dummy implementation of
dd-trace-pyconsuming this API in this experimental PR:DataDog/dd-trace-py#14765By triggering a crash with the tracer and agent attached, we can see outputted
Experimentalfields: