Remote debugging attachment protocol¶
This section describes the low-level protocol that enables external tools toinject and execute a Python script within a running CPython process.
This mechanism forms the basis of thesys.remote_exec()
function, whichinstructs a remote Python process to execute a.py
file. However, thissection does not document the usage of that function. Instead, it provides adetailed explanation of the underlying protocol, which takes as input thepid
of a target Python process and the path to a Python source file to beexecuted. This information supports independent reimplementation of theprotocol, regardless of programming language.
Warning
The execution of the injected script depends on the interpreter reaching asafe evaluation point. As a result, execution may be delayed depending onthe runtime state of the target process.
Once injected, the script is executed by the interpreter within the targetprocess the next time a safe evaluation point is reached. This approach enablesremote execution capabilities without modifying the behavior or structure ofthe running Python application.
Subsequent sections provide a step-by-step description of the protocol,including techniques for locating interpreter structures in memory, safelyaccessing internal fields, and triggering code execution. Platform-specificvariations are noted where applicable, and example implementations are includedto clarify each operation.
Locating the PyRuntime structure¶
CPython places thePyRuntime
structure in a dedicated binary section tohelp external tools find it at runtime. The name and format of this sectionvary by platform. For example,.PyRuntime
is used on ELF systems, and__DATA,__PyRuntime
is used on macOS. Tools can find the offset of thisstructure by examining the binary on disk.
ThePyRuntime
structure contains CPython’s global interpreter state andprovides access to other internal data, including the list of interpreters,thread states, and debugger support fields.
To work with a remote Python process, a debugger must first find the memoryaddress of thePyRuntime
structure in the target process. This addresscan’t be hardcoded or calculated from a symbol name, because it depends onwhere the operating system loaded the binary.
The method for findingPyRuntime
depends on the platform, but the steps arethe same in general:
Find the base address where the Python binary or shared library was loadedin the target process.
Use the on-disk binary to locate the offset of the
.PyRuntime
section.Add the section offset to the base address to compute the address in memory.
The sections below explain how to do this on each supported platform andinclude example code.
Linux (ELF)
To find thePyRuntime
structure on Linux:
Read the process’s memory map (for example,
/proc/<pid>/maps
) to findthe address where the Python executable orlibpython
was loaded.Parse the ELF section headers in the binary to get the offset of the
.PyRuntime
section.Add that offset to the base address from step 1 to get the memory address of
PyRuntime
.
The following is an example implementation:
deffind_py_runtime_linux(pid:int)->int:# Step 1: Try to find the Python executable in memorybinary_path,base_address=find_mapped_binary(pid,name_contains="python")# Step 2: Fallback to shared library if executable is not foundifbinary_pathisNone:binary_path,base_address=find_mapped_binary(pid,name_contains="libpython")# Step 3: Parse ELF headers to get .PyRuntime section offsetsection_offset=parse_elf_section_offset(binary_path,".PyRuntime")# Step 4: Compute PyRuntime address in memoryreturnbase_address+section_offset
On Linux systems, there are two main approaches to read memory from anotherprocess. The first is through the/proc
filesystem, specifically by reading from/proc/[pid]/mem
which provides direct access to the process’s memory. Thisrequires appropriate permissions - either being the same user as the targetprocess or having root access. The second approach is using theprocess_vm_readv()
system call which provides a more efficient way to copymemory between processes. While ptrace’sPTRACE_PEEKTEXT
operation can also beused to read memory, it is significantly slower as it only reads one word at atime and requires multiple context switches between the tracer and traceeprocesses.
For parsing ELF sections, the process involves reading and interpreting the ELFfile format structures from the binary file on disk. The ELF header contains apointer to the section header table. Each section header contains metadata abouta section including its name (stored in a separate string table), offset, andsize. To find a specific section like .PyRuntime, you need to walk through theseheaders and match the section name. The section header then provides the offsetwhere that section exists in the file, which can be used to calculate itsruntime address when the binary is loaded into memory.
You can read more about the ELF file format in theELF specification.
macOS (Mach-O)
To find thePyRuntime
structure on macOS:
Call
task_for_pid()
to get themach_port_t
task port for the targetprocess. This handle is needed to read memory using APIs likemach_vm_read_overwrite
andmach_vm_region
.Scan the memory regions to find the one containing the Python executable or
libpython
.Load the binary file from disk and parse the Mach-O headers to find thesection named
PyRuntime
in the__DATA
segment. On macOS, symbolnames are automatically prefixed with an underscore, so thePyRuntime
symbol appears as_PyRuntime
in the symbol table, but the section nameis not affected.
The following is an example implementation:
deffind_py_runtime_macos(pid:int)->int:# Step 1: Get access to the process's memoryhandle=get_memory_access_handle(pid)# Step 2: Try to find the Python executable in memorybinary_path,base_address=find_mapped_binary(handle,name_contains="python")# Step 3: Fallback to libpython if the executable is not foundifbinary_pathisNone:binary_path,base_address=find_mapped_binary(handle,name_contains="libpython")# Step 4: Parse Mach-O headers to get __DATA,__PyRuntime section offsetsection_offset=parse_macho_section_offset(binary_path,"__DATA","__PyRuntime")# Step 5: Compute the PyRuntime address in memoryreturnbase_address+section_offset
On macOS, accessing another process’s memory requires using Mach-O specific APIsand file formats. The first step is obtaining atask_port
handle viatask_for_pid()
, which provides access to the target process’s memory space.This handle enables memory operations through APIs likemach_vm_read_overwrite()
.
The process memory can be examined usingmach_vm_region()
to scan through thevirtual memory space, whileproc_regionfilename()
helps identify which binaryfiles are loaded at each memory region. When the Python binary or library isfound, its Mach-O headers need to be parsed to locate thePyRuntime
structure.
The Mach-O format organizes code and data into segments and sections. ThePyRuntime
structure lives in a section named__PyRuntime
within the__DATA
segment. The actual runtime address calculation involves finding the__TEXT
segment which serves as the binary’s base address, then locating the__DATA
segment containing our target section. The final address is computed bycombining the base address with the appropriate section offsets from the Mach-Oheaders.
Note that accessing another process’s memory on macOS typically requireselevated privileges - either root access or special security entitlementsgranted to the debugging process.
Windows (PE)
To find thePyRuntime
structure on Windows:
Use the ToolHelp API to enumerate all modules loaded in the target process.This is done using functions such asCreateToolhelp32Snapshot,Module32First,andModule32Next.
Identify the module corresponding to
python.exe
orpythonXY.dll
, whereX
andY
are the major and minorversion numbers of the Python version, and record its base address.Locate the
PyRuntim
section. Due to the PE format’s 8-character limiton section names (defined asIMAGE_SIZEOF_SHORT_NAME
), the originalnamePyRuntime
is truncated. This section contains thePyRuntime
structure.Retrieve the section’s relative virtual address (RVA) and add it to the baseaddress of the module.
The following is an example implementation:
deffind_py_runtime_windows(pid:int)->int:# Step 1: Try to find the Python executable in memorybinary_path,base_address=find_loaded_module(pid,name_contains="python")# Step 2: Fallback to shared pythonXY.dll if the executable is not# foundifbinary_pathisNone:binary_path,base_address=find_loaded_module(pid,name_contains="python3")# Step 3: Parse PE section headers to get the RVA of the PyRuntime# section. The section name appears as "PyRuntim" due to the# 8-character limit defined by the PE format (IMAGE_SIZEOF_SHORT_NAME).section_rva=parse_pe_section_offset(binary_path,"PyRuntim")# Step 4: Compute PyRuntime address in memoryreturnbase_address+section_rva
On Windows, accessing another process’s memory requires using the Windows APIfunctions likeCreateToolhelp32Snapshot()
andModule32First()/Module32Next()
to enumerate loaded modules. TheOpenProcess()
function provides a handle toaccess the target process’s memory space, enabling memory operations throughReadProcessMemory()
.
The process memory can be examined by enumerating loaded modules to find thePython binary or DLL. When found, its PE headers need to be parsed to locate thePyRuntime
structure.
The PE format organizes code and data into sections. ThePyRuntime
structurelives in a section named “PyRuntim” (truncated from “PyRuntime” due to PE’s8-character name limit). The actual runtime address calculation involves findingthe module’s base address from the module entry, then locating our targetsection in the PE headers. The final address is computed by combining the baseaddress with the section’s virtual address from the PE section headers.
Note that accessing another process’s memory on Windows typically requiresappropriate privileges - either administrative access or theSeDebugPrivilege
privilege granted to the debugging process.
Reading _Py_DebugOffsets¶
Once the address of thePyRuntime
structure has been determined, the nextstep is to read the_Py_DebugOffsets
structure located at the beginning ofthePyRuntime
block.
This structure provides version-specific field offsets that are needed tosafely read interpreter and thread state memory. These offsets vary betweenCPython versions and must be checked before use to ensure they are compatible.
To read and check the debug offsets, follow these steps:
Read memory from the target process starting at the
PyRuntime
address,covering the same number of bytes as the_Py_DebugOffsets
structure.This structure is located at the very start of thePyRuntime
memoryblock. Its layout is defined in CPython’s internal headers and stays thesame within a given minor version, but may change in major versions.Check that the structure contains valid data:
The
cookie
field must match the expected debug marker.The
version
field must match the version of the Python interpreterused by the debugger.If either the debugger or the target process is using a pre-releaseversion (for example, an alpha, beta, or release candidate), the versionsmust match exactly.
The
free_threaded
field must have the same value in both the debuggerand the target process.
If the structure is valid, the offsets it contains can be used to locatefields in memory. If any check fails, the debugger should stop the operationto avoid reading memory in the wrong format.
The following is an example implementation that reads and checks_Py_DebugOffsets
:
defread_debug_offsets(pid:int,py_runtime_addr:int)->DebugOffsets:# Step 1: Read memory from the target process at the PyRuntime addressdata=read_process_memory(pid,address=py_runtime_addr,size=DEBUG_OFFSETS_SIZE)# Step 2: Deserialize the raw bytes into a _Py_DebugOffsets structuredebug_offsets=parse_debug_offsets(data)# Step 3: Validate the contents of the structureifdebug_offsets.cookie!=EXPECTED_COOKIE:raiseRuntimeError("Invalid or missing debug cookie")ifdebug_offsets.version!=LOCAL_PYTHON_VERSION:raiseRuntimeError("Mismatch between caller and target Python versions")ifdebug_offsets.free_threaded!=LOCAL_FREE_THREADED:raiseRuntimeError("Mismatch in free-threaded configuration")returndebug_offsets
Warning
Process suspension recommended
To avoid race conditions and ensure memory consistency, it is stronglyrecommended that the target process be suspended before performing anyoperations that read or write internal interpreter state. The Python runtimemay concurrently mutate interpreter data structures—such as creating ordestroying threads—during normal execution. This can result in invalidmemory reads or writes.
A debugger may suspend execution by attaching to the process withptrace
or by sending aSIGSTOP
signal. Execution should only be resumed afterdebugger-side memory operations are complete.
Note
Some tools, such as profilers or sampling-based debuggers, may operate ona running process without suspension. In such cases, tools must beexplicitly designed to handle partially updated or inconsistent memory.For most debugger implementations, suspending the process remains thesafest and most robust approach.
Locating the interpreter and thread state¶
Before code can be injected and executed in a remote Python process, thedebugger must choose a thread in which to schedule execution. This is necessarybecause the control fields used to perform remote code injection are located inthe_PyRemoteDebuggerSupport
structure, which is embedded in aPyThreadState
object. These fields are modified by the debugger to requestexecution of injected scripts.
ThePyThreadState
structure represents a thread running inside a Pythoninterpreter. It maintains the thread’s evaluation context and contains thefields required for debugger coordination. Locating a validPyThreadState
is therefore a key prerequisite for triggering execution remotely.
A thread is typically selected based on its role or ID. In most cases, the mainthread is used, but some tools may target a specific thread by its nativethread ID. Once the target thread is chosen, the debugger must locate both theinterpreter and the associated thread state structures in memory.
The relevant internal structures are defined as follows:
PyInterpreterState
represents an isolated Python interpreter instance.Each interpreter maintains its own set of imported modules, built-in state,and thread state list. Although most Python applications use a singleinterpreter, CPython supports multiple interpreters in the same process.PyThreadState
represents a thread running within an interpreter. Itcontains execution state and the control fields used by the debugger.
To locate a thread:
Use the offset
runtime_state.interpreters_head
to obtain the address ofthe first interpreter in thePyRuntime
structure. This is the entry pointto the linked list of active interpreters.Use the offset
interpreter_state.threads_main
to access the main threadstate associated with the selected interpreter. This is typically the mostreliable thread to target.
3. Optionally, use the offsetinterpreter_state.threads_head
to iteratethrough the linked list of all thread states. EachPyThreadState
structurecontains anative_thread_id
field, which may be compared to a target threadID to find a specific thread.
1. Once a validPyThreadState
has been found, its address can be used inlater steps of the protocol, such as writing debugger control fields andscheduling execution.
The following is an example implementation that locates the main thread state:
deffind_main_thread_state(pid:int,py_runtime_addr:int,debug_offsets:DebugOffsets,)->int:# Step 1: Read interpreters_head from PyRuntimeinterp_head_ptr=(py_runtime_addr+debug_offsets.runtime_state.interpreters_head)interp_addr=read_pointer(pid,interp_head_ptr)ifinterp_addr==0:raiseRuntimeError("No interpreter found in the target process")# Step 2: Read the threads_main pointer from the interpreterthreads_main_ptr=(interp_addr+debug_offsets.interpreter_state.threads_main)thread_state_addr=read_pointer(pid,threads_main_ptr)ifthread_state_addr==0:raiseRuntimeError("Main thread state is not available")returnthread_state_addr
The following example demonstrates how to locate a thread by its native threadID:
deffind_thread_by_id(pid:int,interp_addr:int,debug_offsets:DebugOffsets,target_tid:int,)->int:# Start at threads_head and walk the linked listthread_ptr=read_pointer(pid,interp_addr+debug_offsets.interpreter_state.threads_head)whilethread_ptr:native_tid_ptr=(thread_ptr+debug_offsets.thread_state.native_thread_id)native_tid=read_int(pid,native_tid_ptr)ifnative_tid==target_tid:returnthread_ptrthread_ptr=read_pointer(pid,thread_ptr+debug_offsets.thread_state.next)raiseRuntimeError("Thread with the given ID was not found")
Once a valid thread state has been located, the debugger can proceed withmodifying its control fields and scheduling execution, as described in the nextsection.
Writing control information¶
Once a validPyThreadState
structure has been identified, the debugger maymodify control fields within it to schedule the execution of a specified Pythonscript. These control fields are checked periodically by the interpreter, andwhen set correctly, they trigger the execution of remote code at a safe pointin the evaluation loop.
EachPyThreadState
contains a_PyRemoteDebuggerSupport
structure usedfor communication between the debugger and the interpreter. The locations ofits fields are defined by the_Py_DebugOffsets
structure and include thefollowing:
debugger_script_path
: A fixed-size buffer that holds the full path to aPython source file (
.py
). This file must be accessible and readable bythe target process when execution is triggered.
debugger_pending_call
: An integer flag. Setting this to1
tells theinterpreter that a script is ready to be executed.
eval_breaker
: A field checked by the interpreter during execution.Setting bit 5 (
_PY_EVAL_PLEASE_STOP_BIT
, value1U<<5
) in thisfield causes the interpreter to pause and check for debugger activity.
To complete the injection, the debugger must perform the following steps:
Write the full script path into the
debugger_script_path
buffer.Set
debugger_pending_call
to1
.Read the current value of
eval_breaker
, set bit 5(_PY_EVAL_PLEASE_STOP_BIT
), and write the updated value back. Thissignals the interpreter to check for debugger activity.
The following is an example implementation:
definject_script(pid:int,thread_state_addr:int,debug_offsets:DebugOffsets,script_path:str)->None:# Compute the base offset of _PyRemoteDebuggerSupportsupport_base=(thread_state_addr+debug_offsets.debugger_support.remote_debugger_support)# Step 1: Write the script path into debugger_script_pathscript_path_ptr=(support_base+debug_offsets.debugger_support.debugger_script_path)write_string(pid,script_path_ptr,script_path)# Step 2: Set debugger_pending_call to 1pending_ptr=(support_base+debug_offsets.debugger_support.debugger_pending_call)write_int(pid,pending_ptr,1)# Step 3: Set _PY_EVAL_PLEASE_STOP_BIT (bit 5, value 1 << 5) in# eval_breakereval_breaker_ptr=(thread_state_addr+debug_offsets.debugger_support.eval_breaker)breaker=read_int(pid,eval_breaker_ptr)breaker|=(1<<5)write_int(pid,eval_breaker_ptr,breaker)
Once these fields are set, the debugger may resume the process (if it wassuspended). The interpreter will process the request at the next safeevaluation point, load the script from disk, and execute it.
It is the responsibility of the debugger to ensure that the script file remainspresent and accessible to the target process during execution.
Note
Script execution is asynchronous. The script file cannot be deletedimmediately after injection. The debugger should wait until the injectedscript has produced an observable effect before removing the file.This effect depends on what the script is designed to do. For example,a debugger might wait until the remote process connects back to a socketbefore removing the script. Once such an effect is observed, it is safe toassume the file is no longer needed.
Summary¶
To inject and execute a Python script in a remote process:
Locate the
PyRuntime
structure in the target process’s memory.Read and validate the
_Py_DebugOffsets
structure at the beginning ofPyRuntime
.Use the offsets to locate a valid
PyThreadState
.Write the path to a Python script into
debugger_script_path
.Set the
debugger_pending_call
flag to1
.Set
_PY_EVAL_PLEASE_STOP_BIT
in theeval_breaker
field.Resume the process (if suspended). The script will execute at the next safeevaluation point.