Remote debugging attachment protocol

This section describes the low-level protocol that enables external tools toinject and execute a Python script within a running CPython process.

This mechanism forms the basis of thesys.remote_exec() function, whichinstructs a remote Python process to execute a.py file. However, thissection does not document the usage of that function. Instead, it provides adetailed explanation of the underlying protocol, which takes as input thepid of a target Python process and the path to a Python source file to beexecuted. This information supports independent reimplementation of theprotocol, regardless of programming language.

Warning

The execution of the injected script depends on the interpreter reaching asafe evaluation point. As a result, execution may be delayed depending onthe runtime state of the target process.

Once injected, the script is executed by the interpreter within the targetprocess the next time a safe evaluation point is reached. This approach enablesremote execution capabilities without modifying the behavior or structure ofthe running Python application.

Subsequent sections provide a step-by-step description of the protocol,including techniques for locating interpreter structures in memory, safelyaccessing internal fields, and triggering code execution. Platform-specificvariations are noted where applicable, and example implementations are includedto clarify each operation.

Locating the PyRuntime structure

CPython places thePyRuntime structure in a dedicated binary section tohelp external tools find it at runtime. The name and format of this sectionvary by platform. For example,.PyRuntime is used on ELF systems, and__DATA,__PyRuntime is used on macOS. Tools can find the offset of thisstructure by examining the binary on disk.

ThePyRuntime structure contains CPython’s global interpreter state andprovides access to other internal data, including the list of interpreters,thread states, and debugger support fields.

To work with a remote Python process, a debugger must first find the memoryaddress of thePyRuntime structure in the target process. This addresscan’t be hardcoded or calculated from a symbol name, because it depends onwhere the operating system loaded the binary.

The method for findingPyRuntime depends on the platform, but the steps arethe same in general:

  1. Find the base address where the Python binary or shared library was loadedin the target process.

  2. Use the on-disk binary to locate the offset of the.PyRuntime section.

  3. Add the section offset to the base address to compute the address in memory.

The sections below explain how to do this on each supported platform andinclude example code.

Linux (ELF)

To find thePyRuntime structure on Linux:

  1. Read the process’s memory map (for example,/proc/<pid>/maps) to findthe address where the Python executable orlibpython was loaded.

  2. Parse the ELF section headers in the binary to get the offset of the.PyRuntime section.

  3. Add that offset to the base address from step 1 to get the memory address ofPyRuntime.

The following is an example implementation:

deffind_py_runtime_linux(pid:int)->int:# Step 1: Try to find the Python executable in memorybinary_path,base_address=find_mapped_binary(pid,name_contains="python")# Step 2: Fallback to shared library if executable is not foundifbinary_pathisNone:binary_path,base_address=find_mapped_binary(pid,name_contains="libpython")# Step 3: Parse ELF headers to get .PyRuntime section offsetsection_offset=parse_elf_section_offset(binary_path,".PyRuntime")# Step 4: Compute PyRuntime address in memoryreturnbase_address+section_offset

On Linux systems, there are two main approaches to read memory from anotherprocess. The first is through the/proc filesystem, specifically by reading from/proc/[pid]/mem which provides direct access to the process’s memory. Thisrequires appropriate permissions - either being the same user as the targetprocess or having root access. The second approach is using theprocess_vm_readv() system call which provides a more efficient way to copymemory between processes. While ptrace’sPTRACE_PEEKTEXT operation can also beused to read memory, it is significantly slower as it only reads one word at atime and requires multiple context switches between the tracer and traceeprocesses.

For parsing ELF sections, the process involves reading and interpreting the ELFfile format structures from the binary file on disk. The ELF header contains apointer to the section header table. Each section header contains metadata abouta section including its name (stored in a separate string table), offset, andsize. To find a specific section like .PyRuntime, you need to walk through theseheaders and match the section name. The section header then provides the offsetwhere that section exists in the file, which can be used to calculate itsruntime address when the binary is loaded into memory.

You can read more about the ELF file format in theELF specification.

macOS (Mach-O)

To find thePyRuntime structure on macOS:

  1. Calltask_for_pid() to get themach_port_t task port for the targetprocess. This handle is needed to read memory using APIs likemach_vm_read_overwrite andmach_vm_region.

  2. Scan the memory regions to find the one containing the Python executable orlibpython.

  3. Load the binary file from disk and parse the Mach-O headers to find thesection namedPyRuntime in the__DATA segment. On macOS, symbolnames are automatically prefixed with an underscore, so thePyRuntimesymbol appears as_PyRuntime in the symbol table, but the section nameis not affected.

The following is an example implementation:

deffind_py_runtime_macos(pid:int)->int:# Step 1: Get access to the process's memoryhandle=get_memory_access_handle(pid)# Step 2: Try to find the Python executable in memorybinary_path,base_address=find_mapped_binary(handle,name_contains="python")# Step 3: Fallback to libpython if the executable is not foundifbinary_pathisNone:binary_path,base_address=find_mapped_binary(handle,name_contains="libpython")# Step 4: Parse Mach-O headers to get __DATA,__PyRuntime section offsetsection_offset=parse_macho_section_offset(binary_path,"__DATA","__PyRuntime")# Step 5: Compute the PyRuntime address in memoryreturnbase_address+section_offset

On macOS, accessing another process’s memory requires using Mach-O specific APIsand file formats. The first step is obtaining atask_port handle viatask_for_pid(), which provides access to the target process’s memory space.This handle enables memory operations through APIs likemach_vm_read_overwrite().

The process memory can be examined usingmach_vm_region() to scan through thevirtual memory space, whileproc_regionfilename() helps identify which binaryfiles are loaded at each memory region. When the Python binary or library isfound, its Mach-O headers need to be parsed to locate thePyRuntime structure.

The Mach-O format organizes code and data into segments and sections. ThePyRuntime structure lives in a section named__PyRuntime within the__DATA segment. The actual runtime address calculation involves finding the__TEXT segment which serves as the binary’s base address, then locating the__DATA segment containing our target section. The final address is computed bycombining the base address with the appropriate section offsets from the Mach-Oheaders.

Note that accessing another process’s memory on macOS typically requireselevated privileges - either root access or special security entitlementsgranted to the debugging process.

Windows (PE)

To find thePyRuntime structure on Windows:

  1. Use the ToolHelp API to enumerate all modules loaded in the target process.This is done using functions such asCreateToolhelp32Snapshot,Module32First,andModule32Next.

  2. Identify the module corresponding topython.exe orpythonXY.dll, whereX andY are the major and minorversion numbers of the Python version, and record its base address.

  3. Locate thePyRuntim section. Due to the PE format’s 8-character limiton section names (defined asIMAGE_SIZEOF_SHORT_NAME), the originalnamePyRuntime is truncated. This section contains thePyRuntimestructure.

  4. Retrieve the section’s relative virtual address (RVA) and add it to the baseaddress of the module.

The following is an example implementation:

deffind_py_runtime_windows(pid:int)->int:# Step 1: Try to find the Python executable in memorybinary_path,base_address=find_loaded_module(pid,name_contains="python")# Step 2: Fallback to shared pythonXY.dll if the executable is not# foundifbinary_pathisNone:binary_path,base_address=find_loaded_module(pid,name_contains="python3")# Step 3: Parse PE section headers to get the RVA of the PyRuntime# section. The section name appears as "PyRuntim" due to the# 8-character limit defined by the PE format (IMAGE_SIZEOF_SHORT_NAME).section_rva=parse_pe_section_offset(binary_path,"PyRuntim")# Step 4: Compute PyRuntime address in memoryreturnbase_address+section_rva

On Windows, accessing another process’s memory requires using the Windows APIfunctions likeCreateToolhelp32Snapshot() andModule32First()/Module32Next()to enumerate loaded modules. TheOpenProcess() function provides a handle toaccess the target process’s memory space, enabling memory operations throughReadProcessMemory().

The process memory can be examined by enumerating loaded modules to find thePython binary or DLL. When found, its PE headers need to be parsed to locate thePyRuntime structure.

The PE format organizes code and data into sections. ThePyRuntime structurelives in a section named “PyRuntim” (truncated from “PyRuntime” due to PE’s8-character name limit). The actual runtime address calculation involves findingthe module’s base address from the module entry, then locating our targetsection in the PE headers. The final address is computed by combining the baseaddress with the section’s virtual address from the PE section headers.

Note that accessing another process’s memory on Windows typically requiresappropriate privileges - either administrative access or theSeDebugPrivilegeprivilege granted to the debugging process.

Reading _Py_DebugOffsets

Once the address of thePyRuntime structure has been determined, the nextstep is to read the_Py_DebugOffsets structure located at the beginning ofthePyRuntime block.

This structure provides version-specific field offsets that are needed tosafely read interpreter and thread state memory. These offsets vary betweenCPython versions and must be checked before use to ensure they are compatible.

To read and check the debug offsets, follow these steps:

  1. Read memory from the target process starting at thePyRuntime address,covering the same number of bytes as the_Py_DebugOffsets structure.This structure is located at the very start of thePyRuntime memoryblock. Its layout is defined in CPython’s internal headers and stays thesame within a given minor version, but may change in major versions.

  2. Check that the structure contains valid data:

    • Thecookie field must match the expected debug marker.

    • Theversion field must match the version of the Python interpreterused by the debugger.

    • If either the debugger or the target process is using a pre-releaseversion (for example, an alpha, beta, or release candidate), the versionsmust match exactly.

    • Thefree_threaded field must have the same value in both the debuggerand the target process.

  3. If the structure is valid, the offsets it contains can be used to locatefields in memory. If any check fails, the debugger should stop the operationto avoid reading memory in the wrong format.

The following is an example implementation that reads and checks_Py_DebugOffsets:

defread_debug_offsets(pid:int,py_runtime_addr:int)->DebugOffsets:# Step 1: Read memory from the target process at the PyRuntime addressdata=read_process_memory(pid,address=py_runtime_addr,size=DEBUG_OFFSETS_SIZE)# Step 2: Deserialize the raw bytes into a _Py_DebugOffsets structuredebug_offsets=parse_debug_offsets(data)# Step 3: Validate the contents of the structureifdebug_offsets.cookie!=EXPECTED_COOKIE:raiseRuntimeError("Invalid or missing debug cookie")ifdebug_offsets.version!=LOCAL_PYTHON_VERSION:raiseRuntimeError("Mismatch between caller and target Python versions")ifdebug_offsets.free_threaded!=LOCAL_FREE_THREADED:raiseRuntimeError("Mismatch in free-threaded configuration")returndebug_offsets

Warning

Process suspension recommended

To avoid race conditions and ensure memory consistency, it is stronglyrecommended that the target process be suspended before performing anyoperations that read or write internal interpreter state. The Python runtimemay concurrently mutate interpreter data structures—such as creating ordestroying threads—during normal execution. This can result in invalidmemory reads or writes.

A debugger may suspend execution by attaching to the process withptraceor by sending aSIGSTOP signal. Execution should only be resumed afterdebugger-side memory operations are complete.

Note

Some tools, such as profilers or sampling-based debuggers, may operate ona running process without suspension. In such cases, tools must beexplicitly designed to handle partially updated or inconsistent memory.For most debugger implementations, suspending the process remains thesafest and most robust approach.

Locating the interpreter and thread state

Before code can be injected and executed in a remote Python process, thedebugger must choose a thread in which to schedule execution. This is necessarybecause the control fields used to perform remote code injection are located inthe_PyRemoteDebuggerSupport structure, which is embedded in aPyThreadState object. These fields are modified by the debugger to requestexecution of injected scripts.

ThePyThreadState structure represents a thread running inside a Pythoninterpreter. It maintains the thread’s evaluation context and contains thefields required for debugger coordination. Locating a validPyThreadStateis therefore a key prerequisite for triggering execution remotely.

A thread is typically selected based on its role or ID. In most cases, the mainthread is used, but some tools may target a specific thread by its nativethread ID. Once the target thread is chosen, the debugger must locate both theinterpreter and the associated thread state structures in memory.

The relevant internal structures are defined as follows:

  • PyInterpreterState represents an isolated Python interpreter instance.Each interpreter maintains its own set of imported modules, built-in state,and thread state list. Although most Python applications use a singleinterpreter, CPython supports multiple interpreters in the same process.

  • PyThreadState represents a thread running within an interpreter. Itcontains execution state and the control fields used by the debugger.

To locate a thread:

  1. Use the offsetruntime_state.interpreters_head to obtain the address ofthe first interpreter in thePyRuntime structure. This is the entry pointto the linked list of active interpreters.

  2. Use the offsetinterpreter_state.threads_main to access the main threadstate associated with the selected interpreter. This is typically the mostreliable thread to target.

3. Optionally, use the offsetinterpreter_state.threads_head to iteratethrough the linked list of all thread states. EachPyThreadState structurecontains anative_thread_id field, which may be compared to a target threadID to find a specific thread.

1. Once a validPyThreadState has been found, its address can be used inlater steps of the protocol, such as writing debugger control fields andscheduling execution.

The following is an example implementation that locates the main thread state:

deffind_main_thread_state(pid:int,py_runtime_addr:int,debug_offsets:DebugOffsets,)->int:# Step 1: Read interpreters_head from PyRuntimeinterp_head_ptr=(py_runtime_addr+debug_offsets.runtime_state.interpreters_head)interp_addr=read_pointer(pid,interp_head_ptr)ifinterp_addr==0:raiseRuntimeError("No interpreter found in the target process")# Step 2: Read the threads_main pointer from the interpreterthreads_main_ptr=(interp_addr+debug_offsets.interpreter_state.threads_main)thread_state_addr=read_pointer(pid,threads_main_ptr)ifthread_state_addr==0:raiseRuntimeError("Main thread state is not available")returnthread_state_addr

The following example demonstrates how to locate a thread by its native threadID:

deffind_thread_by_id(pid:int,interp_addr:int,debug_offsets:DebugOffsets,target_tid:int,)->int:# Start at threads_head and walk the linked listthread_ptr=read_pointer(pid,interp_addr+debug_offsets.interpreter_state.threads_head)whilethread_ptr:native_tid_ptr=(thread_ptr+debug_offsets.thread_state.native_thread_id)native_tid=read_int(pid,native_tid_ptr)ifnative_tid==target_tid:returnthread_ptrthread_ptr=read_pointer(pid,thread_ptr+debug_offsets.thread_state.next)raiseRuntimeError("Thread with the given ID was not found")

Once a valid thread state has been located, the debugger can proceed withmodifying its control fields and scheduling execution, as described in the nextsection.

Writing control information

Once a validPyThreadState structure has been identified, the debugger maymodify control fields within it to schedule the execution of a specified Pythonscript. These control fields are checked periodically by the interpreter, andwhen set correctly, they trigger the execution of remote code at a safe pointin the evaluation loop.

EachPyThreadState contains a_PyRemoteDebuggerSupport structure usedfor communication between the debugger and the interpreter. The locations ofits fields are defined by the_Py_DebugOffsets structure and include thefollowing:

  • debugger_script_path: A fixed-size buffer that holds the full path to a

    Python source file (.py). This file must be accessible and readable bythe target process when execution is triggered.

  • debugger_pending_call: An integer flag. Setting this to1 tells the

    interpreter that a script is ready to be executed.

  • eval_breaker: A field checked by the interpreter during execution.

    Setting bit 5 (_PY_EVAL_PLEASE_STOP_BIT, value1U<<5) in thisfield causes the interpreter to pause and check for debugger activity.

To complete the injection, the debugger must perform the following steps:

  1. Write the full script path into thedebugger_script_path buffer.

  2. Setdebugger_pending_call to1.

  3. Read the current value ofeval_breaker, set bit 5(_PY_EVAL_PLEASE_STOP_BIT), and write the updated value back. Thissignals the interpreter to check for debugger activity.

The following is an example implementation:

definject_script(pid:int,thread_state_addr:int,debug_offsets:DebugOffsets,script_path:str)->None:# Compute the base offset of _PyRemoteDebuggerSupportsupport_base=(thread_state_addr+debug_offsets.debugger_support.remote_debugger_support)# Step 1: Write the script path into debugger_script_pathscript_path_ptr=(support_base+debug_offsets.debugger_support.debugger_script_path)write_string(pid,script_path_ptr,script_path)# Step 2: Set debugger_pending_call to 1pending_ptr=(support_base+debug_offsets.debugger_support.debugger_pending_call)write_int(pid,pending_ptr,1)# Step 3: Set _PY_EVAL_PLEASE_STOP_BIT (bit 5, value 1 << 5) in# eval_breakereval_breaker_ptr=(thread_state_addr+debug_offsets.debugger_support.eval_breaker)breaker=read_int(pid,eval_breaker_ptr)breaker|=(1<<5)write_int(pid,eval_breaker_ptr,breaker)

Once these fields are set, the debugger may resume the process (if it wassuspended). The interpreter will process the request at the next safeevaluation point, load the script from disk, and execute it.

It is the responsibility of the debugger to ensure that the script file remainspresent and accessible to the target process during execution.

Note

Script execution is asynchronous. The script file cannot be deletedimmediately after injection. The debugger should wait until the injectedscript has produced an observable effect before removing the file.This effect depends on what the script is designed to do. For example,a debugger might wait until the remote process connects back to a socketbefore removing the script. Once such an effect is observed, it is safe toassume the file is no longer needed.

Summary

To inject and execute a Python script in a remote process:

  1. Locate thePyRuntime structure in the target process’s memory.

  2. Read and validate the_Py_DebugOffsets structure at the beginning ofPyRuntime.

  3. Use the offsets to locate a validPyThreadState.

  4. Write the path to a Python script intodebugger_script_path.

  5. Set thedebugger_pending_call flag to1.

  6. Set_PY_EVAL_PLEASE_STOP_BIT in theeval_breaker field.

  7. Resume the process (if suspended). The script will execute at the next safeevaluation point.