Important
This PEP is a historical document. The up-to-date, canonical documentation can now be found atsys.monitoring.
×
SeePEP 1 for how to propose changes.
Using a profiler or debugger in CPython can have a severe impact onperformance. Slowdowns by an order of magnitude are common.
This PEP proposes an API for monitoring Python programs runningon CPython that will enable monitoring at low cost.
Although this PEP does not specify an implementation, it is expected thatit will be implemented using the quickening step ofPEP 659.
Asys.monitoring namespace will be added, which will containthe relevant functions and constants.
Developers should not have to pay an unreasonable cost to use debuggers,profilers and other similar tools.
C++ and Java developers expect to be able to run a program at full speed(or very close to it) under a debugger.Python developers should expect that too.
The quickening mechanism provided byPEP 659 provides a way to dynamicallymodify executing Python bytecode. These modifications have little cost beyondthe parts of the code that are modified and a relatively low cost to thoseparts that are modified. We can leverage this to provide an efficientmechanism for monitoring that was not possible in 3.10 or earlier.
By using quickening, we expect that code run under a debugger on 3.12should outperform code run without a debugger on 3.11.Profiling will still slow down execution, but by much less than in 3.11.
Monitoring of Python programs is done by registering callback functionsfor events and by activating a set of events.
Activating events and registering callback functions are independent of each other.
Both registering callbacks and activating events are done on a per-tool basis.It is possible to have multiple tools that respond to different sets of events.
Note that, unlikesys.settrace(), events and callbacks are per interpreter, not per thread.
As a code object executes various events occur that might be of interestto tools. By activating events and by registering callback functionstools can respond to these events in any way that suits them.Events can be set globally, or for individual code objects.
For 3.12, CPython will support the following events:
STOP_ITERATION event.StopIteration is raised;seethe STOP_ITERATION event.More events may be added in the future.
All events will be attributes of theevents namespace insys.monitoring.All events will represented by a power of two integer, so that they can be combinedwith the| operator.
Events are divided into three groups:
Local events are associated with normal execution of the program and happenat clearly defined locations. All local events can be disabled.The local events are:
Ancillary events can be monitored like other events, but are controlledby another event:
TheC_RETURN andC_RAISE events are are controlled by theCALLevent.C_RETURN andC_RAISE events will only be seen if thecorrespondingCALL event is being monitored.
Other events are not necessarily tied to a specific location in theprogram and cannot be individually disabled.
The other events that can be monitored are:
PEP 380specifies that aStopIteration exception is raised when returning a valuefrom a generator or coroutine. However, this is a very inefficient way toreturn a value, so some Python implementations, notably CPython 3.12+, do notraise an exception unless it would be visible to other code.
To allow tools to monitor for real exceptions without slowing down generatorsand coroutines, theSTOP_ITERATION event is provided.STOP_ITERATION can be locally disabled, unlikeRAISE.
The VM can support up to 6 tools at once.Before registering or activating events, a tool should choose an identifier.Identifiers are integers in the range 0 to 5.
sys.monitoring.use_tool_id(id,name:str)->Nonesys.monitoring.free_tool_id(id)->Nonesys.monitoring.get_tool(id)->str|None
sys.monitoring.use_tool_id raises aValueError ifid is in use.sys.monitoring.get_tool returns the name of the tool ifid is in use,otherwise it returnsNone.
All IDs are treated the same by the VM with regard to events, but thefollowing IDs are pre-defined to make co-operation of tools easier:
sys.monitoring.DEBUGGER_ID=0sys.monitoring.COVERAGE_ID=1sys.monitoring.PROFILER_ID=2sys.monitoring.OPTIMIZER_ID=5
There is no obligation to set an ID, nor is there anything preventing a toolfrom using an ID even it is already in use.However, tools are encouraged to use a unique ID and respect other tools.
For example, if a debugger were attached andDEBUGGER_ID were in use, itshould report an error, rather than carrying on regardless.
TheOPTIMIZER_ID is provided for tools like Cinder or PyTorchthat want to optimize Python code, but need to decide what tooptimize in a way that depends on some wider context.
Events can be controlled globally by modifying the set of events being monitored:
sys.monitoring.get_events(tool_id:int)->intReturns theint representing all the active events.sys.monitoring.set_events(tool_id:int,event_set:int)Activates all events which are set inevent_set.Raises aValueError iftool_id is not in use.No events are active by default.
Events can also be controlled on a per code object basis:
sys.monitoring.get_local_events(tool_id:int,code:CodeType)->intReturns all the local events forcodesys.monitoring.set_local_events(tool_id:int,code:CodeType,event_set:int)Activates all the local events forcode which are set inevent_set.Raises aValueError iftool_id is not in use.Local events add to global events, but do not mask them.In other words, all global events will trigger for a code object,regardless of the local events.
To register a callable for events call:
sys.monitoring.register_callback(tool_id:int,event:int,func:Callable|None)->Callable|None
If another callback was registered for the giventool_id andevent,it is unregistered and returned.Otherwiseregister_callback returnsNone.
Functions can be unregistered by callingsys.monitoring.register_callback(tool_id,event,None).
Callback functions can be registered and unregistered at any time.
Registering or unregistering a callback function will generate asys.audit event.
When an active event occurs, the registered callback function is called.Different events will provide the callback function with different arguments, as follows:
PY_START andPY_RESUME:func(code:CodeType,instruction_offset:int)->DISABLE|Any
PY_RETURN andPY_YIELD:func(code:CodeType,instruction_offset:int,retval:object)->DISABLE|Any
CALL,C_RAISE andC_RETURN:func(code:CodeType,instruction_offset:int,callable:object,arg0:object|MISSING)->DISABLE|AnyIf there are no arguments,
arg0is set toMISSING.
RAISE andEXCEPTION_HANDLED:func(code:CodeType,instruction_offset:int,exception:BaseException)->DISABLE|Any
LINE:func(code:CodeType,line_number:int)->DISABLE|Any
BRANCH:func(code:CodeType,instruction_offset:int,destination_offset:int)->DISABLE|Any
Note that thedestination_offset is where the code will next execute.For an untaken branch this will be the offset of the instruction followingthe branch.
INSTRUCTION:func(code:CodeType,instruction_offset:int)->DISABLE|Any
If a callback function returnsDISABLE, then that function will no longerbe called for that(code,instruction_offset) untilsys.monitoring.restart_events() is called.This feature is provided for coverage and other tools that are only interestedseeing an event once.
Note thatsys.monitoring.restart_events() is not specific to one tool,so tools must be prepared to receive events that they have chosen to DISABLE.
Events are suspended in callback functions and their callees for the toolthat registered that callback.
That means that other tools will see events in the callback functions for othertools. This could be useful for debugging a profiling tool, but would producemisleading profiles, as the debugger tool would show up in the profile.
If an instructions triggers several events they occur in the following order:
Each event is delivered to tools in ascending order of ID.
Most events are independent; setting or disabling one event has no effect on the others.However, theCALL,C_RAISE andC_RETURN events form a group.If any of those events are set or disabled, then all events in the group are.Disabling aCALL event will not disable the matchingC_RAISE orC_RETURN,but will disable all subsequent events.
sys.monitoring namespacedefuse_tool_id(id)->Nonedeffree_tool_id(id)->Nonedefget_events(tool_id:int)->intdefset_events(tool_id:int,event_set:int)->Nonedefget_local_events(tool_id:int,code:CodeType)->intdefset_local_events(tool_id:int,code:CodeType,event_set:int)->Nonedefregister_callback(tool_id:int,event:int,func:Callable)->Optional[Callable]defrestart_events()->NoneDISABLE:objectMISSING:objectSome features of the standard library are not accessible to normal code,but are accessible to debuggers. For example, setting local variables, orthe line number.
These features will be available to callback functions.
This PEP is mostly backwards compatible.
There are some compatibility issues withPEP 523, as the behaviorofPEP 523 plugins is outside of the VM’s control.It is up toPEP 523 plugins to ensure that they respect the semanticsof this PEP. Simple plugins that do not change the state of the VM, anddefer execution to_PyEval_EvalFrameDefault() should continue to work.
sys.settrace() andsys.setprofile() will act as if they were tools6 and 7 respectively, so can be used alongside this PEP.
This means thatsys.settrace() andsys.setprofile() may not workcorrectly with allPEP 523 plugins. Although, simplePEP 523plugins, as described above, should be fine.
If no events are active, this PEP should have a small positive impact onperformance. Experiments show between 1 and 2% speedup from not supportingsys.settrace() directly.
The performance ofsys.settrace() will be about the same.The performance ofsys.setprofile() should be better.However, tools relying onsys.settrace() andsys.setprofile() can be made a lot faster by using theAPI provided by this PEP.
If a small set of events are active, e.g. for a debugger, then the overheadof callbacks will be orders of magnitudes less than forsys.settrace()and much cheaper than usingPEP 523.
Coverage tools can be implemented at very low cost,by returningDISABLE in all callbacks.
For heavily instrumented code, e.g. usingLINE, performance should bebetter thansys.settrace, but not by that much as performance will bedominated by the time spent in callbacks.
For optimizing virtual machines, such as future versions of CPython(andPyPy should they choose to support this API), changes to the setactive events in the midst of a long running program could be quiteexpensive, possibly taking hundreds of milliseconds as it triggersde-optimizations. Once such de-optimization has occurred, performance shouldrecover as the VM can re-optimize the instrumented code.
In general these operations can be considered to be fast:
defget_events(tool_id:int)->intdefget_local_events(tool_id:int,code:CodeType)->intdefregister_callback(tool_id:int,event:int,func:Callable)->Optional[Callable]defget_tool(tool_id)->str|NoneThese operations are slower, but not especially so:
defset_local_events(tool_id:int,code:CodeType,event_set:int)->NoneAnd these operations should be regarded as slow:
defuse_tool_id(id,name:str)->Nonedeffree_tool_id(id)->Nonedefset_events(tool_id:int,event_set:int)->Nonedefrestart_events()->NoneHow slow the slow operations are depends on when they happen.If done early in the program, before modules are loaded,they should be fairly inexpensive.
When not in use, this PEP will have a negligible change on memory consumption.
How memory is used is very much an implementation detail.However, we expect that for 3.12 the additional memory consumption percode object will beroughly as follows:
| Events | |||
|---|---|---|---|
| Tools | Others | LINE | INSTRUCTION |
| One | None | ≈40% | ≈80% |
| Two or more | ≈40% | ≈120% | ≈200% |
Allowing modification of running code has some security implications,but no more than the ability to generate and call new code.
All the new functions listed above will trigger audit hooks.
This outlines the proposed implementation for CPython 3.12. The actualimplementation for later versions of CPython and other Python implementationsmay differ considerably.
The proposed implementation of this PEP will be built on top of the quickeningstep of CPython 3.11, as described inPEP 659.Instrumentation works in much the same way as quickening, bytecodes arereplaced with instrumented ones as needed.
For example, if theCALL event is turned on,then all call instructions will bereplaced with aINSTRUMENTED_CALL instruction.
Note that this will interfere with specialization, which will result in someperformance degradation in addition to the overhead of calling theregistered callable.
When the set of active events changes, the VM will immediately updateall code objects present on the call stack of any thread. It will also set inplace traps to ensure that all code objects are correctly instrumented whencalled. Consequently changing the set of active events should be done asinfrequently as possible, as it could be quite an expensive operation.
Other events, such asRAISE can be turned on or off cheaply,as they do not rely on code instrumentation, but runtime checks when theunderlying event occurs.
The exact set of events that require instrumentation is an implementation detail,but for the current design, the following events will require instrumentation:
Each instrumented bytecode will require an additional 8 bits of information tonote which tool the instrumentation applies to.LINE andINSTRUCTION events require additional information, as theyneed to store the original instruction, or even the instrumented instructionif they overlap other instrumentation.
It is the philosophy of this PEP that it should be possible for third-party monitoringtools to achieve high-performance, not that it should be easy for them to do so.
Converting events into data that is meaningful to the users isthe responsibility of the tool.
All events have a cost, and tools should attempt to the use set of eventsthat trigger the least often and still provide the necessary information.
Breakpoints can be inserted setting per code object events, eitherLINE orINSTRUCTION,and returningDISABLE for any events not matching a breakpoint.
Debuggers usually offer the ability to step execution by asingle instruction or line.
Like breakpoints, stepping can be implemented by setting per code object events.As soon as normal execution is to be resumed, the local events can be unset.
Debuggers can use thePY_START andPY_RESUME events to be informedwhen a code object is first encountered, so that any necessary breakpointscan be inserted.
Coverage tools need to track which parts of the control graph have beenexecuted. To do this, they need to register for thePY_ events,plusJUMP andBRANCH.
This information can be then be converted back into a line based reportafter execution has completed.
Simple profilers need to gather information about calls.To do this profilers should register for the following events:
Line based profilers can use theLINE andJUMP events.Implementers of profilers should be aware that instrumentingLINEevents will have a large impact on performance.
Note
Instrumenting profilers have significant overhead and will distortthe results of profiling. Unless you need exact call counts,consider using a statistical profiler.
A draft version of this PEP proposed making the user responsiblefor inserting the monitoring instructions, rather than have VM do it.However, that puts too much of a burden on the tools, and would makeattaching a debugger nearly impossible.
An earlier version of this PEP, proposed storing events asenums:
classEvent(enum.IntFlag):PY_START=...
However, that would prevent monitoring of code before theenum module wasloaded and could cause unnecessary overhead.
This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.
Source:https://github.com/python/peps/blob/main/peps/pep-0669.rst
Last modified:2025-02-01 07:28:42 GMT