Important
This PEP is a historical document. The up-to-date, canonical documentation can now be found atAudit events table.
×
SeePEP 1 for how to propose changes.
This PEP describes additions to the Python API and specific behaviorsfor the CPython implementation that make actions taken by the Pythonruntime visible to auditing tools. Visibility into these actionsprovides opportunities for test frameworks, logging frameworks, andsecurity tools to monitor and optionally limit actions taken by theruntime.
This PEP proposes adding two APIs to provide insights into a runningPython application: one for arbitrary events, and another specific tothe module import system. The APIs are intended to be available in allPython implementations, though the specific messages and values usedare unspecified here to allow implementations the freedom to determinehow best to provide information to their users. Some examples likelyto be used in CPython are provided for explanatory purposes.
SeePEP 551 for discussion and recommendations on enhancing thesecurity of a Python runtime making use of these auditing APIs.
Python provides access to a wide range of low-level functionality onmany common operating systems. While this is incredibly useful for“write-once, run-anywhere” scripting, it also makes monitoring ofsoftware written in Python difficult. Because Python uses native systemAPIs directly, existing monitoring tools either suffer from limitedcontext or auditing bypass.
Limited context occurs when system monitoring can report that anaction occurred, but cannot explain the sequence of events leading toit. For example, network monitoring at the OS level may be able toreport “listening started on port 5678”, but may not be able toprovide the process ID, command line, parent process, or the localstate in the program at the point that triggered the action. Firewallcontrols to prevent such an action are similarly limited, typicallyto process names or some global state such as the current user, andin any case rarely provide a useful log file correlated with otherapplication messages.
Auditing bypass can occur when the typical system tool used for anaction would ordinarily report its use, but accessing the APIs viaPython do not trigger this. For example, invoking “curl” to make HTTPrequests may be specifically monitored in an audited system, butPython’s “urlretrieve” function is not.
Within a long-running Python application, particularly one thatprocesses user-provided information such as a web app, there is a riskof unexpected behavior. This may be due to bugs in the code, ordeliberately induced by a malicious user. In both cases, normalapplication logging may be bypassed resulting in no indication thatanything out of the ordinary has occurred.
Additionally, and somewhat unique to Python, it is very easy to affectthe code that is run in an application by manipulating either theimport system’s search path or placing files earlier on the path thanintended. This is often seen when developers create a script with thesame name as the module they intend to use - for example, arandom.py file that attempts to import the standard libraryrandom module.
This is not sandboxing, as this proposal does not attempt to preventmalicious behavior (though it enables some new options to do so).See theWhy Not A Sandbox section below for further discussion.
The aim of these changes is to enable both application developers andsystem administrators to integrate Python into their existingmonitoring systems without dictating how those systems look or behave.
We propose two API changes to enable this: an Audit Hook and VerifiedOpen Hook. Both are available from Python and native code, allowingapplications and frameworks written in pure Python code to takeadvantage of the extra messages, while also allowing embedders orsystem administrators to deploy builds of Python where auditing isalways enabled.
Only CPython is bound to provide the native APIs as described here.Other implementations should provide the pure Python APIs, andmay provide native versions as appropriate for their underlyingruntimes. Auditing events are likewise considered implementationspecific, but are bound by normal feature compatibility guarantees.
In order to observe actions taken by the runtime (on behalf of thecaller), an API is required to raise messages from within certainoperations. These operations are typically deep within the Pythonruntime or standard library, such as dynamic code compilation, moduleimports, DNS resolution, or use of certain modules such asctypes.
The following new C APIs allow embedders and CPython implementors tosend and receive audit hook messages:
# Add an auditing hooktypedefint(*hook_func)(constchar*event,PyObject*args,void*userData);intPySys_AddAuditHook(hook_funchook,void*userData);# Raise an event with all auditing hooksintPySys_Audit(constchar*event,PyObject*args);
The new Python APIs for receiving and raising audit hooks are:
# Add an auditing hooksys.addaudithook(hook:Callable[[str,tuple]])# Raise an event with all auditing hookssys.audit(str,*args)
Hooks are added by callingPySys_AddAuditHook() from C at any time,including beforePy_Initialize(), or by callingsys.addaudithook() from Python code. Hooks cannot be removed orreplaced. For CPython, hooks added from C are global, while hooks addedfrom Python are only for the current interpreter. Global hooks areexecuted before interpreter hooks.
When events of interest are occurring, code can either callPySys_Audit() from C (while the GIL is held) orsys.audit(). Thestring argument is the name of the event, and the tuple containsarguments. A given event name should have a fixed schema for arguments,which should be considered a public API (for each x.y version release),and thus should only change between feature releases with updateddocumentation. To minimize overhead and simplify handling in native codehook implementations, named arguments are not supported.
For maximum compatibility, events using the same name as an event inthe reference interpreter CPython should make every attempt to usecompatible arguments. Including the name or an abbreviation of theimplementation in implementation-specific event names will also helpprevent collisions. For example, apypy.jit_invoked event is clearlydistinguished from anipy.jit_invoked event. Events raised fromPython modules should include their module or package name in the eventname.
While event names may be arbitrary UTF-8 strings, for consistency acrossimplementations it is recommended to use valid Python dotted names andavoid encoding specific details in the name. For example, animportevent with the module namespam as an argument is preferable to aspammoduleimported event with no arguments. Avoid using embeddednull characters or you may upset those who implement hooks using C.
When an event is audited, each hook is called in the order it was added(as much as is possible), passing the event name and arguments. If anyhook returns with an exception set, later hooks are ignored andingeneral the Python runtime should terminate - exceptions from hooks arenot intended to be handled or treated as expected occurrences. Thisallows hook implementations to decide how to respond to any particularevent. The typical responses will be to log the event, abort theoperation with an exception, or to immediately terminate the process withan operating system exit call.
When an event is audited but no hooks have been set, theaudit()function should impose minimal overhead. Ideally, each argument is areference to existing data rather than a value calculated just for theauditing call.
As hooks may be Python objects, they need to be freed duringinterpreter or runtime finalization. These should not be triggered atany other time, and should raise an event hook to ensure that anyunexpected calls are observed.
Below inSuggested Audit Hook Locations, we recommend some importantoperations that should raise audit events. In general, events should beraised at the lowest possible level. Given the choice between raising anevent from Python code or native code, raising from native code should bepreferred.
Python implementations should document which operations will raiseaudit events, along with the event schema. It is intentional thatsys.addaudithook(print) is a trivial way to display all messages.
Most operating systems have a mechanism to distinguish between filesthat can be executed and those that can not. For example, this may be anexecute bit in the permissions field, a verified hash of the filecontents to detect potential code tampering, or file system pathrestrictions. These are an important security mechanism for ensuringthat only code that has been approved for a given environment isexecuted.
Most kernels offer ways to restrict or audit binaries loaded and executedby the kernel. File types owned by Python appear as regular data andthese features do not apply. This open hook allows Python embedders tointegrate with operating system support when launching scripts orimporting Python code.
The new public C API for the verified open hook is:
# Set the handlertypedefPyObject*(*hook_func)(PyObject*path,void*userData)intPyFile_SetOpenCodeHook(hook_funchandler,void*userData)# Open a file using the handlerPyObject*PyFile_OpenCode(constchar*path)
The new public Python API for the verified open hook is:
# Open a file using the handlerio.open_code(path:str)->io.IOBase
Theio.open_code() function is a drop-in replacement foropen(abspath(str(pathlike)),'rb'). Its default behaviour is toopen a file for raw, binary access. To change the behaviour a newhandler should be set. Handler functions only acceptstr arguments.The C APIPyFile_OpenCode function assumes UTF-8 encoding. Pathsmust be absolute, and it is the responsibility of the caller to ensurethe full path is correctly resolved.
A custom handler may be set by callingPyFile_SetOpenCodeHook() fromC at any time, including beforePy_Initialize(). However, if a hookhas already been set then the call will fail. Whenopen_code() iscalled with a hook set, the hook will be passed the path and its returnvalue will be returned directly. The returned object should be an openfile-like object that supports reading raw bytes. This is explicitlyintended to allow aBytesIO instance if the open handler has alreadyread the entire file into memory.
Note that these hooks can import and call the_io.open() function onCPython without triggering themselves. They can also use_io.BytesIOto return a compatible result using an in-memory buffer.
If the hook determines that the file should not be loaded, it shouldraise an exception of its choice, as well as performing any otherlogging.
All import and execution functionality involving code from a file willbe changed to useopen_code() unconditionally. It is important tonote that calls tocompile(),exec() andeval() do not gothrough this function - an audit hook that includes the code from thesecalls is the best opportunity to validate code that is read from thefile. Given the current decoupling between import and execution inPython, most imported code will go through bothopen_code() and thelog hook forcompile, and so care should be taken to avoidrepeating verification steps.
File accesses that are not intentionally planning to execute code arenot expected to use this function. This includes loading pickles, XMLor YAML files, where code execution is generally considered maliciousrather than intentional. These operations should provide their ownauditing events, preferably distinguishing between normal functionality(for example,Unpickler.load) and code execution(Unpickler.find_class).
A few examples: if the file type normally requires an execute bit (onPOSIX) or would warn when marked as having been downloaded from theinternet (on Windows), it should probably useopen_code() ratherthan plainopen(). Opening ZIP files using theZipFile classshould useopen(), while opening them viazipimport should useopen_code() to signal the correct intent. Code that uses the wrongfunction for a particular context may bypass the hook, which in CPythonand the standard library should be considered a bug. Using a combinationofopen_code hooks and auditing hooks is necessary to trace allexecuted sources in the presence of arbitrary code.
There is no Python API provided for changing the open hook. To modifyimport behavior from Python code, use the existing functionalityprovided byimportlib.
While all the functions added here are considered public and stable API,the behavior of the functions is implementation specific. Mostdescriptions here refer to the CPython implementation, and while otherimplementations should provide the functions, there is no requirementthat they behave the same.
For example,sys.addaudithook() andsys.audit() should exist butmay do nothing. This allows code to make calls tosys.audit()without having to test for existence, but it should not assume that itscall will have any effect. (Including existence tests insecurity-critical code allows another vector to bypass auditing, so itis preferable that the function always exist.)
io.open_code(path) should at a minimum always return_io.open(path,'rb'). Code using the function should make no furtherassumptions about what may occur, and implementations other than CPythonare not required to let developers override the behavior of thisfunction with a hook.
The locations and parameters in calls tosys.audit() orPySys_Audit() are to be determined by individual Pythonimplementations. This is to allow maximum freedom for implementationsto expose the operations that are most relevant to their platform,and to avoid or ignore potentially expensive or noisy events.
Table 1 acts as both suggestions of operations that should triggeraudit events on all implementations, and examples of event schemas.
Table 2 provides further examples that are not required, but arelikely to be available in CPython.
Refer to the documentation associated with your version of Python tosee which operations provide audit events.
| API Function | Event Name | Arguments | Rationale |
|---|---|---|---|
PySys_AddAuditHook | sys.addaudithook | Detect when newaudit hooks are being added. | |
PyFile_SetOpenCodeHook | cpython.PyFile_SetOpenCodeHook | Detects any attempt to set theopen_code hook. | |
compile,exec,eval,PyAst_CompileString,PyAST_obj2mod | compile | (code,filename_or_none) | Detect dynamic code compilation, wherecode could be a string orAST. Note that this will be called for regular imports of sourcecode, including those that were opened withopen_code. |
exec,eval,run_mod | exec | (code_object,) | Detect dynamic execution of code objects. This only occurs forexplicit calls, and is not raised for normal function invocation. |
import | import | (module,filename,sys.path,sys.meta_path,sys.path_hooks) | Detect when modules areimported. This is raised before the module name is resolved to afile. All arguments other than the module name may beNone ifthey are not used or available. |
open | io.open | (path,mode,flags) | Detect when afile is about to be opened.path andmode are the usual parameterstoopen if available, whileflags is provided instead ofmodein some cases. |
PyEval_SetProfile | sys.setprofile | Detect when code isinjecting trace functions. Because of the implementation, exceptionsraised from the hook will abort the operation, but will not beraised in Python code. Note thatthreading.setprofile eventuallycalls this function, so the event will be audited for each thread. | |
PyEval_SetTrace | sys.settrace | Detect when code isinjecting trace functions. Because of the implementation, exceptionsraised from the hook will abort the operation, but will not beraised in Python code. Note thatthreading.settrace eventuallycalls this function, so the event will be audited for each thread. | |
_PyObject_GenericSetAttr,check_set_special_type_attr,object_set_class,func_set_code,func_set_[kw]defaults | object.__setattr__ | (object,attr,value) | Detect monkeypatching of types and objects. This eventis raised for the__class__ attribute and any attribute ontype objects. |
_PyObject_GenericSetAttr | object.__delattr__ | (object,attr) | Detect deletion of object attributes. This event is raisedfor any attribute ontype objects. |
Unpickler.find_class | pickle.find_class | (module_name,global_name) | Detect imports and global name lookup whenunpickling. |
| API Function | Event Name | Arguments | Rationale |
|---|---|---|---|
_PySys_ClearAuditHooks | sys._clearaudithooks | Notifieshooks they are being cleaned up, mainly in case the event istriggered unexpectedly. This event cannot be aborted. | |
code_new | code.__new__ | (bytecode,filename,name) | Detect dynamic creation of code objects. This only occurs fordirect instantiation, and is not raised for normal compilation. |
func_new_impl | function.__new__ | (code,) | Detectdynamic creation of function objects. This only occurs for directinstantiation, and is not raised for normal compilation. |
_ctypes.dlopen,_ctypes.LoadLibrary | ctypes.dlopen | (module_or_path,) | Detect when native modules are used. |
_ctypes._FuncPtr | ctypes.dlsym | (lib_object,name) | Collect information about specific symbols retrieved from nativemodules. |
_ctypes._CData | ctypes.cdata | (ptr_as_int,) | Detectwhen code is accessing arbitrary memory usingctypes. |
new_mmap_object | mmap.__new__ | (fileno,map_size,access,offset) | Detects creation of mmap objects. On POSIX, access mayhave been calculated from theprot andflags arguments. |
sys._getframe | sys._getframe | (frame_object,) | Detectwhen code is accessing frames directly. |
sys._current_frames | sys._current_frames | Detect whencode is accessing frames directly. | |
socket.bind,socket.connect,socket.connect_ex,socket.getaddrinfo,socket.getnameinfo,socket.sendmsg,socket.sendto | socket.address | (socket,address,) | Detect access to network resources. The address is unmodified fromthe original call. |
member_get,func_get_code,func_get_[kw]defaults | object.__getattr__ | (object,attr) | Detect access torestricted attributes. This event is raised for any built-inmembers that are marked as restricted, and members that may allowbypassing imports. |
urllib.urlopen | urllib.Request | (url,data,headers,method) | Detects URL requests. |
The important performance impact is the case where events are beingraised but there are no hooks attached. This is the unavoidable case -once a developer has added audit hooks they have explicitly chosen totrade performance for functionality. Performance impact with hooks addedare not of interest here, since this is opt-in functionality.
Analysis using the Python Performance Benchmark Suite[1] shows nosignificant impact, with the vast majority of benchmarks showingbetween 1.05x faster to 1.05x slower.
In our opinion, the performance impact of the set of auditing pointsdescribed in this PEP is negligible.
The proposal is to add a new module for audit hooks, hypotheticallyaudit. This would separate the API and implementation from thesys module, and allow naming the C functionsPyAudit_AddHook andPyAudit_Audit rather than the current variations.
Any such module would need to be a built-in module that is guaranteed toalways be present. The nature of these hooks is that they must becallable without condition, as any conditional imports or calls provideopportunities to intercept and suppress or modify events.
Given it is one of the most core modules, thesys module is somewhatprotected against module shadowing attacks. Replacingsys with asufficiently functional module that the application can still run is amuch more complicated task than replacing a module with only onefunction of interest. An attacker that has the ability to shadow thesys module is already capable of running arbitrary code from files,whereas anaudit module could be replaced with a single line in a.pth file anywhere on the search path:
importsys;sys.modules['audit']=type('audit',(object,),{'audit':lambda*a:None,'addhook':lambda*a:None})
Multiple layers of protection already exist for monkey patching attacksagainst eithersys oraudit, but assignments or insertions tosys.modules are not audited.
This idea is rejected because it makes it trivial to suppress all callstoaudit.
The proposal is to add a value insys.flags to indicate when Pythonis running in a “secure” or “audited” mode. This would allowapplications to detect when some features are enabled or when hookshave been added and modify their behaviour appropriately.
Currently, we are not aware of any legitimate reasons for a program tobehave differently in the presence of audit hooks.
Both application-level APIssys.audit andio.open_code arealways present and functional, regardless of whether the regularpython entry point or some alternative entry point is used. Callerscannot determine whether any hooks have been added (except by performingside-channel analysis), nor do they need to. The calls should be fastenough that callers do not need to avoid them, and the program isresponsible for ensuring that any added hooks are fast enough to notaffect application performance.
The argument that this is “security by obscurity” is valid, butirrelevant. Security by obscurity is only an issue when there are noother protective mechanisms; obscurity as the first step in avoidingattack is strongly recommended (seethis article fordiscussion).
This idea is rejected because there are no appropriate reasons for anapplication to change its behaviour based on whether these APIs are inuse.
Sandboxing CPython has been attempted many times in the past, and eachpast attempt has failed. Fundamentally, the problem is that certainfunctionality has to be restricted when executing the sandboxed code,but otherwise needs to be available for normal operation of Python. Forexample, completely removing the ability to compile strings intobytecode also breaks the ability to import modules from source code, andif it is not completely removed then there are too many ways to getaccess to that functionality indirectly. There is not yet any feasibleway to generically determine whether a given operation is “safe” or not.Further information and references available at[2].
This proposal does not attempt to restrict functionality, but simplyexposes the fact that the functionality is being used. Particularly forintrusion scenarios, detection is significantly more important thanearly prevention (as early prevention will generally drive attackers touse an alternate, less-detectable, approach). The availability of audithooks alone does not change the attack surface of Python in any way, butthey enable defenders to integrate Python into their environment in waysthat are currently not possible.
Since audit hooks have the ability to safely prevent an operationoccurring, this feature does enable the ability to provide some level ofsandboxing. In most cases, however, the intention is to enable loggingrather than creating a sandbox.
This API was originally presented as part ofPEP 551 SecurityTransparency in the Python Runtime.
For simpler review purposes, and due to the broader applicability ofthese APIs beyond security, the API design is now presented separately.
PEP 551 is an informational PEP discussing how to integrate Python intoa secure or audited environment.
Copyright (c) 2019 by Microsoft Corporation. This material may bedistributed only subject to the terms and conditions set forth in theOpen Publication License, v1.0 or later (the latest version is presentlyavailable athttps://spdx.org/licenses/OPUBL-1.0.html).
Source:https://github.com/python/peps/blob/main/peps/pep-0578.rst
Last modified:2024-06-03 14:51:21 GMT