If Python Virtual Machine (PVM) bytecode is not “well-formed” itis possible to crash or exploit the PVM by causing various errorssuch as under/overflowing the value stack or reading/writing intoarbitrary areas of the PVM program space. Most of these kinds oferrors can be eliminated by verifying that PVM bytecode does notviolate a set of simple constraints before execution.
This PEP proposes a set of constraints on the format and structureof Python Virtual Machine (PVM) bytecode and provides animplementation in Python of this verification process.
Guido believes that a verification tool has some value. Ifsomeone wants to add it toTools/scripts, no PEP is required.
Such a tool may have value for validating the output from“bytecodehacks” or from direct edits of PYC files. As securitymeasure, its value is somewhat limited because perfectly validbytecode can still do horrible things. That situation couldchange if the concept of restricted execution were to besuccessfully resurrected.
The Python Virtual Machine executes Python programs that have beencompiled from the Python language into a bytecode representation.The PVM assumes that any bytecode being executed is “well-formed”with regard to a number implicit constraints. Some of theseconstraints are checked at run-time, but most of them are not dueto the overhead they would create.
When running in debug mode the PVM does do several run-time checksto ensure that any particular bytecode cannot violate theseconstraints that, to a degree, prevent bytecode from crashing orexploiting the interpreter. These checks add a measurableoverhead to the interpreter, and are typically turned off incommon use.
Bytecode that is not well-formed and executed by a PVM not runningin debug mode may create a variety of fatal and non-fatal errors.Typically, ill-formed code will cause the PVM to seg-fault andcause the OS to immediately and abruptly terminate theinterpreter.
Conceivably, ill-formed bytecode could exploit the interpreter andallow Python bytecode to execute arbitrary C-level machineinstructions or to modify private, internal data structures in theinterpreter. If used cleverly this could subvert any form ofsecurity policy an application may want to apply to its objects.
Practically, it would be difficult for a malicious user to“inject” invalid bytecode into a PVM for the purposes ofexploitation, but not impossible. Buffer overflow and memoryoverwrite attacks are commonly understood, particularly when theexploit payload is transmitted unencrypted over a network or whena file or network security permission weakness is used as afoothold for further attacks.
Ideally, no bytecode should ever be allowed to read or writeunderlying C-level data structures to subvert the operation of thePVM, whether the bytecode was maliciously crafted or not. Asimple pre-execution verification step could ensure that bytecodecannot over/underflow the value stack or access other sensitiveareas of PVM program space at run-time.
This PEP proposes several validation steps that should be taken onPython bytecode before it is executed by the PVM so that itcompiles with static and structure constraints on its instructionsand their operands. These steps are simple and catch a largeclass of invalid bytecode that can cause crashes. There is alsosome possibility that some run-time checks can be eliminated upfront by a verification pass.
There is, of course, no way to verify that bytecode is “completelysafe”, for every definition of complete and safe. Even withbytecode verification, Python programs can and most likely in thefuture will seg-fault for a variety of reasons and continue tocause many different classes of run-time errors, fatal or not.The verification step proposed here simply plugs an easy hole thatcan cause a large class of fatal and subtle errors at the bytecodelevel.
Currently, the Java Virtual Machine (JVM) verifies Java bytecodein a way very similar to what is proposed here. The JVMSpecification version 2[1], Sections 4.8 and 4.9 were thereforeused as a basis for some of the constraints explained below. AnyPython bytecode verification implementation at a minimum mustenforce these constraints, but may not be limited to them.
len(co_code)>0).len(co_code)<sizeof(unsignedchar)-1).LOAD_* instruction must be a valid index intoits corresponding data structure.STORE_* instruction must be a valid indexinto its corresponding data structure.co_stacksize.co_code.This PEP is the working document for a Python bytecodeverification implementation written in Python. Thisimplementation is not used implicitly by the PVM before executingany bytecode, but is to be used explicitly by users concernedabout possibly invalid bytecode with the following snippet:
importverifyverify.verify(object)
Theverify module provides averify function which accepts thesame kind of arguments asdis.dis: classes, methods, functions,or code objects. It verifies that the object’s bytecode iswell-formed according to the specifications of this PEP.
If the code is well-formed the call toverify returns silentlywithout error. If an error is encountered, it throws aVerificationError whose argument indicates the cause of thefailure. It is up to the programmer whether or not to handle theerror in some way or execute the invalid code regardless.
Phillip Eby has proposed a pseudo-code algorithm for bytecodestack depth verification used by the reference implementation.
This PEP describes only a small number of verifications. Whilediscussion and analysis will lead to many more, it is highlypossible that future verification may need to be done or custom,project-specific verifications. For this reason, it might bedesirable to add a verification registration interface to the testimplementation to register future verifiers. The need for this isminimal since custom verifiers can subclass and extend the currentimplementation for added behavior.
Armin Rigo noted that several byte-codes will need modification inorder for their stack effect to be statically analyzed. These areEND_FINALLY,POP_BLOCK, andMAKE_CLOSURE. Armin and Guido havealready agreed on how to correct the instructions. Currently thePython implementation punts on these instructions.
This PEP does not propose to add the verification step to theinterpreter, but only to provide the Python implementation in thestandard library for optional use. Whether or not thisverification procedure is translated into C, included with the PVMor enforced in any way is left for future discussion.
This document has been placed in the public domain.
Source:https://github.com/python/peps/blob/main/peps/pep-0330.rst
Last modified:2025-02-01 08:59:27 GMT