Python Enhancement Proposals

Python »
PEP Index »
PEP 330

PEP 330 – Python Bytecode Verification

Author:: Michel Pelletier <michel at users.sourceforge.net>
Status:

Abstract

If Python Virtual Machine (PVM) bytecode is not “well-formed” itis possible to crash or exploit the PVM by causing various errorssuch as under/overflowing the value stack or reading/writing intoarbitrary areas of the PVM program space. Most of these kinds oferrors can be eliminated by verifying that PVM bytecode does notviolate a set of simple constraints before execution.

This PEP proposes a set of constraints on the format and structureof Python Virtual Machine (PVM) bytecode and provides animplementation in Python of this verification process.

Pronouncement

Guido believes that a verification tool has some value. Ifsomeone wants to add it toTools/scripts, no PEP is required.

Such a tool may have value for validating the output from“bytecodehacks” or from direct edits of PYC files. As securitymeasure, its value is somewhat limited because perfectly validbytecode can still do horrible things. That situation couldchange if the concept of restricted execution were to besuccessfully resurrected.

Motivation

The Python Virtual Machine executes Python programs that have beencompiled from the Python language into a bytecode representation.The PVM assumes that any bytecode being executed is “well-formed”with regard to a number implicit constraints. Some of theseconstraints are checked at run-time, but most of them are not dueto the overhead they would create.

When running in debug mode the PVM does do several run-time checksto ensure that any particular bytecode cannot violate theseconstraints that, to a degree, prevent bytecode from crashing orexploiting the interpreter. These checks add a measurableoverhead to the interpreter, and are typically turned off incommon use.

Bytecode that is not well-formed and executed by a PVM not runningin debug mode may create a variety of fatal and non-fatal errors.Typically, ill-formed code will cause the PVM to seg-fault andcause the OS to immediately and abruptly terminate theinterpreter.

Conceivably, ill-formed bytecode could exploit the interpreter andallow Python bytecode to execute arbitrary C-level machineinstructions or to modify private, internal data structures in theinterpreter. If used cleverly this could subvert any form ofsecurity policy an application may want to apply to its objects.

Practically, it would be difficult for a malicious user to“inject” invalid bytecode into a PVM for the purposes ofexploitation, but not impossible. Buffer overflow and memoryoverwrite attacks are commonly understood, particularly when theexploit payload is transmitted unencrypted over a network or whena file or network security permission weakness is used as afoothold for further attacks.

Ideally, no bytecode should ever be allowed to read or writeunderlying C-level data structures to subvert the operation of thePVM, whether the bytecode was maliciously crafted or not. Asimple pre-execution verification step could ensure that bytecodecannot over/underflow the value stack or access other sensitiveareas of PVM program space at run-time.

This PEP proposes several validation steps that should be taken onPython bytecode before it is executed by the PVM so that itcompiles with static and structure constraints on its instructionsand their operands. These steps are simple and catch a largeclass of invalid bytecode that can cause crashes. There is alsosome possibility that some run-time checks can be eliminated upfront by a verification pass.

There is, of course, no way to verify that bytecode is “completelysafe”, for every definition of complete and safe. Even withbytecode verification, Python programs can and most likely in thefuture will seg-fault for a variety of reasons and continue tocause many different classes of run-time errors, fatal or not.The verification step proposed here simply plugs an easy hole thatcan cause a large class of fatal and subtle errors at the bytecodelevel.

Currently, the Java Virtual Machine (JVM) verifies Java bytecodein a way very similar to what is proposed here. The JVMSpecification version 2[1], Sections 4.8 and 4.9 were thereforeused as a basis for some of the constraints explained below. AnyPython bytecode verification implementation at a minimum mustenforce these constraints, but may not be limited to them.

Static Constraints on Bytecode Instructions

The bytecode string must not be empty. (len(co_code)>0).
The bytecode string cannot exceed a maximum size(len(co_code)<sizeof(unsignedchar)-1).
The first instruction in the bytecode string begins at index 0.
Only valid byte-codes with the correct number of operands canbe in the bytecode string.

Static Constraints on Bytecode Instruction Operands

The target of a jump instruction must be within the codeboundaries and must fall on an instruction, never between aninstruction and its operands.
The operand of aLOAD_* instruction must be a valid index intoits corresponding data structure.
The operand of aSTORE_* instruction must be a valid indexinto its corresponding data structure.

Structural Constraints between Bytecode Instructions

Each instruction must only be executed with the appropriatenumber of arguments in the value stack, regardless of theexecution path that leads to its invocation.
If an instruction can be executed along several differentexecution paths, the value stack must have the same depth priorto the execution of the instruction, regardless of the pathtaken.
At no point during execution can the value stack grow to adepth greater than that implied byco_stacksize.
Execution never falls off the bottom ofco_code.

Implementation

This PEP is the working document for a Python bytecodeverification implementation written in Python. Thisimplementation is not used implicitly by the PVM before executingany bytecode, but is to be used explicitly by users concernedabout possibly invalid bytecode with the following snippet:

importverifyverify.verify(object)

Theverify module provides averify function which accepts thesame kind of arguments asdis.dis: classes, methods, functions,or code objects. It verifies that the object’s bytecode iswell-formed according to the specifications of this PEP.

If the code is well-formed the call toverify returns silentlywithout error. If an error is encountered, it throws aVerificationError whose argument indicates the cause of thefailure. It is up to the programmer whether or not to handle theerror in some way or execute the invalid code regardless.

Phillip Eby has proposed a pseudo-code algorithm for bytecodestack depth verification used by the reference implementation.

Verification Issues

This PEP describes only a small number of verifications. Whilediscussion and analysis will lead to many more, it is highlypossible that future verification may need to be done or custom,project-specific verifications. For this reason, it might bedesirable to add a verification registration interface to the testimplementation to register future verifiers. The need for this isminimal since custom verifiers can subclass and extend the currentimplementation for added behavior.

Required Changes

Armin Rigo noted that several byte-codes will need modification inorder for their stack effect to be statically analyzed. These areEND_FINALLY,POP_BLOCK, andMAKE_CLOSURE. Armin and Guido havealready agreed on how to correct the instructions. Currently thePython implementation punts on these instructions.

This PEP does not propose to add the verification step to theinterpreter, but only to provide the Python implementation in thestandard library for optional use. Whether or not thisverification procedure is translated into C, included with the PVMor enforced in any way is left for future discussion.

References

[1]

The Java Virtual Machine Specification 2nd Editionhttp://java.sun.com/docs/books/vmspec/2nd-edition/html/ClassFile.doc.html

Copyright

This document has been placed in the public domain.

Source:https://github.com/python/peps/blob/main/peps/pep-0330.rst

Last modified:2025-02-01 08:59:27 GMT

Movatterモバイル変換

PEP 330 – Python Bytecode Verification