This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed. Find sources: "Bytecode" – news ·newspapers ·books ·scholar ·JSTOR(January 2009) (Learn how and when to remove this message) |
Program execution |
---|
General concepts |
Types of code |
Compilation strategies |
Notable runtimes |
|
Notable compilers & toolchains |
|
Bytecode (also calledportable code orp-code) is a form ofinstruction set designed for efficient execution by a softwareinterpreter. Unlikehuman-readable[1]source code, bytecodes are compact numeric codes, constants, and references (normally numeric addresses) that encode the result ofcompiler parsing and performingsemantic analysis of things like type, scope, and nesting depths of program objects.
The name bytecode stems from instruction sets that have one-byteopcodes followed by optional parameters.Intermediate representations such as bytecode may be output byprogramming language implementations to easeinterpretation, or it may be used to reduce hardware andoperating system dependence by allowing the same code to runcross-platform, on different devices. Bytecode may often be either directly executed on avirtual machine (ap-code machine, i.e., interpreter), or it may be further compiled intomachine code for better performance.
Since bytecode instructions are processed by software, they may be arbitrarily complex, but are nonetheless often akin to traditional hardware instructions: virtualstack machines are the most common, but virtualregister machines have been built also.[2][3] Different parts may often be stored in separate files, similar toobject modules, but dynamically loaded during execution.
A bytecode program may be executed by parsing anddirectly executing the instructions, one at a time. This kind ofbytecode interpreter is very portable. Some systems, called dynamic translators, orjust-in-time (JIT) compilers, translate bytecode intomachine code as necessary atruntime. This makes the virtual machine hardware-specific but does not lose the portability of the bytecode. For example,Java andSmalltalk code is typically stored in bytecode format, which is typically then JIT compiled to translate the bytecode to machine code before execution. This introduces a delay before a program is run, when the bytecode is compiled to native machine code, but improves execution speed considerably compared to interpreting source code directly, normally by around an order of magnitude (10x).[4]
Because of its performance advantage, today many language implementations execute a program in two phases, first compiling the source code into bytecode, and then passing the bytecode to the virtual machine. There are bytecode based virtual machines of this sort forJava,Raku,Python,PHP,[a]Tcl,mawk andForth (however, Forth is seldom compiled via bytecodes in this way, and its virtual machine is more generic instead). The implementation ofPerl andRuby 1.8 instead work by walking anabstract syntax tree representation derived from the source code.
More recently, the authors ofV8[1] andDart[7] have challenged the notion that intermediate bytecode is needed for fast and efficient VM implementation. Both of these language implementations currently do direct JIT compiling from source code to machine code with no bytecode intermediary.[8]
disassemble
function[10] which prints to the standard output the underlying code of a specified function. The result is implementation-dependent and may or may not resolve to bytecode. Its inspection can be utilized for debugging and optimization purposes.[11]Steel Bank Common Lisp, for instance, produces:(disassemble'(lambda(x)(printx))); disassembly for (LAMBDA (X)); 2436F6DF: 850500000F22 TEST EAX, [#x220F0000] ; no-arg-parsing entry point; E5: 8BD6 MOV EDX, ESI; E7: 8B05A8F63624 MOV EAX, [#x2436F6A8] ; #<FDEFINITION object for PRINT>; ED: B904000000 MOV ECX, 4; F2: FF7504 PUSH DWORD PTR [EBP+4]; F5: FF6005 JMP DWORD PTR [EAX+5]; F8: CC0A BREAK 10 ; error trap; FA: 02 BYTE #X02; FB: 18 BYTE #X18 ; INVALID-ARG-COUNT-ERROR; FC: 4F BYTE #X4F ; ECX
>>>importdis# "dis" - Disassembler of Python byte code into mnemonics.>>>dis.dis('print("Hello, World!")') 1 0 LOAD_NAME 0 (print) 2 LOAD_CONST 0 ('Hello, World!') 4 CALL_FUNCTION 1 6 RETURN_VALUE
[…] In fact, the format is basically the same inMS-DOS 3.3 - 8.0,PC DOS 3.3 - 2000, including Russian, Lithuanian, Chinese and Japanese issues, as well as in Windows NT, 2000, and XP […]. There are minor differences and incompatibilities, but the general format has not changed over the years. […] Some of the data entries contain normal tables […] However, most entries containexecutable code interpreted by some kind ofp-code interpreter at *runtime*, including conditional branches and the like. This is why theKEYB driver has such a huge memory footprint compared to table-driven keyboard drivers which can be done in 3 - 4 Kb getting the same level of function except for the interpreter. […]
[…] Matthias [R.] Paul […] warns that theIBM PC DOS version of the keyboard driver uses some internal procedures that are not recognized by theMicrosoft driver, so, if possible, you should use theIBM versions of bothKEYB.COM andKEYBOARD.SYS instead of mixing Microsoft and IBM versions […](NB. What is meant by "procedures" here are some additional bytecodes in the IBM KEYBOARD.SYS file not supported by the Microsoft version of the KEYB driver.)
Multiplan wasn't compiled tomachine code, but to a kind of byte-code which was run by aninterpreter, in order to make Multiplan portable across the widely varying hardware of the time. This byte-code distinguished between the machine-specificfloating point format to calculate on, and an external (standard) format, which wasbinary coded decimal (BCD). The PACK and UNPACK instructions converted between the two.