Movatterモバイル変換

[0]ホーム

Jump to content

Machine code

Edit links

From Wikipedia, the free encyclopedia

Instructions directly executable by a computer

For code that is completely internal to some CPUs and normally inaccessible to programmers, seeMicrocode.

"Native code" redirects here. For the French colonial legal system, seeNative code (France).

Machine language monitor running on aW65C816S microprocessor, displayingcode disassembly anddumps of processor register and memory

Program execution
General concepts
Code Translation Compiler Compile time Optimizing compiler Linking Execution Runtime system Executable Interpreter Virtual machine Intermediate representation (IR)
Types of code
Source code Object code Bytecode Machine code Microcode
Compilation strategies
Ahead-of-time (AOT) Just-in-time (JIT) Tracing just-in-time Compile and go system Precompilation Transcompilation Recompilation Meta-tracing
Notable runtimes
Android Runtime (ART) BEAM (Erlang) Common Language Runtime (CLR) and Mono CPython and PyPy crt0 (C target-specific initializer) Java virtual machine (JVM) LuaJIT Objective-C and Swift's V8 and Node.js Zend Engine (PHP)
Notable compilers & toolchains
GNU Compiler Collection (GCC) LLVM and Clang MSVC Glasgow Haskell Compiler (GHC)
v t e

Incomputing,machine code isdata encoded and structured to control acomputer'scentral processing unit (CPU) via its programmableinterface. Acomputer program consists primarily of sequences of machine-code instructions.^[1] Machine code is classified asnative with respect to its host CPU since it is the language that the CPU interprets directly.^[2] Somesoftware interpreters translate theprogramming language that they interpret into a virtual machine code (bytecode) and process it with aP-code machine.

A machine-code instruction causes the CPU to perform a specific task such as:

Load aword frommemory to aCPU register
Execute anarithmetic logic unit (ALU) operation on one or more registers or memory locations
Jump orskip to an instruction that is not the next one

Aninstruction set architecture (ISA) defines the interface to a CPU and varies by groupings or families of CPU design such asx86 andARM. Generally, machine code compatible with one family is not with others, but there are exceptions. TheVAX architecture includes optional support of thePDP-11 instruction set. TheIA-64 architecture includes optional support of theIA-32 instruction set. And, thePowerPC 615 can natively process bothPowerPC and x86 instructions.

Assembly language

[edit]

Translation of assembly into machine code

Assembly language provides a relatively direct mapping from ahuman-readable source code to machine code. The assembly language source code represents numerical codes in machine code, as mnemonics and labels.^[3] For example,NOP in assembly for anx86 processor represents the x86 architectureopcode 0x90 in machine code. While it is possible to write a program in machine code, doing so is tedious and error-prone. Therefore, programs are usually written in assembly or, more commonly, in ahigh-level programming language.

Instruction set

[edit]

A machine instruction encodes an operation as a pattern ofbits based on the specified format for the machine's instruction set.^{[nb 1]}^[4]

Instruction sets differ in various ways. Instructions of a set might all be the same length or different instructions might have different lengths; they might be smaller than, the same size as, or larger than theword size of the architecture. The number of instructions may be relatively small or large. Instructions may or may not have to be aligned on particular memory boundaries, such as the architecture's word boundary.^[4]

An instruction set needs to execute the circuits of a computer'sdigital logic level. At the digital level, the program needs to control the computer's registers, bus, memory, ALU, and other hardware components.^[5] To control a computer'sarchitectural features, machine instructions are created. Examples of features that are controlled using machine instructions:

segment registers^[6]
protected address mode^[7]
binary-coded decimal (BCD) arithmetic^[8]

The criteria for instruction formats include:

Instructions most commonly used should be shorter than instructions rarely used.^[4]
Thememory transfer rate of the underlying hardware determines the flexibility of the memory fetch instructions.
The number of bits in theaddress field requires special consideration.^[9]

Determining the size of the address field is a choice between space and speed.^[9] On some computers, the number of bits in the address field may be too small to access all of the physical memory. Also,virtual address space needs to be considered. Another constraint may be a limitation on the size of registers used to construct the address. Whereas a shorter address field allows the instructions to execute more quickly, other physical properties need to be considered when designing the instruction format.

Instructions can be separated into two types: general-purpose and special-purpose. Special-purpose instructions exploit architectural features that are unique to a computer. General-purpose instructions control architectural features common to all computers.^[10]

General-purpose instructions control:

Data movement from one place to another
Monadic operations that have oneoperand to produce a result
Dyadic operations that have two operands to produce a result
Comparisons and conditional jumps
Procedure calls
Loop control
Input/output

Overlapping instruction

[edit]

On processor architectures withvariable-length instruction sets^[11] (such asIntel'sx86 processor family) it is, within the limits of the control-flowresynchronizing phenomenon known as theKruskal count,^[12]^[11]^[13]^[14]^[15] sometimes possible through opcode-level programming to deliberately arrange the resulting code so that two code paths share a common fragment of opcode sequences.^{[nb 2]} These are calledoverlapping instructions,overlapping opcodes,overlapping code,overlapped code,instruction scission, orjump into the middle of an instruction.^[16]^[17]^[18]

In the 1970s and 1980s, overlapping instructions were sometimes used to preserve memory space. One example was in the implementation of error tables inMicrosoft'sAltair BASIC, whereinterleaved instructions mutually shared their instruction bytes.^[19]^[11]^[16] The technique is rarely used today, but might still be necessary to resort to in areas where extreme optimization for size is necessary on the byte level such as in the implementation ofboot loaders which have to fit intoboot sectors.^{[nb 3]}

It is also sometimes used as acode obfuscation technique as a measure againstdisassembly and tampering.^[11]^[14]

The principle is also used in shared code sequences offat binaries which must run on multiple instruction-set-incompatible processor platforms.^{[nb 2]}

This property is also used to findunintended instructions calledgadgets in existing code repositories and is used inreturn-oriented programming as alternative tocode injection for exploits such asreturn-to-libc attacks.^[20]^[11]

Microcode

[edit]

In some computers, the machine code of thearchitecture is implemented by an even more fundamental underlying layer calledmicrocode, providing a common machine language interface across a line or family of different models of computer with widely different underlyingdataflows. This is done to facilitateporting of machine language programs between different models.^[21] An example of this use is the IBMSystem/360 family of computers and their successors.^[22]

Examples

[edit]

IBM 709x

[edit]

TheIBM 704, 709, 704x and 709x store one instruction in each instruction word; IBM numbers the bit from the left as S, 1, ..., 35. Most instructions have one of two formats:

Generic: S,1-11; 12-13 Flag, ignored in some instructions; 14-17 unused; 18-20 Tag; 21-35 Y

Index register control, other than TSX: S,1-2 Opcode; 3-17 Decrement; 18-20 Tag; 21-35 Y

For all but theIBM 7094 and 7094 II, there are three index registers designated A, B and C; indexing with multiple 1 bits in the tag subtracts thelogical or of the selected index registers and loading with multiple 1 bits in the tag loads all of the selected index registers. The 7094 and 7094 II have seven index registers, but when they are powered on they are inmultiple tag mode, in which they use only the three of the index registers in a fashion compatible with earlier machines, and require a Leave Multiple Tag Mode (LMTM) instruction in order to access the other four index registers.

The effective address is normally Y-C(T), where C(T) is either 0 for a tag of 0, the logical or of the selected index registers in multiple tag mode or the selected index register if not in multiple tag mode. However, the effective address for index register control instructions is just Y.

A flag with both bits 1 selects indirect addressing; the indirect address word has both a tag and a Y field.

In addition totransfer (branch) instructions, these machines have skip instruction that conditionally skip one or two words, e.g., Compare Accumulator with Storage (CAS) does a three way compare and conditionally skips to NSI, NSI+1 or NSI+2, depending on the result.

MIPS

[edit]

TheMIPS architecture provides a specific example for a machine code whose instructions are always 32 bits long.^[23]^: 299 The general type of instruction is given by theop (operation) field, the highest 6 bits. J-type (jump) and I-type (immediate) instructions are fully specified byop. R-type (register) instructions include an additionalfunct (function) field to determine the exact operation. The fields used in these types are:

   6      5     5     5     5      6 bits[  op  |  rs |  rt |  rd |shamt| funct]  R-type[  op  |  rs |  rt | address/immediate]  I-type[  op  |        target address        ]  J-type

rs,rt, andrd indicate register operands;shamt gives a shift amount; and theaddress orimmediate fields contain an operand directly.^[23]^{: 299–301}

For example, adding the registers 1 and 2 and placing the result in register 6 is encoded:^[23]^: 554

[  op  |  rs |  rt |  rd |shamt| funct]    0     1     2     6     0     32     decimal 000000 00001 00010 00110 00000 100000   binary

Load a value into register 8, taken from the memory cell 68 cells after the location listed in register 3:^[23]^: 552

[  op  |  rs |  rt | address/immediate]   35     3     8           68           decimal 100011 00011 01000 00000 00001 000100   binary

Jumping to the address 1024:^[23]^: 552

[  op  |        target address        ]    2                 1024               decimal 000010 00000 00000 00000 10000 000000   binary

Bytecode

[edit]

Machine code is similar to yet fundamentally different frombytecode. Like machine code, bytecode is typically generated (i.e. by a compiler) from source code. But, unlike machine code, bytecode is not directly executable by a CPU. An exception is if a processor is designed to use bytecode as its machine code, such as theJava processor. If bytecode is processed by an software interpreter, then that interpreter is avirtual machine for which the bytecode is its machine code.

Storage

[edit]

During execution, machine code is generally stored in RAM although running from ROM is supported by some devices. Regardless, the code may also be cached in more specialized memory to enhance performance. There may be different caches for instructions and data, depending on the architecture.^[24]

From the point of view of aprocess, the machine code lives incode space, a designated part of itsaddress space. In amulti-threading environment, different threads of one process share code space along with data space, which reduces the overhead ofcontext switching considerably as compared to process switching.^[25]

Readability

[edit]

Machine code is generally considered to be not human readable,^[26] withDouglas Hofstadter comparing it to examining the atoms of aDNA molecule.^[27] However, various tools and methods support understanding machine code.

Disassembly decodes machine code to assembly language which is possible since assembly instructions can often be mapped one-to-one to machine instructions.^[28]

Adecompiler converts machine code to ahigh-level language, but the result can be relativelyobfuscated (hard to understand).

A program can be associated withdebug symbols (either embedded in thenative executable or in a separate file) that allow it to be mapped to external source code. Adebugger reads the symbols to help a programmer interactivelydebug the program. Examples include:

TheSHARE Operating System (1959) for theIBM 709,IBM 7090, andIBM 7094 computers allowed for an loadable code format namedSQUOZE. SQUOZE was a compressed binary form ofassembly language code and included a symbol table.
Modern IBM mainframeoperating systems, such asz/OS, have available a symbol table namedAssociated data (ADATA). The table is stored in a file that can be produced by theIBM High-Level Assembler (HLASM),^[29]^[30] IBM'sCOBOL compiler,^[31] and IBM'sPL/I compiler,^[32] either as a separate SYSADATA file or as ADATA records in aGeneralized object output file (GOFF).^[33] This obsoletes the TEST records fromOS/360, although it is still possible to request them and to use them in theTSO TEST command.
Windows uses a symbol table^[34] that is stored in aprogram database (.pdb) file.^[35]
MostUnix-like operating systems have available symbol table formats namedstabs andDWARF. InmacOS and otherDarwin-based operating systems, the debug symbols are stored in DWARF format in a separate.dSYM file.

Notes

[edit]

^On earlydecimal machines, patterns of characters, digits and digit sign
^^a ^bWhile overlapping instructions on processor architectures withvariable-length instruction sets can sometimes be arranged to merge different code paths back into one through control-flowresynchronization, overlapping code for different processor architectures can sometimes also be crafted to cause execution paths to branch into different directions depending on the underlying processor, as is sometimes used infat binaries.
^ For example, theDR-DOS master boot records (MBRs) andboot sectors (which also hold thepartition table andBIOS Parameter Block, leaving less than 446 respectively 423 bytes for the code) were traditionally able to locate the boot file in theFAT12 orFAT16 file system by themselves and load it into memory as a whole, in contrast to their counterparts inMS-DOS andPC DOS, which instead rely on thesystem files to occupy the first twodirectory entry locations in the file system and the first three sectors ofIBMBIO.COM to be stored at the start of the data area in contiguous sectors containing a secondary loader to load the remainder of the file into memory (requiringSYS to take care of all these conditions). WhenFAT32 andlogical block addressing (LBA) support was added,Microsoft even switched to requirei386 instructions and split the boot code over two sectors for code size reasons, which was no option to follow for DR-DOS as it would have brokenbackward- and cross-compatibility with other operating systems inmulti-boot andchain load scenarios, and as with olderIBM PC–compatible PCs. Instead, theDR-DOS 7.07 boot sectors resorted toself-modifying code,opcode-level programming in machine language, controlled utilization of (documented)side effects, multi-level data/code overlapping and algorithmicfolding techniques to still fit everything into a physical sector of only 512 bytes without giving up any of their extended functions.

References

[edit]

^Stallings, William (2015).Computer Organization and Architecture 10th edition. Pearson Prentice Hall. p. 776.ISBN 978-93-325-7040-5.
^Gregory, Kate (2003-04-28)."Managed, Unmanaged, Native: What Kind of Code Is This?".Developer.com. Archived fromthe original on 2009-09-23. Retrieved2008-09-02.
^Dourish, Paul (2004).Where the Action is: The Foundations of Embodied Interaction.MIT Press. p. 7.ISBN 0-262-54178-5. Retrieved2023-03-05.
^^a ^b ^cTanenbaum 1990, p. 251
^Tanenbaum 1990, p. 162
^Tanenbaum 1990, p. 231
^Tanenbaum 1990, p. 237
^Tanenbaum 1990, p. 236
^^a ^bTanenbaum 1990, p. 253
^Tanenbaum 1990, p. 283
^^a ^b ^c ^d ^eJacob, Matthias; Jakubowski, Mariusz H.;Venkatesan, Ramarathnam[at Wikidata] (20–21 September 2007).Towards Integral Binary Execution: Implementing Oblivious Hashing Using Overlapped Instruction Encodings(PDF). Proceedings of the 9th workshop on Multimedia & Security (MM&Sec '07). Dallas, Texas, US:Association for Computing Machinery. pp. 129–140.CiteSeerX 10.1.1.69.5258.doi:10.1145/1288869.1288887.ISBN 978-1-59593-857-2.S2CID 14174680.Archived(PDF) from the original on 2018-09-04. Retrieved2021-12-25. (12 pages)
^Lagarias, Jeffrey "Jeff" Clark;Rains, Eric Michael;Vanderbei, Robert J. (2009) [2001-10-13]. "The Kruskal Count". In Brams, Stephen; Gehrlein, William V.; Roberts, Fred S. (eds.).The Mathematics of Preference, Choice and Order. Studies in Choice and Welfare. Berlin / Heidelberg, Germany:Springer-Verlag. pp. 371–391.arXiv:math/0110143.doi:10.1007/978-3-540-79128-7_23.ISBN 978-3-540-79127-0. (22 pages)
^Andriesse, Dennis;Bos, Herbert[at Wikidata] (2014-07-10). Written at Vrije Universiteit Amsterdam, Amsterdam, Netherlands. Dietrich, Sven (ed.).Instruction-Level Steganography for Covert Trigger-Based Malware(PDF). 11th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA).Lecture Notes in Computer Science. Egham, UK; Switzerland:Springer International Publishing. pp. 41–50 [45].doi:10.1007/978-3-319-08509-8_3.eISSN 1611-3349.ISBN 978-3-31908508-1.ISSN 0302-9743.S2CID 4634611. LNCS 8550.Archived(PDF) from the original on 2023-08-26. Retrieved2023-08-26. (10 pages)
^^a ^bJakubowski, Mariusz H. (February 2016)."Graph Based Model for Software Tamper Protection".Microsoft.Archived from the original on 2019-10-31. Retrieved2023-08-19.
^Jämthagen, Christopher (November 2016).On Offensive and Defensive Methods in Software Security(PDF) (Thesis). Lund, Sweden: Department of Electrical and Information Technology,Lund University. p. 96.ISBN 978-91-7623-942-1.ISSN 1654-790X.Archived(PDF) from the original on 2023-08-26. Retrieved2023-08-26. (1+xvii+1+152 pages)
^^a ^b"Unintended Instructions on x86".Hacker News. 2021.Archived from the original on 2021-12-25. Retrieved2021-12-24.
^Kinder, Johannes (2010-09-24).Static Analysis of x86 Executables [Statische Analyse von Programmen in x86 Maschinensprache](PDF) (Dissertation). Munich, Germany:Technische Universität Darmstadt. D17.Archived from the original on 2020-11-12. Retrieved2021-12-25. (199 pages)
^"What is "overlapping instructions" obfuscation?".Reverse Engineering Stack Exchange. 2013-04-07.Archived from the original on 2021-12-25. Retrieved2021-12-25.
^Gates, William "Bill" Henry,Personal communication (NB. According toJacob et al.)
^Shacham, Hovav (2007).The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls (on the x86)(PDF). Proceedings of the ACM, CCS 2007.ACM Press.Archived(PDF) from the original on 2021-12-15. Retrieved2021-12-24.
^Kent, Allen; Williams, James G. (1993-04-05).Encyclopedia of Computer Science and Technology: Volume 28 - Supplement 13: AerosPate Applications of Artificial Intelligence to Tree Structures. CRC Press. pp. 33–34.ISBN 978-0-8247-2281-4.
^Tucker, S. G. (1967-12-31)."Microprogram control for SYSTEM/360".IBM Systems Journal.6 (4):222–241.doi:10.1147/sj.64.0222.ISSN 0018-8670 – via IEEE Xplore.
^^a ^b ^c ^d ^eHarris, David; Harris, Sarah L. (2007).Digital Design and Computer Architecture.Morgan Kaufmann Publishers.ISBN 978-0-12-370497-9. Retrieved2023-03-05.
^Su, Chao; Zeng, Qingkai (2021)."Survey of CPU Cache-Based Side-Channel Attacks: Systematic Analysis, Security Models, and Countermeasures".Security and Communication Networks.2021 (1) 5559552.doi:10.1155/2021/5559552.ISSN 1939-0122.
^"CS 537 Notes, Section #3A: Processes and Threads".pages.cs.wisc.edu. School of Computer, Data & Information Sciences, University of Wisconsin-Madison. Retrieved2025-07-18.
^Samuelson 1984, p. 683.
^Hofstadter 1979, p. 290.
^Tanenbaum 1990, p. 398.
^"Associated Data Architecture".High Level Assembler and Toolkit Feature.
^"Associated data file output"(PDF).High Level Assembler for z/OS & z/VM & z/VSE - 1.6 -HLASM Programmer's Guide(PDF) (Eighth ed.).IBM. October 2022. pp. 278–332. SC26-4941-07. Retrieved2025-02-14.
^"COBOL SYSADATA file contents".Enterprise COBOL for z/OS.
^"SYSADATA message information".Enterprise PL/I for z/OS 6.1 information. 2025-03-17.
^"Appendix C. Generalized object file format (GOFF)"(PDF).z/OS - 3.1 - MVS Program Management: Advanced Facilities(PDF).IBM. 2024-12-18. pp. 201–240. SA23-1392-60. Retrieved2025-02-14.
^"Symbols for Windows debugging".Microsoft Learn. 2022-12-20.
^"Querying the .Pdb File".Microsoft Learn. 2024-01-12.

Sources

[edit]

Hofstadter, Douglas R. (1979).Gödel, Escher, Bach: An Eternal Golden Braid.Basic Books.ISBN 0-465-02685-0. Retrieved2025-02-10.
Samuelson, Pamela (1984)."CONTU Revisited: The Case Against Copyright Protection for Computer Programs in Machine-Readable Form".Duke Law Journal.33 (4):663–769.doi:10.2307/1372418.hdl:hein.journals/duklr1984.JSTOR 1372418. Retrieved2025-02-10.
Tanenbaum, Andrew S. (1990).Structured Computer Organization, Third Edition. Prentice Hall. p. 398.ISBN 978-0-13-854662-5.

v t e Application binary interface (ABI)
Parts, conventions	Alignment Calling convention Call stack Library static Machine code Memory segmentation Name mangling Object code Opaque pointer Position-independent code Relocation System call Virtual method table
Related topics	Binary-code compatibility Foreign function interface Language binding Linker dynamic Loader

v t e Types of programming languages
Level	Machine Assembly Compiled Interpreted Low-level High-level Very high-level Esoteric
Generation	First Second Third Fourth Fifth

Movatterモバイル変換

Machine code

Assembly language

Instruction set

Overlapping instruction

Microcode

Examples

IBM 709x

MIPS

Bytecode

Storage

Readability

See also

Notes

References

Sources

Further reading