Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Machine code

From Wikipedia, the free encyclopedia
Instructions directly executable by a computer
For code that is completely internal to some CPUs and normally inaccessible to programmers, seeMicrocode.
"Native code" redirects here. For the French colonial legal system, seeNative code (France).

Machine language monitor running on aW65C816Smicroprocessor, displayingcode disassembly anddumps of processor register and memory
Program execution
General concepts
Types of code
Compilation strategies
Notable runtimes
Notable compilers & toolchains

Incomputing,machine code isdataencoded and structured to control acomputer'scentral processing unit (CPU) via its programmableinterface. Acomputer program consists primarily of sequences of machine-code instructions.[1] Machine code is classified asnative with respect to its host CPU since it is the language that the CPU interprets directly.[2] Asoftware interpreter is avirtual machine that processes virtual machine code.

A machine-code instruction causes the CPU to perform a specific task such as:

Aninstruction set architecture (ISA) defines the interface to a CPU and varies by groupings or families of CPU design such asx86 andARM. Generally, machine code compatible with one family is not with others, but there are exceptions. TheVAX architecture includes optional support of thePDP-11 instruction set. TheIA-64 architecture includes optional support of theIA-32 instruction set. And, thePowerPC 615 can natively process bothPowerPC and x86 instructions.

Assembly language

[edit]
Translation of assembly into machine code

Assembly language provides a relatively direct mapping from ahuman-readablesource code to machine code. The assembly language source code represents numerical codes in machine code, as mnemonics and labels.[3] For example,NOP in assembly for an x86 processor represents thex86 architectureopcode 0x90 in machine code. While it is possible to write a program in machine code, doing so is tedious and error-prone. Therefore, programs are usually written in assembly or, more commonly, in ahigh-level programming language.

Instruction set

[edit]

A machine instruction encodes an operation as a pattern ofbits based on the specified format for the machine's instruction set.[nb 1][4]

Instruction sets differ in various ways. Instructions of a set might all be the same length or different instructions might have different lengths; they might be smaller than, the same size as, or larger than theword size of the architecture. The number of instructions may be relatively small or large. Instructions may or may not have to be aligned on particular memory boundaries, such as the architecture's word boundary.[4]

An instruction set needs to execute the circuits of a computer'sdigital logic level. At the digital level, the program needs to control the computer's registers, bus, memory, ALU, and other hardware components.[5] To control a computer'sarchitectural features, machine instructions are created. Examples of features that are controlled using machine instructions:

The criteria for instruction formats include:

  • Instructions most commonly used should be shorter than instructions rarely used.[4]
  • Thememory transfer rate of the underlying hardware determines the flexibility of the memory fetch instructions.
  • The number of bits in theaddress field requires special consideration.[9]

Determining the size of the address field is a choice between space and speed.[9] On some computers, the number of bits in the address field may be too small to access all of the physical memory. Also,virtual address space needs to be considered. Another constraint may be a limitation on the size of registers used to construct the address. Whereas a shorter address field allows the instructions to execute more quickly, other physical properties need to be considered when designing the instruction format.

Instructions can be separated into two types: general-purpose and special-purpose. Special-purpose instructions exploit architectural features that are unique to a computer. General-purpose instructions control architectural features common to all computers.[10]

General-purpose instructions control:

  • Data movement from one place to another
  • Monadic operations that have oneoperand to produce a result
  • Dyadic operations that have two operands to produce a result
  • Comparisons and conditional jumps
  • Procedure calls
  • Loop control
  • Input/output

Overlapping instruction

[edit]

On processor architectures withvariable-length instruction sets[11] (such asIntel'sx86 processor family) it is, within the limits of the control-flowresynchronizing phenomenon known as theKruskal count,[12][11][13][14][15] sometimes possible through opcode-level programming to deliberately arrange the resulting code so that two code paths share a common fragment of opcode sequences.[nb 2] These are calledoverlapping instructions,overlapping opcodes,overlapping code,overlapped code,instruction scission, orjump into the middle of an instruction.[16][17][18]

In the 1970s and 1980s, overlapping instructions were sometimes used to preserve memory space. One example was in the implementation of error tables inMicrosoft'sAltair BASIC, whereinterleaved instructions mutually shared their instruction bytes.[19][11][16] The technique is rarely used today, but might still be necessary to resort to in areas where extreme optimization for size is necessary on the byte-level such as in the implementation ofboot loaders which have to fit intoboot sectors.[nb 3]

It is also sometimes used as acode obfuscation technique as a measure againstdisassembly and tampering.[11][14]

The principle is also used in shared code sequences offat binaries which must run on multiple instruction-set-incompatible processor platforms.[nb 2]

This property is also used to findunintended instructions calledgadgets in existing code repositories and is used inreturn-oriented programming as alternative tocode injection for exploits such asreturn-to-libc attacks.[20][11]

Microcode

[edit]

In some computers, the machine code of thearchitecture is implemented by an even more fundamental underlying layer calledmicrocode, providing a common machine language interface across a line or family of different models of computer with widely different underlyingdataflows. This is done to facilitateporting of machine language programs between different models.[21] An example of this use is the IBMSystem/360 family of computers and their successors.[22]

Examples

[edit]

IBM 709x

[edit]

TheIBM 704, 709, 704x and 709x store one instruction in each instruction word; IBM numbers the bit from the left as S, 1, ..., 35. Most instructions have one of two formats:

Generic
S,1-11
12-13 Flag, ignored in some instructions
14-17 unused
18-20 Tag
21-35 Y
Index register control, other than TSX
S,1-2 Opcode
3-17 Decrement
18-20 Tag
21-35 Y

For all but theIBM 7094 and 7094 II, there are three index registers designated A, B and C; indexing with multiple 1 bits in the tag subtracts thelogical or of the selected index registers and loading with multiple 1 bits in the tag loads all of the selected index registers. The 7094 and 7094 II have seven index registers, but when they are powered on they are inmultiple tag mode, in which they use only the three of the index registers in a fashion compatible with earlier machines, and require a Leave Multiple Tag Mode (LMTM) instruction in order to access the other four index registers.

The effective address is normally Y-C(T), where C(T) is either 0 for a tag of 0, the logical or of the selected index registers in multiple tag mode or the selected index register if not in multiple tag mode. However, the effective address for index register control instructions is just Y.

A flag with both bits 1 selects indirect addressing; the indirect address word has both a tag and a Y field.

In addition totransfer (branch) instructions, these machines have skip instruction that conditionally skip one or two words, e.g., Compare Accumulator with Storage (CAS) does a three way compare and conditionally skips to NSI, NSI+1 or NSI+2, depending on the result.

MIPS

[edit]

TheMIPS architecture provides a specific example for a machine code whose instructions are always 32 bits long.[23]: 299  The general type of instruction is given by theop (operation) field, the highest 6 bits. J-type (jump) and I-type (immediate) instructions are fully specified byop. R-type (register) instructions include an additional function (funct) field to determine the exact operation. The fields used in these types are:

   6      5     5     5     5      6 bits[  op  |  rs |  rt |  rd |shamt| funct]  R-type[  op  |  rs |  rt | address/immediate]  I-type[  op  |        target address        ]  J-type

rs,rt, andrd indicate register operands;shamt gives a shift amount; and theaddress orimmediate fields contain an operand directly.[23]: 299–301 

For example, adding the registers 1 and 2 and placing the result in register 6 is encoded:[23]: 554 

[  op  |  rs |  rt |  rd |shamt| funct]    0     1     2     6     0     32     decimal 000000 00001 00010 00110 00000 100000   binary

Load a value into register 8, taken from the memory cell 68 cells after the location listed in register 3:[23]: 552 

[  op  |  rs |  rt | address/immediate]   35     3     8           68           decimal 100011 00011 01000 00000 00001 000100   binary

Jumping to the address 1024:[23]: 552 

[  op  |        target address        ]    2                 1024               decimal 000010 00000 00000 00000 10000 000000   binary

Bytecode

[edit]

Machine code is similar to yet fundamentally different frombytecode. Like machine code, bytecode is typically generated (i.e. by a compiler) from source code. But, unlike machine code, bytecode is not directly executable by a CPU. An exception is if a processor is designed to use bytecode as its machine code, such as theJava processor. If bytecode is processed by an software interpreter, then that interpreter is avirtual machine for which the bytecode is its machine code.

Storage

[edit]

During execution, machine code is generally stored in RAM although running from ROM is supported by some devices. Regardless, the code may also be cached in more specialized memory to enhance performance. There may be different caches for instructions and data, depending on the architecture.[24]

From the point of view of aprocess, the machine code lives incode space, a designated part of itsaddress space. In amulti-threading environment, different threads of one process share code space along with data space, which reduces the overhead ofcontext switching considerably as compared to process switching.[25]

Readability

[edit]

Machine code is generally considered to be not human readable,[26] withDouglas Hofstadter comparing it to examining the atoms of aDNA molecule.[27] However, various tools and methods support understanding machine code.

Disassembly decodes machine code to assembly language which is possible since assembly instructions can often be mapped one-to-one to machine instructions.[28]

Adecompiler converts machine code to ahigh-level language, but the result can be relativelyobfuscated; hard to understand.

A program can be associated withdebug symbols (either embedded in thenative executable or in a separate file) that allow it to be mapped to external source code. Adebugger reads the symbols to help a programmer interactivelydebug the program. Examples include:

See also

[edit]
Look upmachine code in Wiktionary, the free dictionary.

Notes

[edit]
  1. ^On earlydecimal machines, patterns of characters, digits and digit sign
  2. ^abWhile overlapping instructions on processor architectures withvariable-length instruction sets can sometimes be arranged to merge different code paths back into one through control-flowresynchronization, overlapping code for different processor architectures can sometimes also be crafted to cause execution paths to branch into different directions depending on the underlying processor, as is sometimes used infat binaries.
  3. ^ For example, theDR-DOSmaster boot records (MBRs) andboot sectors (which also hold thepartition table andBIOS Parameter Block, leaving less than 446 respectively 423 bytes for the code) were traditionally able to locate the boot file in theFAT12 orFAT16file system by themselves and load it into memory as a whole, in contrast to their counterparts inMS-DOS andPC DOS, which instead rely on thesystem files to occupy the first twodirectory entry locations in the file system and the first three sectors ofIBMBIO.COM to be stored at the start of the data area in contiguous sectors containing a secondary loader to load the remainder of the file into memory (requiringSYS to take care of all these conditions). WhenFAT32 andlogical block addressing (LBA) support was added,Microsoft even switched to requirei386 instructions and split the boot code over two sectors for code size reasons, which was no option to follow for DR-DOS as it would have brokenbackward- and cross-compatibility with other operating systems inmulti-boot andchain load scenarios, and as with olderIBM PC–compatible PCs. Instead, theDR-DOS 7.07 boot sectors resorted toself-modifying code,opcode-level programming in machine language, controlled utilization of (documented)side effects, multi-level data/code overlapping and algorithmicfolding techniques to still fit everything into a physical sector of only 512 bytes without giving up any of their extended functions.

References

[edit]
  1. ^Stallings, William (2015).Computer Organization and Architecture 10th edition. Pearson Prentice Hall. p. 776.ISBN 978-93-325-7040-5.
  2. ^Gregory, Kate (2003-04-28)."Managed, Unmanaged, Native: What Kind of Code Is This?".Developer.com. Archived fromthe original on 2009-09-23. Retrieved2008-09-02.
  3. ^Dourish, Paul (2004).Where the Action is: The Foundations of Embodied Interaction.MIT Press. p. 7.ISBN 0-262-54178-5. Retrieved2023-03-05.
  4. ^abcTanenbaum 1990, p. 251
  5. ^Tanenbaum 1990, p. 162
  6. ^Tanenbaum 1990, p. 231
  7. ^Tanenbaum 1990, p. 237
  8. ^Tanenbaum 1990, p. 236
  9. ^abTanenbaum 1990, p. 253
  10. ^Tanenbaum 1990, p. 283
  11. ^abcdeJacob, Matthias; Jakubowski, Mariusz H.;Venkatesan, Ramarathnam[at Wikidata] (20–21 September 2007).Towards Integral Binary Execution: Implementing Oblivious Hashing Using Overlapped Instruction Encodings(PDF). Proceedings of the 9th workshop on Multimedia & Security (MM&Sec '07). Dallas, Texas, US:Association for Computing Machinery. pp. 129–140.CiteSeerX 10.1.1.69.5258.doi:10.1145/1288869.1288887.ISBN 978-1-59593-857-2.S2CID 14174680.Archived(PDF) from the original on 2018-09-04. Retrieved2021-12-25. (12 pages)
  12. ^Lagarias, Jeffrey "Jeff" Clark;Rains, Eric Michael;Vanderbei, Robert J. (2009) [2001-10-13]. "The Kruskal Count". In Brams, Stephen; Gehrlein, William V.; Roberts, Fred S. (eds.).The Mathematics of Preference, Choice and Order. Studies in Choice and Welfare. Berlin / Heidelberg, Germany:Springer-Verlag. pp. 371–391.arXiv:math/0110143.doi:10.1007/978-3-540-79128-7_23.ISBN 978-3-540-79127-0. (22 pages)
  13. ^Andriesse, Dennis;Bos, Herbert[at Wikidata] (2014-07-10). Written at Vrije Universiteit Amsterdam, Amsterdam, Netherlands. Dietrich, Sven (ed.).Instruction-Level Steganography for Covert Trigger-Based Malware(PDF). 11thInternational Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA).Lecture Notes in Computer Science. Egham, UK; Switzerland:Springer International Publishing. pp. 41–50 [45].doi:10.1007/978-3-319-08509-8_3.eISSN 1611-3349.ISBN 978-3-31908508-1.ISSN 0302-9743.S2CID 4634611. LNCS 8550.Archived(PDF) from the original on 2023-08-26. Retrieved2023-08-26. (10 pages)
  14. ^abJakubowski, Mariusz H. (February 2016)."Graph Based Model for Software Tamper Protection".Microsoft.Archived from the original on 2019-10-31. Retrieved2023-08-19.
  15. ^Jämthagen, Christopher (November 2016).On Offensive and Defensive Methods in Software Security(PDF) (Thesis). Lund, Sweden: Department of Electrical and Information Technology,Lund University. p. 96.ISBN 978-91-7623-942-1.ISSN 1654-790X.Archived(PDF) from the original on 2023-08-26. Retrieved2023-08-26. (1+xvii+1+152 pages)
  16. ^ab"Unintended Instructions on x86".Hacker News. 2021.Archived from the original on 2021-12-25. Retrieved2021-12-24.
  17. ^Kinder, Johannes (2010-09-24).Static Analysis of x86 Executables [Statische Analyse von Programmen in x86 Maschinensprache](PDF) (Dissertation). Munich, Germany:Technische Universität Darmstadt. D17.Archived from the original on 2020-11-12. Retrieved2021-12-25. (199 pages)
  18. ^"What is "overlapping instructions" obfuscation?".Reverse Engineering Stack Exchange. 2013-04-07.Archived from the original on 2021-12-25. Retrieved2021-12-25.
  19. ^Gates, William "Bill" Henry,Personal communication (NB. According toJacob et al.)
  20. ^Shacham, Hovav (2007).The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls (on the x86)(PDF). Proceedings of the ACM, CCS 2007.ACM Press.Archived(PDF) from the original on 2021-12-15. Retrieved2021-12-24.
  21. ^Kent, Allen; Williams, James G. (1993-04-05).Encyclopedia of Computer Science and Technology: Volume 28 - Supplement 13: AerosPate Applications of Artificial Intelligence to Tree Structures. CRC Press. pp. 33–34.ISBN 978-0-8247-2281-4.
  22. ^Tucker, S. G. (1967-12-31)."Microprogram control for SYSTEM/360".IBM Systems Journal.6 (4):222–241.doi:10.1147/sj.64.0222.ISSN 0018-8670 – via IEEE Xplore.
  23. ^abcdeHarris, David; Harris, Sarah L. (2007).Digital Design and Computer Architecture.Morgan Kaufmann Publishers.ISBN 978-0-12-370497-9. Retrieved2023-03-05.
  24. ^Su, Chao; Zeng, Qingkai (2021)."Survey of CPU Cache-Based Side-Channel Attacks: Systematic Analysis, Security Models, and Countermeasures".Security and Communication Networks.2021 (1) 5559552.doi:10.1155/2021/5559552.ISSN 1939-0122.
  25. ^"CS 537 Notes, Section #3A: Processes and Threads".pages.cs.wisc.edu. School of Computer, Data & Information Sciences, University of Wisconsin-Madison. Retrieved2025-07-18.
  26. ^Samuelson 1984, p. 683.
  27. ^Hofstadter 1979, p. 290.
  28. ^Tanenbaum 1990, p. 398.
  29. ^"Associated Data Architecture".High Level Assembler and Toolkit Feature.
  30. ^"Associated data file output"(PDF).High Level Assembler for z/OS & z/VM & z/VSE - 1.6 -HLASM Programmer's Guide(PDF) (Eighth ed.).IBM. October 2022. pp. 278–332. SC26-4941-07. Retrieved2025-02-14.
  31. ^"COBOL SYSADATA file contents".Enterprise COBOL for z/OS.
  32. ^"SYSADATA message information".Enterprise PL/I for z/OS 6.1 information. 2025-03-17.
  33. ^"Appendix C. Generalized object file format (GOFF)"(PDF).z/OS - 3.1 - MVS Program Management: Advanced Facilities(PDF).IBM. 2024-12-18. pp. 201–240. SA23-1392-60. Retrieved2025-02-14.
  34. ^"Symbols for Windows debugging".Microsoft Learn. 2022-12-20.
  35. ^"Querying the .Pdb File".Microsoft Learn. 2024-01-12.

Sources

[edit]

Further reading

[edit]
Parts,
conventions
Related topics
Level
Generation
Authority control databasesEdit this at Wikidata
Retrieved from "https://en.wikipedia.org/w/index.php?title=Machine_code&oldid=1324754801"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp