Execute anarithmetic logic unit (ALU) operation on one or more registers or memory locations
Jump orskip to an instruction that is not the next one
Aninstruction set architecture (ISA) defines the interface to a CPU and varies by groupings or families of CPU design such asx86 andARM. Generally, machine code compatible with one family is not with others, but there are exceptions. TheVAX architecture includes optional support of thePDP-11 instruction set. TheIA-64 architecture includes optional support of theIA-32 instruction set. And, thePowerPC 615 can natively process bothPowerPC and x86 instructions.
Assembly language provides a relatively direct mapping from ahuman-readablesource code to machine code. The assembly language source code represents numerical codes in machine code, as mnemonics and labels.[3] For example,NOP in assembly for an x86 processor represents thex86 architectureopcode 0x90 in machine code. While it is possible to write a program in machine code, doing so is tedious and error-prone. Therefore, programs are usually written in assembly or, more commonly, in ahigh-level programming language.
A machine instruction encodes an operation as a pattern ofbits based on the specified format for the machine's instruction set.[nb 1][4]
Instruction sets differ in various ways. Instructions of a set might all be the same length or different instructions might have different lengths; they might be smaller than, the same size as, or larger than theword size of the architecture. The number of instructions may be relatively small or large. Instructions may or may not have to be aligned on particular memory boundaries, such as the architecture's word boundary.[4]
An instruction set needs to execute the circuits of a computer'sdigital logic level. At the digital level, the program needs to control the computer's registers, bus, memory, ALU, and other hardware components.[5] To control a computer'sarchitectural features, machine instructions are created. Examples of features that are controlled using machine instructions:
Instructions most commonly used should be shorter than instructions rarely used.[4]
Thememory transfer rate of the underlying hardware determines the flexibility of the memory fetch instructions.
The number of bits in theaddress field requires special consideration.[9]
Determining the size of the address field is a choice between space and speed.[9] On some computers, the number of bits in the address field may be too small to access all of the physical memory. Also,virtual address space needs to be considered. Another constraint may be a limitation on the size of registers used to construct the address. Whereas a shorter address field allows the instructions to execute more quickly, other physical properties need to be considered when designing the instruction format.
Instructions can be separated into two types: general-purpose and special-purpose. Special-purpose instructions exploit architectural features that are unique to a computer. General-purpose instructions control architectural features common to all computers.[10]
General-purpose instructions control:
Data movement from one place to another
Monadic operations that have oneoperand to produce a result
Dyadic operations that have two operands to produce a result
On processor architectures withvariable-length instruction sets[11] (such asIntel'sx86 processor family) it is, within the limits of the control-flowresynchronizing phenomenon known as theKruskal count,[12][11][13][14][15] sometimes possible through opcode-level programming to deliberately arrange the resulting code so that two code paths share a common fragment of opcode sequences.[nb 2] These are calledoverlapping instructions,overlapping opcodes,overlapping code,overlapped code,instruction scission, orjump into the middle of an instruction.[16][17][18]
In the 1970s and 1980s, overlapping instructions were sometimes used to preserve memory space. One example was in the implementation of error tables inMicrosoft'sAltair BASIC, whereinterleaved instructions mutually shared their instruction bytes.[19][11][16] The technique is rarely used today, but might still be necessary to resort to in areas where extreme optimization for size is necessary on the byte-level such as in the implementation ofboot loaders which have to fit intoboot sectors.[nb 3]
In some computers, the machine code of thearchitecture is implemented by an even more fundamental underlying layer calledmicrocode, providing a common machine language interface across a line or family of different models of computer with widely different underlyingdataflows. This is done to facilitateporting of machine language programs between different models.[21] An example of this use is the IBMSystem/360 family of computers and their successors.[22]
TheIBM 704, 709, 704x and 709x store one instruction in each instruction word; IBM numbers the bit from the left as S, 1, ..., 35. Most instructions have one of two formats:
Generic
S,1-11
12-13 Flag, ignored in some instructions
14-17 unused
18-20 Tag
21-35 Y
Index register control, other than TSX
S,1-2 Opcode
3-17 Decrement
18-20 Tag
21-35 Y
For all but theIBM 7094 and 7094 II, there are three index registers designated A, B and C; indexing with multiple 1 bits in the tag subtracts thelogical or of the selected index registers and loading with multiple 1 bits in the tag loads all of the selected index registers. The 7094 and 7094 II have seven index registers, but when they are powered on they are inmultiple tag mode, in which they use only the three of the index registers in a fashion compatible with earlier machines, and require a Leave Multiple Tag Mode (LMTM) instruction in order to access the other four index registers.
The effective address is normally Y-C(T), where C(T) is either 0 for a tag of 0, the logical or of the selected index registers in multiple tag mode or the selected index register if not in multiple tag mode. However, the effective address for index register control instructions is just Y.
A flag with both bits 1 selects indirect addressing; the indirect address word has both a tag and a Y field.
In addition totransfer (branch) instructions, these machines have skip instruction that conditionally skip one or two words, e.g., Compare Accumulator with Storage (CAS) does a three way compare and conditionally skips to NSI, NSI+1 or NSI+2, depending on the result.
TheMIPS architecture provides a specific example for a machine code whose instructions are always 32 bits long.[23]: 299 The general type of instruction is given by theop (operation) field, the highest 6 bits. J-type (jump) and I-type (immediate) instructions are fully specified byop. R-type (register) instructions include an additional function (funct) field to determine the exact operation. The fields used in these types are:
Machine code is similar to yet fundamentally different frombytecode. Like machine code, bytecode is typically generated (i.e. by a compiler) from source code. But, unlike machine code, bytecode is not directly executable by a CPU. An exception is if a processor is designed to use bytecode as its machine code, such as theJava processor. If bytecode is processed by an software interpreter, then that interpreter is avirtual machine for which the bytecode is its machine code.
During execution, machine code is generally stored in RAM although running from ROM is supported by some devices. Regardless, the code may also be cached in more specialized memory to enhance performance. There may be different caches for instructions and data, depending on the architecture.[24]
From the point of view of aprocess, the machine code lives incode space, a designated part of itsaddress space. In amulti-threading environment, different threads of one process share code space along with data space, which reduces the overhead ofcontext switching considerably as compared to process switching.[25]
Machine code is generally considered to be not human readable,[26] withDouglas Hofstadter comparing it to examining the atoms of aDNA molecule.[27] However, various tools and methods support understanding machine code.
Disassembly decodes machine code to assembly language which is possible since assembly instructions can often be mapped one-to-one to machine instructions.[28]
A program can be associated withdebug symbols (either embedded in thenative executable or in a separate file) that allow it to be mapped to external source code. Adebugger reads the symbols to help a programmer interactivelydebug the program. Examples include:
Modern IBM mainframeoperating systems, such asz/OS, have available a symbol table namedAssociated data (ADATA). The table is stored in a file that can be produced by theIBM High-Level Assembler (HLASM),[29][30] IBM'sCOBOL compiler,[31] and IBM'sPL/I compiler,[32] either as a separate SYSADATA file or as ADATA records in aGeneralized object output file (GOFF).[33] This obsoletes the TEST records fromOS/360, although it is still possible to request them and to use them in theTSO TEST command.
MostUnix-like operating systems have available symbol table formats namedstabs andDWARF. InmacOS and otherDarwin-based operating systems, the debug symbols are stored in DWARF format in a separate.dSYM file.
^On earlydecimal machines, patterns of characters, digits and digit sign
^abWhile overlapping instructions on processor architectures withvariable-length instruction sets can sometimes be arranged to merge different code paths back into one through control-flowresynchronization, overlapping code for different processor architectures can sometimes also be crafted to cause execution paths to branch into different directions depending on the underlying processor, as is sometimes used infat binaries.
^ For example, theDR-DOSmaster boot records (MBRs) andboot sectors (which also hold thepartition table andBIOS Parameter Block, leaving less than 446 respectively 423 bytes for the code) were traditionally able to locate the boot file in theFAT12 orFAT16file system by themselves and load it into memory as a whole, in contrast to their counterparts inMS-DOS andPC DOS, which instead rely on thesystem files to occupy the first twodirectory entry locations in the file system and the first three sectors ofIBMBIO.COM to be stored at the start of the data area in contiguous sectors containing a secondary loader to load the remainder of the file into memory (requiringSYS to take care of all these conditions). WhenFAT32 andlogical block addressing (LBA) support was added,Microsoft even switched to requirei386 instructions and split the boot code over two sectors for code size reasons, which was no option to follow for DR-DOS as it would have brokenbackward- and cross-compatibility with other operating systems inmulti-boot andchain load scenarios, and as with olderIBM PC–compatible PCs. Instead, theDR-DOS 7.07 boot sectors resorted toself-modifying code,opcode-level programming in machine language, controlled utilization of (documented)side effects, multi-level data/code overlapping and algorithmicfolding techniques to still fit everything into a physical sector of only 512 bytes without giving up any of their extended functions.