Movatterモバイル変換

History of general-purpose CPUs

From Wikipedia, the free encyclopedia

This article has multiple issues. Please helpimprove it or discuss these issues on thetalk page.(Learn how and when to remove these messages)

This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "History of general-purpose CPUs" – news ·newspapers ·books ·scholar ·JSTOR(March 2009) (Learn how and when to remove this message)

This articlepossibly containsoriginal research. Pleaseimprove it byverifying the claims made and addinginline citations. Statements consisting only of original research should be removed.(March 2009) (Learn how and when to remove this message)

(Learn how and when to remove this message)

History of computing

Hardware
Hardware 1960s to present
Software
Software Software configuration management Unix Free and open-source software
Computer science
Artificial intelligence Compiler construction Early computer science Operating systems Programming languages Prominent pioneers Software engineering
Modern concepts
General-purpose CPUs Graphical user interface Internet Laptops Personal computers Video games World Wide Web Cloud Quantum
By country
Bulgaria Eastern Bloc Poland Romania South America Soviet Union Yugoslavia
Timeline of computing
before 1950 1950–1979 1980–1989 1990–1999 2000–2009 2010–2019 2020–present more timelines ...
Glossary of computer science
Category
v t e

Thehistory of general-purposeCPUs is a continuation of the earlierhistory of computing hardware.

1950s: Early designs

[edit]

A Vacuum tube module from early 700 series IBM computers

In the early 1950s, each computer design was unique. There were no upward-compatible machines or computer architectures with multiple, differing implementations. Programs written for one machine would run on no other kind, even other kinds from the same company. This was not a major drawback then because no large body of software had been developed to run on computers, so starting programming from scratch was not seen as a large barrier.

The design freedom of the time was very important because designers were very constrained by the cost of electronics, and only starting to explore how a computer could best be organized. Some of the basic features introduced during this period includedindex registers (on theFerranti Mark 1), areturn address saving instruction (UNIVAC I), immediate operands (IBM 704), and detecting invalid operations (IBM 650).

By the end of the 1950s, commercial builders had developed factory-constructed, truck-deliverable computers. The most widely installed computer was theIBM 650, which useddrum memory onto which programs were loaded using either paperpunched tape orpunched cards. Some very high-end machines also includedcore memory which provided higher speeds.Hard disks were also starting to grow popular.

A computer is an automaticabacus. The type of number system affects the way it works. In the early 1950s, most computers were built for specific numerical processing tasks, and many machines used decimal numbers as their basic number system; that is, the mathematical functions of the machines worked in base-10 instead of base-2 as is common today. These were not merelybinary-coded decimal (BCD). Most machines had ten vacuum tubes per digit in eachprocessor register. Some earlySoviet computer designers implemented systems based onternary logic; that is, a bit could have three states: +1, 0, or -1, corresponding to positive, zero, or negative voltage.

An early project for theU.S. Air Force,BINAC attempted to make a lightweight, simple computer by using binary arithmetic. It deeply impressed the industry.

As late as 1970, major computer languages were unable to standardize their numeric behavior because decimal computers had groups of users too large to alienate.

Even when designers used a binary system, they still had many odd ideas. Some used sign-magnitude arithmetic (-1 = 10001), orones' complement (-1 = 11110), rather than moderntwo's complement arithmetic (-1 = 11111). Most computers used six-bit character sets because they adequately encoded Hollerithpunched cards. It was a major revelation to designers of this period to realize that the data word should be a multiple of the character size. They began to design computers with 12-, 24- and 36-bit data words (e.g., see theTX-2).

In this era,Grosch's law dominated computer design: computer cost increased as the square of its speed.

1960s: Computer revolution and CISC

[edit]

One major problem with early computers was that a program for one would work on no others. Computer companies found that their customers had little reason to remain loyal to a given brand, as the next computer they bought would be incompatible anyway. At that point, the only concerns were usually price and performance.

In 1962, IBM tried a new approach to designing computers. The plan was to make a family of computers that could all run the same software, but with different performances, and at different prices. As users' needs grew, they could move up to larger computers, and still keep all of their investment in programs, data and storage media.

To do this, they designed onereference computer namedSystem/360 (S/360). This was a virtual computer, a reference instruction set, and abilities that all machines in the family would support. To provide different classes of machines, each computer in the family would use more or less hardware emulation, and more or lessmicroprogram emulation, to create a machine able to run the full S/360instruction set.

For instance, a low-end machine could include a very simple processor for low cost. However, this would require the use of a larger microcode emulator to provide the rest of the instruction set, which would slow it down. A high-end machine would use a much more complex processor that could directly process more of the S/360 design, thus running a much simpler and faster emulator.

IBM chose consciously to make the referenceinstruction set quite complex, and very capable. Even though the computer was complex, itscontrol store holding themicroprogram would stay relatively small and could be made with very fast memory. Another important effect was that one instruction could describe quite a complex sequence of operations. Thus the computers would generally have to fetch fewer instructions from the main memory, which could be made slower, smaller and less costly for a given mix of speed and price.

As the S/360 was to be a successor to both scientific machines like the7090 and data processing machines like the1401, it needed a design that could reasonably support all forms of processing. Hence the instruction set was designed to manipulate simple binary numbers, and text, scientific floating-point (similar to the numbers used in a calculator), and thebinary-coded decimal arithmetic needed by accounting systems.

Almost all following computers included these innovations in some form. This basic set of features is now calledcomplex instruction set computing^{[citation needed]} (CISC, pronounced "sisk"), a term not invented until many years later, whenreduced instruction set computing (RISC) began to get market share.

In many CISCs, an instruction could access either registers or memory, usually in several different ways. This made the CISCs easier to program, because a programmer could remember only thirty to a hundred instructions, and a set of three to tenaddressing modes rather than thousands of distinct instructions. This was called anorthogonal instruction set. ThePDP-11 andMotorola 68000 architecture are examples of nearly orthogonal instruction sets.

There was also theBUNCH (Burroughs,UNIVAC,NCR,Control Data Corporation, andHoneywell) that competed against IBM at this time; however, IBM dominated the era with S/360.

The Burroughs Corporation (which later merged with Sperry/Univac to formUnisys) offered an alternative to S/360 with theirBurroughs large systems B5000 series. In 1961, the B5000 had virtual memory, symmetric multiprocessing, a multiprogramming operating system (Master Control Program (MCP)), written inALGOL 60, and the industry's first recursive-descent compilers as early as 1964.

1970s: Microprocessor revolution

[edit]

The first commercialmicroprocessor, thebinary-coded decimal (BCD) basedIntel 4004, was released byIntel in 1971.^[1]^[2] In March 1972, Intel introduced a microprocessor with an8-bit architecture, the8008, an integratedpMOS logic re-implementation of thetransistor–transistor logic (TTL) basedDatapoint 2200 CPU.

4004 designersFederico Faggin andMasatoshi Shima went on to design the 8008's successor, theIntel 8080, a slightly moreminicomputer-like microprocessor, largely based on customer feedback on the limited 8008. Much like the 8008, it was used for applications such as terminals, printers, cash registers and industrial robots. However, the more able 8080 also became the original target CPU for an earlyde facto standard personal computer operating system calledCP/M and was used for such demanding control tasks ascruise missiles, and many other uses. Released in 1974, the 8080 became one of the first really widespread microprocessors.

By the mid-1970s, the use of integrated circuits in computers was common. The decade was marked by market upheavals caused by the shrinking price of transistors.

It became possible to put an entire CPU on one printed circuit board. The result was that minicomputers, usually with 16-bit words, and 4K to 64K of memory, became common.

CISCs were believed to be the most powerful types of computers, because their microcode was small and could be stored in very high-speed memory. The CISC architecture also addressed thesemantic gap as it was then perceived. This was a defined distance between the machine language, and the higher level programming languages used to program a machine. It was felt that compilers could do a better job with a richer instruction set.

Custom CISCs were commonly constructed usingbit slice computer logic such as the AMD 2900 chips, with custom microcode. A bit slice component is a piece of anarithmetic logic unit (ALU), register file ormicrosequencer. Most bit-slice integrated circuits were 4 bits wide.

By the early 1970s, the16-bit PDP-11 minicomputer was developed, arguably the most advanced small computer of its day. In the late 1970s, wider-wordsuperminicomputers were introduced, such as the32-bit VAX.

IBM continued to make large, fast computers. However, the definition of large and fast now meant more than a megabyte of RAM, clock speeds near one megahertz,^[3]^[4] and tens of megabytes of disk drives.

IBM's System 370 was a version of the 360 tweaked to run virtual computing environments. Thevirtual computer was developed to reduce the chances of an unrecoverable software failure.

TheBurroughs large systems (B5000, B6000, B7000) series reached its largest market share. It was a stack computer whose OS was programmed in a dialect of Algol.

All these different developments competed for market share.

The first single-chip16-bit microprocessor was introduced in 1975.Panafacom, a conglomerate formed by Japanese companiesFujitsu,Fuji Electric, andMatsushita, introduced the MN1610, a commercial 16-bit microprocessor.^[5]^[6]^[7] According to Fujitsu, it was "the world's first 16-bitmicrocomputer on a single chip".^[6]

The Intel 8080 was the basis for the 16-bit Intel8086, which is a direct ancestor to today's ubiquitousx86 family (includingPentium andIntel Core). Every instruction of the 8080 has a direct equivalent in the large x86 instruction set, although the opcode values are different in the latter.

Early 1980s–1990s: Lessons of RISC

[edit]

In the early 1980s, researchers atUC Berkeley andIBM both discovered that most computer language compilers and interpreters used only a small subset of the instructions ofcomplex instruction set computing (CISC). Much of the power of the CPU was being ignored in real-world use. They realized that by making the computer simpler and less orthogonal, they could make it faster and less costly at the same time.

At the same time, CPU calculation became faster in relation to the time for needed memory accesses. Designers also experimented with using large sets of internal registers. The goal was tocache intermediate results in the registers under the control of the compiler. This also reduced the number ofaddressing modes and orthogonality.

The computer designs based on this theory were calledreduced instruction set computing (RISC). RISCs usually had larger numbers of registers, accessed by simpler instructions, with a few instructions specifically to load and store data to memory. The result was a very simple core CPU running at very high speed, supporting the sorts of operations the compilers were using anyway.

A common variant on the RISC design employs theHarvard architecture, versusVon Neumann architecture or stored program architecture common to most other designs. In a Harvard Architecture machine, the program and data occupy separate memory devices and can be accessed simultaneously. In Von Neumann machines, the data and programs are mixed in one memory device, requiring sequential accessing which produces the so-calledVon Neumann bottleneck.

One downside to the RISC design was that the programs that run on them tend to be larger. This is becausecompilers must generate longer sequences of the simpler instructions to perform the same results. Since these instructions must be loaded from memory anyway, the larger code offsets some of the RISC design's fast memory handling.

In the early 1990s, engineers at Japan'sHitachi found ways to compress the reduced instruction sets so they fit in even smaller memory systems than CISCs. Such compression schemes were used for the instruction set of theirSuperH series of microprocessors, introduced in 1992.^[8] The SuperH instruction set was later adapted forARM architecture'sThumb instruction set.^[9] In applications that do not need to run older binary software, compressed RISCs are growing to dominate sales.

Another approach to RISCs was theminimal instruction set computer (MISC),niladic, orzero-operand instruction set. This approach realized that most space in an instruction was used to identify the operands of the instruction. These machines placed the operands on a push-down (last-in, first out)stack. The instruction set was supplemented with a few instructions to fetch and store memory. Most used simple caching to provide extremely fast RISC machines, with very compact code. Another benefit was that the interrupt latencies were very small, smaller than most CISC machines (a rare trait in RISC machines). TheBurroughs large systems architecture used this approach. The B5000 was designed in 1961, long before the termRISC was invented. The architecture puts six 8-bit instructions in a 48-bit word, and was a precursor tovery long instruction word (VLIW) design (see below:1990 to today).

The Burroughs architecture was one of the inspirations forCharles H. Moore's programming languageForth, which in turn inspired his later MISC chip designs. For example, his f20 cores had 31 5-bit instructions, which fit four to a 20-bit word.

RISC chips now dominate the market for 32-bit embedded systems. Smaller RISC chips are even growing common in the cost-sensitive 8-bit embedded-system market. The main market for RISC CPUs has been systems that need low power or small size.

Even some CISC processors (based on architectures that were created before RISC grew dominant), such as newerx86 processors, translate instructions internally into a RISC-like instruction set.

These numbers may surprise many, because themarket is perceived as desktop computers. x86 designs dominate desktop and notebook computer sales, but such computers are only a tiny fraction of the computers now sold. Most people in industrialised countries own more computers in embedded systems in their car and house, than on their desks.

Mid-to-late 1980s: Exploiting instruction-level parallelism

[edit]

In the mid-to-late 1980s, designers began using a technique termedinstruction pipelining, in which the processor works on multiple instructions in different stages of completion. For example, the processor can retrieve the operands for the next instruction while calculating the result of the current one. Modern CPUs may use over a dozen such stages. (Pipelining was originally developed in the late 1950s byInternational Business Machines (IBM) on their7030 (Stretch) mainframe computer.)Minimal instruction set computers (MISC) can execute instructions in one cycle with no need for pipelining.

A similar idea, introduced only a few years later, was to execute multiple instructions in parallel on separatearithmetic logic units (ALUs). Instead of operating on only one instruction at a time, the CPU will look for several similar instructions that do not depend on each other, and execute them in parallel. This approach is calledsuperscalar processor design.

Such methods are limited by the degree ofinstruction-level parallelism (ILP), the number of non-dependent instructions in the program code. Some programs can run very well on superscalar processors due to their inherent high ILP, notably graphics. However, more general problems have far less ILP, thus lowering the possible speedups from these methods.

Branching is one major culprit. For example, a program may add two numbers and branch to a different code segment if the number is bigger than a third number. In this case, even if the branch operation is sent to the second ALU for processing, it still must wait for the results from the addition. It thus runs no faster than if there was only one ALU. The most common solution for this type of problem is to use a type ofbranch prediction.

To further the efficiency of multiple functional units which are available insuperscalar designs, operand register dependencies were found to be another limiting factor. To minimize these dependencies,out-of-order execution of instructions was introduced. In such a scheme, the instruction results which complete out-of-order must be re-ordered in program order by the processor for the program to be restart able after an exception. Out-of-order execution was the main advance of the computer industry during the 1990s.

A similar concept isspeculative execution, where instructions from one direction of a branch (the predicted direction) are executed before the branch direction is known. When the branch direction is known, the predicted direction and the actual direction are compared. If the predicted direction was correct, the speculatively executed instructions and their results are kept; if it was incorrect, these instructions and their results are erased. Speculative execution, coupled with an accurate branch predictor, gives a large performance gain.

These advances, which were originally developed from research for RISC-style designs, allow modern CISC processors to execute twelve or more instructions per clock cycle, when traditional CISC designs could take twelve or more cycles to execute one instruction.

The resulting instruction scheduling logic of these processors is large, complex and difficult to verify. Further, higher complexity needs more transistors, raising power consumption and heat. In these, RISC is superior because the instructions are simpler, have less interdependence, and make superscalar implementations easier. However, as Intel has demonstrated, the concepts can be applied to acomplex instruction set computing (CISC) design, given enough time and money.

1990 to today: Looking forward

[edit]

VLIW and EPIC

[edit]

The instruction scheduling logic that makes a superscalar processor isBoolean logic. In the early 1990s, a significant innovation was to realize that the coordination of a multi-ALU computer could be moved into thecompiler, the software that translates a programmer's instructions into machine-level instructions.

This type of computer is called avery long instruction word (VLIW) computer.

Scheduling instructions statically in the compiler (versus scheduling dynamically in the processor) can reduce CPU complexity. This can improve performance, and reduce heat and cost.

Unfortunately, the compiler lacks accurate knowledge of runtime scheduling issues. Merely changing the CPU core frequency multiplier will have an effect on scheduling. Operation of the program, as determined by input data, will have major effects on scheduling. To overcome these severe problems, a VLIW system may be enhanced by adding the normal dynamic scheduling, losing some of the VLIW advantages.

Static scheduling in the compiler also assumes that dynamically generated code will be uncommon. Before the creation ofJava and theJava virtual machine, this was true. It was reasonable to assume that slow compiles would only affect software developers. Now, withjust-in-time compilation (JIT) virtual machines being used for many languages, slow code generation affects users also.

There were several unsuccessful attempts to commercialize VLIW. The basic problem is that a VLIW computer does not scale to different price and performance points, as a dynamically scheduled computer can. Another issue is that compiler design for VLIW computers is very difficult, and compilers, as of 2005, often emit suboptimal code for these platforms.

Also, VLIW computers optimise for throughput, not low latency, so they were unattractive to engineers designing controllers and other computers embedded in machinery. Theembedded systems markets had often pioneered other computer improvements by providing a large market unconcerned about compatibility with older software.

In January 2000,Transmeta Corporation took the novel step of placing a compiler in the central processing unit, and making the compiler translate from a reference byte code (in their case,x86 instructions) to an internal VLIW instruction set. This method combines the hardware simplicity, low power and speed of VLIW RISC with the compact main memory system and software reverse-compatibility provided by popular CISC.

Intel'sItanium chip is based on what they call anexplicitly parallel instruction computing (EPIC) design. This design supposedly provides the VLIW advantage of increased instruction throughput. However, it avoids some of the issues of scaling and complexity, by explicitly providing in eachbundle of instructions information concerning their dependencies. This information is calculated by the compiler, as it would be in a VLIW design. The early versions are also backward-compatible with newerx86 software by means of an on-chipemulator mode. Integer performance was disappointing and despite improvements, sales in volume markets continue to be low.

Multi-threading

[edit]

Current^[when?] designs work best when the computer is running only one program. However, nearly all modernoperating systems allow running multiple programs together. For the CPU to change over and do work on another program needs costlycontext switching. In contrast, multi-threaded CPUs can handle instructions from multiple programs at once.

To do this, such CPUs include several sets of registers. When a context switch occurs, the contents of theworking registers are simply copied into one of a set of registers for this purpose.

Such designs often include thousands of registers instead of hundreds as in a typical design. On the downside, registers tend to be somewhat costly in chip space needed to implement them. This chip space might be used otherwise for some other purpose.

Intel calls this technology "hyperthreading" and offers two threads per core in its current Core i3, Core i5, Core i7 and Core i9 Desktop lineup (as well as in its Core i3, Core i5 and Core i7 Mobile lineup), as well as offering up to four threads per core in high-end Xeon Phi processors.

Multi-core

[edit]

Multi-core CPUs are typically multiple CPU cores on the same die, connected to each other via a shared L2 or L3 cache, an on-diebus, or an on-diecrossbar switch. All the CPU cores on the die share interconnect components with which to interface to other processors and the rest of the system. These components may include afront-side bus interface, amemory controller to interface withdynamic random access memory (DRAM), acache coherent link to other processors, and a non-coherent link to thesouthbridge and I/O devices. The termsmulti-core andmicroprocessor unit (MPU) have come into general use for one die having multiple CPU cores.

The development of multi-core CPUs was largely driven by the physical and thermal limitations of increasing clock speeds. By distributing computational tasks across several cores, systems can achieve higher performance without proportionally increasing power consumption and heat generation. This parallel processing capability allows modern operating systems to schedule multiple threads concurrently, leading to improved responsiveness and throughput, especially in multi-threaded applications.

Many modern multi-core processors also incorporate simultaneous multithreading (SMT), a technology that allows each physical core to execute multiple threads concurrently. SMT enhances overall efficiency by making better use of the core’s resources during periods of low utilization, thus further optimizing performance without a significant increase in power draw.

In addition to the shared cache and interconnect components mentioned earlier, advanced interconnect technologies have played a crucial role in boosting multi-core performance. Interfaces such as Intel's QuickPath Interconnect (QPI) and AMD's Infinity Fabric have been developed to provide high-bandwidth, low-latency communication channels between cores, memory, and other system components. These innovations reduce data transfer bottlenecks and contribute to a more cohesive and efficient processing environment.

Moreover, the rise of heterogeneous computing has seen the integration of dedicated accelerators, such as GPUs and specialized co-processors, alongside multi-core CPUs. These systems offload specific tasks—like graphics rendering or machine learning computations—from the main CPU cores, allowing for a more balanced and efficient utilization of the overall system resources. This evolution in processor design continues to influence software development, where parallel and concurrent programming models are increasingly adopted to harness the full potential of multi-core architectures.^[10]

Intelligent RAM

[edit]

One way to work around theVon Neumann bottleneck is to mix a processor and DRAM all on one chip.

Reconfigurable logic

[edit]

Main article:Reconfigurable computing

Another track of development is to combine reconfigurable logic with a general-purpose CPU. In this scheme, a special computer language compiles fast-running subroutines into a bit-mask to configure the logic. Slower, or less-critical parts of the program can be run by sharing their time on the CPU. This process allows creating devices such as softwareradios, by using digital signal processing to perform functions usually performed by analogelectronics.

Open source processors

[edit]

As the lines between hardware and software increasingly blur due to progress in design methodology and availability of chips such asfield-programmable gate arrays (FPGA) and cheaper production processes, evenopen source hardware has begun to appear. Loosely knit communities likeOpenCores andRISC-V have recently announced fully open CPU architectures such as theOpenRISC which can be readily implemented on FPGAs or in custom produced chips, by anyone, with no license fees, and even established processor makers likeSun Microsystems have released processor designs (e.g.,OpenSPARC) under open-source licenses.

Asynchronous CPUs

[edit]

Main article:Asynchronous circuit

Yet another option is aclockless orasynchronous CPU. Unlike conventional processors, clockless processors have no central clock to coordinate the progress of data through the pipeline. Instead, stages of the CPU are coordinated using logic devices calledpipe line controls orFIFO sequencers. Basically, the pipeline controller clocks the next stage of logic when the existing stage is complete. Thus, a central clock is unneeded.

Relative to clocked logic, it may be easier to implement high performance devices in asynchronous logic:

In a clocked CPU, no component can run faster than the clock rate. In a clockless CPU, components can run at different speeds.
In a clocked CPU, the clock can go no faster than the worst-case performance of the slowest stage. In a clockless CPU, when a stage finishes faster than normal, the next stage can immediately take the results rather than waiting for the next clock tick. A stage might finish faster than normal because of the type of data inputs (e.g., multiplication can be very fast if it occurs by 0 or 1), or because it is running at a higher voltage or lower temperature than normal.

Asynchronous logic proponents believe these abilities would have these benefits:

lower power dissipation for a given performance
highest possible execution speeds

The biggest disadvantage of the clockless CPU is that most CPU design tools assume a clocked CPU (asynchronous circuit), so making a clockless CPU (designing anasynchronous circuit) involves modifying the design tools to handle clockless logic and doing extra testing to ensure the design avoidsmetastability problems.

Even so, several asynchronous CPUs have been built, including

theORDVAC and the identicalILLIAC I (1951)
theILLIAC II (1962), then the fastest computer on Earth
The Caltech Asynchronous Microprocessor, the world-first asynchronous microprocessor (1988)
theARM-implementingAMULET (1993 and 2000)
the asynchronous implementation ofMIPS Technologies R3000, named MiniMIPS (1998)^[11]
the SEAforthmulti-core processor fromCharles H. Moore^[12]

Optical communication

[edit]

In theory, an optical computer's components could directly connect through a holographic or phased open-air switching system. This would provide a large increase in effective speed and design flexibility, and a large reduction in cost. Since a computer's connectors are also its most likely failure points, a busless system may be more reliable.

Further, as of 2010, modern processors use 64- or 128-bit logic. Optical wavelength superposition could allow data lanes and logic many orders of magnitude higher than electronics, with no added space or copper wires.

Optical processors

[edit]

Main article:Optical computing

Another long-term option is to use light instead of electricity for digital logic. In theory, this could run about 30% faster and use less power, and allow a direct interface with quantum computing devices.^{[citation needed]}

The main problems with this approach are that, for the foreseeable future, electronic computing elements are faster, smaller, cheaper, and more reliable. Such elements are already smaller than some wavelengths of light. Thus, even waveguide-based optical logic may be uneconomic relative to electronic logic. As of 2016, most development effort is for electronic circuitry.

Ionic processors

[edit]

Early experimental work has been done on using ion-based chemical reactions instead of electronic or photonic actions to implement elements of a logic processor.

Belt machine architecture

[edit]

Relative to conventionalregister machine orstack machine architecture, yet similar to Intel'sItanium architecture,^[13] a temporal register addressing scheme has been proposed by Ivan Godard and company that is intended to greatly reduce the complexity of CPU hardware (specifically the number of internal registers and the resulting hugemultiplexer trees).^[14] While somewhat harder to read and debug than general-purpose register names, it aids understanding to view the belt as a movingconveyor belt where the oldest valuesdrop off the belt and vanish. It is implemented in the Mill architecture.

Timeline of events

[edit]

1964.IBM release the 32-bitIBM System/360 with memory protection.
1969.Intel 4004's initial design led by Intel's Ted Hoff andBusicom'sMasatoshi Shima.^[15]
1970. Intel 4004's design completed by Intel'sFederico Faggin and Busicom's Masatoshi Shima.^[15]
1971. IBM release theIBM System/370 successor to System/360.
1971.Intel release the 4-bitIntel 4004, the first commercialmicroprocessor.^[1]
1971.NEC release the μPD707 and μPD708, a two-chip 4-bit CPU.^[16]
1972. IBM announce "System/370 Advanced Function", adding support forvirtual memory withdemand paging
1972. NEC release single-chip 4-bit microprocessor, μPD700.^[17]^[18]
1973. NEC release 4-bitμCOM-4 (μPD751),^[17] combining the μPD707 and μPD708 into a single microprocessor.^[16]
1974. Intel release theIntel 8080, an8-bit microprocessor, designed byFederico Faggin andMasatoshi Shima.
1975.MOS Technology release the 8-bitMOS Technology 6502, the first integrated processor to have an affordable price of $25 when the 6800 rival was $175.
1976.Zilog introduce the 8-bitZilog Z80, designed byFederico Faggin andMasatoshi Shima.
1977.Digital Equipment Corporation introduced its first32-bit VAX superminicomputer, theVAX-11/780.
1978. Intel introduces theIntel 8086 andIntel 8088, the first x86 chips.
1978.Fujitsu releases the MB8843 microprocessor.
1979.Zilog release theZilog Z8000, a 16-bit microprocessor, designed byFederico Faggin andMasatoshi Shima.
1979.Motorola introduce theMotorola 68000, a 16/32-bit microprocessor.
1981.Stanford MIPS introduced, one of the firstreduced instruction set computing (RISC) designs.
1982. Intel introduces theIntel 80286, which was the first Intel processor that could run all the software written for its predecessors, the 8086 and 8088.
1984.Motorola introduces theMotorola 68020, which enabled full 32-bit addressing, and the68851 memory management unit, which supported demand paging.
1985. Intel introduces theIntel 80386, which adds a 32-bit instruction set to the x86 microarchitecture, and supports demand paging.
1985.ARM architecture introduced.
1989. Intel introduces theIntel 80486.
1992.Hitachi introducesSuperH architecture,^[8] a precursor to for ARM'sThumb instruction set.^[9]
1993. Intel launches the originalPentium microprocessor, the first processor with a x86 superscalar microarchitecture.
1994. IBM introduce the first IBM mainframe models to use single-chip microprocessors as CPUs, theIBM System/390 9672 series.
1994. ARM'sThumb instruction set introduced,^[19] inspired byHitachi'sSuperH instruction set.^[9]
1995. Intel introduces thePentium Pro which becomes the foundation for thePentium II,Pentium III,Pentium M andIntel Core architectures.
2000. IBM introducez/Architecture, the 64-bit version of their mainframe architecture.
2000.AMD announcedx86-64 64-bit extension to the x86 microarchitecture.
2000. AMD hits 1 GHz with itsAthlon microprocessor.
2000. Analog Devices introduces theBlackfin architecture.
2002. Intel released aPentium 4 withhyper-threading, the first modern desktop processor to implementsimultaneous multithreading (SMT).
2003. AMD released theAthlon 64, the first64-bit consumer CPU.
2003. Intel introduced thePentium M, a low power mobile derivative of the Pentium Pro architecture.
2005. AMD announced theAthlon 64 X2, their first x86dual-core processor.
2006. Intel introduces theCore line of CPUs based on a modified Pentium M design.
2008. Over 10 billion ARM-based CPUs shipped.
2010. Intel introduced theCore i3,i5, andi7, with 2, 4 and 4 cores respectively.
2011. ARM releaseARMv8-A, supporting the 64-bit AAarch64 architecture.
2011. AMD announced the world's first8-core CPU fordesktop PCs.
2017. AMD announcedRyzen processors based on theZen architecture, with up to 16 cores.
2017. Intel 8th generation Core i3, Core i5, Core i7 and Core i9, increased to approximately 4, 6, 8 and 8 cores respectively.
2017. Over 100 billion ARM-based CPUs shipped.^[20]
2020. Apple launched their ownM1 ARMv8-basedsystem-on-a-chip (SoC), significant in that they switched their devices away from Intel CPUs.^[21]
2021. ARM releaseARMv9 the first major upgrade in a decade, since Armv8 in 2011.^[22]
2021. Over 200 billion ARM-based CPUs shipped.^[23]
2022.AMD 3rd Generation EPYC 64C processors powerFrontier, the world's most powerful supercomputer.^[24]
2024. Apple launched theM4, their firstSoC adopting the ARMv9 CPU architecture.^[25]

References

[edit]

^^a ^bNigel Tout."The Busicom 141-PF calculator and the Intel 4004 microprocessor". RetrievedNovember 15, 2009.
^Aspray, William (1994-05-25)."Oral-History: Tadashi Sasaki".Engineering and Technology History Wiki. The Institute of Electrical and Electronics Engineers, Inc. Retrieved2022-09-14.
^Caswell, Wayne (18 February 2008)."Twenty Technology Trends that Affect Home Networking".Home Toys. Archived fromthe original on March 27, 2016.
^"The Structure of SYSTEM/360".Computer Structures: Principles and Examples.
^"16-bit Microprocessors". CPU Museum. Retrieved5 October 2010.
^^a ^b"History".PFU. Retrieved5 October 2010.
^PANAFACOM Lkit-16,Information Processing Society of Japan
^^a ^b"Hitachi Releases the SH-4 SH7750 Series, Offering Industry's Highest Performance of 360 MIPS for an Embedded RISC Processor, as Top-End Series in SuperH Family".
^^a ^b ^cNathan Willis (June 10, 2015)."Resurrecting the SuperH architecture".LWN.net.
^"Find the best laptop in 2024".Suggesters. Retrieved2025-02-25.
^MiniMIPS
^"SEAforth Overview". Archived fromthe original on 2008-02-02.... asynchronous circuit design throughout the chip. There is no central clock with billions of dumb nodes dissipating useless power. ... the processor cores are internally asynchronous themselves.
^"RISC Processors COMP375 Computer Architecture and Organization"(PDF). Archived fromthe original(PDF) on October 26, 2017.
^"The Belt".
^^a ^bFederico Faggin,The Making of the First Microprocessor,IEEE Solid-State Circuits Magazine, Winter 2009,IEEE Xplore
^^a ^b"NEC 751 (uCOM-4)". The Antique Chip Collector's Page. Archived fromthe original on 2011-05-25. Retrieved2010-06-11.
^^a ^b1970年代マイコンの開発と発展～集積回路,Semiconductor History Museum of Japan
^Jeffrey A. Hart & Sangbae Kim (2001),The Defense of Intellectual Property Rights in the Global Information Order, International Studies Association, Chicago
^ARM7TDMI Technical Reference Manual page ii
^Segars, Simon (2017-02-27)."Enabling mass IoT connectivity as ARM partners ship 100 billion chips". Retrieved2021-09-06.
^Clover Juli (2022-10-13)."Apple M1 Chip: Everything You Need to Know". MacRumors. Retrieved2024-07-21.
^Takahashi, Dean (2021-03-30)."Armv9 is Arm's first major architectural update in a decade".VentureBeat. Retrieved2021-09-06.
^Shilov, Anton (2021-10-20)."Over 200 Billion Arm-Based Chips Shipped". Retrieved2022-09-16.
^"TOP500 List - June 2022". June 2022. Retrieved2022-05-30.
^Larabel Michael (2024-06-15)."Apple M4 Support Added To The LLVM Compiler, Confirming Its ISA Capabilities". Phoronix. Retrieved2024-07-21.

External links

[edit]

Great moments in microprocessor history by W. Warner, 2004
Great Microprocessors of the Past and Present (V 13.4.0) by: John Bayko, 2003
Bit by Bit: An Illustrated History of Computers, Stan Augarten, 1984.OCR with permission of the author
Gallery of CPU and related PCBs (in Italian)[1]

Processor technologies

Models

Architecture

Instruction set
architectures

Types	Orthogonal instruction set CISC RISC Application-specific EDGE TRIPS VLIW EPIC MISC OISC NISC ZISC VISC architecture Quantum computing Comparison Addressing modes
Instruction sets	Motorola 68000 series VAX PDP-11 x86 ARM Stanford MIPS MIPS MIPS-X Power POWER PowerPC Power ISA Clipper architecture SPARC SuperH DEC Alpha ETRAX CRIS M32R Unicore Itanium OpenRISC RISC-V MicroBlaze LMC System/3x0 S/360 S/370 S/390 z/Architecture Tilera ISA VISC architecture Epiphany architecture Others

Execution

Instruction pipelining	Pipeline stall Operand forwarding Classic RISC pipeline
Hazards	Data dependency Structural Control False sharing
Out-of-order	Scoreboarding Tomasulo's algorithm Reservation station Re-order buffer Register renaming Wide-issue
Speculative	Branch prediction Memory dependence prediction

Parallelism

Level	Bit Bit-serial Word Instruction Pipelining Scalar Superscalar Task Thread Process Data Vector Memory Distributed
Multithreading	Temporal Simultaneous Hyperthreading Simultaneous and heterogenous Speculative Preemptive Cooperative
Flynn's taxonomy	SISD SIMD Array processing (SIMT) Pipelined processing Associative processing SWAR MISD MIMD SPMD

Processor
performance

Transistor count
Instructions per cycle (IPC)
- Cycles per instruction (CPI)
Instructions per second (IPS)
Floating-point operations per second (FLOPS)
Transactions per second (TPS)
Synaptic updates per second (SUPS)
Performance per watt (PPW)
Cache performance metrics
Computer performance by orders of magnitude

Types

By application	Embedded system Microprocessor Microcontroller Mobile Ultra-low-voltage ASIP Soft microprocessor
Systems on chip	System on a chip (SoC) Multiprocessor (MPSoC) Cypress PSoC Network on a chip (NoC)
Hardware accelerators	Coprocessor AI accelerator Graphics processing unit (GPU) Image processor Vision processing unit (VPU) Physics processing unit (PPU) Digital signal processor (DSP) Tensor Processing Unit (TPU) Secure cryptoprocessor Network processor Baseband processor

Word size

Core count

Components

Functional units	Arithmetic logic unit (ALU) Address generation unit (AGU) Floating-point unit (FPU) Memory management unit (MMU) Load–store unit Translation lookaside buffer (TLB) Branch predictor Branch target predictor Integrated memory controller (IMC) Memory management unit Instruction decoder
Logic	Combinational Sequential Glue Logic gate Quantum Array
Registers	Processor register Status register Stack register Register file Memory buffer Memory address register Program counter
Control unit	Hardwired control unit Instruction unit Data buffer Write buffer Microcode ROM Counter
Datapath	Multiplexer Demultiplexer Adder Multiplier CPU Binary decoder Address decoder Sum-addressed decoder Barrel shifter
Circuitry	Integrated circuit 3D Mixed-signal Power management Boolean Digital Analog Quantum Switch