Processor design is a subfield of computer engineering and electronics that deals with creating a processor, a key component of computer hardware. While historically focused on thecentral processing unit (CPU), modern design often involvessystem-on-chip (SoC) architectures[1], which integrate multiple processing units such as CPUs,graphics processing units (GPUs), andneural processing units (NPUs)[2] onto a single die or set of chiplets.[3][4]
The design process involves choosing aninstruction set and a certain execution paradigm (e.g.VLIW orRISC) and results in amicroarchitecture, which might be described in e.g.VHDL orVerilog. Formicroprocessor design, this description is then manufactured employing some of the varioussemiconductor device fabrication processes, resulting in adie which is bonded onto achip carrier. This chip carrier is then soldered onto, or inserted into asocket on, aprinted circuit board (PCB).
The mode of operation of any processor is the execution of lists of instructions. Instructions typically include those to compute or manipulate data values usingregisters, change or retrieve values in read/write memory, perform relational tests between data values and to control program flow.
Processor designs are often tested and validated on one or several FPGAs before sending the design of the processor to a foundry forsemiconductor fabrication.[5]
CPU design is divided into multiple components. Information is transferred throughdatapaths (such asALUs andpipelines). These datapaths are controlled through logic bycontrol units.Memory components includeregister files andcaches to retain information, or certain actions.Clock circuitry maintains internal rhythms and timing through clock drivers,PLLs, andclock distribution networks. Pad transceiver circuitry which allows signals to be received and sent and alogic gate celllibrary which is used to implement the logic. Logic gates are the foundation for processor design as they are used to implement most of the processor's components.[6]
CPUs designed for high-performance markets might require custom (optimized or application-specific (see below)) designs for each of these items to achieve frequency,power-dissipation, and chip-area goals whereas CPUs designed for lower performance markets might lessen the implementation burden by acquiring some of these items by purchasing them asintellectual property. Control logic implementation techniques (logic synthesis usingCAD tools) can be used to implement datapaths, register files, and clocks. Common logic styles used in CPU design include unstructured random logic,finite-state machines,microprogramming (common from 1965 to 1985), andprogrammable logic arrays (common in the 1980s, no longer common).
Modern processor designs increasingly rely onheterogeneous computing, integrating specialized accelerators alongside general-purpose cores. The most prominent addition is theNeural Processing Unit (NPU), designed specifically to execute machine learning mathematics (matrix multiplication) more efficiently than a standard CPU. This specialization allows for significant gains in performance-per-watt for AI workloads.[7][4]
Device technologies used to implementCPU logic have changed over time. Early implementations used individual relays, vacuum tubes, and discrete components (transistors anddiodes), and later small-scale integrationTTL chips, but these are no longer used for CPUs. Programmable array logic and other programmable logic devices are also no longer used for CPUs in this role, and ECL gate arrays are now uncommon.CMOS gate arrays are no longer used for CPUs, while CMOS mass-produced integrated circuits account for most CPUs by volume. Custom CMOSASICs are generally practical only for high-volume applications because of the engineering cost.Field-programmable gate arrays (FPGAs) remain common for soft microprocessors and are often used for reconfigurable computing.
A CPU design project generally has these major tasks:
Re-designing a CPU core to a smaller die area helps to shrink everything (a "photomask shrink"), resulting in the same number of transistors on a smaller die. It improves performance (smaller transistors switch faster), reduces power (smaller wires have lessparasitic capacitance) and reduces cost (more CPUs fit on the same wafer of silicon). Releasing a CPU on the same size die, but with a smaller CPU core, keeps the cost about the same but allows higher levels of integration within onevery-large-scale integration chip (additional cache, multiple CPUs or other components), improving performance and reducing overall system cost.
As with most complex electronic designs, thelogic verification effort (proving that the design does not have bugs) now dominates the project schedule of a CPU.
Key CPU architectural innovations includeaccumulator,index register,general-purpose register,cache,virtual memory,instruction pipelining,superscalar,CISC,RISC,virtual machine,emulators,microprogram, andstack.
A variety ofnew CPU design ideas have been proposed,includingreconfigurable logic,clockless CPUs,computational RAM, andoptical computing.
Benchmarking is a way of testing CPU speed. Examples include SPECint andSPECfp, developed byStandard Performance Evaluation Corporation, and ConsumerMark developed by the Embedded Microprocessor Benchmark ConsortiumEEMBC.
Some of the commonly used metrics include:
There may be tradeoffs in optimizing some of these metrics. In particular, many design techniques that make a CPU run faster make the "performance per watt", "performance per dollar", and "deterministic response" much worse, and vice versa.
There are several different markets in which CPUs are used. Since each of these markets differ in their requirements for CPUs, the devices designed for one market are in most cases inappropriate for the other markets.
In the general-purpose computing market (desktop, laptop, and server computers), processors implementing the x86-64 instruction set architecture remain widely used, with Intel and AMD as the primary suppliers. Within the x86 CPU market, Mercury Research estimated that Intel held 74.4% and AMD 25.6% of unit shipments in Q3 2025.[10] Arm-based processors dominate smartphones and are also used in some PCs and servers; ABI Research forecast that Arm-based PCs would represent about 13% of total PC shipments in 2025, while IDC estimated that Arm-architecture servers would account for 21.1% of total server shipments in 2025.[11] RISC-V has also seen growing adoption in embedded systems, and some vendors have announced RISC-V-based microcontroller families for automotive applications.[12]
Since these devices are used to run countless different types of programs, these CPU designs are not specifically targeted at one type of application or one function. The demands of being able to run a wide range of programs efficiently has made these CPU designs among the more advanced technically, along with some disadvantages of being relatively costly, and having high power consumption.
Scientific computing is a much smaller niche market (in revenue and units shipped). It is used in government research labs and universities. Before 1990, CPU design was often done for this market, but mass market CPUs organized into large clusters have proven to be more affordable. The main remaining area of active hardware design and research for scientific computing is for high-speed data transmission systems to connect mass market CPUs.
As measured by units shipped, most CPUs are embedded in other machinery, such as telephones, clocks, appliances, vehicles, and infrastructure. Embedded processors sell in the volume of many billions of units per year, however, mostly at much lower price points than that of the general purpose processors.
These single-function devices differ from the more familiar general-purpose CPUs in several ways:
The embedded CPU family with the largest number of total units shipped is the8051, averaging nearly a billion units per year.[13] The 8051 is widely used because it is very inexpensive. The design time is now roughly zero, because it is widely available as commercial intellectual property. It is now often embedded as a small part of a larger system on a chip. The silicon cost of an 8051 is now as low as US$0.001, because some implementations use as few as 2,200 logic gates and take 0.4730 square millimeters of silicon.[14][15]
ARM architecture dominates embedded and mobile processor shipments globally. As of 2024, ARM-based processors account for the majority of all processor units shipped annually, driven by widespread adoption in smartphones, IoT devices, and microcontrollers. The original ARM architecture and first ARM chip were designed in approximately one and a half years with 5 human-years of effort.[16]
The 32-bitParallax Propeller microcontroller architecture and the first chip were designed by two people in about 10 human years of work time.[17]
The 8-bitAVR architecture and first AVR microcontroller was conceived and designed by two students at the Norwegian Institute of Technology.
The 8-bit 6502 architecture and the firstMOS Technology 6502 chip were designed in 13 months by a group of about 9 people.[18]
The 32-bitBerkeley RISC I and RISC II processors were mostly designed by a series of students as part of a four quarter sequence of graduate courses.[19]This design became the basis of the commercialSPARC processor design.
For about a decade, every student taking the 6.004 class at MIT was part of a team—each team had one semester to design and build a simple 8 bit CPU out of7400 seriesintegrated circuits.One team of 4 students designed and built a simple 32 bit CPU during that semester.[20]
Some undergraduate courses require a team of 2 to 5 students to design, implement, and test a simple CPU in a FPGA in a single 15-week semester.[21]
The MultiTitan CPU was designed with 2.5 man years of effort, which was considered "relatively little design effort" at the time.[22]24 people contributed to the 3.5 year MultiTitan research project, which included designing and building a prototype CPU.[23]
For embedded systems, the highest performance levels are often not needed or desired due to the power consumption requirements. This allows for the use of processors which can be totally implemented bylogic synthesis techniques. These synthesized processors can be implemented in a much shorter amount of time, giving quickertime-to-market.