Movatterモバイル変換

[0]ホーム

Jump to content

Coprocessor

Edit links

From Wikipedia, the free encyclopedia

Type of computer processor

Acoprocessor is acomputer processor used to supplement the functions of the primary processor (theCPU). Operations performed by the coprocessor may befloating-point arithmetic,graphics,signal processing,string processing,cryptography orI/O interfacing with peripheral devices. By offloading processor-intensive tasks from themain processor, coprocessors can accelerate system performance. Coprocessors allow a line of computers to be customized, so that customers who do not need the extra performance do not need to pay for it.

Functionality

[edit]

Coprocessors vary in their degree of autonomy. Some (such asFPUs) rely on direct control viacoprocessor instructions, embedded in theCPU'sinstruction stream. Others are independent processors in their own right, capable of working asynchronously; they are still not optimized forgeneral-purpose code, or they are incapable of it due to a limitedinstruction set focused onaccelerating specific tasks. It is common for these to be driven bydirect memory access (DMA), with the host processor (a CPU) building acommand list. ThePlayStation 2'sEmotion Engine contained an unusualDSP-likeSIMD vector unit capable of both modes of operation.

History

[edit]

To make the best use ofmainframe computer processor time, input/output tasks were delegated to separate systems calledChannel I/O. The mainframe would not require any I/O processing at all, instead would just set parameters for an input or output operation and then signal the channel processor to carry out the whole of the operation. By dedicating relatively simple sub-processors to handle time-consuming I/O formatting and processing, overall system performance was improved.

Coprocessors for floating-point arithmetic first appeared indesktop computers in the 1970s and became common throughout the 1980s and into the early 1990s. Early 8-bit and 16-bit processors used software to carry outfloating-point arithmetic operations. Where a coprocessor was supported, floating-point calculations could be carried out many times faster. Math coprocessors were popular purchases for users ofcomputer-aided design (CAD) software and scientific and engineering calculations. Some floating-point units, such as theAMD 9511,Intel 8231/8232 andWeitek FPUs were treated as peripheral devices, while others such as theIntel 8087,Motorola 68881 andNational 32081 were more closely integrated with the CPU.

Another form of coprocessor was a video display coprocessor, as used in theAtari 8-bit computers,TI-99/4A, andMSX home computers, which were called "Video Display Controllers". TheAmiga custom chipset includes such a unit known as theCopper, as well as ablitter for acceleratingbitmap manipulation in memory.

As microprocessors developed, the cost of integrating the floating-point arithmetic functions into the processor declined. High processor speeds also made a closely integrated coprocessor difficult to implement. Separately packaged mathematics coprocessors are now uncommon in desktop computers. The demand for adedicated graphics coprocessor has grown, however, particularly due to the increasing demand for realistic3D graphics incomputer games.

Intel

[edit]

Main article:x87

The originalIBM PC included a socket for theIntel 8087 floating-point coprocessor (akaFPU) which was a popular option for people using the PC forcomputer-aided design or mathematics-intensive calculations. In that architecture, the coprocessor speeds up floating-point arithmetic on the order of fiftyfold. Users that only used the PC for word processing, for example, saved the high cost of the coprocessor, which would not have accelerated performance of text manipulation operations.

The 8087 was tightly integrated with the8086/8088 and responded to floating-pointmachine code operation codes inserted in the 8088 instruction stream. An 8088 processor without an 8087 could not interpret these instructions, requiring separate versions of programs for FPU and non-FPU systems, or at least a test at run time to detect the FPU and select appropriate mathematical library functions.

Intel 80386DX CPU with 80387DX math coprocessor

Another coprocessor for the 8086/8088 central processor was the8089 input/output coprocessor. It used the same programming technique as 8087 for input/output operations, such as transfer of data from memory to a peripheral device, and so reducing the load on the CPU. But IBM did not use it in IBM PC design and Intel stopped development of this type of coprocessor.

TheIntel 80386 microprocessor used an optional "math" coprocessor (the80387) to perform floating-point operations directly inhardware. The Intel 80486DX processor included floating-point hardware on the chip. Intel released a cost-reduced processor, the 80486SX, that had no floating-point hardware, and also sold an 80487SX coprocessor that essentially disabled the main processor when installed, since the 80487SX was a complete 80486DX with a different set of pin connections.^[1]

Intel processors later than the 80486 integrated floating-point hardware on the main processor chip; the advances in integration eliminated the cost advantage of selling the floating-point processor as an optional element. It would be very difficult to adapt circuit-board techniques adequate at 75 MHz processor speed to meet the time-delay, power consumption, and radio-frequency interference standards required at gigahertz-range clock speeds. These on-chip floating-point processors are still referred to as coprocessors because they operate in parallel with the main CPU.

During the era of 8- and 16-bit desktop computers another common source of floating-point coprocessors wasWeitek. These coprocessors had a different instruction set from the Intel coprocessors, and used a different socket, which not all motherboards supported. The Weitek processors did not provide transcendental mathematics functions (for example, trigonometric functions) like the Intel x87 family, and required specific software libraries to support their functions.^[2]

Motorola

[edit]

TheMotorola 68000 family had the68881/68882 coprocessors which provided similar floating-point speed acceleration as for the Intel processors. Computers using the 68000 family but not equipped with the hardware floating-point processor could trap and emulate the floating-point instructions in software, which, although slower, allowed one binary version of the program to be distributed for both cases. The 68451 memory-management coprocessor was designed to work with the 68020 processor.^[3]

Modern coprocessors

[edit]

As of 2001^[update], dedicated Graphics Processing Units (GPUs) in the form ofgraphics cards are commonplace. Certain models ofsound cards have been fitted with dedicated processors providing digital multichannel mixing and real-time DSP effects as early as 1990 to 1994 (theGravis Ultrasound andSound Blaster AWE32 being typical examples), while theSound Blaster Audigy and theSound Blaster X-Fi are more recent examples.

In 2006,AGEIA announced an add-in card for computers that it called thePhysX PPU. PhysX was designed to perform complex physics computations so that theCPU and GPU do not have to perform these time-consuming calculations. It was designed for video games, although other mathematical uses could theoretically be developed for it. In 2008, Nvidia purchased the company and phased out the PhysX card line; the functionality was added through software allowing their GPUs to render PhysX on cores normally used for graphics processing, using their Nvidia PhysX engine software.

In 2006, BigFoot Systems unveiled a PCI add-in card they christened the KillerNIC which ran its own special Linux kernel on a FreeScalePowerQUICC running at 400 MHz, calling the FreeScale chip aNetwork Processing Unit or NPU.

TheSpursEngine is a media-oriented add-in card with a coprocessor based on theCell microarchitecture. TheSPUs are themselves vector coprocessors.

In 2008,Khronos Group released theOpenCL with the aim to support general-purpose CPUs, ATI/AMD and Nvidia GPUs (and other accelerators) with a single common language forcompute kernels.

In 2010s, some mobile computation devices had implemented thesensor hub as a coprocessor. Examples of coprocessors used for handling sensor integration in mobile devices include theApple M7 and M8motion coprocessors, theQualcomm Snapdragon Sensor Core andQualcomm Hexagon, and theHolographic Processing Unit for theMicrosoft HoloLens.

In 2012,Intel announced theIntel Xeon Phi coprocessor.^[4]

As of 2016^[update], various companies are developing coprocessors aimed at acceleratingartificial neural networks for vision and other cognitive tasks (e.g.vision processing units,TrueNorth, andZeroth), and as of 2018, such AI chips are in smartphones such as from Apple, and several Android phone vendors.

Other coprocessors

[edit]

TheMIPS architecture supports up to four coprocessor units, used for memory management, floating-point arithmetic, and two undefined coprocessors for other tasks such as graphics accelerators.^[5]
UsingFPGA (field-programmable gate arrays), custom coprocessors can be created for acceleration of particular processing tasks such as digital signal processing (e.g.Zynq, combinesARM cores with FPGA on a single die).
TLS/SSL accelerators, used onservers; such accelerators used to be cards, but in modern times are instructions for crypto in mainstream CPUs.
Somemulti-core chips can be programmed so that one of their processors is the primary processor, and the other processors are supporting coprocessors.
China'sMatrix 2000 128 core PCI-e coprocessor is a proprietary accelerator that requires a CPU to run it, and has been employed in an upgrade of the 17,792 nodeTianhe-2 supercomputer (2 Intel Knights Bridge+ 2 Matrix 2000 each), now dubbed 2A, roughly doubling its speed at 95 petaflops, exceeding theworld's fastest supercomputer.^[6]
A range of coprocessors were available for various models fromAcorn Computers, notably theBBC Micro andBBC Master series. Rather than special-purpose graphics or arithmetic devices, these were general-purpose CPUs (principally the 6502, Zilog Z80, National Semiconductor 32016, and ARM 1) described as second processors, typically interfaced to the host system using a message passing architecture known as theTube, with Acorn's own products providing such processors in aBBC Micro expansion unit with accompanying memory and interfacing circuitry. Software could be executed independently on the second processor, and applications could be written to offload work from the host system, leaving it to perform input/output tasks, resulting in acceleration. Since a range of CPUs were available in a variety of products, a BBC Micro fitted with such a coprocessor was able to run operating systems for other processor architectures, such as CP/M, DOS and Unix, along with accompanying software.

Trends

[edit]

Over time CPUs have tended to grow to absorb the functionality of the most popular coprocessors. FPUs are now considered an integral part of a processors' main pipeline;SIMD units gave multimedia its acceleration, taking over the role of variousDSP accelerator cards; and evenGPUs have become integrated on CPU dies. Nonetheless, specialized units remain popular away from desktop machines, and for additional power, and allow continued evolution independently of the main processor product lines.

References

[edit]

^Scott Mueller,Upgrading and repairing PCs 15th edition, Que Publishing, 2003ISBN 0-7897-2974-1, pages 108–110
^Scott Mueller,Upgrading and Repairing PCs, Second Edition, Que Publishing, 1992ISBN 0-88022-856-3, pp. 412-413
^William Ford, William R. ToppAssembly language and systems programming for the M68000 family Jones & Bartlett Learning, 1992ISBN 0-7637-0357-5 page 892 and ff.
^"Intel Delivers New Architecture for Discovery with Intel® Xeon Phi™ Coprocessors". Newsroom.intel.com. 2012-11-12. Archived fromthe original on 2013-06-03. Retrieved2013-06-16.
^Erin Farquhar, Philip Bunce,The MIPS programmer's handbook, Morgan Kaufmann, 1994ISBN 1-55860-297-6, appendix A3 page 330
^"China's Tianhe-2A will Use Proprietary Accelerator and Boast 95 Petaflops Peak".hpcwire.com. 25 September 2017.Archived from the original on 1 December 2020. Retrieved7 April 2018.

Processor technologies

Models

Architecture

Instruction set
architectures

Types	Orthogonal instruction set CISC RISC Application-specific EDGE TRIPS VLIW EPIC MISC OISC NISC ZISC VISC architecture Quantum computing Comparison Addressing modes
Instruction sets	Motorola 68000 series VAX PDP-11 x86 ARM Stanford MIPS MIPS MIPS-X Power POWER PowerPC Power ISA Clipper architecture SPARC SuperH DEC Alpha ETRAX CRIS M32R Unicore Itanium OpenRISC RISC-V MicroBlaze LMC System/3x0 S/360 S/370 S/390 z/Architecture Tilera ISA VISC architecture Epiphany architecture Others

Execution

Instruction pipelining	Pipeline stall Operand forwarding Classic RISC pipeline
Hazards	Data dependency Structural Control False sharing
Out-of-order	Scoreboarding Tomasulo's algorithm Reservation station Re-order buffer Register renaming Wide-issue
Speculative	Branch prediction Memory dependence prediction

Parallelism

Level	Bit Bit-serial Word Instruction Pipelining Scalar Superscalar Task Thread Process Data Vector Memory Distributed
Multithreading	Temporal Simultaneous Hyperthreading Simultaneous and heterogenous Speculative Preemptive Cooperative
Flynn's taxonomy	SISD SIMD Array processing (SIMT) Pipelined processing Associative processing SWAR MISD MIMD SPMD

Processor
performance

Transistor count
Instructions per cycle (IPC)
- Cycles per instruction (CPI)
Instructions per second (IPS)
Floating-point operations per second (FLOPS)
Transactions per second (TPS)
Synaptic updates per second (SUPS)
Performance per watt (PPW)
Cache performance metrics
Computer performance by orders of magnitude

Types

By application	Embedded system Microprocessor Microcontroller Mobile Ultra-low-voltage ASIP Soft microprocessor
Systems on chip	System on a chip (SoC) Multiprocessor (MPSoC) Cypress PSoC Network on a chip (NoC)
Hardware accelerators	Coprocessor AI accelerator Graphics processing unit (GPU) Image processor Vision processing unit (VPU) Physics processing unit (PPU) Digital signal processor (DSP) Tensor Processing Unit (TPU) Secure cryptoprocessor Network processor Baseband processor

Word size

Core count

Components

Functional units	Arithmetic logic unit (ALU) Address generation unit (AGU) Floating-point unit (FPU) Memory management unit (MMU) Load–store unit Translation lookaside buffer (TLB) Branch predictor Branch target predictor Integrated memory controller (IMC) Memory management unit Instruction decoder
Logic	Combinational Sequential Glue Logic gate Quantum Array
Registers	Processor register Status register Stack register Register file Memory buffer Memory address register Program counter
Control unit	Hardwired control unit Instruction unit Data buffer Write buffer Microcode ROM Counter
Datapath	Multiplexer Demultiplexer Adder Multiplier CPU Binary decoder Address decoder Sum-addressed decoder Barrel shifter
Circuitry	Integrated circuit 3D Mixed-signal Power management Boolean Digital Analog Quantum Switch