Arcade system boards have used specialized graphics circuits since the 1970s. In early video game hardware,RAM for frame buffers was expensive, so video chips composited data together as the display was being scanned out on the monitor.[1]
In 1984,Hitachi released the ARTC HD63484, the first majorCMOS graphics processor for personal computers. The ARTC could display up to4K resolution when inmonochrome mode. It was used in a number of graphics cards and terminals during the late 1980s.[13] In 1985, theAmiga was released with a custom graphics chip including a blitter for bitmap manipulation, line drawing, and area fill. It also included acoprocessor with its own simple instruction set, that was capable of manipulating graphics hardware registers in sync with the video beam (e.g. for per-scanline palette switches,sprite multiplexing, and hardware windowing), or driving the blitter. In 1986,Texas Instruments released theTMS34010, the first fully programmable graphics processor.[14] It could run general-purpose code but also had a graphics-oriented instruction set. During 1990–1992, this chip became the basis of theTexas Instruments Graphics Architecture ("TIGA")Windows accelerator cards.
TheIBM 8514 Micro Channel adapter, with memory add-on
In 1987, theIBM 8514 graphics system was released. It was one of the first video cards forIBM PC compatibles that implementedfixed-function 2D primitives inelectronic hardware.Sharp'sX68000, released in 1987, used a custom graphics chipset[15] with a 65,536 color palette and hardware support for sprites, scrolling, and multiple playfields.[16] It served as a development machine forCapcom'sCP System arcade board. Fujitsu'sFM Towns computer, released in 1989, had support for a 16,777,216 color palette.[17] In 1988, the first dedicatedpolygonal 3D graphics boards were introduced in arcades with theNamco System 21[18] andTaito Air System.[19]
In 1991,S3 Graphics introduced theS3 86C911, which its designers named after thePorsche 911 as an indication of the performance increase it promised.[21] The 86C911 spawned a variety of imitators: by 1995, all major PC graphics chip makers had added2D acceleration support to their chips.[22] Fixed-functionWindows accelerators surpassed expensive general-purpose graphics coprocessors in Windows performance, and such coprocessors faded from the PC market.
In October 2002, with the introduction of theATIRadeon 9700 (also known as R300), the world's firstDirect3D 9.0 accelerator, pixel and vertexshaders could implementlooping and lengthyfloating point math, and were quickly becoming as flexible as CPUs, yet orders of magnitude faster for image-array operations. Pixel shading is often used forbump mapping, which adds texture to make an object look shiny, dull, rough, or even round or extruded.[32]
With the introduction of the NvidiaGeForce 8 series and new generic stream processing units, GPUs became more generalized computing devices.Parallel GPUs are making computational inroads against the CPU, and a subfield of research, dubbed GPU computing orGPGPU forgeneral purpose computing on GPU, has found applications in fields as diverse asmachine learning,[33]oil exploration, scientificimage processing,linear algebra,[34]statistics,[35]3D reconstruction, andstock options pricing. GPGPUs were the precursors to what is now called a compute shader (e.g.CUDA,OpenCL,DirectCompute) and actually abused the hardware to a degree by treating the data passed to algorithms as texture maps and executing algorithms by drawing a triangle or quad with an appropriate pixel shader.[clarification needed] This entails some overheads since units like thescan converter are involved where they are not needed (nor are triangle manipulations even a concern—except to invoke the pixel shader).[clarification needed]
Nvidia's CUDA platform, first introduced in 2007,[36] was the earliest widely adopted programming model for GPU computing. OpenCL is an open standard defined by theKhronos Group that allows for the development of code for both GPUs and CPUs with an emphasis on portability.[37] OpenCL solutions are supported by Intel, AMD, Nvidia, and ARM, and according to a report in 2011 byEvans Data, OpenCL had become the second most popular HPC tool.[38]
In 2010, Nvidia partnered withAudi to power their cars' dashboards, using theTegra GPU to provide increased functionality to cars' navigation and entertainment systems.[39] Advances in GPU technology in cars helped advanceself-driving technology.[40] AMD'sRadeon HD 6000 series cards were released in 2010, and in 2011 AMD released its 6000M Series discrete GPUs for mobile devices.[41] TheKepler line of graphics cards by Nvidia were released in 2012 and were used in the Nvidia 600 and 700 series cards. A feature in this GPU microarchitecture included GPU boost, a technology that adjusts the clock-speed of a video card to increase or decrease according to its power draw.[42] Kepler also introducedNVENC video encoding acceleration technology.
ThePS4 andXbox One were released in 2013; they both used GPUs based onAMD's Radeon HD 7850 and 7790.[43] Nvidia's Kepler line of GPUs was followed by theMaxwell line, manufactured on the same process. Nvidia's 28 nm chips were manufactured byTSMC in Taiwan using the 28 nm process. Compared to the 40 nm technology from the past, this manufacturing process allowed a 20 percent boost in performance while drawing less power.[44][45]Virtual reality headsets have high system requirements; manufacturers recommended the GTX 970 and the R9 290X or better at the time of their release.[46][47] Cards based on thePascal microarchitecture were released in 2016. TheGeForce 10 series of cards are of this generation of graphics cards. They are made using the 16 nm manufacturing process which improves upon previous microarchitectures.[48]
In 2018, Nvidia launched the RTX 20 series GPUs that addedray tracing cores to GPUs, improving their performance on lighting effects.[49]Polaris 11 andPolaris 10 GPUs from AMD are fabricated by a 14 nm process. Their release resulted in a substantial increase in the performance per watt of AMD video cards.[50] AMD also released the Vega GPU series for the high end market as a competitor to Nvidia's high end Pascal cards, also featuringHBM2 like the Titan V.
In 2019, AMD released the successor to theirGraphics Core Next (GCN) microarchitecture/instruction set. DubbedRDNA, the first product featuring it was theRadeon RX 5000 series of video cards.[51] The company announced that the successor to the RDNA microarchitecture would be incremental (a "refresh"). AMD unveiled theRadeon RX 6000 series, itsRDNA 2 graphics cards with support for hardware-accelerated ray tracing.[52] The product series, launched in late 2020, consisted of the RX 6800, RX 6800 XT, and RX 6900 XT.[53][54] The RX 6700 XT, which is based on Navi 22, was launched in early 2021.[55]
ThePlayStation 5 andXbox Series X and Series S were released in 2020; they both use GPUs based on the RDNA 2 microarchitecture with incremental improvements and different GPU configurations in each system's implementation.[56][57][58]
In the 2020s, GPUs have been increasingly used for calculations involvingembarrassingly parallel problems, such as training ofneural networks on enormous datasets that are needed for artificial intelligencelarge language models. Specialized processing cores on some modern workstation's GPUs are dedicated fordeep learning since they have significantFLOPS performance increases, using 4×4 matrix multiplication and division, resulting in hardware performance up to 128 TFLOPS in some applications.[59] These tensor cores are expected to appear in consumer cards, as well.[needs update][60]
Many companies have produced GPUs under a number of brand names. In 2009,[needs update]Intel,Nvidia, andAMD/ATI were the market share leaders, with 49.4%, 27.8%, and 20.6% market share respectively. In addition,Matrox[61] produces GPUs. Chinese companies such asJingjia Micro have also produced GPUs for the domestic market although in terms of worldwide sales, they lag behind market leaders.[62]
Several factors of GPU construction affect the performance of the card for real-time rendering, such as the size of the connector pathways in thesemiconductor device fabrication, theclock signal frequency, and the number and size of various on-chip memorycaches. Performance is also affected by the number of streaming multiprocessors (SM) for NVidia GPUs, or compute units (CU) for AMD GPUs, or Xe cores for Intel discrete GPUs, which describe the number of on-silicon processor core units within the GPU chip that perform the core calculations, typically working in parallel with other SM/CUs on the GPU. GPU performance is typically measured in floating point operations per second (FLOPS); GPUs in the 2010s and 2020s typically deliver performance measured in teraflops (TFLOPS). This is an estimated performance measure, as other factors can affect the actual display rate.[63]
The ATI HD5470 GPU (above, with copperheatpipe attached) featuresUVD 2.1 which enables it to decode AVC and VC-1 video formats.
In the 1970s, the term "GPU" originally stood forgraphics processor unit and described a programmable processing unit working independently from the CPU that was responsible for graphics manipulation and output.[65][66] In 1994,Sony used the term (now standing forgraphics processing unit) in reference to thePlayStation console'sToshiba-designedSony GPU.[31] The term was popularized byNvidia in 1999, who marketed theGeForce 256 as "the world's first GPU".[67] It was presented as a "single-chipprocessor with integratedtransform, lighting, triangle setup/clipping, and rendering engines".[68] RivalATI Technologies coined the term "visual processing unit" orVPU with the release of theRadeon 9700 in 2002.[69] TheAMD Alveo MA35D features dual VPU’s, each using the5 nm process in 2023.[70]
In personal computers, there are two main forms of GPUs. Each has many synonyms:[71]
Dedicated graphics processing units useRAM that is dedicated to the GPU rather than relying on the computer’s main system memory. This RAM is usually specially selected for the expected serial workload of the graphics card (seeGDDR). Sometimes systems with dedicateddiscrete GPUs were called "DIS" systems as opposed to "UMA" systems (see next section).[72]
Technologies such asScan-Line Interleave by 3dfx,SLI andNVLink by Nvidia andCrossFire by AMD allow multiple GPUs to draw images simultaneously for a single screen, increasing the processing power available for graphics. These technologies, however, are increasingly uncommon; most games do not fully use multiple GPUs, as most users cannot afford them.[73][74][75] Multiple GPUs are still used on supercomputers (like inSummit), on workstations to accelerate video (processing multiple videos at once)[76][77][78] and 3D rendering,[79] forVFX,[80]GPGPU workloads and for simulations,[81] and in AI to expedite training, as is the case with Nvidia's lineup of DGX workstations and servers, Tesla GPUs, and Intel's Ponte Vecchio GPUs.
The position of an integrated GPU in a northbridge/southbridge system layoutAnASRock motherboard with integrated graphics, which has HDMI, VGA and DVI-out ports
Integrated graphics processing units (IGPU),integrated graphics,shared graphics solutions,integrated graphics processors (IGP), orunified memory architectures (UMA) use a portion of a computer's system RAM rather than dedicated graphics memory. IGPs can be integrated onto a motherboard as part of itsnorthbridge chipset,[82] or on the samedie (integrated circuit) with the CPU (likeAMD APU orIntel HD Graphics). On certain motherboards,[83] AMD's IGPs can use dedicated sideport memory: a separate fixed block of high performance memory that is dedicated for use by the GPU. As of early 2007[update], computers with integrated graphics account for about 90% of all PC shipments.[84][needs update] They are less costly to implement than dedicated graphics processing, but tend to be less capable. Historically, integrated processing was considered unfit for 3D games or graphically intensive programs but could run less intensive programs such as Adobe Flash. Examples of such IGPs would be offerings from SiS and VIA circa 2004.[85] However, modern integrated graphics processors such asAMD Accelerated Processing Unit andIntel Graphics Technology (HD, UHD, Iris, Iris Pro, Iris Plus, andXe-LP) can handle 2D graphics or low-stress 3D graphics.
Since GPU computations are memory-intensive, integrated processing may compete with the CPU for relatively slow system RAM, as it has minimal or no dedicated video memory. IGPs use system memory with bandwidth up to a current maximum of 128 GB/s, whereas a discrete graphics card may have a bandwidth[86] of more than 1000 GB/s between itsVRAM and GPU core. Thismemory bus bandwidth can limit the performance of the GPU, thoughmulti-channel memory can mitigate this deficiency.[87] Older integrated graphics chipsets lacked hardwaretransform and lighting, but newer ones include it.[88][89]
On systems with "Unified Memory Architecture" (UMA), including modern AMD processors with integrated graphics,[90] modern Intel processors with integrated graphics,[91] Apple processors, the PS5 and Xbox Series (among others), the CPU cores and the GPU block share the same pool of RAM and memory address space.
Stream processing and general purpose GPUs (GPGPU)
It is common to use ageneral purpose graphics processing unit (GPGPU) as a modified form ofstream processor (or avector processor), runningcompute kernels. This turns the massive computational power of a modern graphics accelerator's shader pipeline into general-purpose computing power. In certain applications requiring massive vector operations, this can yield several orders of magnitude higher performance than a conventional CPU. The two largest discrete (see "Dedicated graphics processing unit" above) GPU designers,AMD andNvidia, are pursuing this approach with an array of applications. Both Nvidia and AMD teamed withStanford University to create a GPU-based client for theFolding@home distributed computing project for protein folding calculations. In certain circumstances, the GPU calculates forty times faster than the CPUs traditionally used by such applications.[92][93]
GPU-based high performance computers play a significant role in large-scale modelling. Three of the ten most powerful supercomputers in the world take advantage of GPU acceleration.[94]
Since 2005 there has been interest in using the performance offered by GPUs forevolutionary computation in general, and for accelerating thefitness evaluation ingenetic programming in particular. Most approaches compilelinear ortree programs on the host PC and transfer the executable to the GPU to be run. Typically a performance advantage is only obtained by running the single active program simultaneously on many example problems in parallel, using the GPU'sSIMD architecture.[95] However, substantial acceleration can also be obtained by not compiling the programs, and instead transferring them to the GPU, to be interpreted there.[96]
Therefore, it is desirable to attach a GPU to some external bus of a notebook.PCI Express is the only bus used for this purpose. The port may be, for example, anExpressCard ormPCIe port (PCIe ×1, up to 5 or 2.5 Gbit/s respectively), aThunderbolt 1, 2, or 3 port (PCIe ×4, up to 10, 20, or 40 Gbit/s respectively), aUSB4 port with Thunderbolt compatibility, or anOCuLink port. Those ports are only available on certain notebook systems.[97] eGPU enclosures include their own power supply (PSU), because powerful GPUs can consume hundreds of watts.[98]
Graphics processing units (GPU) have continued to increase in energy usage, while CPUs designers have recently[when?] focused on improving performance per watt. High performance GPUs may draw large amount of power, therefore intelligent techniques are required to manage GPU power consumption. Measures like3DMark2006 score per watt can help identify more efficient GPUs.[99] However that may not adequately incorporate efficiency in typical use, where much time is spent doing less demanding tasks.[100]
With modern GPUs, energy usage is an important constraint on the maximum computational capabilities that can be achieved. GPU designs are usually highly scalable, allowing the manufacturer to put multiple chips on the same video card, or to use multiple video cards that work in parallel. Peak performance of any system is essentially limited by the amount of power it can draw and the amount of heat it can dissipate. Consequently, performance per watt of a GPU design translates directly into peak performance of a system that uses that design.
Since GPUs may also be used for somegeneral purpose computation, sometimes their performance is measured in terms also applied to CPUs, such as FLOPS per watt.
In 2013, 438.3 million GPUs were shipped globally and the forecast for 2014 was 414.2 million. However, by the third quarter of 2022, shipments of PC GPUs totaled around 75.5 million units, down 19% year-over-year.[101][needs update][102]
^Raina, Rajat; Madhavan, Anand; Ng, Andrew Y. (2009-06-14). "Large-scale deep unsupervised learning using graphics processors".Proceedings of the 26th Annual International Conference on Machine Learning – ICML '09. Dl.acm.org. pp. 1–8.doi:10.1145/1553374.1553486.ISBN9781605585161.S2CID392458.
^Barron, E. T.; Glorioso, R. M. (September 1973). "A micro controlled peripheral processor".Conference record of the 6th annual workshop on Microprogramming – MICRO 6. pp. 122–128.doi:10.1145/800203.806247.ISBN9781450377836.S2CID36942876.
V. Garcia and E. Debreuve and M. Barlaud.Fast k nearest neighbor search using GPU. In Proceedings of the CVPR Workshop on Computer Vision on GPU, Anchorage, Alaska, USA, June 2008.