Movatterモバイル変換

[0]ホーム

Jump to content

Tesla (microarchitecture)

Edit links

From Wikipedia, the free encyclopedia

GPU microarchitecture by Nvidia

This article is about the GPU microarchitecture. For GPGPU cards, seeNvidia Tesla.

Nvidia Tesla
History
Release date	November 2006
Fabrication process	90 nm, 80 nm, 65 nm, 55 nm, and 40 nm
Predecessor	Curie
Successor	Fermi
Support status
Unsupported

Tesla is the codename for a GPUmicroarchitecture developed byNvidia, and released in 2006, as the successor toCurie microarchitecture. It was named after the pioneering electrical engineerNikola Tesla.^[1] As Nvidia's first microarchitecture to implement unified shaders, it was used withGeForce 8 series,GeForce 9 series,GeForce 100 series,GeForce 200 series, andGeForce 300 series of GPUs, collectively manufactured in90 nm,80 nm,65 nm,55 nm, and40 nm. It was also in theGeForce 405 and in theQuadro FX, Quadro x000, Quadro NVS series, andNvidia Tesla computing modules.

Tesla replaced the oldfixed-pipeline microarchitectures, represented at the time of introduction by theGeForce 7 series. It competed directly withAMD's first unified shader microarchitecture namedTeraScale, a development ofATi's work on theXbox 360 which used a similar design. Tesla was followed byFermi.

Overview

[edit]

Tesla is Nvidia's first microarchitecture implementing theunified shader model. The driver supportsDirect3D 10 Shader Model 4.0 /OpenGL 2.1 (later drivers have OpenGL 3.3 support) architecture. The design is a major shift for NVIDIA in GPU functionality and capability, the most obvious change being the move from the separate functional units (pixel shaders, vertex shaders) within previous GPUs to a homogeneous collection of universalfloating point processors (called "stream processors") that can perform a more universal set of tasks.

Die shot of the GT200 GPU found inside NVIDIA GeForce GTX 280 cards, based on the Tesla microarchitecture

GeForce 8's unified shader architecture consists of a number ofstream processors (SPs). Unlike thevector processing approach taken with older shader units, each SP isscalar and thus can operate only on one component at a time. This makes them less complex to build while still being quite flexible and universal. Scalar shader units also have the advantage of being more efficient in a number of cases as compared to previous generationvector shader units that rely on ideal instruction mixture and ordering to reach peak throughput. The lower maximum throughput of these scalar processors is compensated for by efficiency and by running them at a high clock speed (made possible by their simplicity). GeForce 8 runs the various parts of its core at differing clock speeds (clock domains), similar to the operation of the previousGeForce 7 series GPUs. For example, the stream processors of GeForce 8800 GTX operate at a 1.35 GHz clock rate while the rest of the chip is operating at 575 MHz.^[2]

GeForce 8 performs significantly bettertexture filtering than its predecessors that used various optimizations and visual tricks to speed up rendering without impairing filtering quality. The GeForce 8 line correctly renders an angle-independentanisotropic filtering algorithm along with fulltrilinear texture filtering. G80, though not its smaller brethren, is equipped with much more texture filtering arithmetic ability than the GeForce 7 series. This allows high-quality filtering with a much smaller performance hit than previously.^[2]

NVIDIA has also introduced new polygon edgeanti-aliasing methods, including the ability of the GPU'sROPs to perform bothMultisample anti-aliasing (MSAA) and HDR lighting at the same time, correcting various limitations of previous generations. GeForce 8 can perform MSAA with both FP16 and FP32 texture formats. GeForce 8 supports 128-bitHDR rendering, an increase from prior cards' 64-bit support. The chip's new anti-aliasing technology, called coverage sampling AA (CSAA), uses Z, color, and coverage information to determine final pixel color. This technique of color optimization allows 16X CSAA to look crisp and sharp.^[3]

Performance

[edit]

The claimed theoreticalsingle-precision processing power for Tesla-based cards given inFLOPS may be hard to reach in real-world workloads.^[4]

In G80/G90/GT200, each Streaming Multiprocessor (SM) contains 8 Shader Processors (SP, or Unified Shader, orCUDA Core) and 2 Special Function Units (SFU). Each SP can fulfill up to two single-precision operations per clock: 1 Multiply and 1 Add, using a singleMAD instruction. Each SFU can fulfill up to four operations per clock: four MUL (Multiply) instructions. So one SM as a whole can execute 8 MADs (16 operations) and 8 MULs (8 operations) per clock, or 24 operations per clock, which is (relatively speaking) 3 times the number of SPs. Therefore, to calculate the theoretical dual-issue MAD+MUL performance in floating point operations per second [FLOPS_sp+sfu,GFLOPS] of a graphics card with SP count [n] and shader frequency [f, GHz], the formula is:FLOPS_sp+sfu = 3 × n × f.^[5]^[6]

However leveraging dual-issue performance like MAD+MUL is problematic:

Dual-issuing the MUL is not available in graphics mode on G80/G90,^[7] though it was much improved in GT200.^[8]
Not all combinations of instructions like MAD+MUL can be executed in parallel on the SP and SFU, because the SFU is rather specialized as it can only handle a specific subset of instructions: 32-bit floating point multiplication, transcendental functions, interpolation for parameter blending, reciprocal, reciprocal square root, sine, cosine, etc.^[9]
The SFU could become busy for many cycles when executing these instructions, in which case it is unavailable for dual-issuing MUL instructions.^[5]

For these reasons, in order to estimate the performance of real-world workloads, it may be more helpful to ignore the SFU and to assume only 1 MAD (2 operations) per SP per cycle. In this case the formula to calculate the theoretical performance in floating point operations per second becomes:FLOPS_sp = 2 × n × f.

The theoreticaldouble-precision processing power of a Tesla GPU is 1/8 of the single precision performance on GT200; there is no double precision support on G8x and G9x.^[10]

Video decompression/compression

[edit]

NVDEC

[edit]

Main article:Nvidia NVDEC

NVENC

[edit]

Main article:Nvidia NVENC

NVENC was only introduced in later chips.

Chips

[edit]

G80
G84
G86
G92
G92B
G94
G94B
G96
G96B
G96C
G98
C77
C78
C79
C7A
C7A-ION
ION
GT200
GT200B
GT215
GT216
GT218
C87
C89

References

[edit]

^NVIDIA [@nvidia] (10 July 2017)."Happy Birthday to Nikola Tesla, an inspiring inventor and the namesake of our data center GPUs. He was born in 1856 #OnThisDay" (Tweet). Retrieved5 April 2023 – viaTwitter.
^^a ^bWasson, Scott.NVIDIA's GeForce 8800 graphics processor Archived 15 July 2007 at theWayback Machine, Tech Report, 8 November 2007.
^Sommefeldt, Rys.NVIDIA G80: Image Quality Analysis, Beyond3D, 12 December 2006.
^"Beyond3D - NVIDIA GT200 GPU and Architecture Analysis".www.beyond3d.com.
^^a ^bAnand Lal Shimpi & Derek Wilson."Derek Gets Technical: 15th Century Loom Technology Makes a Comeback - NVIDIA's 1.4 Billion Transistor GPU: GT200 Arrives as the GeForce GTX 280 & 260". Archived fromthe original on 12 April 2010.
^Anand Lal Shimpi & Derek Wilson."G80: A Mile High Overview - NVIDIA's GeForce 8800 (G80): GPUs Re-architected for DirectX 10". Archived fromthe original on 23 November 2010.
^Sommefeldt, Rys.NVIDIA G80: Architecture and GPU Analysis - Page 11, Beyond3D, 8 November 2006
^"Technical Brief NVIDIA GeForce GTX 200 GPU Architectural Overview"(PDF). May 2008. p. 15. Retrieved5 December 2015.The individual streaming processing cores of GeForce GTX 200 GPUs can now perform near full-speed dual-issue of multiply-add operations (MADs) and MULs (3 flops/SP)
^Kanter, David (8 September 2008)."NVIDIA's GT200: Inside a Parallel Processor".Real World Tech. p. 9.
^Smith, Ryan (17 March 2015)."The NVIDIA GeForce GTX Titan X Review".AnandTech. p. 2. Archived fromthe original on 19 March 2015.

External links

[edit]

Wikimedia Commons has media related toNvidia Tesla series.

Nvidia

GeForce(List of GPUs)

Fixed pixel pipeline

Pre-GeForce	NV1 NV2 RIVA 128 RIVA TNT TNT2

Vertex andpixel shaders

GeForce 3

4 Ti

Unified shaders

Unified shaders &NUMA

Ray tracing &Tensor Cores

Software and technologies

Multimedia acceleration	NVENC (video encoding) NVDEC (video decoding) PureVideo (video decoding)
Software	Cg (shading language) CUDA Nvidia GameWorks OptiX (ray tracing API) PhysX (physics SDK) Nvidia Omniverse (3D graphics) Nvidia RTX (ray tracing platform) Nvidia System Tools VDPAU (video decode API)
Technologies	Nvidia 3D Vision (stereo 3D) Nvidia G-Sync (variable refresh rate) Nvidia Optimus (GPU switching) Nvidia Surround (multi-monitor) MXM (module/socket) SXM (module/socket) NVLink (protocol) Scalable Link Interface (multi-GPU) TurboCache (framebuffer in system memory) Video Super Resolution (live video upscaling)
GPU microarchitectures	Celsius Kelvin Rankine Curie Tesla Fermi Kepler Maxwell Pascal Volta Turing Ampere Hopper Ada Lovelace Blackwell Rubin Feynman

Other products

GraphicsWorkstation cards	Nvidia Quadro Quadro Plex
GPGPU software	Nvidia Tesla DGX
Console components	NV2A(Xbox) RSX 'Reality Synthesizer'(PlayStation 3) Tegra X1(Nintendo Switch) Tegra T239 "Drake"(Nintendo Switch 2)
Nvidia Shield	Shield Portable Shield Tablet Shield Android TV GeForce Now
SoCs and embedded	GoForce Drive Jetson Tegra
CPUs	Project Denver
Computerchipsets	nForce

Company

Key people	Jen-Hsun Huang Chris Malachowsky Curtis Priem David Kirk Bill Dally Colette Kress Debora Shoquist Ranga Jayaraman Jonah M. Alben
Acquisitions	3dfx Interactive Ageia ULi Bright Computing Cumulus Networks DeepMap Icera Mellanox Technologies Mental Images PortalPlayer Exluna MediaQ Stexar