This articlerelies excessively onreferences toprimary sources. Please improve this article by addingsecondary or tertiary sources. Find sources: "Ada Lovelace" microarchitecture – news ·newspapers ·books ·scholar ·JSTOR(September 2022) (Learn how and when to remove this message) |
Launched | October 12, 2022; 2 years ago (2022-10-12) |
---|---|
Designed by | Nvidia |
Manufactured by | |
Fabrication process | TSMC4N |
Codename(s) | AD10x |
Product Series | |
Desktop | |
Professional/workstation |
|
Server/datacenter | |
Specifications | |
Clock rate | 735 MHz to 2640 MHz |
L1 cache | 128 KB (per SM) |
L2 cache | 32 MB to 96 MB |
Memory support | |
Memory clock rate | 21-23 Gbit/s |
PCIe support | PCIe 4.0 |
Supported GraphicsAPIs | |
DirectX | DirectX 12 Ultimate (Feature Level 12_2) |
Direct3D | Direct3D 12 |
Shader Model | Shader Model 6.8 |
OpenCL | OpenCL 3.0 |
OpenGL | OpenGL 4.6 |
CUDA | Compute Capability 8.9 |
Vulkan | Vulkan 1.3 |
Supported ComputeAPIs | |
CUDA | CUDA Toolkit 11.6 |
DirectCompute | Yes |
Media Engine | |
Encode codecs | |
Decode codecs | |
Color bit-depth |
|
Encoder(s) supported | NVENC |
Display outputs | |
History | |
Predecessor | Ampere |
Variant | Hopper(datacenter) |
Successor | Blackwell |
Support status | |
Supported |
Ada Lovelace, also referred to simply asLovelace,[1] is agraphics processing unit (GPU) microarchitecture developed byNvidia as the successor to theAmpere architecture, officially announced on September 20, 2022. It is named after the 19th century English mathematicianAda Lovelace,[2] one of the first computerprogrammers. Nvidia announced the architecture along with theGeForce RTX 40 series consumer GPUs[3] and the RTX 6000 Ada Generation workstation graphics card.[4] The Lovelace architecture is fabricated onTSMC's custom4N process which offers increased efficiency over the previousSamsung8 nm and TSMCN7 processes used by Nvidia for its previous-generation Ampere architecture.[5]
The Ada Lovelace architecture follows on from the Ampere architecture that was released in 2020. The Ada Lovelace architecture was announced by Nvidia CEOJensen Huang during a GTC 2022 keynote on September 20, 2022 with the architecture powering Nvidia's GPUs for gaming, workstations and datacenters.[6]
Architectural improvements of the Ada Lovelace architecture include the following:[7]
128 CUDA cores are included in each SM.
Ada Lovelace features third-generation RT cores. The RTX 4090 features 128 RT cores compared to the 84 in the previous generation RTX 3090 Ti. These 128 RT cores can provide up to 191 TFLOPS of compute with 1.49 TFLOPS per RT core.[14]A new stage in the ray tracing pipeline called Shader Execution Reordering (SER) is added in the Lovelace architecture which Nvidia claims provides a 2x performance improvement in ray tracing workloads.[6]
Lovelace's new fourth-generation Tensor cores enable the AI technology used in DLSS 3's frame generation techniques. Much like Ampere, each SM contains 4 Tensor cores but Lovelace contains a greater number of Tensor cores overall given its increased number of SMs.
There is a significant increase in clock speeds with the Ada Lovelace architecture with the RTX 4090's base clock speed being higher than the boost clock speed of the RTX 3090 Ti.
RTX 2080 Ti | RTX 3090 Ti | RTX 4090 | |
---|---|---|---|
Architecture | Turing | Ampere | Ada Lovelace |
Base clock speed (MHz) | 1350 | 1560 | 2235 |
Boost clock speed (MHz) | 1635 | 1860 | 2520 |
RTX 2080 Ti | RTX 3090 Ti | RTX 4090 | |
---|---|---|---|
Architecture | Turing | Ampere | Ada Lovelace |
L1 Data Cache | 6.375 MB (96 KB per SM) | 10.5 MB (128 KB per SM) | 16 MB (128 KB per SM) |
L2 Cache | 5.5 MB | 6 MB | 72 MB |
The last enabled AD102 Lovelacedie features 96 MB of L2 cache, a 16x increase from the 6 MB in the Ampere-based GA102 die.[15] The GPU having quick access to a high amount of L2 cache benefits complex operations like ray tracing compared to the GPU seeking data from the GDDR video memory which is slower. Relying less on accessing memory for storing important and frequently accessed data means that a narrower memory bus width can be used in tandem with a large L2 cache.
Each memory controller uses a 32-bit connection with up to 12 controllers present for a combined memory bus width of 384-bit. The Lovelace architecture can use eitherGDDR6 orGDDR6X memory. GDDR6X memory features on the desktop GeForce RTX 40 series while the more energy-efficient GDDR6 memory is used on its corresponding mobile versions and on RTX A6000 workstation GPUs.
The Ada Lovelace architecture is able to use lower voltages compared to its predecessor.[6] Nvidia claims a 2x performance increase for the RTX 4090 at the same 450W used by the previous generation flagship RTX 3090 Ti.[16]
Increased power efficiency can be attributed in part to the smallerfabrication node used by the Lovelace architecture. The Ada Lovelace architecture is fabricated onTSMC's cutting-edge4N process, a custom designed process node for Nvidia. The previous generation Ampere architecture usedSamsung's 8nm-based8N process node from 2018, which was two years old by the time of Ampere's launch.[17][18] The AD102 die with its 76.3 billion transistors has a transistor density of 125.5 million per mm2, a 178% increase in density from GA102's 45.1 million per mm2.
The Lovelace architecture utilizes the new 8th generation NvidiaNVENC video encoder and the 7th generation NVDEC video decoder introduced by Ampere returns.[19]
NVENCAV1 hardware encoding with support for up to 8K resolution at 60FPS in10-bit color is added, enabling higher video fidelity at lower bit rates compared to theH.264 andH.265 codecs.[20] Nvidia claims that its NVENC AV1 encoder featured in the Lovelace architecture is 40% more efficient than the H.264 encoder in the Ampere architecture.[21]
The Lovelace architecture received criticism for not supporting theDisplayPort 2.0 connection that supports higher display data bandwidth and instead uses the older DisplayPort 1.4a which is limited to a peak bandwidth of 32 Gbit/s.[22] As a result, Lovelace GPUs would be limited by DisplayPort 1.4a's supported refresh rates despite the GPU's performance being able to reach higher frame rates.Intel'sArc GPUs that also released in October 2022 included DisplayPort 2.0.AMD's competingRDNA 3 architecture released just two months after Lovelace includedDisplayPort 2.1.[23]
Die[24] | AD102[25] | AD103[26] | AD104[27] | AD106[28] | AD107[29] |
---|---|---|---|---|---|
Die size | 609 mm2 | 379 mm2 | 294 mm2 | 188 mm2 | 159 mm2 |
Transistors | 76.3B | 45.9B | 35.8B | 22.9B | 18.9B |
Transistor density | 125.3 MTr/mm2 | 121.1 MTr/mm2 | 121.8 MTr/mm2 | 121.8 MTr/mm2 | 118.9 MTr/mm2 |
Graphics processing clusters | 12 | 7 | 5 | 3 | 2 |
Streaming multiprocessors | 144 | 80 | 60 | 36 | 24 |
CUDA cores | 18432 | 10240 | 7680 | 4608 | 3072 |
Texture mapping units | 576 | 320 | 240 | 144 | 96 |
Render output units | 192 | 112 | 80 | 48 | 32 |
Tensor cores | 576 | 320 | 240 | 144 | 96 |
RT cores | 144 | 80 | 60 | 36 | 24 |
L1cache | 18 MB | 10 MB | 7.5 MB | 4.5 MB | 3 MB |
128 KB per SM | |||||
L2 cache | 96 MB | 64 MB | 48 MB | 32 MB |
Type | AD107 | AD106 | AD104 | AD103 | AD102 |
---|---|---|---|---|---|
GeForce 40 Series (Desktop) | GeForce RTX 4060 | GeForce RTX 4060 Ti | GeForce RTX 4070 GeForce RTX 4070 SUPER GeForce RTX 4070 Ti | GeForce RTX 4070 Ti Super GeForce RTX 4080 GeForce RTX 4080 Super | GeForce RTX 4090 D GeForce RTX 4090 |
GeForce 40 Series (Mobile) | GeForce RTX 4050 GeForce RTX 4060 | GeForce RTX 4070 | GeForce RTX 4080 | GeForce RTX 4090 | — |
Nvidia Workstation GPUs (Desktop) | RTX 2000 Ada Generation | — | RTX 4000 Ada Generation RTX 4000 SFF Ada Generation RTX 4500 Ada Generation | — | RTX 5000 Ada Generation RTX 5880 Ada Generation RTX 6000 Ada Generation |
Nvidia Workstation GPUs (Mobile) | RTX 500 Ada Generation RTX 1000 Ada Generation | RTX 3000 Ada Generation | RTX 3500 Ada Generation RTX 4000 Ada Generation | RTX 5000 Ada Generation | — |
Nvidia Data Center GPUs | — | Nvidia L4[30] | — | Nvidia L40 Nvidia L40G Nvidia L40CNX |