Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

CDNA (microarchitecture)

From Wikipedia, the free encyclopedia
AMD compute-focused GPU microarchitecture

AMD CDNA
Release dateNovember 16, 2020
(4 years ago)
 (2020-11-16)
Designed byAMD
Fabrication process
History
PredecessorAMD FirePro
VariantRDNA (consumer, professional)

CDNA (Compute DNA) is a compute-centeredgraphics processing unit (GPU)microarchitecture designed byAMD for datacenters. Mostly used in theAMD Instinct line of data center graphics cards, CDNA is a successor to theGraphics Core Next (GCN) microarchitecture; the other successor beingRDNA (Radeon DNA), a consumer graphics focused microarchitecture.

The first generation of CDNA was announced on March 5th, 2020,[2] and was featured in the AMD Instinct MI100, launched November 16th, 2020.[3] This is CDNA 1's only produced product, manufactured onTSMC'sN7 FinFET process.

The second iteration of the CDNA line implemented amulti-chip module (MCM) approach, differing from its predecessor's monolithic approach. Featured in the AMD Instinct MI250X and MI250, this MCM design used an elevated fanout bridge (EFB)[4] to connect the dies. These two products were announced November 8th, 2021, and launched November 11th. The CDNA 2 line includes an additional latecomer using a monolithic design, the MI210.[5] The MI250X and MI250 were the first AMD products to use theOpen Compute Project (OCP)'s OCP Accelerator Module (OAM) socket form factor. Lower wattagePCIe versions are available.

The third iteration of CDNA switches to a MCM design utilizing different chiplets manufactured on multiple nodes. Currently consisting of the MI300X and MI300A, this product contains 15 unique dies and is connected with advanced 3D packaging techniques. The MI300 series was announced on January 5, 2023, and launched in H2 2023.

CDNA 1

[edit]
AMD CDNA 1
Release dateNovember 16, 2020
(4 years ago)
 (2020-11-16)
Fabrication processTSMCN7 (FinFET)
History
PredecessorAMD FirePro
SuccessorCDNA 2

TheCDNA family consists of one die, namedArcturus. The die is 750 square millimetres, contains 25.6 billion transistors and is manufactured on TSMC's N7 node.[6] The Arcturus die possesses 120 compute units and a 4096-bit memory bus, connected to fourHBM2 placements, giving the die 32 GB of memory, and just over 1200 GB/s of memory bandwidth. Compared to its predecessor, CDNA has removed all hardware related to graphics acceleration. This removal includes but is not limited to: graphics caches, tessellation hardware,render output units (ROPs), and the display engine. CDNA retains theVCN media engine forHEVC,H.264, andVP9 decoding.[7] CDNA has also added dedicated matrix compute hardware, similar to those added inNvidia'sVolta Architecture.

Architecture

[edit]

The 120 compute units (CUs) are organized into 4 asynchronous compute engines (ACEs), each ACE maintaining its own independent command execution and dispatch. At the CU level, CDNA compute units are organized similarly to GCN units. Each CU contains four SIMD16, that each execute their 64-thread wavefront (Wave64) over four cycles.

Memory system

[edit]

CDNA has a 20% clock bump for the HBM, resulting in a roughly 200 GB/s bandwidth increase vs.Vega 20 (GCN 5.0). The die has a shared 4 MB L2 cache that puts out 2 KB per clock to the CUs. At the CU level, each CU has its own L1 cache, a local data store (LDS) with 64 KB per CU and a 4 KB global data store (GDS), shared by all CUs. This GDS can be used to store control data, reduction operations or act as a small global shared surface.[7][8]

Experimental PIM implementation
[edit]

In October 2022, Samsung demonstrated a Processing-In-Memory (PIM) specialized version of the MI100. In December 2022 Samsung showed off a cluster of 96 modified MI100s, boasting large increases in processing throughput for various workloads and significant reduction in power consumption.[9]

Changes from GCN

[edit]

The individual compute units remain highly similar to GCN but with the addition of 4 matrix units per CU. Support for more datatypes were added, withBF16, INT8 and INT4 being added.[7] For an extensive list of operations utilizing the matrix units and new datatypes, please reference theCDNA ISA Reference Guide.

Products

[edit]
Model
(Code name)
ReleasedArchitecture
fab
Transistors
& die size
CoreFillrate[a]Processing power (TFLOPS)MemoryTBPSoftware
interface
Physical
interface
Vector[a][b]Matrix[a][b]
Config[c]Clock[a]
(MHz)
Texture[d]
(GT/s)
Pixel[e]
(GP/s)
Half
(FP16)
Single
(FP32)
Double
(FP64)
INT8BF16FP16FP32FP64Bus type
& width
Size
(GB)
Clock
(MT/s)
Bandwidth
(GB/s)
AMD Instinct MI100
(Arcturus)[10][11]
Nov 16, 2020CDNA
TSMC N7
25.6×109
750 mm2
7680:480:-
120 CU
1000
1502
480
720.96
-15.72
23.10
7.86
11.5
122.88
184.57
61.44
92.28
122.88
184.57
30.72
46.14
15.36
23.07
HBM2
4096-bit
3224001228300 WPCIe 4.0
×16
PCIe
×16
  1. ^abcdBoost values (if available) are stated below the base value initalic.
  2. ^abPrecision performance is calculated from the base (or boost) core clock speed based on aFMA operation.
  3. ^Unified shaders :Texture mapping units :Render output units andCompute units (CU)
  4. ^Texture fillrate is calculated as the number oftexture mapping units multiplied by the base (or boost) core clock speed.
  5. ^Pixel fillrate is calculated as the number ofrender output units multiplied by the base (or boost) core clock speed.

CDNA 2

[edit]
AMD CDNA 2
Release dateNovember 8, 2021
(3 years ago)
 (2021-11-08)
Fabrication processTSMCN6
History
PredecessorCDNA 1
SuccessorCDNA 3

Like CDNA,CDNA 2 also consists of one die, namedAldebaran. This die is estimated to be 790 square millimetres, and contains 28 billion transistors while being manufactured on TSMC's N6 node.[12] The Aldebaran die contains only 112 compute units, a 6.67% decrease from Arcturus. Like the previous generation, this die contains a 4096-bit memory bus, now using HBM2e with a doubling in capacity, up to 64 GB. The largest change in CDNA 2 is the ability for two dies to be placed on the same package. The MI250X consists of 2 Aldebaran dies, 220 CUs (110 per die) and 128 GB of HBM2e. These dies are connected with 4Infinity Fabric links, and addressed as independent GPUs by the host system.[13]

Architecture

[edit]

The 112 CUs are organized similarly to CDNA, into 4 asynchronous compute engines, each with 28 CUs, instead of the prior generations 30. Like CDNA, each CU contains four SIMD16 units executing a 64-thread wavefront across 4 cycles. The 4 matrix engines and vector units have added support for full rateFP64, enabling significant uplift over the prior generation.[14] CDNA 2 also revises multiple internal caches, doubling bandwidth across the board.

Memory system

[edit]

The memory system in CDNA 2 sports across the board improvements. Starting with the move toHBM2e, doubling the quantity to 64 GB, and increasing bandwidth by roughly one third (from ~1200 GB/s to 1600 GB/s).[13] At the cache level. Each GCD has a 16-way, 8 MB L2 cache that is partitioned into 32 slices. This cache puts out 4 KB per clock, 128 B per clock per slice, which is a doubling of the bandwidth from CDNA.[13] Additionally, the 4 KB Global Data Store was removed.[14] All caches, including the L2 and LDS have support added for FP64 data.

Interconnect

[edit]

CDNA 2 brings forth the first product with multiple GPUs on the same package. The two GPU dies are connected by 4Infinity Fabric links, with a total bidirectional bandwidth of 400 GB/s.[14] Each die contains 8 Infinity Fabric links, each physically implemented with a 16-lane Infinity Link. When paired with an AMD processor, this will act as Infinity Fabric. if paired with any otherx86 processor, this will fallback to 16 lanes ofPCIe 4.0.[14]

Changes from CDNA

[edit]

The largest up front change is the additional of full rate FP64 support across all compute elements. This results in a 4x increase FP64 matrix calculations, with large increases in FP64 vector calculations.[13] Additionally support for packed FP32 operations were added, with opcodes like 'V_PK_FMA_F32' and 'V_PK_MUL_F32'.[15] Packed FP32 operations can enable up to 2x throughput, but do require code modification.[13] As with CDNA, for further information on CDNA 2 operations, please reference theCDNA 2 ISA Reference Guide.

Products

[edit]

Products

[edit]
AMD Instinct CDNA 2 GPU generations MI-2xx
AcceleratorLaunch dateArchitectureLithographyCompute UnitsMemoryPCIe supportForm factorProcessing powerTBP
SizeTypeBandwidth (GB/s)FP16BF16FP32FP32 matrixFP64 performanceFP64 matrixINT8INT4
MI2102022-03-22[16]CDNA 26 nm10464 GBHBM2E1600181 TFLOPS22.6 TFLOPS45.3 TFLOPS22.6 TFLOPS45.3 TFLOPS181 TOPS300 W
MI2502021-11-08[17]208128 GB3200OAM362.1 TFLOPS45.3 TFLOPS90.5 TFLOPS45.3 TFLOPS90.5 TFLOPS362.1 TOPS560 W
MI250X220383 TFLOPS47.92 TFLOPS95.7 TFLOPS47.9 TFLOPS95.7 TFLOPS383 TOPS560 W

CDNA 3

[edit]
AMD CDNA 3
Release dateDecember 6, 2023
(17 months ago)
 (2023-12-06)
Fabrication processTSMCN5 &N6
History
PredecessorCDNA 2

Unlike its predecessors,CDNA 3 consists of multiple dies, used in a multi-chip system, similar to AMD'sZen 2,3 and4 line of products. The MI300 package is comparatively massive, with nine chiplets produced on 5 nm, placed on top of four 6 nm chiplets.[18] This is all combined with 128 GB of HBM3, using eight HBM placements.[19] This package contains an estimated 146 billion transistors. It comes in the form of the Instinct MI300X and MI300A, the latter being anAPU. These products were launched on December 6, 2023.[20]

Products

[edit]
AMD Instinct CDNA 3 GPU generations - MI-3xx
AcceleratorLaunch dateArchitectureLithographyCompute UnitsMemoryPCIe supportForm factorProcessing powerTBP
SizeTypeBandwidth (GB/s)FP16BF16FP32FP32 matrixFP64 performanceFP64 matrixINT8INT4
MI300A2023-12-06[21]CDNA 36 & 5 nm228128 GBHBM353005.0APU SH5 socket980.6 TFLOPS
1961.2 TFLOPS (with Sparsity)
122.6 TFLOPS61.3 TFLOPS122.6 TFLOPS1961.2 TOPS
3922.3 TOPS (with Sparsity)
N/A550 W
760 W (with liquid cooling)
MI300X304192 GBOAM1307.4 TFLOPS
2614.9 TFLOPS (with Sparsity)
163.4 TFLOPS81.7 TFLOPS163.4 TFLOPS2614.9 TOPS
5229.8 TOPS (with Sparsity)
N/A750 W
MI325X2024-10-10[22]256 GBHBM3E6000

Product Comparisons

[edit]
Model
(Code name)
Release dateArchitecture
& fab
Transistors
& die size
CoreFillrate[a]Vector Processing power[a][b]
(TFLOPS)
Matrix Processing power[a][b]
(TFLOPS)
MemoryTBPSoftware
Interface
Physical
Interface
Config[c]Clock[a]
(MHz)
Texture[d]
(GT/s)
Pixel[e]
(GP/s)
Half (FP16)Single (FP32)Double (FP64)INT8BF16FP16FP32FP64Bus type
& width
Size
(GB)
Clock
(MT/s)
Bandwidth
(GB/s)
Tesla V100 (PCIE)
(GV100)[23][24]
May 10, 2017Volta
TSMC  12 nm
12.1×109
815 mm2
5120:320:128:640
80 SM
1370438.4175.3628.0614.037.01N/AN/AN/A112.23N/AHBM2
4096 bit
16
32
1750900250 WPCIe 3.0
×16
PCIe ×16
Tesla V100 (SXM)
(GV100)[25][26]
May 10, 20171455465.6186.2429.8014.907.46N/AN/AN/A119.19N/A300 WNVLINKSXM2
Radeon Instinct MI50
(Vega 20)[27][28][29][30][31][32]
Nov 18, 2018GCN 5
TSMC 7 nm
13.2×109
331 mm2
3840:240:64
60 CU
1450
1725
348.0
414.0
92.80
110.4
22.27
26.50
11.14
13.25
5.568
6.624
N/AN/A26.513.3?HBM2
4096-bit
16
32
20001024300 WPCIe 4.0
×16
PCIe
×16
Radeon Instinct MI60
(Vega 20)[28][33][34][35]
4096:256:64
64 CU
1500
1800
384.0
460.8
96.00
115.2
24.58
29.49
12.29
14.75
6.144
7.373
N/AN/A3216?
Tesla A100 (PCIE)
(GA100)[36][37]
May 14, 2020Ampere
TSMC 7 nm
54.2×109
826 mm2
6912:432:-:432
108 SM
1065
1410
460.08
609.12
-58.89
77.97
14.72
19.49
7.36
9.75
942.24
1247.47
235.56
311.87
235.56
311.87
117.78
155.93
14.72
19.49
HBM2
5120 bit
40
80
31862039250 WPCIe 4.0
×16
PCIe ×16
Tesla A100 (SXM)
(GA100))[38][39]
1275
1410
550.80
609.12
-70.50
77.97
17.63
19.49
8.81
9.75
1128.04
1247.47
282.01
311.87
282.01
311.87
141.00
155.93
17.63
19.49
400 WNVLINKSXM4
AMD Instinct MI100
(Arcturus)[40][41]
Nov 16, 2020CDNA
TSMC 7 nm
25.6×109
750 mm2
7860:480:-:480
120 CU
1000
1502
480
720.96
-?15.72
23.10
7.86
11.5
122.88
184.57
61.44
92.28
122.88
184.57
30.72
46.14
15.36
23.07
HBM2
4096-bit
3224001228300 WPCIe 4.0
×16
PCIe
×16
AMD Instinct MI250X (PCIE)
(Aldebaran)
Nov 8, 2021CDNA 2
TSMC 6 nm
58×109
1540 mm2
14080:880:-:880
220 CU
AMD Instinct MI250X (OAM)
(Aldebaran)
Tesla H100 (PCIE)
(GH100)
Mar 22, 2022Hopper
TSMC 4 nm
80×109
814 mm2
Tesla H100 (SXM)
(GH100)
  1. ^abcdBoost values (if available) are stated below the base value initalic.
  2. ^abPrecision performance is calculated from the base (or boost) core clock speed based on aFMA operation.
  3. ^Unified shaders :Texture mapping units :Render output units :AI accelerators andCompute units (CU) /Streaming multiprocessors (SM)
  4. ^Texture fillrate is calculated as the number oftexture mapping units multiplied by the base (or boost) core clock speed.
  5. ^Pixel fillrate is calculated as the number ofrender output units multiplied by the base (or boost) core clock speed.

See also

[edit]

References

[edit]
  1. ^Smith, Ryan (June 9, 2022)."AMD: Combining CDNA 3 and Zen 4 for MI300 Data Center APU in 2023".AnandTech. RetrievedDecember 20, 2022.
  2. ^Smith, Ryan."AMD Unveils CDNA GPU Architecture: A Dedicated GPU Architecture for Data Centers".www.anandtech.com. RetrievedSeptember 20, 2022.
  3. ^"GPU Database: AMD Radeon Instinct MI100".TechPowerUp. RetrievedSeptember 20, 2022.
  4. ^Smith, Ryan."AMD Announces Instinct MI200 Accelerator Family: Taking Servers to Exascale and Beyond".www.anandtech.com. RetrievedSeptember 21, 2022.
  5. ^Smith, Ryan."AMD Releases Instinct MI210 Accelerator: CDNA 2 On a PCIe Card".www.anandtech.com. RetrievedSeptember 21, 2022.
  6. ^Kennedy, Patrick (November 16, 2020)."AMD Instinct MI100 32GB CDNA GPU Launched".ServeTheHome. RetrievedSeptember 22, 2022.
  7. ^abc"AMD CDNA Whitepaper"(PDF).amd.com. March 5, 2020. RetrievedSeptember 22, 2022.
  8. ^""AMD Instinct MI100" Instruction Set Architecture, Reference Guide"(PDF).developer.amd.com. December 14, 2020. RetrievedSeptember 22, 2022.
  9. ^Aaron Klotz (December 14, 2022)."Samsung Soups Up 96 AMD MI100 GPUs With Radical Computational Memory".Tom's Hardware. RetrievedDecember 23, 2022.
  10. ^"AMD Instinct MI100 Brochure"(PDF).AMD. RetrievedDecember 25, 2022.
  11. ^"AMD CDNA Whitepaper"(PDF).AMD. RetrievedDecember 25, 2022.
  12. ^Anton Shilov (November 17, 2021)."AMD's Instinct MI250X OAM Card Pictured: Aldebaran's Massive Die Revealed".Tom's Hardware. RetrievedNovember 20, 2022.
  13. ^abcde"Hot Chips 34 – AMD's Instinct MI200 Architecture".Chips and Cheese. September 18, 2022. RetrievedNovember 10, 2022.
  14. ^abcd"INTRODUCING AMD CDNA™ 2 ARCHITECTURE"(PDF).AMD.com. RetrievedNovember 20, 2022.
  15. ^""AMD Instinct MI200" Instruction Set Architecture"(PDF).developer.amd.com. February 4, 2022. RetrievedOctober 11, 2022.
  16. ^Smith, Ryan."AMD Releases Instinct MI210 Accelerator: CDNA 2 On a PCIe Card".www.anandtech.com. RetrievedJune 3, 2024.
  17. ^Smith, Ryan."AMD Announces Instinct MI200 Accelerator Family: Taking Servers to Exascale and Beyond".www.anandtech.com. RetrievedJune 3, 2024.
  18. ^Smith, Ryan."CES 2023: AMD Instinct MI300 Data Center APU Silicon In Hand - 146B Transistors, Shipping H2'23".www.anandtech.com. RetrievedJanuary 22, 2023.
  19. ^Paul Alcorn (January 5, 2023)."AMD Instinct MI300 Data Center APU Pictured Up Close: 13 Chiplets, 146 Billion Transistors".Tom's Hardware. RetrievedJanuary 22, 2023.
  20. ^Kennedy, Patrick (December 6, 2023)."AMD Instinct MI300X GPU and MI300A APUs Launched for AI Era".ServeTheHome. RetrievedApril 15, 2024.
  21. ^Bonshor, Ryan Smith, Gavin."The AMD Advancing AI & Instinct MI300 Launch Live Blog (Starts at 10am PT/18:00 UTC)".www.anandtech.com. RetrievedJune 3, 2024.{{cite web}}: CS1 maint: multiple names: authors list (link)
  22. ^Smith, Ryan."AMD Plans Massive Memory Instinct MI325X for Q4'24, Lays Out Accelerator Roadmap to 2026".www.anandtech.com. RetrievedJune 3, 2024.
  23. ^Oh, Nate (December 16, 2022)."Nvidia Formally Announced PCIe Tesla V100".AnandTech.
  24. ^"NVIDIA Tesla V100 PCIe 16GB".TechPowerUp.
  25. ^Smith, Ryan (December 19, 2022)."Nvidia Volta Unveiled".AnandTech.
  26. ^"NVIDIA Tesla V100 SXM3 32GB".TechPowerUp.
  27. ^Walton, Jarred (January 10, 2019)."Hands on with the AMD Radeon VII".PC Gamer.
  28. ^ab"Next Horizon – David Wang Presentation"(PDF).AMD.
  29. ^"AMD Radeon Instinct MI50 Accelerator (16GB)".AMD.
  30. ^"AMD Radeon Instinct MI50 Accelerator (32GB)".AMD.
  31. ^"AMD Radeon Instinct MI50 Datasheet"(PDF).AMD.
  32. ^"AMD Radeon Instinct MI50 Specs".TechPowerUp. RetrievedMay 27, 2022.
  33. ^"Radeon Instinct MI60".AMD. Archived fromthe original on November 22, 2018. RetrievedMay 27, 2022.
  34. ^"AMD Radeon Instinct MI60 Datasheet"(PDF).AMD.
  35. ^"AMD Radeon Instinct MI60 Specs".TechPowerUp. RetrievedMay 27, 2022.
  36. ^"Nvidia A100 Tensor Core GPU Archiecture"(PDF).Nvidia. RetrievedDecember 12, 2022.
  37. ^"Nvidia A100 PCIE 80 GB Specs".TechPowerUp. RetrievedDecember 12, 2022.
  38. ^"Nvidia A100 Tensor Core GPU Archiecture"(PDF).Nvidia. RetrievedDecember 12, 2022.
  39. ^"Nvidia A100 SXM4 80 GB Specs".TechPowerUp. RetrievedDecember 12, 2022.
  40. ^"AMD Instinct MI100 Brochure"(PDF).AMD. RetrievedDecember 25, 2022.
  41. ^"AMD CDNA Whitepaper"(PDF).AMD. RetrievedDecember 25, 2022.

External links

[edit]
AMD graphics
Fixed pipeline
Vertex and fragment shaders
Unified shaders
TeraScale
Unified shaders &memory
GCN
RDNA
Current technologies and software
Audio/Video acceleration
GPU technologies
Software
Current
Obsolete
Other brands and products
Workstations
&supercomputers
Current
Obsolete
Consoles
&handheld PCs
Retrieved from "https://en.wikipedia.org/w/index.php?title=CDNA_(microarchitecture)&oldid=1286282125"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp