| Release date | June 20, 2017; 8 years ago (2017-06-20) |
|---|---|
| Designed by | AMD |
| Marketed by | AMD |
| Architecture | |
| Models | MI Series |
| Cores | 36-304Compute Units (CUs) |
| Transistors |
|
| History | |
| Predecessor | |
AMD Instinct isAMD's brand of data centerGPUs.[1][2] It replaced AMD'sFirePro S brand in 2016. Compared to theRadeon brand of mainstream consumer/gamer products, the Instinct product line is intended to accelerate deep learning,artificial neural network, andhigh-performance computing/GPGPU applications.
The AMD Instinct product line directly competes withNvidia'sTesla andIntel'sXeon Phi andData Center GPU lines of machine learning and GPGPU cards.
The brand was originally known asAMD Radeon Instinct, but AMD dropped the Radeon brand from the name before AMD Instinct MI100 was introduced in November 2020.
In June 2022,supercomputers based on AMD'sEpyc CPUs and Instinct GPUs took the lead on theGreen500 list of the most power-efficient supercomputers with over 50% lead over any other, and held the top first 4 spots.[3] One of them, the AMD-basedFrontier is since June 2022 and as of 2023 the fastest supercomputer in the world on theTOP500 list.[4][5]

| Accelerator | Launch date | Architecture | Lithography | Compute Units | Memory | PCIe support | Form factor | Processing power | TBP | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Size | Type | Bandwidth (GB/s) | FP16 | BF16 | FP32 | FP32 matrix | FP64 performance | FP64 matrix | INT8 | INT4 | ||||||||
| MI6 | 2016-12-12[6] | GCN 4 | 14 nm | 36 | 16 GB | GDDR5 | 224 | 3.0 | PCIe | 5.7 TFLOPS | N/A | 5.7 TFLOPS | N/A | 358 GFLOPS | N/A | N/A | N/A | 150 W |
| MI8 | GCN 3 | 28 nm | 64 | 4 GB | HBM | 512 | 8.2 TFLOPS | 8.2 TFLOPS | 512 GFLOPS | 175 W | ||||||||
| MI25 | GCN 5 | 14 nm | 16 GB | HBM2 | 484 | 26.4 TFLOPS | 12.3 TFLOPS | 768 GFLOPS | 300 W | |||||||||
| MI50 | 2018-11-06[7] | 7 nm | 60 | 1024 | 4.0 | 26.5 TFLOPS | 13.3 TFLOPS | 6.6 TFLOPS | 53 TOPS | 300 W | ||||||||
| MI60 | 64 | 32 GB | 29.5 TFLOPS | 14.7 TFLOPS | 7.4 TFLOPS | 59 TOPS | 300 W | |||||||||||
| MI100 | 2020-11-16 | CDNA | 120 | 1200 | 184.6 TFLOPS | 92.3 TFLOPS | 23.1 TFLOPS | 46.1 TFLOPS | 11.5 TFLOPS | 184.6 TOPS | 300 W | |||||||
| MI210 | 2022-03-22[8] | CDNA 2 | 6 nm | 104 | 64 GB | HBM2E | 1600 | 181 TFLOPS | 22.6 TFLOPS | 45.3 TFLOPS | 22.6 TFLOPS | 45.3 TFLOPS | 181 TOPS | 300 W | ||||
| MI250 | 2021-11-08[9] | 208 | 128 GB | 3200 | OAM | 362.1 TFLOPS | 45.3 TFLOPS | 90.5 TFLOPS | 45.3 TFLOPS | 90.5 TFLOPS | 362.1 TOPS | 560 W | ||||||
| MI250X | 220 | 383 TFLOPS | 47.92 TFLOPS | 95.7 TFLOPS | 47.9 TFLOPS | 95.7 TFLOPS | 383 TOPS | 560 W | ||||||||||
| MI300A | 2023-12-06[10] | CDNA 3 | 6 & 5 nm | 228 | 128 GB | HBM3 | 5300 | 5.0 | APU SH5 socket | 980.6 TFLOPS 1961.2 TFLOPS (with Sparsity) | 122.6 TFLOPS | 61.3 TFLOPS | 122.6 TFLOPS | 1961.2 TOPS 3922.3 TOPS (with Sparsity) | N/A | 550 W 760 W (with liquid cooling) | ||
| MI300X | 304 | 192 GB | OAM | 1307.4 TFLOPS 2614.9 TFLOPS (with Sparsity) | 163.4 TFLOPS | 81.7 TFLOPS | 163.4 TFLOPS | 2614.9 TOPS 5229.8 TOPS (with Sparsity) | N/A | 750 W | ||||||||
| MI325X | 2024-10-10[11] | 256 GB | HBM3E | 6000 | ||||||||||||||
| MI350X | 2025-06-13[12] | CDNA 4 | 3 nm | 256 | 288 GB | HBM3E | 8000 | 5.0 | OAM | 2386.9 TFLOPS 4613.8 TFLOPS (with Sparsity) | 144.2 TFLOPS | 72.1 TFLOPS | 4.6137 POPS 9.2274 POPS (with Sparsity) | 1000 W | ||||
| MI355X | 2516.6 TFLOPS 5033.2 TFLOPS (with Sparsity) | 157.3 TFLOPS | 78.6 TFLOPS | 5.0332 POPS 10.066 POPS (with Sparsity) | 1400 W | |||||||||||||
The three initial Radeon Instinct products were announced on December 12, 2016, and released on June 20, 2017, with each based on a different architecture.[13][14]
The MI6 is a passively cooled,Polaris 10 based card with 16 GB ofGDDR5 memory and with a <150WTDP.[1][2] At 5.7TFLOPS (FP16 andFP32), the MI6 is expected to be used primarily for inference, rather than neural network training. The MI6 has a peak double precision (FP64) compute performance of 358 GFLOPS.[15]
The MI8 is aFiji based card, analogous to the R9 Nano, has a <175W TDP.[1] The MI8 has 4 GB ofHigh Bandwidth Memory. At 8.2 TFLOPS (FP16 and FP32), the MI8 is marked toward inference. The MI8 has a peak (FP64) double precision compute performance 512 GFLOPS.[16]
The MI25 is aVega based card, utilizing HBM2 memory. The MI25 performance is expected to be 12.3 TFLOPS using FP32 numbers. In contrast to the MI6 and MI8, the MI25 is able to increase performance when using lower precision numbers, and accordingly is expected to reach 24.6 TFLOPS when using FP16 numbers. The MI25 is rated at <300W TDP with passive cooling. The MI25 also provides 768 GFLOPS peak double precision (FP64) at 1/16th rate.[17]
MI50 and MI60 are based on the Vega20 variant of GCN 5. They support 1/2 rate FP64 and are the last Instinct cards to bear the Radeon branding as well as the ability to produce display output.
The CDNA1 cards have removed all rendering-related resources while adding matrix processing units.

The MI300A and MI300X are data center accelerators that use theCDNA 3 architecture, which is optimized for high-performance computing (HPC) and generative artificial intelligence (AI) workloads. The CDNA 3 architecture features a scalable chiplet design that leverages TSMC’s advanced packaging technologies, such as CoWoS (chip-on-wafer-on-substrate) and InFO (integrated fan-out), to combine multiple chiplets on a single interposer. The chiplets are interconnected by AMD’s Infinity Fabric, which enables high-speed and low-latency data transfer between the chiplets and the host system.
The MI300A is an accelerated processing unit (APU) that integrates 24Zen 4 CPU cores with four CDNA 3 GPU cores, resulting in a total of 228 CUs in the GPU section, and 128 GB of HBM3 memory. The Zen 4 CPU cores are based on the 5 nm process node and support the x86-64 instruction set, as well as AVX-512 and BFloat16 extensions. The Zen 4 CPU cores can run general-purpose applications and provide host-side computation for the GPU cores. The MI300A has a peak performance of 61.3 TFLOPS of FP64 (122.6 TFLOPS FP64 matrix) and 980.6 TFLOPS of FP16 (1961.2 TFLOPS with sparsity), as well as 5.3 TB/s of memory bandwidth. The MI300A supports PCIe 5.0 and CXL 2.0 interfaces, which allow it to communicate with other devices and accelerators in a heterogeneous system.
The MI300X is a dedicated generative AI accelerator that replaces the CPU cores with additional GPU cores and HBM memory, resulting in a total of 304 CUs (64 cores per CU) and 192 GB of HBM3 memory. The MI300X is designed to accelerate generative AI applications, such as natural language processing, computer vision, and deep learning. The MI300X has a peak performance of 653.7 TFLOPS of TP32 (1307.4 TFLOPS with sparsity) and 1307.4 TFLOPS of FP16 (2614.9 TFLOPS with sparsity), as well as 5.3 TB/s of memory bandwidth. The MI300X also supports PCIe 5.0 and CXL 2.0 interfaces, as well as AMD’s ROCm software stack, which provides a unified programming model and tools for developing and deploying generative AI applications on AMD hardware.[18][19][20]
The MI350X and MI355X are data center accelerators built on the CDNA 4 architecture, targeting advanced AI training and inference workloads. Manufactured on TSMC’s 3 nm (N3) process, they incorporate a high-performance chiplet design, feature 288 GB of HBM3E memory with 8 TB/s of bandwidth.[21] CDNA 4 introduces native support for low-precision formats FP4 and FP6, in addition to FP8 and FP16—boosting FP4 compute to up to 9.2 PetaFLOPS on the MI355X.[22] The architecture maintains AMD’s Infinity Fabric interconnect for high-speed, low-latency data transit between GPU chiplets and the host system. This design builds on CDNA 3, advancing both scalability and energy efficiency for large-scale AI deployments.
Following software is, as of 2022, regrouped under the Radeon Open Compute meta-project.
The MI6, MI8, and MI25 products all support AMD's MxGPUvirtualization technology, enabling sharing of GPU resources across multiple users.[1][23]
MIOpen is AMD's deep learning library to enable GPU acceleration of deep learning.[1] Much of this extends theGPUOpen's Boltzmann Initiative software.[23] This is intended to compete with the deep learning portions of Nvidia'sCUDA library. It supports the deep learning frameworks:Theano,Caffe,TensorFlow,MXNet,Microsoft Cognitive Toolkit,Torch, andChainer. Programming is supported inOpenCL andPython, in addition to supporting the compilation of CUDA through AMD's Heterogeneous-compute Interface for Portability and Heterogeneous Compute Compiler.

| Model (Code name) | Launch | Architecture & fab | LLVM target[24] | Transistors & die size | Core | Fillrate[a][b][c] | VectorTFLOPS[a][d] | Memory | TBP | Bus interface | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Config[e] | Clock[a] (MHz) | Texture (GT/s) | Pixel (GP/s) | FP16 | FP32 | FP64 | Size (GB) | Bus type & width | Bandwidth (GB/s) | Clock (MT/s) | |||||||
| Radeon Instinct MI6 (Polaris 10)[25][26][27][28][29][30] | Jun 20, 2017 | GCN 4 GloFo 14LP | gfx803 | 5.7×109 232 mm2 | 2304:144:32 36 CU | 1120 1233 | 161.3 177.6 | 35.84 39.46 | 5.161 5.682 | 5.161 5.682 | 0.323 0.355 | 16 | GDDR5 256-bit | 224 | 7000 | 150 W | PCIe 3.0 ×16 |
| Radeon Instinct MI8 (Fiji)[25][26][27][31][32][33] | GCN 3 TSMC 28 nm | gfx803 | 8.9×109 596 mm2 | 4096:256:64 64 CU | 1000 | 256.0 | 64.00 | 8.192 | 8.192 | 0.512 | 4 | HBM 4096-bit | 512 | 1000 | 175 W | ||
| Radeon Instinct MI25 (Vega 10)[25][26][27][34][35][36][37] | GCN 5 GloFo 14LP | gfx900 | 12.5×109 510 mm2 | 1400 1500 | 358.4 384.0 | 89.60 96.00 | 22.94 24.58 | 11.47 12.29 | 0.717 0.768 | 16 | HBM2 2048-bit | 484 | 1890 | 300 W | |||
| Radeon Instinct MI50 (Vega 20)[38][39][40][41][42][43] | Nov 18, 2018 | GCN 5 TSMC N7 | gfx906 | 13.2×109 331 mm2 | 3840:240:64 60 CU | 1450 1725 | 348.0 414.0 | 92.80 110.4 | 22.27 26.50 | 11.14 13.25 | 5.568 6.624 | 16 32 | HBM2 4096-bit | 1024 | 2000 | 300 W | PCIe 4.0 ×16 |
| Radeon Instinct MI60 (Vega 20)[39][44][45][46] | 4096:256:64 64 CU | 1500 1800 | 384.0 460.8 | 96.00 115.2 | 24.58 29.49 | 12.29 14.75 | 6.144 7.373 | 32 | |||||||||
| Model (Code name) | Launch | Architecture & fab | LLVM target[24] | Transistors & die size | Core | VectorTFLOPS[a][d] | Matrix speedup[f] | Memory | TBP | Bus interface | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Config[e] | Clock[a] (MHz) | INT8[g] | FP16[h] | FP32 | FP64 | FP32 | FP64 | S.Sparse | Size (GB) | Bus type & width | Bandwidth (GB/s) | Clock (MT/s) | |||||||
| AMD Instinct MI100 (Arcturus)[47][48][49] | Nov 16, 2020 | CDNA 1 TSMC N7 | gfx908 | 25.6×109 750 mm2 | 7680:480:- 120 CU | 1000 1502 | 122.9 184.6 | 122.9 184.6 | 15.36 23.07 | 7.680 11.54 | 2× | 2× | 1× | 32 | HBM2 4096-bit | 1228.8 | 2400 | 300 W | PCIe 4.0 ×16 |
| AMD Instinct MI210 (Aldebaran)[50][51][52] | Mar 22, 2022 | CDNA 2 TSMC N6 | gfx90a | 28 × 109 ~770 mm2 | 6656:416:- 104 CU (1 ×GCD)[i] | 1000 1700 | 106.5 181.0 | 106.5 181.0 | 13.31 22.63 | 13.31 22.63 | 2× | 2× | 1× | 64 | HBM2E 4096-bit | 1638.4 | 3200 | 300 W | |
| AMD Instinct MI250 (Aldebaran)[53][54][55] | Nov 8, 2021 | 58 × 109 1540 mm2 | 13312:832:- 208 CU (2 ×GCD) | 213.0 362.1 | 213.0 362.1 | 26.62 45.26 | 26.62 45.26 | 2× | 2× | 1× | 2 × 64 | HBM2E 2 × 4096-bit[j] | 2 × 1638.4 | 500 W 560 W (Peak) | |||||
| AMD Instinct MI250X (Aldebaran)[56][54][57] | 14080:880:- 220 CU (2 ×GCD) | 225.3 383.0 | 225.3 383.0 | 28.16 47.87 | 28.16 47.87 | 2× | 2× | 1× | |||||||||||
| AMD Instinct MI300A (Antares)[58][59][60][61] | Dec 6, 2023 | CDNA 3 TSMC N5 &N6 | gfx942 | 146 × 109 1017 mm2 | 14592:912:- 228 CU (6 ×XCD) | 2100 | 1961.2 | 980.6 | 122.6 | 61.3 | 1× | 2× | 2× | 128 | HBM3 8192-bit | 5300 | 5200 | 550 W 760 W (Liquid Cooling) | PCIe 5.0 ×16 |
| AMD Instinct MI300X (Aqua Vanjaram)[62][63][64][65] | 153 × 109 1017 mm2 | 19456:1216:- 304 CU (8 ×XCD) | 2614.9 | 1307.4 | 163.4 | 81.7 | 1× | 2× | 2× | 192 | 750 W | ||||||||
| AMD Instinct MI350X[66][67] | CDNA 4 TSMC N3 &N6 | gfx950 | 185 × 109 1017 mm2 | 16384:1024:- 256 CU (8 ×XCD) | 2200 | 4600[k] | 144.2 | 144.2 | 72.1 | 1× | 1× | 2× | 288 | HBM3e 8192-bit | 8000 | 8000 | 1000 W | PCIe 5.0 ×16 (OAM) | |
| AMD Instinct MI355X | 2400 | 1× | 1× | 2× | 288 | 1400 W | |||||||||||||
{{cite web}}: CS1 maint: multiple names: authors list (link)