Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Ampere (microarchitecture)

From Wikipedia, the free encyclopedia
(Redirected fromNvidia Ampere)
GPU microarchitecture by Nvidia

Ampere
LaunchedMay 14, 2020; 5 years ago (2020-05-14)
Designed byNvidia
Manufactured by
Fabrication processTSMCN7(professional)
Samsung8N(consumer)
CodenameGA10x
Product Series
Desktop
Professional/workstation
  • RTX A series
Server/datacenter
  • A100
Specifications
L1 cache192 KB per SM(professional)
128 KB per SM(consumer)
L2 cache2 MB to 6 MB
Memory support
PCIe supportPCIe 4.0
Supported GraphicsAPIs
DirectXDirectX 12 Ultimate (Feature Level 12_2)
Direct3DDirect3D 12.0
Shader ModelShader Model 6.8
OpenGLOpenGL 4.6
CUDACompute Capability 8.6
VulkanVulkan 1.3
Supported ComputeAPIs
OpenCLOpenCL 3.0
Media Engine
Encode codecs
Decode codecs
Color bit-depth
  • 8-bit
  • 10-bit
Encoder supportedNVENC
Display outputs
History
PredecessorTuring(consumer)
Volta(professional)
SuccessorAda Lovelace(consumer)
Hopper(datacenter)
Support status
Supported

Ampere is the codename for agraphics processing unit (GPU)microarchitecture developed byNvidia as the successor to both theVolta andTuring architectures. It was officially announced on May 14, 2020, and is named after French mathematician and physicistAndré-Marie Ampère.[1][2]

Nvidia announced the Ampere architectureGeForce 30 series consumer GPUs at a GeForce Special Event on September 1, 2020.[3][4] Nvidia announced the A100 80 GB GPU at SC20 on November 16, 2020.[5] Mobile RTX graphics cards and the RTX 3060 based on the Ampere architecture were revealed on January 12, 2021.[6]

Nvidia announced Ampere's successor,Hopper, at GTC 2022, and "Ampere Next Next" (Blackwell) for a 2024 release at GPU Technology Conference 2021.

Details

[edit]

Architectural improvements of the Ampere architecture include the following:

  • CUDA Compute Capability 8.0 for A100 and 8.6 forthe GeForce 30 series[7]
  • TSMC's7 nmFinFET process for A100
  • Custom version ofSamsung's8 nm process (8N) for the GeForce 30 series[8]
  • Third-generation Tensor Cores with FP16,bfloat16, TensorFloat-32 (TF32) and FP64 support and sparsity acceleration.[9] The individual Tensor cores have with 256 FP16 FMA operations per clock 4x processing power (GA100 only, 2x on GA10x) compared to previous Tensor Core generations; the Tensor Core Count is reduced to one per SM.
  • Second-generation ray tracing cores; concurrent ray tracing, shading, and compute for the GeForce 30 series
  • High Bandwidth Memory 2 (HBM2) on A100 40 GB & A100 80 GB
  • GDDR6X memory for GeForce RTX 3090, RTX 3080 Ti, RTX 3080, RTX 3070 Ti
  • Double FP32 cores per SM on GA10x GPUs
  • NVLink 3.0 with a 50 Gbit/s per pair throughput[9]
  • PCI Express 4.0 withSR-IOV support (SR-IOV is reserved only for A100)
  • Multi-instance GPU (MIG) virtualization and spatial GPU partitioning feature in A100 supporting up to seven instances
  • PureVideo feature set K hardware video decoding withAV1 hardware decoding[10] for the GeForce 30 series and feature set J for A100
  • 5NVDEC for A100
  • Adds new hardware-based 5-coreJPEG decode (NVJPG) with YUV420, YUV422, YUV444, YUV400, RGBA. Should not be confused with NvidiaNVJPEG (GPU-acceleratedlibrary for JPEG encoding/decoding)

Chips

[edit]
  • GA100[11]
  • GA102
  • GA103
  • GA104
  • GA106
  • GA107
  • GA10B

Comparison of Compute Capability: GP100 vs GV100 vs GA100[12]

GPU featuresNvidia Tesla P100Nvidia Tesla V100Nvidia A100
GPU codenameGP100GV100GA100
GPU architecturePascalVoltaAmpere
Compute capability6.07.08.0
Threads / warp323232
Max warps / SM646464
Max threads / SM204820482048
Max thread blocks / SM323232
Max 32-bit registers / SM655366553665536
Max registers / block655366553665536
Max registers / thread255255255
Max thread block size102410241024
FP32 cores / SM646464
Ratio of SM registers to FP32 cores102410241024
Shared Memory Size / SM64 KBConfigurable up to 96 KBConfigurable up to 164 KB

Comparison of Precision Support Matrix[13][14]

Supported CUDA Core PrecisionsSupported Tensor Core Precisions
FP16FP32FP64INT1INT4INT8TF32BF16FP16FP32FP64INT1INT4INT8TF32BF16
Nvidia Tesla P4NoYesYesNoNoYesNoNoNoNoNoNoNoNoNoNo
Nvidia P100YesYesYesNoNoNoNoNoNoNoNoNoNoNoNoNo
Nvidia VoltaYesYesYesNoNoYesNoNoYesNoNoNoNoNoNoNo
Nvidia TuringYesYesYesNoNoNoNoNoYesNoNoYesYesYesNoNo
Nvidia A100YesYesYesNoNoYesNoYesYesNoYesYesYesYesYesYes

Legend:

  • FPnn: floating point with nn bits
  • INTn: integer with n bits
  • INT1: binary
  • TF32: TensorFloat32
  • BF16: bfloat16

Comparison of Decode Performance

Concurrent streamsH.264 decode (1080p30)H.265 (HEVC) decode (1080p30)VP9 decode (1080p30)
V100162222
A10075157108

Ampere dies

[edit]
DieGA100[15]GA102[16]GA103[17]GA104[18]GA106[19]GA107[20]GA10B[21]GA10F
Die size826 mm2628 mm2496 mm2392 mm2276 mm2200 mm2448 mm2?
Transistors54.2B28.3B22B17.4B12B8.7B21B?
Transistor density65.6 MTr/mm245.1 MTr/mm244.4 MTr/mm244.4 MTr/mm243.5 MTr/mm243.5 MTr/mm246.9 MTr/mm2?
Graphics processing clusters87663221
Streaming multiprocessors12884604830201612
CUDA cores819210752768061443840256020481536
Texture mapping units512336240192120806448
Render output units192112969648323216
Tensor cores512336240192120806448
RT coresN/A8460483020812
L1cache24 MB10.5 MB7.5 MB6 MB3 MB2.5 MB3 MB1.5 MB
192 KB
per SM
128 KB per SM192 KB
per SM
128 KB
per SM
L2 cache40 MB6 MB4 MB4 MB3 MB2 MB4 MB1 MB

A100 accelerator and DGX A100

[edit]

The Ampere-based A100 accelerator was announced and released on May 14, 2020.[9] The A100 features 19.5 teraflops of FP32 performance, 6912 FP32/INT32 CUDA cores, 3456 FP64 CUDA cores, 40 GB of graphics memory, and 1.6 TB/s of graphics memory bandwidth.[22] The A100 accelerator was initially available only in the 3rd generation ofDGX server, including 8 A100s.[9] Also included in the DGX A100 is 15 TB ofPCIe gen 4NVMe storage,[22] two 64-core AMDRome 7742 CPUs, 1 TB of RAM, andMellanox-powered HDR InfiniBand interconnect. The initial price for the DGX A100 was $199,000.[9]

Comparison of accelerators used in DGX:[23][24][25]

ModelArchitectureSocketFP32
CUDA
cores
FP64 cores
(excl. tensor)
Mixed
INT32/FP32
cores
INT32
cores
Boost
clock
Memory
clock
Memory
bus width
Memory
bandwidth
VRAMSingle
precision
(FP32)
Double
precision
(FP64)
INT8
(non-tensor)
INT8
dense tensor
INT32FP4
dense tensor
FP16FP16
dense tensor
bfloat16
dense tensor
TensorFloat-32
(TF32)
dense tensor
FP64
dense tensor
Interconnect
(NVLink)
GPUL1 CacheL2 CacheTDPDie sizeTransistor
count
ProcessLaunched
P100PascalSXM/SXM235841792N/AN/A1480 MHz1.4 Gbit/s HBM24096-bit720 GB/sec16 GB HBM210.6 TFLOPS5.3 TFLOPSN/AN/AN/AN/A21.2 TFLOPSN/AN/AN/AN/A160 GB/secGP1001344 KB (24 KB × 56)4096 KB300 W610 mm215.3 BTSMC 16FF+Q2 2016
V100 16GBVoltaSXM251202560N/A51201530 MHz1.75 Gbit/s HBM24096-bit900 GB/sec16 GB HBM215.7 TFLOPS7.8 TFLOPS62 TOPSN/A15.7 TOPSN/A31.4 TFLOPS125 TFLOPSN/AN/AN/A300 GB/secGV10010240 KB (128 KB × 80)6144 KB300 W815 mm221.1 BTSMC 12FFNQ3 2017
V100 32GBVoltaSXM351202560N/A51201530 MHz1.75 Gbit/s HBM24096-bit900 GB/sec32 GB HBM215.7 TFLOPS7.8 TFLOPS62 TOPSN/A15.7 TOPSN/A31.4 TFLOPS125 TFLOPSN/AN/AN/A300 GB/secGV10010240 KB (128 KB × 80)6144 KB350 W815 mm221.1 BTSMC 12FFN
A100 40GBAmpereSXM4691234566912N/A1410 MHz2.4 Gbit/s HBM25120-bit1.52 TB/sec40 GB HBM219.5 TFLOPS9.7 TFLOPSN/A624 TOPS19.5 TOPSN/A78 TFLOPS312 TFLOPS312 TFLOPS156 TFLOPS19.5 TFLOPS600 GB/secGA10020736 KB (192 KB × 108)40960 KB400 W826 mm254.2 BTSMC N7Q1 2020
A100 80GBAmpereSXM4691234566912N/A1410 MHz3.2 Gbit/s HBM2e5120-bit1.52 TB/sec80 GB HBM2e19.5 TFLOPS9.7 TFLOPSN/A624 TOPS19.5 TOPSN/A78 TFLOPS312 TFLOPS312 TFLOPS156 TFLOPS19.5 TFLOPS600 GB/secGA10020736 KB (192 KB × 108)40960 KB400 W826 mm254.2 BTSMC N7
H100HopperSXM516896460816896N/A1980 MHz5.2 Gbit/s HBM35120-bit3.35 TB/sec80 GB HBM367 TFLOPS34 TFLOPSN/A1.98 POPSN/AN/AN/A990 TFLOPS990 TFLOPS495 TFLOPS67 TFLOPS900 GB/secGH10025344 KB (192 KB × 132)51200 KB700 W814 mm280 BTSMC 4NQ3 2022
H200HopperSXM516896460816896N/A1980 MHz6.3 Gbit/s HBM3e6144-bit4.8 TB/sec141 GB HBM3e67 TFLOPS34 TFLOPSN/A1.98 POPSN/AN/AN/A990 TFLOPS990 TFLOPS495 TFLOPS67 TFLOPS900 GB/secGH10025344 KB (192 KB × 132)51200 KB1000 W814 mm280 BTSMC 4NQ3 2023
B100BlackwellSXM6N/AN/AN/AN/AN/A8 Gbit/s HBM3e8192-bit8 TB/sec192 GB HBM3eN/AN/AN/A3.5 POPSN/A7 PFLOPSN/A1.98 PFLOPS1.98 PFLOPS989 TFLOPS30 TFLOPS1.8 TB/secGB100N/AN/A700 WN/A208 BTSMC 4NPQ4 2024
B200BlackwellSXM6N/AN/AN/AN/AN/A8 Gbit/s HBM3e8192-bit8 TB/sec192 GB HBM3eN/AN/AN/A4.5 POPSN/A9 PFLOPSN/A2.25 PFLOPS2.25 PFLOPS1.2 PFLOPS40 TFLOPS1.8 TB/secGB100N/AN/A1000 WN/A208 BTSMC 4NP

Products using Ampere

[edit]
  • GeForce MX series
    • GeForce MX570 (mobile) (GA107)
  • GeForce 20 series
    • GeForce RTX 2050 (mobile) (GA107)
  • GeForce 30 series
    • GeForce RTX 3050 Laptop GPU (GA107)
    • GeForce RTX 3050 (GA106 or GA107)[26]
    • GeForce RTX 3050 Ti Laptop GPU (GA107)
    • GeForce RTX 3060 Laptop GPU (GA106)
    • GeForce RTX 3060 (GA106 or GA104)[27]
    • GeForce RTX 3060 Ti (GA104 or GA103)[28]
    • GeForce RTX 3070 Laptop GPU (GA104)
    • GeForce RTX 3070 (GA104)
    • GeForce RTX 3070 Ti Laptop GPU (GA104)
    • GeForce RTX 3070 Ti (GA104 or GA102)[29]
    • GeForce RTX 3080 Laptop GPU (GA104)
    • GeForce RTX 3080 (GA102)
    • GeForce RTX 3080 12 GB (GA102)
    • GeForce RTX 3080 Ti Laptop GPU (GA103)
    • GeForce RTX 3080 Ti (GA102)
    • GeForce RTX 3090 (GA102)
    • GeForce RTX 3090 Ti (GA102)
  • Nvidia Workstation GPUs (formerlyQuadro)
    • RTX A1000 (mobile) (GA107)
    • RTX A2000 (mobile) (GA106)
    • RTX A2000 (GA106)
    • RTX A3000 (mobile) (GA104)
    • RTX A4000 (mobile) (GA104)
    • RTX A4000 (GA104)
    • RTX A5000 (mobile) (GA104)
    • RTX A5500 (mobile) (GA103)
    • RTX A4500 (GA102)
    • RTX A5000 (GA102)
    • RTX A5500 (GA102)
    • RTX A6000 (GA102)
    • A800 Active
  • Nvidia Data Center GPUs (formerlyTesla)
    • Nvidia A2 (GA107)
    • Nvidia A10 (GA102)
    • Nvidia A16 (4 × GA107)
    • Nvidia A30 (GA100)
    • Nvidia A40 (GA102)
    • Nvidia A100 (GA100)
    • Nvidia A100 80 GB (GA100)
    • Nvidia A100X
    • NVIDIA A30X
Products using Ampere (per Chip)
TypeGA10BGA107GA106GA104GA103GA102GA100
GeForce MX seriesGeForce MX570 (mobile)
GeForce 20 seriesGeForce RTX 2050 (mobile)
GeForce 30 seriesGeForce RTX 3050 Laptop
GeForce RTX 3050
GeForce RTX 3050 Ti Laptop
GeForce RTX 3050
GeForce RTX 3060 Laptop
GeForce RTX 3060
GeForce RTX 3060
GeForce RTX 3060 Ti
GeForce RTX 3070 Laptop
GeForce RTX 3070
GeForce RTX 3070 Ti Laptop
GeForce RTX 3070 Ti
GeForce RTX 3080 Laptop
GeForce RTX 3060 Ti
GeForce RTX 3080 Ti Laptop
GeForce RTX 3070 Ti
GeForce RTX 3080
GeForce RTX 3080 Ti
GeForce RTX 3090
GeForce RTX 3090 Ti
Nvidia Workstation GPUsRTX A1000 (mobile)RTX A2000 (mobile)
RTX A2000
RTX A3000 (mobile)
RTX A4000 (mobile)
RTX A4000
RTX A5000 (mobile)
RTX A5500 (mobile)RTX A4500
RTX A5000
RTX A5500
RTX A6000
Nvidia Data Center GPUsNvidia A2
Nvidia A16
Nvidia A10
Nvidia A40
Nvidia A30
Nvidia A100
Tegra SoCsAGX Orin
Orin NX
Orin Nano

See also

[edit]

References

[edit]
  1. ^"NVIDIA's New Ampere Data Center GPU in Full Production".NVIDIA News. May 14, 2020.
  2. ^Krashinsky, Ronny; Giroux, Olivier; Jones, Stephen; Stam, Nick; Ramaswamy, Sridhar (May 14, 2020)."NVIDIA Ampere Architecture In-Depth".NVIDIA Developer Blog.
  3. ^"NVIDIA Delivers Greatest-Ever Generational Leap with GeForce RTX 30 Series GPUs".Nvidia Newsroom. September 1, 2020. RetrievedApril 9, 2023.
  4. ^"NVIDIA GeForce Ultimate Countdown".Nvidia.
  5. ^"NVIDIA Doubles Down: Announces A100 80GB GPU, Supercharging World's Most Powerful GPU for AI Supercomputing".Nvidia Newsroom. November 16, 2020. RetrievedApril 9, 2023.
  6. ^"NVIDIA GeForce Beyond at CES 2023".NVIDIA.
  7. ^"I.7. Compute Capability 8.x".Nvidia. RetrievedSeptember 23, 2020.
  8. ^Bosnjak, Dominik (September 1, 2020)."Samsung's old 8nm tech at the heart of NVIDIA's monstrous Ampere cards".SamMobile. RetrievedSeptember 19, 2020.
  9. ^abcdeSmith, Ryan (May 14, 2020)."NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, and Accelerator". AnandTech. Archived fromthe original on May 14, 2020.
  10. ^Delgado, Gerardo (September 1, 2020)."GeForce RTX 30 Series GPUs: Ushering In A New Era of Video Content With AV1 Decode".Nvidia. RetrievedApril 9, 2023.
  11. ^Morgan, Timothy Prickett (May 29, 2020)."Diving Deep Into The Nvidia Ampere GPU Architecture".The Next Platform. RetrievedMarch 24, 2022.
  12. ^"NVIDIA A100 Tensor Core GPU Architecture: Unprecedented Accerlation at Every Scale"(PDF).Nvidia. RetrievedSeptember 18, 2020.
  13. ^"NVIDIA Tensor Cores: Versatility for HPC & AI".NVIDIA.
  14. ^"Abstract".docs.nvidia.com.
  15. ^"NVIDIA A100 Tensor Core GPU Architecture"(PDF).NVIDIA Corporation. RetrievedApril 29, 2024.
  16. ^"NVIDIA GA102 GPU Specs".TechPowerUp. RetrievedApril 29, 2024.
  17. ^"NVIDIA GA103 GPU Specs".TechPowerUp. RetrievedApril 29, 2024.
  18. ^"NVIDIA GA104 GPU Specs".TechPowerUp. RetrievedApril 29, 2024.
  19. ^"NVIDIA GA106 GPU Specs".TechPowerUp. RetrievedApril 29, 2024.
  20. ^"NVIDIA GA107 GPU Specs".TechPowerUp. RetrievedApril 29, 2024.
  21. ^"NVIDIA AGX Orin Series Technical Brief v1.2"(PDF).NVIDIA Corporation. RetrievedApril 29, 2024.
  22. ^abTom Warren; James Vincent (May 14, 2020)."Nvidia's first Ampere GPU is designed for data centers and AI, not your PC". The Verge.
  23. ^Smith, Ryan (March 22, 2022)."NVIDIA Hopper GPU Architecture and H100 Accelerator Announced: Working Smarter and Harder". AnandTech.
  24. ^Smith, Ryan (May 14, 2020)."NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, and Accelerator". AnandTech.
  25. ^"NVIDIA Tesla V100 tested: near unbelievable GPU power".TweakTown. September 17, 2017.
  26. ^Igor, Wallossek (February 13, 2022)."The two faces of the GeForce RTX 3050 8GB".Igor's Lab. RetrievedFebruary 23, 2022.
  27. ^Shilov, Anton (September 25, 2021)."Gainward and Galax List GeForce RTX 3060 Cards With GA104 GPU".Tom's Hardware. RetrievedSeptember 23, 2022.
  28. ^Tyson, Mark (February 23, 2022)."Zotac Debuts First RTX 3060 Ti Desktop Cards With GA103 GPU".Tom's Hardware. RetrievedSeptember 23, 2022.
  29. ^WhyCry (October 26, 2022)."ZOTAC launches GeForce RTX 3070 Ti with GA102-150 GPU".VideoCardz. RetrievedMay 21, 2023.
  30. ^"Nintendo Switch 2 teardown confirms Nvidia Tegra T239 chip, SK Hynix memory, more details".TechSpot. April 24, 2025. RetrievedMay 31, 2025.

External links

[edit]
Fixed pixel pipeline
Pre-GeForce
Vertex andpixel shaders
Unified shaders
Unified shaders &NUMA
Ray tracing &Tensor Cores
Software and technologies
Multimedia acceleration
Software
Technologies
GPU microarchitectures
Other products
GraphicsWorkstation cards
GPGPU software
Console components
Nvidia Shield
SoCs and embedded
CPUs
Computerchipsets
Company
Key people
Acquisitions
Retrieved from "https://en.wikipedia.org/w/index.php?title=Ampere_(microarchitecture)&oldid=1316143897"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp