
HyperTransport (HT), formerly known asLightning Data Transport, is a technology for interconnection of computerprocessors. It is a bidirectionalserial/parallel high-bandwidth, low-latencypoint-to-point link that was introduced on April 2, 2001.[1] TheHyperTransport Consortium is in charge of promoting and developing HyperTransport technology.
HyperTransport is best known as thesystem bus architecture ofAMDcentral processing units (CPUs) fromAthlon 64 throughAMD FX and the associatedmotherboard chipsets. HyperTransport has also been used byIBM andApple for thePower Mac G5 machines, as well as a number of modernMIPS systems.
The current specification HTX 3.1 remained competitive for 2014 high-speed (2666 and 3200 MT/s or about 10.4 GB/s and 12.8 GB/s)DDR4 RAM and slower (around 1 GB/s[1] similar to high endPCIe SSDsULLtraDIMM flash RAM) technology[clarification needed]—a wider range of RAM speeds on a common CPU bus than any Intelfront-side bus. Intel technologies require each speed range of RAM to have its own interface, resulting in a more complex motherboard layout but with fewer bottlenecks. HTX 3.1 at 26 GB/s can serve as a unified bus for as many as four DDR4 sticks running at the fastest proposed speeds. Beyond that DDR4 RAM may require two or more HTX 3.1 buses diminishing its value as unified transport.
HyperTransport comes in four versions—1.x, 2.0, 3.0, and 3.1—which run from 200 MHz to 3.2 GHz. It is also a DDR or "double data rate" connection, meaning it sends data on both the rising and falling edges of theclock signal. This allows for a maximum data rate of 6400 MT/s when running at 3.2 GHz. The operating frequency is autonegotiated with the motherboard chipset (North Bridge) in current computing.
HyperTransport supports an autonegotiated bit width, ranging from 2 to 32 bits per link; there are two unidirectional links per HyperTransport bus. With the advent of version 3.1, using full32-bit links and utilizing the full HyperTransport 3.1 specification's operating frequency, the theoretical transfer rate is 25.6 GB/s (3.2 GHz × 2 transfers per clock cycle × 32 bits per link) per direction, or 51.2 GB/s aggregated throughput, making it faster than most existing bus standard for PC workstations and servers as well as making it faster than most bus standards for high-performance computing and networking.
Links of various widths can be mixed together in a single system configuration as in one16-bit link to another CPU and one8-bit link to a peripheral device, which allows for a wider interconnect betweenCPUs, and a lower bandwidth interconnect toperipherals as appropriate. It also supports link splitting, where a single 16-bit link can be divided into two 8-bit links. The technology also typically has lower latency than other solutions due to its lower overhead.
Electrically, HyperTransport is similar tolow-voltage differential signaling (LVDS) operating at 1.2 V.[2] HyperTransport 2.0 added post-cursor transmitterdeemphasis. HyperTransport 3.0 added scrambling and receiver phase alignment as well as optional transmitter precursor deemphasis.
HyperTransport ispacket-based, where each packet consists of a set of32-bit words, regardless of the physical width of the link. The first word in a packet always contains a command field. Many packets contain a 40-bit address. An additional 32-bit control packet is prepended when 64-bit addressing is required. The data payload is sent after the control packet. Transfers are always padded to a multiple of 32 bits, regardless of their actual length.
HyperTransport packets enter the interconnect in segments known as bit times. The number of bit times required depends on the link width. HyperTransport also supports system management messaging, signaling interrupts, issuing probes to adjacent devices or processors,I/O transactions, and general data transactions. There are two kinds of write commands supported: posted and non-posted. Posted writes do not require a response from the target. This is usually used for high bandwidth devices such asuniform memory access traffic ordirect memory access transfers. Non-posted writes require a response from the receiver in the form of a "target done" response. Reads also require a response, containing the read data. HyperTransport supports the PCI consumer/producer ordering model.
HyperTransport also facilitatespower management as it is compliant with theAdvanced Configuration and Power Interface specification. This means that changes in processor sleep states (C states) can signal changes in device states (D states), e.g. powering off disks when the CPU goes to sleep. HyperTransport 3.0 added further capabilities to allow a centralized power management controller to implement power management policies.
The primary use for HyperTransport is to replace the Intel-definedfront-side bus, which is different for every type of Intel processor. For instance, aPentium cannot be plugged into aPCI Express bus directly, but must first go through an adapter to expand the system. The proprietary front-side bus must connect through adapters for the various standard buses, likeAGP or PCI Express. These are typically included in the respective controller functions, namely thenorthbridge andsouthbridge.
In contrast, HyperTransport is an open specification, published by a multi-company consortium. A single HyperTransport adapter chip will work with a wide spectrum of HyperTransport enabled microprocessors.
AMD used HyperTransport to replace thefront-side bus in theirOpteron,Athlon 64,Athlon II,Sempron 64,Turion 64,Phenom,Phenom II andFX families of microprocessors.
Another use for HyperTransport is as an interconnect forNUMAmultiprocessor computers. AMD used HyperTransport with a proprietarycache coherency extension as part of their Direct Connect Architecture in theirOpteron andAthlon 64 FX (Dual Socket Direct Connect (DSDC) Architecture) line of processors.Infinity Fabric used with theEPYC server CPUs is a superset of HyperTransport. TheHORUS interconnect fromNewisys extends this concept to larger clusters. The Aqua device from 3Leaf Systems virtualizes and interconnects CPUs, memory, and I/O.
HyperTransport can also be used as a bus inrouters andswitches. Routers and switches have multiple network interfaces, and must forward data between these ports as fast as possible. For example, a four-port, 1000 Mbit/sEthernet router needs a maximum 8000 Mbit/s of internal bandwidth (1000 Mbit/s × 4 ports × 2 directions)—HyperTransport greatly exceeds the bandwidth this application requires. However a 4 + 1 port 10 Gb router would require 100 Gbit/s of internal bandwidth. Add to that 802.11ac 8 antennas and the WiGig 60 GHz standard (802.11ad) and HyperTransport becomes more feasible (with anywhere between 20 and 24 lanes used for the needed bandwidth).
The issue of latency and bandwidth between CPUs and co-processors has usually been the major stumbling block to their practical implementation. Co-processors such asFPGAs have appeared that can access the HyperTransport bus and become integrated on the motherboard. Current generation FPGAs from both main manufacturers (Altera andXilinx) directly support the HyperTransport interface, and haveIP Cores available. Companies such as XtremeData, Inc. and DRC take these FPGAs (Xilinx in DRC's case) and create a module that allows FPGAs to plug directly into the Opteron socket.
AMD started an initiative namedTorrenza on September 21, 2006, to further promote the usage of HyperTransport for plug-in cards andcoprocessors. This initiative opened their "Socket F" to plug-in boards such as those from XtremeData and DRC.

A connector specification that allows a slot-based peripheral to have direct connection to a microprocessor using a HyperTransport interface was released by the HyperTransport Consortium. It is known asHyperTransport eXpansion (HTX). Using a reversed instance of the same mechanical connector as a 16-lanePCI Express slot (plus an x1 connector for power pins), HTX allows development of plug-in cards that support direct access to a CPU andDMA to the systemRAM. The initial card for this slot was theQLogic InfiniPath InfiniBand HCA. IBM andHP, among others, have released HTX compliant systems.
The original HTX standard is limited to 16 bits and 800 MHz.[3]
In August 2008, the HyperTransport Consortium released HTX3, which extends the clock rate of HTX to 2.6 GHz (5.2 GT/s, 10.7 GTi, 5.2 real GHz data rate, 3 MT/s edit rate) and retains backwards compatibility.[4]
The "DUT" test connector[5] is defined to enable standardized functional test system interconnection.
| HyperTransport version | Year | Max. HT frequency | Max. link width | Max. aggregate bandwidth (GB/s) | ||
|---|---|---|---|---|---|---|
| bi-directional | 16-bit unidirectional | 32-bit unidirectional* | ||||
| 1.0 | 2001 | 800 MHz | 32-bit | 12.8 | 3.2 | 6.4 |
| 1.1 | 2002 | 800 MHz | 32-bit | 12.8 | 3.2 | 6.4 |
| 2.0 | 2004 | 1.4 GHz | 32-bit | 22.4 | 5.6 | 11.2 |
| 3.0 | 2006 | 2.6 GHz | 32-bit | 41.6 | 10.4 | 20.8 |
| 3.1 | 2008 | 3.2 GHz | 32-bit | 51.2 | 12.8 | 25.6 |
* AMDAthlon 64, Athlon 64 FX,Athlon 64 X2, Athlon X2,Athlon II, Phenom,Phenom II,Sempron,Turion series and later use one 16-bit HyperTransport link. AMD Athlon 64 FX (1207),Opteron use up to three 16-bit HyperTransport links. Common clock rates for these processor links are 800 MHz to 1 GHz (older single and multi socket systems on 754/939/940 links) and 1.6 GHz to 2.0 GHz (newer single socket systems on AM2+/AM3 links—most newer CPUs using 2.0 GHz). While HyperTransport itself is capable of 32-bit width links, that width is not currently utilized by any AMD processors. Some chipsets though do not even utilize the 16-bit width used by the processors. Those include the NvidianForce3 150, nForce3 Pro 150, and theULi M1689—which use a 16-bit HyperTransport downstream link but limit the HyperTransport upstream link to 8 bits.
There has been some marketing confusion[citation needed] between the use ofHT referring toHyperTransport and the later use ofHT to refer toIntel'sHyper-Threading feature on somePentium 4-based and the newer Nehalem and Westmere-basedIntel Core microprocessors. Hyper-Threading is officially known asHyper-ThreadingTechnology (HTT) orHT Technology. Because of this potential for confusion, the HyperTransport Consortium always uses the written-out form: "HyperTransport."
Infinity Fabric (IF) is a superset of HyperTransport announced by AMD in 2016 as an interconnect for its GPUs and CPUs. When used internally it is called aGlobal Memory Interconnect (GMI).[7] It is also usable as interchip interconnect for communication between CPUs and CPUs, GPUs and GPUs, or CPUs and GPUs (forHeterogeneous System Architecture), an arrangement known asInfinity Architecture, with the links known asExternal Global Memory Interconnect (xGMI).[8][9][10][11] The company said the Infinity Fabric would scale from 30 GB/s to 512 GB/s, and be used in theZen-based CPUs andVega GPUs which were subsequently released in 2017.
On Zen andZen+ CPUs, the "SDF" data interconnects are run at the same frequency as the DRAM memory clock (MEMCLK), a decision made to remove the latency caused by different clock speeds. As a result, using a faster RAM module makes the entire bus faster. The links are 32-bit wide, as in HT, but 8 transfers are done per cycle (128-bit packets) compared to the original 2. Electrical changes are made for higher power efficiency.[12] OnZen 2 andZen 3 CPUs, the IF bus is on a separate clock (FCLK) and so is the unified memory controller (UCLK). The UCLK is either in a 1:1 or a 2:1 ratio to the DRAM clock (MCLK). This avoids a limitation on desktop platforms where maximum DRAM speeds were in practice limited by the IF speed. The bus width has also been doubled.[13] A latency penalty is present when the FCLK is not synchonized with the UCLK.[14] OnZen 4 and later CPUs, the IF bus is able to run at an asynchronous clock to the DRAM, to allow the higher clock speeds that DDR5 is capable of.[15]
Professional/Workstation models of AMD GPUs include an Infinity Fabric Linkedge connector for connecting the Infinity Fabric buses of GPUs together, bypassing the host PCIe bus. The Link "Bridge" device itself is a printed circuit board with 2 or 4 matching slots.[16] Each GPU family uses a different connector and the Bridge/Link generally only works between the GPUs of the same model. It is therefore similar to the plug-in board version ofNVLink.
Epyc CPUs based onZen 5 have internal Infinity Fabric connections of 36 GB/s per core. Each IO die has external Infinity Fabric connectivity on its multifunctional PCIe 5.0/Infinity Fabric serializer/deserializers (SerDes), reusing the PCIe physical layer. It is used for interprocessor communication in two-socket systems, providing 3 or 4 links of 64 GB/s each.[7]
Each Instinct MI250 has four lanes of Infinity Fabric Link of 50 GB/s each for mesh interconnection running thexGMI protocol. It connects to the host through PCIe Gen 4 x16 or Infinity Fabric on top of PCIe PHY. The bandwidth from multiple links, passing through different intermediate GPUs, can be aggregated.[17] For actually achievable performance figures, see Schieffer et al. (2024).[18]
UALink utilizes Infinity Fabric/xGMI as one of its shared memory protocols.
Broadcom produces PCIe switches and network interface cards with xGMI support.[19][20]