TheCray-3 was avectorsupercomputer,Seymour Cray's designated successor to theCray-2. The system was one of the first major applications ofgallium arsenide (GaAs) semiconductors in computing, using hundreds of custom builtICs packed into a 1 cubic foot (0.028 m3)CPU. The design goal was performance around 16GFLOPS, about 12 times that of the Cray-2.
Work started on the Cray-3 in 1988 atCray Research's (CRI) development labs inChippewa Falls, Wisconsin. Other teams at the lab were working on designs with similar performance. To focus the teams, the Cray-3 effort was moved to a new lab inColorado Springs, Colorado later that year. Shortly thereafter, the corporate headquarters inMinneapolis decided to end work on the Cray-3 in favor of another design, theCray C90. In 1989 the Cray-3 effort was spun off to a newly formed company,Cray Computer Corporation (CCC).
The launch customer,Lawrence Livermore National Laboratory, cancelled their order in 1991 and a number of company executives left shortly thereafter. The first machine was finally ready in 1993, but with no launch customer, it was instead loaned as a demonstration unit to the nearbyNational Center for Atmospheric Research inBoulder. The company went bankrupt in May 1995, and the machine was officially decommissioned.
With the delivery of the first Cray-3,Seymour Cray immediately moved on to the similar-but-improvedCray-4 design, but the company went bankrupt before it was completely tested.[1] The Cray-3 was Cray's last completed design; with CCC's bankruptcy, he formed SRC Computers to concentrate on parallel designs, but died in a car accident in 1996 before this work was delivered.[2]
Seymour Cray began the design of the Cray-3 in 1985, as soon as theCray-2 reached production.[3] Cray generally set himself the goal of producing new machines with ten times the performance of the previous models. Although the machines did not always meet this goal, this was a useful technique in defining the project and clarifying what sort of process improvements would be needed to meet it.[4] For the Cray-3, he decided to set an even higher performance improvement goal, an increase of 12x over the Cray-2.[5]
Cray had always attacked the problem of increased speed with three simultaneous advances; moreexecution units to give the system higherparallelism, tighter packaging to decrease signal delays, and faster components to allow for a higher clock speed. Of the three, Cray was normally least aggressive on the last; his designs tended to use components that were already in widespread use, as opposed to leading-edge designs.[4]
For the Cray-2, he introduced a novel 3D-packaging system for itsintegrated circuits to allow higher densities,[6] and it appeared that there was some room for improvement in this process. For the new design, he stated that all wires would be limited to a maximum length of 1 foot (0.30 m). This would demand the processor be able to fit into a 1 cubic foot (0.028 m3) block, about1⁄3 that of the Cray-2 CPU. This would not only increase performance but make the system 27 times smaller.[7]
For a 12x performance increase, the packaging alone would not be enough, the circuits on the chips themselves would also have to speed up. The Cray-2 appeared to be pushing the limits of the speed ofsilicon-basedtransistors at 4.1 ns (244 MHz), and it did not appear that anything more than another 2x would be possible. If the goal of 12x was to be met, more radical changes would be needed, and a "high tech" approach would have to be used.[8]
Cray had intended to usegallium arsenide circuitry in the Cray-2, which would not only offer much higher switching speeds but also used less energy and thus ran cooler as well. At the time the Cray-2 was being designed, the state of GaAs manufacturing simply was not up to the task of supplying a supercomputer.[9] By the mid-1980s, things had changed and Cray decided it was the only way forward.[10] Given a lack of investment on the part of large chip makers, Cray decided to invest in a GaAs chipmaking startup, GigaBit Logic, and use them as an internal supplier.[11]
Describing the system in November 1988, Cray stated that the 12 times performance increase would be made up of a three times increase due to GaAs circuits, and four times due to the use of more processors. One of the problems with the Cray-2 had been poor multiprocessing performance due to limitedbandwidth between the processors, and to address this the Cray-3 would adopt the much faster architecture used in theCray Y-MP. This would provide a design performance of 8000MIPS, or 16GFLOPS.[7]
The Cray-3 was originally slated for delivery in 1991.[12] This was during a time when the supercomputer market was rapidly shrinking from 50% annual growth in 1980, to 10% in 1988.[10] At the same time, Cray Research was also working on the Y-MP, a faster multi-processor version of the system architecture tracing its ancestry to the originalCray-1. In order to focus the Y-MP and Cray-3 groups, and with Cray's personal support,[13] the Cray-3 project moved to a new research center inColorado Springs.[3]
By 1989, the Y-MP was starting deliveries, and the main CRI lab inChippewa Falls, Wisconsin, moved on to the C90, a further improvement in the Y-MP series.[14][15] With only 25 Cray-2s sold, management decided that the Cray-3 should be put on "low priority" development. In November 1988, the Colorado Springs lab was spun off asCray Computer Corporation (CCC), with CRI retaining 10% of the new company's stock and providing an $85 million promissory note to fund development.[3] Cray himself was not a shareholder in the new company, and worked under contract.[16][17] As CRI retained the lease on the original building, the new company had to move once again, introducing further delays.[3][6]
By 1991, development was behind schedule.[18] Development slowed even more whenLawrence Livermore National Laboratory cancelled its order for the first machine,[19] in favor of the C90. Several executives, including the CEO, left the company.[16] The company then announced they would be looking for a customer that needed a smaller version of the machine, with four to eight processors.[20]
The first (and only) production model (serial number S5, namedGraywolf) was loaned toNCAR as a demonstration system in May 1993. NCAR's version was configured with 4 processors and a 128 MWord (64-bit words, 1 GB) common memory.[21] In service, thestatic RAM proved to be problematic. It was also discovered that thesquare root code contained a bug that resulted in 1 in 60 million calculations being wrong. Additionally, one of the four CPUs was not running reliably.[22]
CCC declared bankruptcy in March 1995, after spending about $300 million of financing. NCAR's machine was officially decommissioned the next day.[23] Seven system cabinets, or "tanks", serial numbers S1 to S7, were built for Cray-3 machines. Most were for smaller two-CPU machines. Three of the smaller tanks were used on theCray-4 project,[24] essentially a Cray-3 with 64 faster CPUs running at 1 ns (1 GHz) and packed into an even smaller space.[25] Another was used for theCray-3/SSS project.[26]
The failure of the Cray-3 was in large part due to the changing political and technical climate. The machine was being designed during the collapse of theWarsaw Pact and ending of theCold War, which led to a massive downsizing in supercomputer purchases.[20][27] At the same time, the market was increasingly investing inmassively parallel (MP or MPP) designs. Cray was critical of this approach, and was quoted byThe Wall Street Journal as saying that MPP systems had not yet proven their supremacy over vector computers, noting the difficulty many users have had programming for large parallel machines. "I don't think they'll ever be universally successful, at least not in my lifetime".[27]
The Cray-3 system architecture comprised aforeground processing system, up to 16background processors and up to 2 gigawords (16 GB) ofcommon memory. The foreground system was dedicated toinput/output and system management. It included a 32-bit processor and four synchronous data channels formass storage and network devices, primarily viaHiPPI channels.[28]
Each background processor consisted of acomputation section, acontrol section andlocal memory. The computation section performed64-bit scalar,floating point andvector arithmetic. The control section provided instruction buffers, memory management functions, and areal-time clock. 16 kilowords (128 kbytes) of high-speed local memory was incorporated into each background processor for use as temporary scratch memory.[29]
Common memory consisted of siliconCMOSSRAM, organized intooctants of 64 banks each, with up to eight octants possible. Theword size was 64-bits plus eighterror-correction bits, and total memory bandwidth was rated at 128 gigabytes per second.[30]
As with previous designs, the core of the Cray-3 consisted of a number ofmodules, each containing several circuit boards packed with parts. In order to increase density, the individualGaAs chips were notpackaged, and instead several were mounted directly with ultrasonic gold bonding to a board approximately 1 inch (25 mm) square. The boards were then turned over and mated to a second board carrying the electrical wiring, with wires on this card running through holes to the "bottom" (opposite the chips) side of the chip carrier where they were bonded, hence sandwiching the chip between the two layers of board. Thesesubmodules were then stacked four-deep and, as in the Cray-2, wired to each other to make a 3D circuit.[21]
Unlike the Cray-2, the Cray-3 modules also includededge connectors. Sixteen such submodules were connected together in a 4×4 array to make a single module measuring 121 by 107 by 7 millimetres (4.76 in × 4.21 in × 0.28 in). Even with this advanced packaging the circuit density was low even by 1990s standards, at about 96,000 gates per cubic inch.[31] Modern CPUs offer gate counts of millions per square inch, and the move to 3D circuits was still just being considered as of 2017[update].[32]
Thirty-two such modules were then stacked and wired together with a mass of twisted-pair wires into a single processor. The basic cycle time was 2.11 ns, or 474 MHz, allowing each processor to reach about 0.948GFLOPS, and a 16 processor machine a theoretical 15.17 GFLOP. Key to the high performance was the high-speed access to main memory, which allowed each process to burst up to 8 GB/s.[33]
The modules were held together in an aluminum chassis known as a "brick". The bricks were immersed in liquidfluorinert for cooling, as in the Cray-2. A four-processor system with 64 memory modules dissipated about 88 kW of power.[21] The entire four-processor system was about 20 inches (510 mm) tall and front-to-back, and a little over 2 feet (0.61 m) wide.[34]
For systems with up to four processors, the processor assembly sat under a translucent bronzed acrylic cover at the top of a cabinet 42 inches (1.1 m) wide, 28 inches (0.71 m) deep and 50 inches (1.3 m) high,[34] with the memory below it, and then the power supplies and cooling systems on the bottom. Eight and 16-processors system would have been housed in a larger octagonal cabinet. All in all, the Cray-3 was considerably smaller than the Cray-2, itself relatively small compared to other supercomputers.[34]
In addition to the system cabinet, a Cray-3 system also needed one or two (depending on number of processors)system control pods (or "C-Pods"), 52.5 inches (1.33 m) square and 55.3 inches (1.40 m) high, containing power and cooling control equipment.[34]
The following possible Cray-3 configurations were officially specified:[35]
Name | CPUs | Memory (Mwords) | I/O Modules |
---|---|---|---|
Cray-3/1-256 | 1 | 256 | 1 |
Cray-3/2-256 | 2 | 256 | 1 |
Cray-3/4-512 | 4 | 512 | 3 |
Cray-3/4-1024 | 4 | 1024 | 3 |
Cray-3/4-2048 | 4 | 2048 | 3 |
Cray-3/8-1024 | 8 | 1024 | 7 |
Cray-3/8-2048 | 8 | 2048 | 7 |
Cray-3/16-2048 | 16 | 2048 | 15 |
The Cray-3 ran the Colorado Springs Operating System (CSOS) which was based upon Cray Research'sUNICOSoperating system version 5.0.A major difference between CSOS and UNICOS was that CSOS was ported to standard C with allPCC extensions that were used in UNICOS removed.[36]
Much of the software available under the Cray-3 was derived from Cray Research and included for instance theX Window System, vectorizingFORTRAN andC compilers,NFS and aTCP/IP stack.[37][36]