TheWhetstone benchmark is a syntheticbenchmark for evaluating the performance ofcomputers.[1] It was first written inALGOL 60 in 1972 at the Technical Support Unit (TSU) of theDepartment of Trade and Industry (later part of theCentral Computer and Telecommunications Agency (CCTA)) in theUnited Kingdom. It was derived from statistics on program behaviour gathered on theKDF9 computer at theNational Physical Laboratory (NPL), using a modified version of its WhetstoneALGOL 60 compiler.[2] The workload on the machine was represented as a set of frequencies of execution of the 124 instructions of the Whetstone Code. The Whetstone Compiler was built at the Atomic Power Division of theEnglish Electric Company inWhetstone, Leicestershire, England,[3] hence its name. Dr. B.A. Wichman at NPL produced a set of 42 simple ALGOL 60 statements, which in a suitable combination matched the execution statistics.
To make a more practical benchmark Harold Curnow of TSU wrote a program incorporating the 42 statements. This program worked in its ALGOL 60 version, but when translated intoFORTRAN and compiled by IBM's optimizing FORTRAN compiler, calculations whose results were not output were optimized out by the compiler. He then produced a set of program fragments which were more like real code and which collectively matched the original 124 Whetstone instructions. Timing this program gave a measure of the machine's speed in thousands of Whetstone instructions per second (kWIPS). The Fortran version became the first general purpose benchmark that set industry standards of computer system performance. Further development was carried out by Roy Longbottom, also of TSU/CCTA, who became the official design authority.
In July 2010, the original Algol 60 program ran once again under the Whetstone compiler, 30 years since the shutdown of the last KDF9 machine. The program was executed by a KDF9 emulator.[4]
The benchmark employs 8 test procedures:[5]
The original version only reported comprised parameters used for each test, numeric results produced, and the overall KWIPS performance rating.[5]
In 1978, the program was updated to log running time of each of the tests. As a result, each individual test could report a score. The three floating-point tests each reported one version of theMFLOPS (Millions of Floating Point Operations Per Second) statistic. The rest each reported a MOPS statistic. In 1980, the MOPS of tests 3, 4 and 7 were combined to form the VAX MIPS, a Millions of Integer Instructions Per Second measure calibrated to read the Digital VAX 11/780 as 1 MIPS.[5]
Note that there are other versions of the Whetstone Benchmark available online, some claiming copyright, without reference to CCTA or the design authority.
In conjunction with the undertaking controlled by the Contracts Division, CCTA engineers had responsibility to design and supervise acceptance trials[6] of allUK Government computers and those for centrally funded for Universities andResearch Councils, with systems varying fromminicomputers tosupercomputers. This provided the opportunity to gather verified Whetstone Benchmark results. Other results were obtained via new computer system appraisal activities.
CCTA records are now available in TheUKNational Archives,[7] including technical reports. Original Whetstone Benchmark results are in the 1985 CCTA Technical Memorandum 1182, where overall speeds is only shown as MWIPS. This contains more than 1000 results for 244 computers from 32 manufacturers, including the first for PCs and supercomputers.
As mentioned above, per item scoring was added in 1978.
The VAX-11/780 was the first commercially available 32-bit computer to demonstrate 1 MIPS,[8] as measured by performance relative to the IBMSystem/370 architecture (it was specifically as fast as the model 158–3). In 1980, Whetstone added a feature where the MOPS of tests 3, 4 and 7 were combined to form the VAX MIPS, millions of instructions using VAX-11/780 as the definition of 1 MIPS.[5]
Roy Longbottom converted the original Whetstone Benchmark to fully exploit capabilities of the newvector processors since 1972.[5] Results were included in the paper “Performance of Multi-User Supercomputing Facilities” presented in the 1989 Fourth International Conference on Supercomputing, Santa Clara.[9][10]
This was also repeated in the Harold Curnow paper “Whither Whetstone? The synthetic benchmark after 15 years” presented at the “Evaluating supercomputers: strategies for exploiting, evaluating and benchmarking computers with advanced architecture” conference in 1990, in book .[11]
In 1987, code changes were also carried out, including byBangor University, necessary to identify unexpected behaviour, without changing the implementation of the original 124 Whetstone instructions. One necessary change was to maintain measurement accuracy at increasing CPU speeds, with self calibration to run for a noticeable time, typically set for 10 seconds or 100 for early PCs with low timer resolution.
Following retirement from CCTA, Roy Longbottom continued providing free benchmarking and stress testing programs available on his web site, latterly roylongbottom.org.uk, with most development usingC on PCs runningMicrosoft Windows andLinux. This was initially in conjunction with theCompuserve Benchmarks and Standards Forum,[12] covering PC hardware 1997 to 2008, providing numerous new benchmark results.
From 2008 to 2013 further PC results were collected privately. By then, PC processor operating clock speeds reached 4 GHz and did not increase that much by the 2020s, reducing the need to gather results of the original scalar benchmark. In 2017 "Whetstone Benchmark History and Results"[13] was published for public access, with identified year of first delivery and purchase prices were added, also doubling the number of computers covered in the CCTA report. The most notable citation for this was by Tony Voellm, then Google Cloud Performance Engineering Manager, entitled "Cloud Benchmarking: Fight the black hole".[14] This considered available benchmarks and performance by time with detailed graphs, including those from the Whetstone reports. At a later stage, 504 of the results, by year, were included in the report "Techniques used for analyzing basic performance measurements".[15]
During this period, versions of the Whetstone Benchmark were produced to accessmultiprocessor/multi-core systems andCPU multithreading, initially for PCs running underMicrosoft Windows, the latest supporting up to 8 CPUs or CPU cores particularly for those known as 4 core/8 thread varieties.
The History report includes new sections for PC results, with CPUs from 1979, particularly those produced by up to 12 different compilers or interpreters, covering C/C++ (up to 64-bit SSE level), Old Fortran,BASIC, andJava. These are based on the ratio MWIPS per MHz (multiplied by 100) to represent efficiency. Bottom line is one with a Core i7 CPU with ratings varying from 0.39, via the Basic Interpreter, to 311, via C, using 64-bit SSE options, then 1003 with the multithreading benchmarks, using all four CPU cores.
Another report "Whetstone Benchmark Detailed Later Results"[16] was produced in 2017. This document provides a summary of speeds of the eight test loops in the benchmark, as MfLOPS or MOPS plus the MWIPS ratings. There are 22 pages of results covering the same Windows based PCs as the Historic file with different compilers and compiling options, some with multithreaded versions. Later results cover PCs using Linux. Then there are others for a sample of Android phones and tablets and, at the time, the full range of Raspberry Pi computers. For the latter, Roy Longbottom had been recruited as a voluntary member of Raspberry Pi Foundation new products Alpha Testing Team.
Later scalar, vector and multithreading results were included in a 2022 report "Cray 1 Supercomputer Performance Comparisons With Home Computers Phones and Tablets".[17] This included the following, originally in a report on the firstRaspberry Pi computer:
In 1978, the Cray 1 supercomputer cost $7 Million, weighed 10,500 pounds and had a 115 kilowatt power supply. It was, by far, the fastest computer in the world. The Raspberry Pi 4B costs around $70 (CPU board, case, power supply, SD card), weighs a few ounces, uses a 5 watt power supply and is more than 4.5 times faster than the Cray 1.
This result was based on the official average performance of theLivermore loops benchmark that was used to demonstrate that the first Cray-1 met the required contractual requirements. The scalar Whetstone benchmark achieved a much higher gain of 16.7 times improvement.The report includes comparisons with other supercomputers, a modern fairly fast laptop PC and the 2020 Raspberry Pi 400, where the latter obtained MWIPS gains over the Cray-1 of 155 times scalar, 38 vector and 593 scalar multithreading (4 CPU cores versus 1). The quad-core laptop, using advancedSIMD compilations, obtained gains of 400, 215 and 3520 times respectively.
The integer-onlyDhrystone also adopted the idea of using VAX 11/780 as the performance reference for "1 MIPS". The VAX 11/780 ran at 1757 Dhrystones Per Second, hence each 1757 Dhrystones Per Second is equivalent to one Dreystone MIPS (DMIPS).
As used in modern times, the Whetstone benchmark is generally ran to obtain its FLOPS metrics (floating-point performance). Although Whetstone also offers integer performance measures, theDhrystone benchmark, being more focused on integer and string operations, is more commonly used for this purpose.
Harold also reported comments from the 1989 conference “Software for Parallel Computers” in a presentation byGordon Bell, designer of theDigital Equipment CorporationVAX range of minicomputers, indicating that the range was designed to perform well on the Whetstone Benchmark.
The Whetstone Benchmark also had high visibility concerning floating point performance ofIntel CPUs and PCs, starting with the 1980 Intel8087 coprocessor. This was reported in the 1986 Intel Application Report “High Speed Numerics with the80186/80188 and8087".[18] The latter includes hardware functions for exponential, logarithmic or trigonometric calculations, as used in two of the eight Whetstone Benchmark tests, where these can dominate running time. Only two other benchmarks were included in the Intel procedures, showing huge gains over the earlier software-based routines on all three programs.
Later tests, by a SSMEC Laboratory, evaluated Intel80486 compatible CPU chips using their Universal Chip Analyzer.[19] Considering two floating point benchmarks, as used by Intel in the above report, they preferred Whetstone, stating "Whetstone utilizes the complete set of instructions available on earlyx87 FPUs". This might suggest that the Whetstone Benchmark influenced the hardware instruction set.
By the 1990s the Whetstone Benchmark and results had become relatively popular. A notable quotation in 1985 was in "A portable seismic computing benchmark" stating "The only commonly used benchmark to my knowledge is the venerable Whetstone benchmark, designed many years ago to test floating point operations" in theEuropean Association of Geoscientists and Engineers Journal.[20]
Details of the Vector Whetstone Benchmark performance were also repeated, by Roy Longbottom, at the June 1990 Advanced Computing Seminar atNatural Environment Research Council Wallingford. This led toCouncil for the Central Laboratory of the Research Councils Distributed Computing Support collecting results from running “on a variety of machines, including vector supercomputers, minisupers, super-workstations and workstations, together with that obtained on a number of vector CPUs and on single nodes of various MPP machines “. More than 200 results are included, up to 2006, in the report in entries to at least the year 2007 section.[21] The report also indicated "The wide variety of standard functions exercised (sqrt, exp, cos etc.) consume a far larger fraction of the reported times.".
{{cite book}}:ISBN / Date incompatibility (help)