Movatterモバイル変換


[0]ホーム

URL:


Integer micro-benchmarks [Smalltalk Multi-Precision Numerics vis-a-vis Highly Optimized Fixed Size Integer C++ Numerics]

David Simmonspulsar at qks.com
Mon Apr 23 21:26:10 EDT 2001


This benchmark is actually a reasonable test of compiler (loop) optimizationand integer numerics optimizations.On a 1.2GHz Athlon processor with:    512 MB memory    Dual Raid ATA 100 IBM Hard drives    Windows 2000 Professional Service Pack 2I ran two different tests. The first one loops to generate "only"SmallInteger sums; the (900,000) loop. The second test is based on theoriginally posted sample and performs the 1,000,000 count loop; and thusgenerates a significant number of large integers.For each test I launched the application, ran it once and discarded theresult and shutdown. Then I repeated this same procedure three more timesand used the mean of the three runs.Unoptimized C++ code: (Visual Studio 6 Service Pack 5 w/processor pack 1)    730ms for loop over 900,000    808ms for loop over 1,000,000Best optimized C++ inlined code: (Visual Studio 6 Service Pack 5 w/processorpack 1)    332ms for loop over 900,000        2.1988 : 1 Unoptimized C++ Ratio    374ms for loop over 1,000,000        2.1604: 1 Unoptimized C++ Ratio**Variations in the 3rd decimal place of above numbers are the result ofwindows performance counter precision and OS process/thread slicingvariations.For the C++ code and SmallScript the QueryPerformanceCounters call was usedto obtain the millisecond timings. Presumably similar timers are used withinVisualWorks and Dolphin (but I don't know).NOTE: Use of GetTickCount() has rounding loss that can result in reportingof times which are up to 20ms less than actual time.**In the following tables, "shorter" times and "smaller" ratios are better.In summary, a Smalltalk VM (without using adaptive type based inlining) canachieve roughly a 4:1 ratio of performance relative to highly optimized C++code performing "pure" integer numerics. SmallScript's v4 AOS PlatformVM/Jitter *has not* been agressively tuned for numerics or similaroperations so I would expect nominal improvements of some form as itmatures.To get a sense of what further adaptive inline compilation can achieve Inote that SmallScript on the v4 vAOS Platform VM will execute the 900,000loop case in 1,215ms when the triangle method is inlined. If we assumed thatthe JIT/compiler was capable of hoisting the invariant calculation of 10triangle out of the loop then we would see a time of 95ms for SmallScript onthe v4 AOS Platform VM.Remember: The Smalltalk code is performing multi-precision arithmetic andthus has a significant number of overflow and type checks it performs. Anagressive adaptive inlining JIT compiler could dynamically eliminate many ofthe typechecks by calculating the type graph for the code-flow tree andgenerating separate versions based on likely types. Presumably this wouldalso allow most of the intermediate values to be retained in registersrather than in stack local memory. The resulting ratio for multi-precisionnumerics would most likely somwhere around 2 : 1 with highly optimized C++performing *non-multi-precision-arithmetic*.I should also point out that this kind of test represents a *worst-case*type of scenario for Smalltalk (dynamic/script language) performance (whereit is handling arbitrary/multi-precision arithmetic) vis-a-vis staticallytyped and highly optimized C++ code performing fixed size integer truncatedarithmetic.=================================SmallScript v4 AOS Platform VM    1,328ms for loop over 900,000        1.819 : 1 Unoptimized C++ Ratio:        4.000 : 1 Optimized C++ Ratio    1,576ms for loop over 1,000,000 (GC tuning for tight memory raises thisto 1,874ms)        2.159 : 1 Unoptimized C++ Ratio (2.567 : 1)        4.747 : 1 Optimized C++ Ratio (5.644 : 1)Cincom VisualWorks 5i3NC    1,457ms for loop over 900,000        1.9959 : 1 Unoptimized C++ Ratio:        4.3886 : 1 Optimized C++ Ratio    1,789ms for loop over 1,000,000        2.4507 : 1 Unoptimized C++ Ratio:        5.3886 : 1 Optimized C++ RatioDolphin Smalltalk Professional 4.01    12,086ms for loop over 900,000        16.556 : 1 Unoptimized C++ Ratio:        36.404 : 1 Optimized C++ Ratio    13,434ms for loop over 1,000,000        18.403 : 1 Unoptimized C++ Ratio:        40.464 : 1 Optimized C++ RatioI did not test VisualAge, Squeak, GNU, Gemstone. I did an empirical(1,000,000) loop test with Smalltalk/X from CampSmalltalk#1 and it ran inroughly 5212ms (this is only rough because I had to run the test ondifferent hardware using SmallScript and C++ as a baseline for scaling theresult).-- Dave Simmonswww.qks.com / www.smallscript.com"Bob Nemec" <bobn at home.com> wrote in messagenews:D72166C0036950F6.D162DFAAECC52CD9.F2A613E2A85334A6 at lp.airnews.net...>brangdon at cix.co.uk says...> > To make the measurements less dependant on specific hardware, I suggest> > we express speeds as a proportion of C++'s speed doing the same thingbut> > using fixed integers.> >> Interesting idea: a standard set of cross-language, cross-platform> benchmarks.>> However, in general I think benchmarks do Smalltalk a disservice.> Small tight independent chunks of code are not Smalltalk's strength;> large complex systems are.>> The standard argument (which I agree with) is that Smalltalk can scale> better than any other language, and that the truth of that statement> becomes more self evident the larger your systems get.>> FWIW: I ran your little benchmark on VA, VW, Squeak and GemStone> (care to publish a Window EXE with your C++ code? ... no compiler on this> machine).> The ratios are:> VW: 1.0> VA: 3.37> Sqeak: 18.13> GS:  25.61>> Details...> "VW 3500"> #(550000000 3500)> #(550000000 3475)> #(550000000 3712)>> "VA 11790"> (550000000 11790)> (550000000 11797)> (550000000 11787)>> "Squeak 63440"> #(550000000 63411)> #(550000000 63471)>> "GemStone 89650"> anArray( 550000000, 88897)> anArray( 550000000, 90400)> --> Bob Nemec> Newcastle Objects>bobn at home.com


More information about the Python-listmailing list

[8]ページ先頭

©2009-2025 Movatter.jp