Embed presentation
Download to read offline

























![Hardware parameter tuningBehavior of the simple application with two kernels• Low computational intensity – DGEMV• High computational intensity – DGEMM• Tuning of three parameters• Core frequency• Uncore frequency• Number of OpenMP threads• Visualized by RADAR....Low CI (DGEMV) High CI (DGEMM)10 threads2.2 GHz UCF1.2 GHz CF12 threads1.2 GHz UCF2.5 GHz CFStatic tuning for both kernels12 threads2.2 GHz UCF2.4 GHz CFComputenodeenergyconsumption[J]CPU core frequency [GHz] CPU core frequency [GHz] CPU core frequency [GHz]Computenodeenergyconsumption[J]Computenodeenergyconsumption[J]Note: runtime of both kernels was equal for default settingsTwo kernels with1:1 workload ratioEnergyconsumptionEnergysavingsDefault settings 2017J - -Static optimal 1833J 179J 9%Dynamic optimal 1612J 221J 12%Total savings - 400J 20%](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fpop-webinar-readex-v2-200406164655%2f75%2fEnergy-Efficient-Computing-using-Dynamic-Tuning-26-2048.jpg&f=jpg&w=240)


![Tuning of COMPUTE bound workload• behavior of the platform when running memory bound workload• under 145 W (TDP level, no power cap)• three different power cap levels 100 W, 80 W and 60 W.3,268s 3,268s3,903s3,903s7,409s3,577s7,693s4,379s3,653s363,4J311,8J 311,8J285,4J304,2J271,6J290,0J0501001502002503003504002,54,56,58,510,512,51,0 1,2 1,4 1,6 1,8 2,0 2,2 2,4 2,6 2,8 3,0Energyconsumption[J]Runtime[s]Frequency [GHz]Tuningof computebound region under 80W power capEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - min UCFEXP7 - DVFS & UCF - max UCFEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - min UCFEXP7 - DVFS & UCF - max UCF8.5% energy savings8.4% time savings12.9% energy savings10.9% time extension3,268s3,450s 3,450s7,411s3,293s4,378s7,698s363,4J344,4J 344,4J300,4J293,0J305,4J271,0J297J0501001502002503003504002,54,56,58,510,512,51,0 1,2 1,4 1,6 1,8 2,0 2,2 2,4 2,6 2,8 3,0Energyconsumption[J]Runtime[s]Frequency [GHz]Tuningof computebound region under 100W power capEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - min UCFEXP7 - DVFS & UCF - max UCFEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - min UCFEXP7 - DVFS & UCF - max UCF14.9% energy savings4.5% time savings21.3% energy savings21.1% time extension3,268s4,944s 4,944s7,410s4,849s4,477s7,692s4,565s 4,606s363,4J296J295,0J268,0J303,0J270,4J0501001502002503003504002,54,56,58,510,512,51,0 1,2 1,4 1,6 1,8 2,0 2,2 2,4 2,6 2,8 3,0Energyconsumption[J]Runtime[s]Frequency [GHz]Tuningof computebound region under 60W power capEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - max UCFEXP7 - DVFS & UCF - min UCFEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - max UCFEXP7 - DVFS & UCF - min UCF9.1% energy savings9.4% time savings](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fpop-webinar-readex-v2-200406164655%2f75%2fEnergy-Efficient-Computing-using-Dynamic-Tuning-29-2048.jpg&f=jpg&w=240)
![3,268s3,450s 3,450s7,411s3,293s4,378s7,698s363,4J344,4J344,4J300,4J293,0J305,4J271,0J297J0501001502002503003504002,54,56,58,510,512,51,0 1,2 1,4 1,6 1,8 2,0 2,2 2,4 2,6 2,8 3,0Energyconsumption[J]Runtime[s]Frequency [GHz]Tuning of compute bound region under 100W power capEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - min UCFEXP7 - DVFS & UCF - max UCFEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - min UCFEXP7 - DVFS & UCF - max UCF14.9% energy savings4.5% time savings21.3% energy savings21.1% time extensionTuning of COMPUTE bound workload under 100W power cap](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fpop-webinar-readex-v2-200406164655%2f75%2fEnergy-Efficient-Computing-using-Dynamic-Tuning-30-2048.jpg&f=jpg&w=240)
![3,268s3,268s3,903s 3,903s7,409s3,577s7,693s4,379s3,653s363,4J311,8J 311,8J285,4J304,2J271,6J290,0J0501001502002503003504002,54,56,58,510,512,51,0 1,2 1,4 1,6 1,8 2,0 2,2 2,4 2,6 2,8 3,0Energyconsumption[J]Runtime[s]Frequency [GHz]Tuning of compute bound region under 80W power capEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - min UCFEXP7 - DVFS & UCF - max UCFEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - min UCFEXP7 - DVFS & UCF - max UCF8.5% energy savings8.4% time savings12.9% energy savings10.9% time extensionTuning of COMPUTE bound workload under 80W power cap](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fpop-webinar-readex-v2-200406164655%2f75%2fEnergy-Efficient-Computing-using-Dynamic-Tuning-31-2048.jpg&f=jpg&w=240)
![3,268s4,944s 4,944s7,410s4,849s4,477s7,692s4,565s 4,606s363,4J296J295,0J268,0J303,0J270,4J0501001502002503003504002,54,56,58,510,512,51,0 1,2 1,4 1,6 1,8 2,0 2,2 2,4 2,6 2,8 3,0Energyconsumption[J]Runtime[s]Frequency [GHz]Tuning of compute bound region under 60W power capEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - max UCFEXP7 - DVFS & UCF - min UCFEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - max UCFEXP7 - DVFS & UCF - min UCF9.1% energy savings9.4% time savingsTuning of COMPUTE bound workload under 60W power cap](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fpop-webinar-readex-v2-200406164655%2f75%2fEnergy-Efficient-Computing-using-Dynamic-Tuning-32-2048.jpg&f=jpg&w=240)

![Tuning of memory bound workload• behavior of the platform when running memory bound workload• under 145 W (TDP level, no power cap)• three different power cap levels 100 W, 80 W and 60 W.1,886s1,959s1,886s197,6J188,2J188,2J148,6J115,2J170J145,6J0501001502002501,82,32,83,33,84,34,85,35,81,0 1,2 1,4 1,6 1,8 2,0 2,2 2,4 2,6 2,8 3,0Energyconsumption[J]Runtime[s]Frequency [GHz]Tuningofmemorybound region under 100W power capEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - UCF = 2.2GHzEXP7 - DVFS & UCF - max UCFEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - UCF = 2.2GHzEXP7 - DVFS & UCF - max UCF38.7% energy savings3.6% time extension1,886s1,920s1,890s1,959s197,6J153,2J114,4J146,0J0501001502002501,82,32,83,33,84,34,85,35,81,0 1,2 1,4 1,6 1,8 2,0 2,2 2,4 2,6 2,8 3,0Energyconsumption[J]Runtime[s]Frequency [GHz]Tuningofmemorybound region under 80W power capEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - UCF = 2.2GHzEXP7 - DVFS & UCF - max UCFEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - UCF = 2.2GHzEXP7 - DVFS & UCF - max UCF21.6% energy savings3.6% time extension1,886s2,475s 2,475s1,945s2,397s1,925s197,6J147,8J116,2J115,0J0501001502002501,82,32,83,33,84,34,85,35,81,0 1,2 1,4 1,6 1,8 2,0 2,2 2,4 2,6 2,8 3,0Energyconsumption[J]Runtime[s]Frequency [GHz]Tuningofmemorybound region under 60W power capEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - UCF = 2.2GHzEXP7 - DVFS & UCF - max UCFEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - UCF = 2.2GHzEXP7 - DVFS & UCF - max UCF22.2% energy savings22.2% time savings](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fpop-webinar-readex-v2-200406164655%2f75%2fEnergy-Efficient-Computing-using-Dynamic-Tuning-34-2048.jpg&f=jpg&w=240)
![1,886s1,959s1,886s197,6J188,2J188,2J148,6J115,2J170J145,6J0501001502002501,82,32,83,33,84,34,85,35,81,0 1,2 1,4 1,6 1,8 2,0 2,2 2,4 2,6 2,8 3,0Energyconsumption[J]Runtime[s]Frequency [GHz]Tuning of memory bound region under 100W power capEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - UCF = 2.2GHzEXP7 - DVFS & UCF - max UCFEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - UCF = 2.2GHzEXP7 - DVFS & UCF - max UCF38.7% energy savings3.6% time extensionTuning of memory bound workload under 100W power cap](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fpop-webinar-readex-v2-200406164655%2f75%2fEnergy-Efficient-Computing-using-Dynamic-Tuning-35-2048.jpg&f=jpg&w=240)
![1,886s1,920s1,890s1,959s197,6J153,2J114,4J146,0J0501001502002501,82,32,83,33,84,34,85,35,81,0 1,2 1,4 1,6 1,8 2,0 2,2 2,4 2,6 2,8 3,0Energyconsumption[J]Runtime[s]Frequency [GHz]Tuning of memory bound region under 80W power capEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - UCF = 2.2GHzEXP7 - DVFS & UCF - max UCFEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - UCF = 2.2GHzEXP7 - DVFS & UCF - max UCF21.6% energy savings3.6% time extensionTuning of memory bound workload under 80W power cap](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fpop-webinar-readex-v2-200406164655%2f75%2fEnergy-Efficient-Computing-using-Dynamic-Tuning-36-2048.jpg&f=jpg&w=240)
![1,886s2,475s 2,475s1,945s2,397s1,925s197,6J147,8J116,2J115,0J0501001502002501,82,32,83,33,84,34,85,35,81,0 1,2 1,4 1,6 1,8 2,0 2,2 2,4 2,6 2,8 3,0Energyconsumption[J]Runtime[s]Frequency [GHz]Tuning of memory bound region under 60W power capEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - UCF = 2.2GHzEXP7 - DVFS & UCF - max UCFEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - UCF = 2.2GHzEXP7 - DVFS & UCF - max UCF22.2% energy savings22.2% time savingsTuning of memory bound workload under 60W power cap](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fpop-webinar-readex-v2-200406164655%2f75%2fEnergy-Efficient-Computing-using-Dynamic-Tuning-37-2048.jpg&f=jpg&w=240)


![BEM4I ApplicationApplication runtimeassemble_k[s]assemble_v[s]gmres_solve[s]print_vtu[s]main[s]default runtime 5.4 5.9 10.2 5.6 27.3static tuning runtime 9.8 10.6 6.1 2.4 29.0dynamic tuning runtime 7.0 7.2 7.9 2.1 24.3static savings [%] -82.3% -79.1% 40.5% 56.8% -6.2%dynamic savings [%] -30.6% -20.9% 23.2% 62.9% 10.9%Hardware: dual socket system with 2x12 CPU cores – ”standard HW” in HPC centresRegion description:• assemble_k and assemble_v – high utilization of vector units, extreme level ofoptimization – fully compute bound great utilization of both sockets and all cores• gmres_solve – uses DGEMV from MKL – memory bound, suffers on NUMA effect;this routine is more efficient on single socket• print_vtu – single threaded I/O and network bound region why stores data to afile on LUSTRE system”static": {"FREQUENCY": ”25", <--------- 2.5 GHz"NUM_THREADS": ”12", <--------- 12 OpenMP threads"UNCORE_FREQUENCY": ”22” } <--------- 2.2 GHz"assemble_k": {"FREQUENCY": "23","NUM_THREADS": "24","UNCORE_FREQUENCY": ”16”},"assemble_v": {"FREQUENCY": ”25","NUM_THREADS": "24","UNCORE_FREQUENCY": ”14”},"gmres_solve": {"FREQUENCY": ”17","NUM_THREADS": ”8","UNCORE_FREQUENCY": ”22”},"print_vtu": {"FREQUENCY": "25","NUM_THREADS": ”6","UNCORE_FREQUENCY": ”24”}](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fpop-webinar-readex-v2-200406164655%2f75%2fEnergy-Efficient-Computing-using-Dynamic-Tuning-40-2048.jpg&f=jpg&w=240)
![Compute node energyassemble_k[J]assemble_v[J]gmres_solve[J]print_vtu[J]main[J]default energy 1476 1484 2733 1142 6872static tuning energy 1962 2015 1366 420 5792dynamic tuning energy 1467 1462 1259 293 4531static savings [%] -33.8% -35.8% 50.0% 63.2% 15.7%dynamic savings [%] 0.6% 1.5% 53.9% 74.3% 34.1%BEM4I ApplicationLarge energy savings is combination of optimal HW settings and runtime savingsdue to mitigation of NUMA effect by optimal settings of OpenMP threading• Without savings in runtime caused by similar application will• Energy savings approx. 15 – 20%• Runtime savings approx. -15%”static": {"FREQUENCY": ”25", <--------- 2.5 GHz"NUM_THREADS": ”12", <--------- 12 OpenMP threads"UNCORE_FREQUENCY": ”22” } <--------- 2.2 GHz"assemble_k": {"FREQUENCY": "23","NUM_THREADS": "24","UNCORE_FREQUENCY": ”16”},"assemble_v": {"FREQUENCY": ”25","NUM_THREADS": "24","UNCORE_FREQUENCY": ”14”},"gmres_solve": {"FREQUENCY": ”17","NUM_THREADS": ”8","UNCORE_FREQUENCY": ”22”},"print_vtu": {"FREQUENCY": "25","NUM_THREADS": ”6","UNCORE_FREQUENCY": ”24”}](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fpop-webinar-readex-v2-200406164655%2f75%2fEnergy-Efficient-Computing-using-Dynamic-Tuning-41-2048.jpg&f=jpg&w=240)




![Application parameters tuning of the ESPRESO50% - 66% against ”reasonable” settings86% against the worst case0501001502002503000 500 1000 1500 2000 2500 3000 3500Energyconsumption[kJ]Configuration indexthe “reasonable” settingsthe optimal settings9 parameters3840 combinations• FETI METHOD 2x• PRECONDITIONER 5x• ITERATIVE SOLVER TYPE 2x• HFETI type 2x• NON-UNIFORM PARTS 6x• REDUNDANT LAGRANGE 2x• SCALING 2x• B0_TYPE 2x• ADAPTIVE PRECISION 2x](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fpop-webinar-readex-v2-200406164655%2f75%2fEnergy-Efficient-Computing-using-Dynamic-Tuning-46-2048.jpg&f=jpg&w=240)



The document outlines a methodology for dynamic tuning of high-performance computing (HPC) applications to improve energy efficiency and resource utilization. It discusses the Readex project, which develops tools for automatic tuning of applications, and evaluates the impact of tuning on energy consumption and performance across various applications. Key findings include significant energy savings achieved through dynamic tuning, particularly with the bem4i application, and the importance of adjusting hardware parameters to optimize performance.

























![Hardware parameter tuningBehavior of the simple application with two kernels• Low computational intensity – DGEMV• High computational intensity – DGEMM• Tuning of three parameters• Core frequency• Uncore frequency• Number of OpenMP threads• Visualized by RADAR....Low CI (DGEMV) High CI (DGEMM)10 threads2.2 GHz UCF1.2 GHz CF12 threads1.2 GHz UCF2.5 GHz CFStatic tuning for both kernels12 threads2.2 GHz UCF2.4 GHz CFComputenodeenergyconsumption[J]CPU core frequency [GHz] CPU core frequency [GHz] CPU core frequency [GHz]Computenodeenergyconsumption[J]Computenodeenergyconsumption[J]Note: runtime of both kernels was equal for default settingsTwo kernels with1:1 workload ratioEnergyconsumptionEnergysavingsDefault settings 2017J - -Static optimal 1833J 179J 9%Dynamic optimal 1612J 221J 12%Total savings - 400J 20%](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fpop-webinar-readex-v2-200406164655%2f75%2fEnergy-Efficient-Computing-using-Dynamic-Tuning-26-2048.jpg&f=jpg&w=240)


![Tuning of COMPUTE bound workload• behavior of the platform when running memory bound workload• under 145 W (TDP level, no power cap)• three different power cap levels 100 W, 80 W and 60 W.3,268s 3,268s3,903s3,903s7,409s3,577s7,693s4,379s3,653s363,4J311,8J 311,8J285,4J304,2J271,6J290,0J0501001502002503003504002,54,56,58,510,512,51,0 1,2 1,4 1,6 1,8 2,0 2,2 2,4 2,6 2,8 3,0Energyconsumption[J]Runtime[s]Frequency [GHz]Tuningof computebound region under 80W power capEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - min UCFEXP7 - DVFS & UCF - max UCFEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - min UCFEXP7 - DVFS & UCF - max UCF8.5% energy savings8.4% time savings12.9% energy savings10.9% time extension3,268s3,450s 3,450s7,411s3,293s4,378s7,698s363,4J344,4J 344,4J300,4J293,0J305,4J271,0J297J0501001502002503003504002,54,56,58,510,512,51,0 1,2 1,4 1,6 1,8 2,0 2,2 2,4 2,6 2,8 3,0Energyconsumption[J]Runtime[s]Frequency [GHz]Tuningof computebound region under 100W power capEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - min UCFEXP7 - DVFS & UCF - max UCFEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - min UCFEXP7 - DVFS & UCF - max UCF14.9% energy savings4.5% time savings21.3% energy savings21.1% time extension3,268s4,944s 4,944s7,410s4,849s4,477s7,692s4,565s 4,606s363,4J296J295,0J268,0J303,0J270,4J0501001502002503003504002,54,56,58,510,512,51,0 1,2 1,4 1,6 1,8 2,0 2,2 2,4 2,6 2,8 3,0Energyconsumption[J]Runtime[s]Frequency [GHz]Tuningof computebound region under 60W power capEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - max UCFEXP7 - DVFS & UCF - min UCFEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - max UCFEXP7 - DVFS & UCF - min UCF9.1% energy savings9.4% time savings](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fpop-webinar-readex-v2-200406164655%2f75%2fEnergy-Efficient-Computing-using-Dynamic-Tuning-29-2048.jpg&f=jpg&w=240)
![3,268s3,450s 3,450s7,411s3,293s4,378s7,698s363,4J344,4J344,4J300,4J293,0J305,4J271,0J297J0501001502002503003504002,54,56,58,510,512,51,0 1,2 1,4 1,6 1,8 2,0 2,2 2,4 2,6 2,8 3,0Energyconsumption[J]Runtime[s]Frequency [GHz]Tuning of compute bound region under 100W power capEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - min UCFEXP7 - DVFS & UCF - max UCFEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - min UCFEXP7 - DVFS & UCF - max UCF14.9% energy savings4.5% time savings21.3% energy savings21.1% time extensionTuning of COMPUTE bound workload under 100W power cap](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fpop-webinar-readex-v2-200406164655%2f75%2fEnergy-Efficient-Computing-using-Dynamic-Tuning-30-2048.jpg&f=jpg&w=240)
![3,268s3,268s3,903s 3,903s7,409s3,577s7,693s4,379s3,653s363,4J311,8J 311,8J285,4J304,2J271,6J290,0J0501001502002503003504002,54,56,58,510,512,51,0 1,2 1,4 1,6 1,8 2,0 2,2 2,4 2,6 2,8 3,0Energyconsumption[J]Runtime[s]Frequency [GHz]Tuning of compute bound region under 80W power capEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - min UCFEXP7 - DVFS & UCF - max UCFEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - min UCFEXP7 - DVFS & UCF - max UCF8.5% energy savings8.4% time savings12.9% energy savings10.9% time extensionTuning of COMPUTE bound workload under 80W power cap](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fpop-webinar-readex-v2-200406164655%2f75%2fEnergy-Efficient-Computing-using-Dynamic-Tuning-31-2048.jpg&f=jpg&w=240)
![3,268s4,944s 4,944s7,410s4,849s4,477s7,692s4,565s 4,606s363,4J296J295,0J268,0J303,0J270,4J0501001502002503003504002,54,56,58,510,512,51,0 1,2 1,4 1,6 1,8 2,0 2,2 2,4 2,6 2,8 3,0Energyconsumption[J]Runtime[s]Frequency [GHz]Tuning of compute bound region under 60W power capEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - max UCFEXP7 - DVFS & UCF - min UCFEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - max UCFEXP7 - DVFS & UCF - min UCF9.1% energy savings9.4% time savingsTuning of COMPUTE bound workload under 60W power cap](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fpop-webinar-readex-v2-200406164655%2f75%2fEnergy-Efficient-Computing-using-Dynamic-Tuning-32-2048.jpg&f=jpg&w=240)

![Tuning of memory bound workload• behavior of the platform when running memory bound workload• under 145 W (TDP level, no power cap)• three different power cap levels 100 W, 80 W and 60 W.1,886s1,959s1,886s197,6J188,2J188,2J148,6J115,2J170J145,6J0501001502002501,82,32,83,33,84,34,85,35,81,0 1,2 1,4 1,6 1,8 2,0 2,2 2,4 2,6 2,8 3,0Energyconsumption[J]Runtime[s]Frequency [GHz]Tuningofmemorybound region under 100W power capEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - UCF = 2.2GHzEXP7 - DVFS & UCF - max UCFEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - UCF = 2.2GHzEXP7 - DVFS & UCF - max UCF38.7% energy savings3.6% time extension1,886s1,920s1,890s1,959s197,6J153,2J114,4J146,0J0501001502002501,82,32,83,33,84,34,85,35,81,0 1,2 1,4 1,6 1,8 2,0 2,2 2,4 2,6 2,8 3,0Energyconsumption[J]Runtime[s]Frequency [GHz]Tuningofmemorybound region under 80W power capEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - UCF = 2.2GHzEXP7 - DVFS & UCF - max UCFEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - UCF = 2.2GHzEXP7 - DVFS & UCF - max UCF21.6% energy savings3.6% time extension1,886s2,475s 2,475s1,945s2,397s1,925s197,6J147,8J116,2J115,0J0501001502002501,82,32,83,33,84,34,85,35,81,0 1,2 1,4 1,6 1,8 2,0 2,2 2,4 2,6 2,8 3,0Energyconsumption[J]Runtime[s]Frequency [GHz]Tuningofmemorybound region under 60W power capEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - UCF = 2.2GHzEXP7 - DVFS & UCF - max UCFEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - UCF = 2.2GHzEXP7 - DVFS & UCF - max UCF22.2% energy savings22.2% time savings](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fpop-webinar-readex-v2-200406164655%2f75%2fEnergy-Efficient-Computing-using-Dynamic-Tuning-34-2048.jpg&f=jpg&w=240)
![1,886s1,959s1,886s197,6J188,2J188,2J148,6J115,2J170J145,6J0501001502002501,82,32,83,33,84,34,85,35,81,0 1,2 1,4 1,6 1,8 2,0 2,2 2,4 2,6 2,8 3,0Energyconsumption[J]Runtime[s]Frequency [GHz]Tuning of memory bound region under 100W power capEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - UCF = 2.2GHzEXP7 - DVFS & UCF - max UCFEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - UCF = 2.2GHzEXP7 - DVFS & UCF - max UCF38.7% energy savings3.6% time extensionTuning of memory bound workload under 100W power cap](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fpop-webinar-readex-v2-200406164655%2f75%2fEnergy-Efficient-Computing-using-Dynamic-Tuning-35-2048.jpg&f=jpg&w=240)
![1,886s1,920s1,890s1,959s197,6J153,2J114,4J146,0J0501001502002501,82,32,83,33,84,34,85,35,81,0 1,2 1,4 1,6 1,8 2,0 2,2 2,4 2,6 2,8 3,0Energyconsumption[J]Runtime[s]Frequency [GHz]Tuning of memory bound region under 80W power capEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - UCF = 2.2GHzEXP7 - DVFS & UCF - max UCFEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - UCF = 2.2GHzEXP7 - DVFS & UCF - max UCF21.6% energy savings3.6% time extensionTuning of memory bound workload under 80W power cap](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fpop-webinar-readex-v2-200406164655%2f75%2fEnergy-Efficient-Computing-using-Dynamic-Tuning-36-2048.jpg&f=jpg&w=240)
![1,886s2,475s 2,475s1,945s2,397s1,925s197,6J147,8J116,2J115,0J0501001502002501,82,32,83,33,84,34,85,35,81,0 1,2 1,4 1,6 1,8 2,0 2,2 2,4 2,6 2,8 3,0Energyconsumption[J]Runtime[s]Frequency [GHz]Tuning of memory bound region under 60W power capEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - UCF = 2.2GHzEXP7 - DVFS & UCF - max UCFEXP0 - defaultEXP1 - default PcapEXP5 - DVFS under PcapEXP6 - UCF under PcapEXP7 - DVFS & UCF - UCF = 2.2GHzEXP7 - DVFS & UCF - max UCF22.2% energy savings22.2% time savingsTuning of memory bound workload under 60W power cap](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fpop-webinar-readex-v2-200406164655%2f75%2fEnergy-Efficient-Computing-using-Dynamic-Tuning-37-2048.jpg&f=jpg&w=240)


![BEM4I ApplicationApplication runtimeassemble_k[s]assemble_v[s]gmres_solve[s]print_vtu[s]main[s]default runtime 5.4 5.9 10.2 5.6 27.3static tuning runtime 9.8 10.6 6.1 2.4 29.0dynamic tuning runtime 7.0 7.2 7.9 2.1 24.3static savings [%] -82.3% -79.1% 40.5% 56.8% -6.2%dynamic savings [%] -30.6% -20.9% 23.2% 62.9% 10.9%Hardware: dual socket system with 2x12 CPU cores – ”standard HW” in HPC centresRegion description:• assemble_k and assemble_v – high utilization of vector units, extreme level ofoptimization – fully compute bound great utilization of both sockets and all cores• gmres_solve – uses DGEMV from MKL – memory bound, suffers on NUMA effect;this routine is more efficient on single socket• print_vtu – single threaded I/O and network bound region why stores data to afile on LUSTRE system”static": {"FREQUENCY": ”25", <--------- 2.5 GHz"NUM_THREADS": ”12", <--------- 12 OpenMP threads"UNCORE_FREQUENCY": ”22” } <--------- 2.2 GHz"assemble_k": {"FREQUENCY": "23","NUM_THREADS": "24","UNCORE_FREQUENCY": ”16”},"assemble_v": {"FREQUENCY": ”25","NUM_THREADS": "24","UNCORE_FREQUENCY": ”14”},"gmres_solve": {"FREQUENCY": ”17","NUM_THREADS": ”8","UNCORE_FREQUENCY": ”22”},"print_vtu": {"FREQUENCY": "25","NUM_THREADS": ”6","UNCORE_FREQUENCY": ”24”}](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fpop-webinar-readex-v2-200406164655%2f75%2fEnergy-Efficient-Computing-using-Dynamic-Tuning-40-2048.jpg&f=jpg&w=240)
![Compute node energyassemble_k[J]assemble_v[J]gmres_solve[J]print_vtu[J]main[J]default energy 1476 1484 2733 1142 6872static tuning energy 1962 2015 1366 420 5792dynamic tuning energy 1467 1462 1259 293 4531static savings [%] -33.8% -35.8% 50.0% 63.2% 15.7%dynamic savings [%] 0.6% 1.5% 53.9% 74.3% 34.1%BEM4I ApplicationLarge energy savings is combination of optimal HW settings and runtime savingsdue to mitigation of NUMA effect by optimal settings of OpenMP threading• Without savings in runtime caused by similar application will• Energy savings approx. 15 – 20%• Runtime savings approx. -15%”static": {"FREQUENCY": ”25", <--------- 2.5 GHz"NUM_THREADS": ”12", <--------- 12 OpenMP threads"UNCORE_FREQUENCY": ”22” } <--------- 2.2 GHz"assemble_k": {"FREQUENCY": "23","NUM_THREADS": "24","UNCORE_FREQUENCY": ”16”},"assemble_v": {"FREQUENCY": ”25","NUM_THREADS": "24","UNCORE_FREQUENCY": ”14”},"gmres_solve": {"FREQUENCY": ”17","NUM_THREADS": ”8","UNCORE_FREQUENCY": ”22”},"print_vtu": {"FREQUENCY": "25","NUM_THREADS": ”6","UNCORE_FREQUENCY": ”24”}](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fpop-webinar-readex-v2-200406164655%2f75%2fEnergy-Efficient-Computing-using-Dynamic-Tuning-41-2048.jpg&f=jpg&w=240)




![Application parameters tuning of the ESPRESO50% - 66% against ”reasonable” settings86% against the worst case0501001502002503000 500 1000 1500 2000 2500 3000 3500Energyconsumption[kJ]Configuration indexthe “reasonable” settingsthe optimal settings9 parameters3840 combinations• FETI METHOD 2x• PRECONDITIONER 5x• ITERATIVE SOLVER TYPE 2x• HFETI type 2x• NON-UNIFORM PARTS 6x• REDUNDANT LAGRANGE 2x• SCALING 2x• B0_TYPE 2x• ADAPTIVE PRECISION 2x](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fpop-webinar-readex-v2-200406164655%2f75%2fEnergy-Efficient-Computing-using-Dynamic-Tuning-46-2048.jpg&f=jpg&w=240)

