- Notifications
You must be signed in to change notification settings - Fork2
Dynamic Loop Self-scheduling For Load Balancing (DLS4LB) is an MPI-Based load balancing library. It is implemented in C and FORTRAN (F90) programming languages to support scientific applications executed on High Performance Computing (HPC) systems.
License
unibas-dmi-hpc/DLS4LB
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Dynamic Loop Self-scheduling For Load Balancing (DLS4LB) is an MPI-Based load balancing library. It is implemented in C and FORTRAN (F90) programming languages to support scientific applications executed on High Performance Computing (HPC) systems.
DLS4LB used reproduction as a means for the verification of the implementation of the scheduling techniques in the library [1].Therefore, scheduling techniques implemented in DLS4LB conform to their original implementation as published decades ago.DLS4LB is developed by Ali Mohammed (ali.mohammed@unibas.ch) and Florina M. Ciorba (florina.ciorba@unibas.ch)It includes 14 scheduling techniques and support for Simulation-assisted scheduling Algorithm Selection (SimAS) as the fifteen DLS technique [2].That is, the most efficient DLS technique will be selected dynamically during execution based on simulation results. Please read publication [2] for more details.
DLS4LB library is based on the DLB_tool developed by Ricolindo L. Carino (rlc@erc.msstate.edu) and Ioana Banicescu (ioana@cse.msstate.edu), see publication [3].It is modified and extended by Ali Mohammed (ali.mohammed@unibas.ch) to support more scheduling techniques and some bug fixes, see publication [1].
The DLS4LB parallelizes and load balances scientific applications that contain simple parallel loops (1D loops) or nested parallel loops (2D loops). The tool employs a master-worker model where workers request work from the master whenever they become free. The master serves work requests and assigns workers chunks of loop iterations according to the selected DLS technique. The master also doubles as a worker and executes chunks of loop iterations when it is not serving any requests.
The DLS4LB library supports fourteen scheduling techniques:
- Straightforward static scheduluing (STATIC)
- Self Scheduling (SS)
- Fixed size chunk (FSC)
- Modified fixed size chhunk (mFSC)
- Guided self-scheduling (GSS)
- Trapezoid self-scheduling (TSS)
- Factoring (FAC)
- Weighted factoring (WF)
- Adaptive weighted factoring (AWF)
- Adaptive weighted factoring - Batch (AWF-B)
- Adaptive weighted factoring - Chunk (AWF-C)
- Adaptive weighted factoring - Batch with scheduling overhead (AWF-D)
- Adaptive weighted factoring - Chunk with scheduling overhead (AWF-E)
- Adaptive factoring (AF)
The DLS4LB is designed to load balance scientific applications with minimum code changes. Below is a simple example on how to use the DLS4LB tool for simple 1D loops (Type I) and nested 2D loops (Type II).
Type I! begin i-loopdo i=1,N... i-iterateend do! end i-loop
Type II! begin j-loopdo j=1,M... part of j-iterate! begin i-loopdo i=1,N(j)... i-iterate of j-iterateend do! end i-loop... part of j-iterateend do! end j-loop
Dynamic load balancing is achieved in the applicationby invoking DLS routines to manipulate the loop indices.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
For Type I loops, the application must be modified as:
use DLSinclude 'mpif.h'...type (infoDLS) infointeger method, iStart, iSizeinteger iItersdouble precision iTime...method = ... (choice of loop scheduling technique)call DLS_Setup (MPI_COMM_WORLD, info)call DLS_StartLoop (info, 1, N, method)do while ( .not. DLS_Terminated(info) )call DLS_StartChunk (info, iStart, iSize)! begin i-loop, new extentsdo i=iStart, iStart+iSize-1 ! 1,N... i-iterateend do! end i-loopcall DLS_EndChunk (info)end do ! while ( .not. DLS_Terminated(info) )call DLS_EndLoop (info, iIters, iTime)
where:
DLS_Setup (MPI_COMM_WORLD, info)Initializes a dynamic load balancing environment onMPI_COMM_WORLD. Information about this environment isstored in "info" (see type declaration infoDLS below)
DLS_StartLoop (info, 1, N, method)The synchronization point to start loop execution.(1, N) is the loop range, and "method" is a user-specified index [1..9] for the loop scheduling method(see array DLS_LongName() below)
DLS_Terminated(info)returns .TRUE. if all loop iterates have beenexecuted
DLS_StartChunk (info, iStart, iSize)Returns a range for a chunk of iterates, startingat "iStart", with size "iSize"
do i=iStart, iStart+iSize-1 ! 1,N... i-iterateend doThe original "serial" code, but now executinga chunk of iterates instead of "i=1,N"
DLS_EndChunk (info)Indicates the end of execution of a chunk of iterates
DLS_EndLoop (info, iIters, iTime)The synchronization point to end loop execution."iIters" is the number of iterates done by the callingprocessor, and "iTime" is its cost (in seconds)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
For Type II loops, the application must be modified as:
use DLSinclude 'mpif.h'...type (infoDLS) iInfo, jInfointeger iComm, jComm ! communicatorsinteger iMethod, iStart, iSize, jMethod, jStart, jSizeinteger iIters, jItersdouble precision iTime, jTimeinteger coordinator...iMethod = ... (choice of loop scheduling technique)jMethod = ... (choice of loop scheduling technique)coordinator = 0call DLS_GroupSetup (MPI_COMM_WORLD, coordinator, jInfo, iInfo)call DLS_StartLoop (jInfo, 1, M, jMethod)do while ( .not. DLS_Terminated(jInfo) )call DLS_StartChunk (jInfo, jStart, jSize)
! begin j-loop code, new extentsdo j=jStart, jStart+jSjze-1 ! 1,M... part of j-iterate
call DLS_StartLoop (iInfo, 1, N(j), iMethod)do while ( .not. DLS_Terminated(iInfo) )call DLS_StartChunk (iInfo, iStart, iSize)! begin i-loop, new extentsdo i=iStart, iStart+iSize-1 ! 1,N... i-iterate of j-iterateend doend i-loopcall DLS_EndChunk (iInfo)end do ! while ( .not. DLS_Terminated(iInfo) )call DLS_EndLoop (iInfo, iIters, iTime)
... part of j-iterateend do! end j-loop
call DLS_EndChunk (jInfo)end do ! while ( .not. DLS_Terminated(jInfo) )call DLS_EndLoop (jInfo, jIters, jTime)
where:
DLS_GroupSetup (MPI_COMM_WORLD, coordinator, jInfo, iInfo)Splits MPI_COMM_WORLD into non-overlapping "iComm"s anda "jComm" comprised of the "coordinator" and theforemen (rank 0 of the iComms). Similar to DLS_Setup(),DLS_GroupSetup() initializes load balancing environmentsin "jComm" and "iComm", keeping the data in "jInfo" and"iInfo" respectively. Graphically:
[1] Ali Mohammed
[2] Florina M. Ciorba
[3] Ricolindo L. Carino
[4] Ioana Banicescu
[1] Ali Mohammed, Ahmed Eleliemy, Florina M. Ciorba, Franziska Kasielke, Ioana Banicescu, “An Approach for Realistically Simulating the Performance of Scientific Applications on High Performance Computing Systems”, Future Generation Computer Systems (FGCS), 2019,https://doi.org/10.1016/j.future.2019.10.007 (Open Access)
[2] Ali Mohammed and Florina M. Ciorba, “SimAS: A simulation-assisted approach for the scheduling algorithm selection under perturbations”, In Concurrency and Computation: Practice and Experience (CPE), 2020,https://onlinelibrary.wiley.com /doi/full/10.1002/cpe.5648 (Open Access)
[3] Ricolindo L. Cariño and Ioana Banicescu. A Tool for a Two-level Dynamic Load Balancing Strategy in Scientific Applications. Scalable Computing: Practice and Experience, 8(3), 2007.
About
Dynamic Loop Self-scheduling For Load Balancing (DLS4LB) is an MPI-Based load balancing library. It is implemented in C and FORTRAN (F90) programming languages to support scientific applications executed on High Performance Computing (HPC) systems.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.