You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
This is accompany code associated with the paper submission 'SPEL: Software Tool for Porting ELM with OpenACC in a Function Unit Test Framework'. This software tool builds off of previous work done by Dali Wang and Yao Cindy to create a robust method for developing the E3SM Land Model (ELM) onto GPUs.
SPEL contains the folders:
./SourceFiles/: folder contains the ELM Fortran source files and the GPU-ready ELM test modules
./scripts/: folder contains SPEL Python scripts and few Fortran modules to generate ELM test modules
./modified-files/: folder created to hold optimized versions of source files -- created as neeeded
./scripts/script-output: created at first run to hold temporary files
./unit-tests/: Created at first run and will contain directories for all cases.
SPEL setup and instructions
Currently, these SPEL Python scripts are used to:
extract and prepare ELM files to run and compile without MPI and netcdf.
modify ELM routines to remove modules that cannot or are undesired to run on the GPU.
Perform automatic OpenACC acceleration using either theroutine directive orparallel loop directives.
Understand code by generating simple call tree and dependency graph of the modules.
write Fortran routines to generate input/output needed to initialize variables and verify the results and a needed Makefile
Note: The current intended workflow is to generate a unit test withopt = False and then re-run withopt = True as desired. Currently,opt = True bypasses many of the other processing scripts as it is meant to be run multiple times as the user recognizes and resolves issues with the code.
Setup : In scripts directory, editmod_config.py with specific file layout as needed andUnitTestforELM.py with a list of subroutines to parse (only parent subroutine needs to be listed) and a name for the case. While running withpython3 UnitTestforELM.py, a directory will be created in./unit-test/{casename} to contain the Function Unit Test program.
A Makefile will automatically be generated for the chosen subroutines to test. elm_initializeMod.F90 andmain.F90 will be modified by the scripts touse and allocate only the variables that are needed.
readMod.F90,writeMod.F90, andverificationMod.F90 are generated by the scriptsto create the appropriate I/O and validation functions.duplicationMod.F90 is generated to duplicate the same variables as many times as desired at run-time.
Get Reference Data: Compile ELM withwriteMod.F90 and place subroutinewrite_vars() before subroutine used for the Unit Test.
In addition to the scripts, themain.F90 file was created to effectively replacethe lnd_cpl_mct and elm_drv routines and is where all testing is done and configurations should be done. Compilation of unit-test only requires NV Fortran compiler, CUDA 10+, and potentially LAPACK.
make command will create theelmtest.exe which is then run with./elmtest.exe [numSetsOfSites] [clump-pproc] wherenumSetsOfSites controls the number of unique sites used for the reference output to be computed andclump-pproc (optional default = 1) are the number of clumps to have per mpi task.
The Makefile defaults to compiling with CPU-only and with debug mode. To change to OpenACC edit the Makefile to compile withFC_FLAGS_ACC
Unit Test Example:
mpirun -n1 ./elmtest.exe 2. ->> Perform a Unit Test for 2 sets of the 42 Ameriflux sites on one mpi task.
Notes on LakeTemperature Example
Example reference simulation data for LakeTemperature come with SPEL calledE3SM_constants.txt andoutput_LakeTemperature_vars.txt. These must be in same directory as executable.
elm_initializationMod.F90 andmain.F90 are hard-coded with SPEL output to avoid having to make changes for this example. (will update SPEL to handle this automatically in the future)
Since the optimizations are only semi-automatic and require some familiarity to fully implement, an optimized version of LakeTemperature is provided in the./scripts directory calledLakeTemperature.OPT.F90.
An original version of LakeTemperatureMod.F90 is in the main directory calledCPU-LakeTemperatureMod.F90
SPEL Script Description
edit_file.py :
Contains functions that are intended to be used on entire .F90 files rather thanon specific subroutines. These functions were created for the purpose of preparingELM files to work with the unit-test. The user must provide a list of the modulesand subroutines/functions that need to be removed, and the python functions can thencomment them out (entire subroutines for some modules) with a '!#py ' comment.If a module is encountered that is not present in the SourceFiles/ directory,The user will be prompted if this module is necessary and add it to the omit list if notand exit if it is.
The file keeps track of any mods used in the file that have not been processed andwill recursively process them. Currently, the user must manually keep track ofwhat has been processed in a separate file.
There are special comments for BeTR and FATES additions to ELM to allow for easysearch and replace to enable them.
analyze_subroutines.py :
Contains a classSubroutine designed to hold all the relevant info and functionsneeded to analyze subroutines for the unit-test and openACC, such as derived typesand components read/written to and any other subroutines called(and their variable info).
The functions will add !$acc routine info to each subroutine (if not present)and do necessary edits to the subroutines required for GPU compilation. Mostly,this means changing subroutine calls containing array bounds. The python functionsoperate recursively on all subroutines called by the main one.
TheexamineLoops function is used for further optimization to go beyond the naiveimplementation. There is also functionality to automatically replace bounds allocationswith the compact filter allocation if needed.Currently, ifexamineLoops is called with theadd_acc enabled, it will onlyaccelerate loops that do not have a race condition detected (reduction operation).Loops that have that detected are listed in Yellow for the user to examine afterwards.
UnitTestforELM.py :
This has themain python function that calls the others.This is where the user sets which subroutines they wish to create a Unit Test for usingthesub_name_list list variable. Ifopt = True then SPEL will attempt to parse and accelerate subroutineson a loop-by-loop level, some features that are incompatible with the "routine" directive.Then,a Makefile, verification routines, I/O routines and other files are created thatare necessary for a Unit Test are created.
DerivedType.py :
Contains DerivedType Class used for processing the ELM data types
LoopConstructs.py :
Contains Loop Class used for processing and modifying loops in ELM functions
errorAnalysis.py :
Functions used for analyszing output fromverficationMod.F90
mod_config.py :
Configure location of source files and essential files.
process_associate.py :
Holds function to obtain global variables in associate list.
variable_analysis.py :
Functions to find global variables that aren't derived types.
write_routines.py :
Functions that write needed .F90 files (e.g., duplicateMod.F90, Makefile)
interfaces.py :
Functions that help to resolve which subroutine in an interface is actually being called.
utilityFunctions.py :
Holds funtions that may be used in many different other modules.
Notes
SPEL has been developed mainly on the Summit computer at the Oak Ridge National Laboratory. Summit has 4,608 computing nodes, most of which contain two 22-core IBM POWER9 CPUs, six 16-GB NVIDIA Volta GPUs, and 512 GB of shared memory. The software environment includes NVIDIA HPC 21.3 and several libraries: spectrum-mpi (10.4), NetCDF (4.8), pnetcdf(1.12), HDF (1.10), and CUDA (11.1).
SPEL uses CUDA Fortran (NVIDIA HPC package) to manage memory.
Instructions to create an example test module (LakeTemperature) on Summit
cd SPEL_OpenACC/scripts python3 UnitTestforELM.py # It creates a LakeTemperature module # at SPEL_OpenACC/unit-tests/LakeTemparature cd SPEL_OpenACC/unit-tests/LakeTemperature
====== compilation on Summit
module load nvhpc #Load appropriate module file current it is nvhpc/21.3 make clean; make #Make CPU-version test module
Make GPU-version test module
Change the compiler flag in makefile. From: FC_FLAGS = $(FC_FLAGS_DEBUG) $(MODEL_FLAGS) To: FC_FLAGS = $(FC_FLAGS_ACC) $(MODEL_FLAGS) make clean; make
Go to the test module directory and copy (reference) data for the test module (LakeTemperature)
cd SPEL_OpenACC/unit-tests/LakeTemperature cp ../../*.txt . # copy reference/input data (E3SM_constants.txt # and output_LakeTemperature_vars.txt)
Run the test module (both CPU Version and GPU version)
./elmtest.exe 2 # run LakeTemperature Module using 2 sets of 42 # AmeriFlux datasets, total 84 sites
(For a larger dataset (> 4 sets), we will need to increasethe CUDA heapsize via cudeDeviceSetLimit accordingly)
PS: Verification procedure is done by inserting the “call update_vars_LakeTemperature(flag, tag) (from verificationMod.F90) into Main.F90 after the function call (such as “LakeTemperature”) to output a file that hold all variables modified by LakeTemperature. We conduct the same verification procedure for both CPU-version and GPU-version code and produce two output files, then we call “errorAnalysis.py” to analysis the differences between these two outputs.
The final optimized LakeTemparture.F90 code is saved at SPEL_OpenACC/scripts/LakeTemperatureMod.OPT.F90. It can be used to replace the LakeTemperatureMod.F90 in the test module directory to create a new elmtest.exe.
About
Python tool designed for E3SM Land Model to create unit-tests and code-insertion of GPU compiler directives.