You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Sep 25, 2023. It is now read-only.
See a NERSC staff member for the correct values to gain access to the NERSCjobs database.
Step 2. Generate job summary files
Then we generate summary json files for each Darshan log. This takes asignificant amount of time because it involves opening every Darshan log, thencollecting metrics from across the system that correspond to that job. To dothis in parallel, use the includedparallel_summarize_job.sh script, e.g.,
$ ./parallel_summarize_job.sh edison 2>&1 | tee -a summarize_jobs-edison.logmkdir: created directory '/global/project/projectdirs/m888/glock/tokio-year/summaries/edison'Generating /global/project/projectdirs/m888/glock/tokio-year/summaries/edison/glock_ior_id3906633_2-14-63024-14939811182217632593_1.jsonGenerating /global/project/projectdirs/m888/glock/tokio-year/summaries/edison/glock_ior_id4048967_2-19-64883-9509271909828150823_1.jsonGenerating /global/project/projectdirs/m888/glock/tokio-year/summaries/edison/glock_ior_id4015752_2-18-63772-1376825852187540237_1.json...
This script is just a parallel wrapper aroundsummarize_job.py and is invokedon each darshan log with options similar to the following:
Thissummarize_job.py script retrieves and indexes data from each connector,but it does not strive to synthesize cross-connector metrics such as coveragefactors. That occurs in analysis that we will perform later on.
Step 3. Collate job summaries
We then convert the collection of per-job summary json files into a normalizedcollection of records in CSV format.
Thenormalize_job_summaries.py script takes any number of json files generatedbysummarize_job.py, finds all of the fields that were populated, and createsa Pandas DataFrame from all of those records. Each record that is missing oneor more keys fromsummarize_job.py simply has that field left as a NaN.
The--output argument allows you to specify a file name to which thenormalized data should be written in CSV format. If the--output file namecontains a%s, this is replaced by the date range represented in thenormalized data.