Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork52
Make batch.id robust to warning messages from sbatch#314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
HenrikBengtsson commentedSep 5, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
You might want to create an issue for this that reference this pull request. At least I tend to miss or forget about PR-only issues over time, and I know other repos like an issue with details where discussions can take place. Now, I had a look at Line 44 in7763ed8
That captures both stdout and stderr. It could be that it would be more sane if those two are captured separately, e.g. something like $ sbatch --time=00:01:00 --mem=128G --wrap="hostname"> stdout.log2> stderr.log what does $ cat stdout.log$ cat stderr.log output? With Slurm, you should see "Submitted batch job ..." in |
bwcompton commentedSep 10, 2025
Nice! It looks like you can do a cleaner fix than what I came up with. |
HenrikBengtsson commentedSep 12, 2025
I've been prototyping with a more flexible @bwcompton , although it'sfuture.batchtools and notbatchtools, could you please give it a spin? If it works, then I can propose this newer To try it out, install it as: remotes::install_github("futureverse/future.batchtools",ref="develop") and then try it as: library(future)plan(future.batchtools::batchtools_slurm)f<- future({ Sys.info()[["nodename"]] })v<- value(f)print(v) Seehttps://future.batchtools.futureverse.org/reference/batchtools_slurm.html for how to control sbatch resource specifications. |
bwcompton commentedSep 12, 2025 via email
Thanks! I tried your code snippet, and it can't find slurm_script. AmI missing something?Brad library(future)> plan(future.batchtools::batchtools_slurm)> f <- future({ Sys.info()[["nodename"]] })> v <- value(f)Error: Future (<unnamed-1>) of class BatchtoolsSlurmFuture expired, which indicates that it crashed or was killed. Post-mortem details:Future state: ‘running’Batchtools status: ‘defined’, ‘expired’, ‘submitted’Slurm job ID: [n=1] ‘43049392’Slurm 'squeue' job status: <empty>Slurm 'sacct' job status: 43049392|FAILED|1:0The last few lines of the logged output:Session information:- timestamp: 2025-09-12 14:36:54+0000- hostname: cpu016- Rscript path:/var/spool/slurm/slurmd/job43049392/slurm_script: line 20: Rscript:command not found- Rscript version:/var/spool/slurm/slurmd/job43049392/slurm_script: line 21: Rscript:command not found- Rscript library paths:Rscript -e 'batchtools::doJobCollection()' ...- job name: 'jobb9686511f15322fe9d3568b52c61e703'- job log file:'/work/pi_cschweik_umass_edu/marsh_mapping/salt-marsh-mapping/.future/20250912_143653-MdNjCh/batchtools_1109039380/logs/jobb9686511f15322fe9d3568b52c61e703.log'- job uri: '/work/pi_cschweIn addition: Warning messages:1: batchtools::waitForJobs(..., timeout = 2592000) returned FALSE2: In delete.BatchtoolsFuture(future) : Will not remove batchtools registry, because the status of thebatchtools was ‘error’, ‘defined’, ‘expired’, ‘submitted’ and futurebackend argument 'delete' is ‘on-success’:‘/work/pi_cschweik_umass_edu/marsh_mapping/salt-marsh-mapping/.future/20250912_143653-MdNjCh/batchtools_1109039380’> …On Fri, Sep 12, 2025 at 12:40 AM Henrik Bengtsson ***@***.***> wrote: *HenrikBengtsson* left a comment (mlr-org/batchtools#314) <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmlr-org%2Fbatchtools%2Fpull%2F314%23issuecomment-3283634371&data=05%7C02%7Cbcompton%40eco.umass.edu%7Cd88f15012e12443945a508ddf1b680d0%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C638932488358099900%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=XSa2XbNjVl2pEPjiaPXiUSbZBlFeMfOnjzt%2BWHgnS4c%3D&reserved=0> I've been prototyping with a more flexible runOSCommand() in my *future.batchtools* package. It has new arguments stdout and stderr with default stdout = TRUE and stderr = TRUE (backward compatible). The special stderr = NA with capture stderr separately from stdout.@bwcompton <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbwcompton&data=05%7C02%7Cbcompton%40eco.umass.edu%7Cd88f15012e12443945a508ddf1b680d0%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C638932488358131610%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=vTGFeNjU5AT84YQi7cImnSLAgErc%2FccVCsEk7YVPUX8%3D&reserved=0> , although it's *future.batchtools* and not *batchtools*, could you please give it a spin? If it works, then I can propose this newer runOSCommand() version to *batchtools*, plus adjustments to makeClusterFunctionSlurm(), which I also patch in *future.batchtools*. To try it out, install it as: remotes::install_github("futureverse/future.batchtools", ref="develop") and then try it as: library(future) plan(future.batchtools::batchtools_slurm)f <- future({ Sys.info()[["nodename"]] })v <- value(f) print(v) Seehttps://future.batchtools.futureverse.org/reference/batchtools_slurm.html <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Ffuture.batchtools.futureverse.org%2Freference%2Fbatchtools_slurm.html&data=05%7C02%7Cbcompton%40eco.umass.edu%7Cd88f15012e12443945a508ddf1b680d0%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C638932488358143281%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=u%2BGwQhkidnbRGl%2B7%2BEhIoDeTG3Ad4EtkBfRWJW8y1PQ%3D&reserved=0> for how to control sbatch resource specifications. — Reply to this email directly, view it on GitHub <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmlr-org%2Fbatchtools%2Fpull%2F314%23issuecomment-3283634371&data=05%7C02%7Cbcompton%40eco.umass.edu%7Cd88f15012e12443945a508ddf1b680d0%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C638932488358155056%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=GUmEXkgvmyPWWMJhaP1xc%2Btun4fBFDFOIhHQGag6NsQ%3D&reserved=0>, or unsubscribe <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAUIZI2VZFGCGL3NUUAKXKZL3SJFD3AVCNFSM6AAAAAB7G4SBCGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEOBTGYZTIMZXGE&data=05%7C02%7Cbcompton%40eco.umass.edu%7Cd88f15012e12443945a508ddf1b680d0%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C638932488358166124%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=Qg2x%2FPh2UFME%2FwznQtEl24kLIxJHVvEeoj7KqoM0d0I%3D&reserved=0> . You are receiving this because you were mentioned.Message ID: ***@***.***> |
HenrikBengtsson commentedSep 12, 2025
R is not available by default in your jobs. Do you load an environment module to get access to R? If so, specify that I'm in the This is illustrated also inhttps://future.batchtools.futureverse.org/reference/batchtools_slurm.html If you use other techniques to make R available in a job script, please let me know |
HenrikBengtsson commentedSep 12, 2025
That said, the job submission itself actually worked! It's just that R didn't start, which means the patch works |
bwcompton commentedSep 12, 2025
Great news that the patch works. Here's what I've got in my template, |
HenrikBengtsson commentedSep 12, 2025
Unfortunately not possible today; you'd have to create your own custom template file. But, I've createdfutureverse/future.batchtools#99 to add support for this too. Stay tuned. |
bwcompton commentedSep 12, 2025
Okay, I'll look forward to future.batchtools in the future. Do you have what you need from me to address the original issue in this PR? |
HenrikBengtsson commentedSep 12, 2025
Yes, I'd like to have a success story over atfuture.batchtools first, ideally some mileage from other users, and have my patch "ripe" enough, before I "bug" thebatchtools maintainers here. So, I'll ping you again over atfutureverse/future.batchtools#99 for you to test. Thanks. |
bwcompton commentedSep 12, 2025
Deal! Thanks so much for your help with this. |
I ran into a crazy bug today:
getJobStatusgave mebatch.id = "that". It turns out that when I requested a large amount of memory,sbatchreturned this um, helpful message:clusterFunctionsSlurmwas pulling the 4th word of the first line, which should have been the Slurm jobid, but instead was "that". It wanted, of course, the last line.This really isn't a bug in
batchtools, as the sysops inserted an informational message in a crazy place. But I suspect if the smart, on the ball people at the UMass Unity cluster are doing this, others probably are too. It'd be nice forbatchtoolsto be robust to such shenanigans. Alternatively, I suppose it could throw an error if batch.id is non-numeric and print the message fromsbatch.My suggested change looks for a line beginning with "Submitted batch job" and pulls the 4th word as the
batch.id.I've tested this change against the following:
as well as against real-life
submitJobscalls, both with and without the informational message.