[MINC-users] Announce - qbatch 1.0 - Execute shell command lines in parallel (serial farm) on SGE/PBS clusters

Andrew Janke a.janke at gmail.com
Wed May 11 21:54:24 EDT 2016


> Whereas qbatch(python) expects commands to be in a file, this makes it
> difficult to use things like globs with qbatch:
>
> $ qbatch —logfile blah.log — do_something_args.pl *.mnc
>
> How hard would it be to support a single command after “—“? I note
> that you can pipe things to qbatch via echo but this is a real pain to
> use in things like perl and python scripts. OK in shell though.
>
> We do support job lists input via STDIN
>
> #From this:
> $ for i in *.mnc; do process_something.sh -args *.mnc; done
> #To this
> $ for i in *.mnc; do echo process_something.sh -args *.mnc; done | qbatch -
>
> Would this work for you?

In most cases yes. If I'm just bashing away at the command line it'll
work. It gets hard when you try to use this incantation style in a
perl/python. (but OK in sh/bash).

ie you'd have to do this:

   system("for i in *.mnc; do echo process_something.sh -args *.mnc;
done | qbatch -")

Which is fraught with danger regarding quoting. Pipes are a pain
unless you are on a command line. It would mean that for me to use
this in a perl script I'd first have to create a commands.txt file (in
a temp dir -- and clean it up) and then call it. This to me seems
annoying as it's yet more tiny file everywhere. I also think it's
going to bite many users in the rear as how do you handle quoting?

print FH "mincmath \"dumb mincfile with space in name.mnc\"
\"Macintosh HD/out.mnc\"";

And then what happens if you are using a variable? just quote always?

print FH "mincmath \"$infile\" \"$outfile\"\n";

?

I much prefer using system's array format (in both perl and python) as
it gets around this. note that I still have to attempt to "fortify"
input commands as such:

   https://github.com/andrewjanke/qbatch/blob/master/qbatch#L59

There's no easy answer to this stuff.

> On our main PBS cluster the prolog and epilog scripts provide this for
> us, I’ve actually been scraping the net for prolog and epilog examples
> for SGE and yet to find them.

Short version: don't. I remember trying this also and giving up and
going back to doing it myself.

> Would you be amenable to allowing —header-file and —footer-file
> inserts into the joblist? I’m not sure about hard coding that kind of
> status lines into the tool.

By all means, so long as I can make it default in my ~/.config/qbatch.cfg :)

> if we were using PBS’s awful qstat text outputs which are indeed
> truncated, that would be the case, however, we’re parsing PBS’s XML
> qstat output, which gives the full jobname properly (at least on our
> version of 4.2.something)

Ah, OK, makes sense.

> I hadn’t thought at all about job re-running. Currently the dependency
> support relies on the cluster, so I’m not sure of the heuristics of a
> “soft failure” that allows a re-run, does it get back it’s old job
> number? If it does, this should work, if it doesn’t I presume the
> cluster kills the dependent job. Will need to investigate this
> further.

Your grepping for job ID's should get you around this. There are also
interesting PBSPro installs on HPC systems that will crash the
scheduler if you specify a jobID as a dependency that doesn't exist!
it's a known issue that they are "working on". This is about 5 years
since I reported the bug...



a


More information about the MINC-users mailing list