[MINC-users] Announce - qbatch 1.0 - Execute shell command lines in parallel (serial farm) on SGE/PBS clusters

Gabriel A. Devenyi gdevenyi at gmail.com
Wed May 11 22:06:04 EDT 2016


On Wed, May 11, 2016 at 9:54 PM, Andrew Janke <a.janke at gmail.com> wrote:

> > Whereas qbatch(python) expects commands to be in a file, this makes it
> > difficult to use things like globs with qbatch:
> >
> > $ qbatch —logfile blah.log — do_something_args.pl *.mnc
> >
> > How hard would it be to support a single command after “—“? I note
> > that you can pipe things to qbatch via echo but this is a real pain to
> > use in things like perl and python scripts. OK in shell though.
> >
> > We do support job lists input via STDIN
> >
> > #From this:
> > $ for i in *.mnc; do process_something.sh -args *.mnc; done
> > #To this
> > $ for i in *.mnc; do echo process_something.sh -args *.mnc; done |
> qbatch -
> >
> > Would this work for you?
>
> In most cases yes. If I'm just bashing away at the command line it'll
> work. It gets hard when you try to use this incantation style in a
> perl/python. (but OK in sh/bash).
>
> ie you'd have to do this:
>
>    system("for i in *.mnc; do echo process_something.sh -args *.mnc;
> done | qbatch -")
>
> Which is fraught with danger regarding quoting. Pipes are a pain
> unless you are on a command line. It would mean that for me to use
> this in a perl script I'd first have to create a commands.txt file (in
> a temp dir -- and clean it up) and then call it. This to me seems
> annoying as it's yet more tiny file everywhere. I also think it's
> going to bite many users in the rear as how do you handle quoting?
>
> print FH "mincmath \"dumb mincfile with space in name.mnc\"
> \"Macintosh HD/out.mnc\"";
>
> And then what happens if you are using a variable? just quote always?
>
> print FH "mincmath \"$infile\" \"$outfile\"\n";
>
> ?
>
> I much prefer using system's array format (in both perl and python) as
> it gets around this. note that I still have to attempt to "fortify"
> input commands as such:
>
>    https://github.com/andrewjanke/qbatch/blob/master/qbatch#L59
>
> There's no easy answer to this stuff.
>


I see the problem with just allowing piping, we will definitely add the
"--" method I see the benefits for non-shell uses.


>
> > On our main PBS cluster the prolog and epilog scripts provide this for
> > us, I’ve actually been scraping the net for prolog and epilog examples
> > for SGE and yet to find them.
>
> Short version: don't. I remember trying this also and giving up and
> going back to doing it myself.
>
> > Would you be amenable to allowing —header-file and —footer-file
> > inserts into the joblist? I’m not sure about hard coding that kind of
> > status lines into the tool.
>
> By all means, so long as I can make it default in my ~/.config/qbatch.cfg
> :)
>

So far, we handle defaults via environment variables, we may add another
ENV var, or we might do a config, need to discuss.


>
> > if we were using PBS’s awful qstat text outputs which are indeed
> > truncated, that would be the case, however, we’re parsing PBS’s XML
> > qstat output, which gives the full jobname properly (at least on our
> > version of 4.2.something)
>
> Ah, OK, makes sense.
>
> > I hadn’t thought at all about job re-running. Currently the dependency
> > support relies on the cluster, so I’m not sure of the heuristics of a
> > “soft failure” that allows a re-run, does it get back it’s old job
> > number? If it does, this should work, if it doesn’t I presume the
> > cluster kills the dependent job. Will need to investigate this
> > further.
>
> Your grepping for job ID's should get you around this. There are also
> interesting PBSPro installs on HPC systems that will crash the
> scheduler if you specify a jobID as a dependency that doesn't exist!
> it's a known issue that they are "working on". This is about 5 years
> since I reported the bug...
>
>
>
> a
>


More information about the MINC-users mailing list