[MINC-development] -O3 breaks N3

Sat Apr 14 06:47:04 EDT 2012

On Fri, Apr 13, 2012 at 6:23 PM, Vladimir S. FONOV
<vladimir.fonov at gmail.com> wrote:
> Hello,
>
>
> On 12-04-13 12:03 PM, Claude LEPAGE wrote:
>>>
>>> looks like compiling EBTKS and/or N3 with -O3 flag (default for release
>>> build in CMake) breaks nu_correct completely (i.e it doesn't converge).
>>> Any ideas why it happens and how to fix it, apart from the obvious
>>> suggestion of using -O2 flag of course.
>>
>>
>> I prefer not to touch that code. It's quite a mess with the templates.
>> Basically, it's very poorly designed C++ code. Perhaps running valgrind
>> on it would give you clues where to look. Otherwise, be happy to use -O2.

[ of course I could argue that the design of the original (now EBTKS)
template code was actually quite elegant and working beautifully with
Sun's compilers 20 years back (pre-gcc); and that it's since been
broken due to gcc's flight-of-the-bumblebee development; but why
bother ;-) ]

I think there are two issues here: (1) what to do with the legacy C++
template code which has been causing plenty of headaches over the
years; and (2) how the MINC/CIVET/whatever code behaves under
influence of changing environments (OS, compilers, math libraries,
etc).

Re: (1), I think that ideally EBTKS would be written out of the
MINC/etc codebase altogether. I suspect that would primarily affect
N3, so it at least it would require a substantial rewrite of N3.
However, I do think there may be a shortcut to be taken here, which
would be to leave the code pretty much as it is, but just work the C++
template pieces out of it. That could probably be done with a
manageable amount of copy/paste/search/replace; basically
instantiating the templates (only) for the types needed (which
probably aren't many in the end). Question is, who? Does John need a
sabbatical perhaps? ;-)

> Also, it turns out that even when nu_correct produces results it is slightly
> different between two versions. I assume that different comes from using
> different versions of gcc  or math libraries...
>
> Since nu_correct is usually the first thing which is applied to the mri scan
> it raises some interesting questions on reproducibility of our research...

So re:(2): a very valid point, but not specific to N3. Some years ago
I actually ran some tests and iirc N3 was not the only bit of code
subject to changing results as a function of -OX compiler flags;
various other pieces of the MINC (and CIVET) codebase also produce
different results when built with different flags or against different
library versions. In practice one shouldn't mix and match software
builds within the same study population of course; but that still
leaves the question of "now I reprocessed my 500 scans with this new
build and all my results are different", which I am sure has happened
to all of us (and perhaps unknowingly). I quarantine all software
builds but even with that you can get bitten by changing
OS/hardware/libraries - and of course you really don't want to be
stuck with old builds as bugs are also fixed all the time...

Regression testing, anyone? Would be nice to set up some specific
tests that could be part of the MINC install/distribution; but I fear
that byte-for-byte identical results will be difficult to obtain,
which leaves the question of how to judge the impact of any
differences that will show up. Ideas anybody?

-- A