[MINC-development] MINC2 file with floating-point voxels and slice normalization

Wed Mar 20 22:09:03 EDT 2013

On Wed, Mar 20, 2013 at 9:12 PM, Andrew Janke <a.janke at gmail.com> wrote:
>> Yup. voxel_loop is designed to be memory efficient (back in the days
>> when 32 MB was a lot of memory, or something - it's a bit hazy now).
>
> To me this was a good decision and has allowed MINC to stand the test
> of time. Over time we have of course increased this memory "chunk"
> size but the ability to mash through 100 files and not have to read
> them all into memory just to average/compute across them is something
> that "the others" don't have.

Agreed; of course the issue I raised is only with the writing of a
volume, not with reading them. Obviously when running mincaverage on
100s of volumes you have no hopes of reading them all into memory;
however, most of these tools create only a single output volume (I
suppose minccalc could create an arbitrary number of them in
principle). One crude way of course could be to write a float output
volume (assuming that the slice scaling on those will be turned off as
per the beginning of this discussion), and then discretize that;
assuming you have sufficient temp space that would be fairly
transparent. Could be handled by wrapper scripts; would be ugly either
way.

>> I believe that voxel_loop is designed to do no more than an image at a
>> time. Perhaps it could be modified to slurp up more data in one shot -
>> not sure how hard that would be.
>
> image? I think you mean slice?

On that note - how does voxel_loop divvy things up on >3D volumes?
Still per 2D slice? Or per unit of the slowest-varying dimension
(i.e., a volume)?

>> You would end up discretizing the data a second time - that's not ideal.
>
> Agree. It's not me that wants this change, it's for this reason that
> I'll keep suggesting that people who think they don't want slice
> scaling buy two disks and use float. :)

Yah - with a rapidly growing number of 1000 cubed data sets, that
would get a little pricey :) Definitely agree that
double-discretization should be avoided.

>> Is it because minc does not support integer, boolean
>> or label data? For years, I have felt that this is the big gap in minc
>> and should really be addressed. If you know that the data is integer
>> or labels (IDs for which proximity in value means nothing, so
>> interpolating between 10 and 12 to get 11 is nonsense), don't do nasty
>> things to it like scale it to real values or interpolate it.
>
> Agree. And the shift to HDF was a big step in this direction, one of
> the initial problems was that MINC2 was still very strongly tied to
> netCDF data types. Vlad has a working port of minc_lite that is HDF
> only that we are using in production here now. Once we are happy with
> its stability we should think pretty seriously about releasing this
> code as "MINC2 proper" as it will allow all the recoding that will be
> needed to add such discrete types into MINC.

Clearly, label volumes are by far the most affected by the slice-based
scaling; however, I often get bitten by the same effect on
"continuous" data as well. Especially masked volumes, or volumes with
poor dynamic range; you might get single slices that are suddenly
discretized differently, causing odd striping artifacts. Similarly, if
you have a volume with poor dynamic range and slice scaling that your
resample at an angle to the original slice direction, you may actually
start seeing that in the data. My point really is: discretizing data
is a necessary evil, but if we need to do it, I would prefer to apply
the evil in a consistent way across a volume.

But I'm a bit confused about the label volume argument. Presumably if
MINC would properly handle label volumes (and Vlad mentioned that
support for that already exists in MINC2), the slice-based scaling
issue would have to be dealt with? Isn't a central issue of properly
handling label volumes the ability to turn off slice-based scaling?

> For now pretty much everything can be done with labels but you need to
> be aware of tools like resample_labels from conglomerate.
>
>> I suspect that the code changes would be easier and
>> would also benefit the big data people (like Andrew showing off with
>> his monster volumes :)).
>
> I'll bet it's not only me.

Definitely not :)

> Change mincaverage/mincmath/minccalc such that I can't perform a
> calculation across 10x 12GB volumes because I'd need 120GB of RAM?  I
> think not...  it'd be quicker to just use niftilib.

Memory is cheap, much like disks - perhaps think SSD ;-)

More seriously, I never suggested we take any functionality away; just
that he slice-based scaling (or any scaling in the case of label
volumes) be under user control. Hey - it will even make mincheader
output easier to read.

But from what I gather this would (at minimum) require control over
the size of the sliding window in voxel_loop; possibly even on a
per-volume basis. I have no idea how feasible that would be, but it
sounds daunting (and that is recalling the meeting at MNI where Peter
first presented his new invention "voxel_loop", and very few - if any
- of us were able to follow at the time :-) )

-- A