[MINC-development] MINC 2.0 draft

Tue, 14 Jan 2003 01:21:40 -0500

Howdy,

I've finally read the proposal and the various emails about it.

The changes proposed for the library code itself all seem reasonable.
I worry a bit about changing the on-disk layout, since that will 
add complexity that can never be removed.

On Tue, Jan 07, 2003 at 04:46:54PM -0500, Leila Baghdadi wrote:

> I also like the idea of an API definition. I think it will simplify things 
> and make our development process faster which is one of our main 
> priorities.

I wonder if it would also be valuable to sketch out the kinds of
applications that use MINC and what the requirements of each would be.
It might be helpful in thinking about the on-disk layout and algorithms
appropriate to the lowest level of MINC.

I can only think of a few kinds of applications.

1. Simple processing.  Scan through a file performing a computation on
each voxel (or a neighbourhood of, say, 5x5x5 voxels) and perhaps writing the
result to a new file.  Could be multiple input files.
Think of mincmath or minccalc.

2. Visualization.  Typically wants planar slices through the data.
Read-only.

3. Complicated processing.  Input files might be read in arbitrary
order.  For example, one input may be scanned and locations mapped
through a transform into the second volume -- imagine computing an image
similarity measure between one image and a transformed version of a
second image.

I'm surely oversimplifying life here.  What kinds of access-patterns
occur in your applications?

Now I'm trying to figure out how the three listed requirements
(large datasets, multi-resolution, and compression) impact these
kinds of applications.

1. Simple processing.  Can handle large files with little memory
using voxel_loop (or a generalization).  Reading a file in multiresolution
form seems likely to require more disk seeking, which would be detrimental.
Ditto for "blocked" files unless you got lucky and all the inputs
have the same block structure.  Ditto for compression that is block
oriented.

2. Visualization.  For large files, a multiresolution scheme that
allows you to navigate through a low-res version and progressively
fills in detail seems like a good idea.  It's less clear to me whether
a block-structured file would help.  Would it?  Compression would
likely be detrimental.  

3. Complicated processing.  If the volume that you need to access in an
unpredictable fashion won't fit in memory, you'd pretty much be forced
to do a lot of disk seeks, I suspect.  If the file was multi-resolution
or compressed, you'd expect even more slowdowns.  I suspect that
whether the file was block structured or not wouldn't matter much.
I'm just guessing here -- does anyone have hard data on this?

In summary, my limited understanding suggests that applications
in the first category are best served by the simplest disk layout
such as the current MINC.  Visualizations would likely benefit
from a multi-res layout if the file is too large to fit in 
memory.  And category #3 is doomed no matter what you do.

To help better balance the inevitable tradeoffs, we should also
consider how often each type of application is used.  Clearly
visualization is important and it is speed-sensitive.  For the
"processing" applications, the ones that I can think of either
fall into the "simple" category (e.g. most of the minc tools)
or they keep the entire volume(s) in memory (minctracc, for
example).

Could we possibly come up with an extension to MINC
that is both forward- and backward-compatible?  For example,
use the current MINC format with some extra variables that
multi-res-aware visualization tools can take advantage of?
[Of course the compression would break forwards-compatibility,
but presumably one could use a "mincuncompress" utility to
get an uncompressed new-style MINC file.]

-Steve