[MINC-development] Welcome

Tue, 19 Nov 2002 09:32:13 -0500

On Tue, Nov 19, 2002 at 01:19:55AM -0500, Steve ROBBINS wrote:

> Peter has raised one big difference between netcdf and MINC, namely
> that MINC supports independent scaling on each slice of an image,
> while netcdf "conventions" allow only a single overall scale for
> packed data.  I wonder if Peter has some feeling about whether
> we could work with the netcdf folks to get the MINC view blessed.  I'm
> not sure this really buys us much in the short run, really, since the
> netcdf "convention" for packed data is not actually implemented
> in the core library -- it is up to the application to support such
> a convention.  At least, that's my understanding.  However, in the
> long run, a layer of code supporing minc-style rescaling for packed
> data may appear in netcdf.  Heck, we could supply them such a layer ;-)

Perhaps we should take the easier route and revert to the netcdf
scaling convention (i.e. one scale factor for the volume).  All of the
programs that use volume_io work this way already.  Although having
separate scale factors for each slice makes good use of the available
numerical precision, if one considers digitization to be a source of
noise then this adds noise nonuniformly across the slices.  Another
disadvantage of this scheme is that reformatting the data, for example
from sagittal to transverse and back, is not an identity operation.

> > 3. MINC requires extensions to support files larger than core.
> 
> Notwithstanding your later disclaimer that the term "MINC"
> refers to both the file structure, libraries, and apps, I think
> this point needs some precision.  I don't think there is anything
> in the file format, nor in netcdf, nor even in MINC that prevents
> one from processing a file larger than core.  I'm not too sure about
> volume_io, but I think it also allows this (?).

Volome_io does allow one to work with files larger than core; however,
the layout of the MINC file and the implementation of caching makes
this slow for complicated 3D algorithms and visualization.  The
algorithms that do work well are mostly the ones that could have been
implemented with voxel_loop.

> 
> Isn't it also true that all the minc programs are carefully
> written (using voxel_loop) so that processing files larger than
> core is feasible?
> 
> So, what's left?  ;-)
> 
All of the complicated 3D algorithms and visualization tools that 
we haven't written yet -- plus a few that we have ;)

> 
> > 4. MINC requires extensions to support files larger than 2 GB.
> 
> It seems that netcdf itself needs modifications to allow this
> in full generality.  The netcdf FAQ is a little cagey about this,
> saying that "large files can be written" (with certain restrictions
> like the last variable may be extend past the 2GB limit, but it
> must start before that mark, etc).  
> 
> What escapes me, though, is: are there restrictions on how you
> can USE such a large file?  Netcdf stores the offset to the beginning
> of the variable in a signed int (thus the 2GB restriction).  But
> what about accessing a given slice?  Is seeking going on -- and
> if so, are these offsets correctly calculated using off_t in netcdf?
> If yes, this is good news for us!

Yes, recent versions of NetCDF can be compiled to use 64 bit versions of 
seek so that even 32 bit programs can access the whole volume. 

> 
> > 6. Given #3, block-structuring MINC data will improve file access speed.
> 
> ... for some access patterns.  It may degrade other access patterns,
> e.g. traditional voxel_loop processing.

I don't think that voxel_loop would be affected by this since the loop
could proceed through the voxels in their natural (i.e. file) order for
most cases.

> > What other major or minor tasks are obvious to people?
>  
> On a much longer time scale / more speculative note: should
> MINC remain based on netcdf, or is something like HDF more
> suitable?  I don't know, but some of the proposed additions
> like compression or sparse data storage suggest that the netcdf
> view of the world (i.e. data is a dense matrix and fast random
> access is key) is not fitting well.  It also doesn't fit well
> for polyhedral data, of course.

I don't know the answer to this, but now would be a good time to make
these decisions given that MINC 3.0 is likely a long way off.

cheers,

John