[MINC-development] Welcome

Steve ROBBINS minc-development@bic.mni.mcgill.ca
Tue, 19 Nov 2002 01:19:55 -0500


On Mon, Nov 11, 2002 at 02:39:04PM -0500, Robert VINCENT wrote:
> Hi everyone,
> 
> Welcome to the list.  I thought I'd start off by posting a few guesses
> about the goals for MINC development.
> 
> To test my understanding of the current state of things, I'd like to see
> if there is consensus about the following claims:
> 
> 1. MINC could be better documented.

I'm sure you'll get no disagreement on this point.  Andrew started
extending the existing documentation (I've lost the url for this,
though).  I finished off the missing manpages for all the tools
in MINC when I made the Debian packages.  I keep meaning to
put them in CVS, but have not yet done so.

My personal wish about documentation, however, is to better
document the actual file structure.  Yes, it is based on NetCDF,
but there seem to be some corner cases, e.g. "vector" volumes
must have the vector dimension last.  And it may be just me, but 
I have a hard time working out exactly how the image scaling works.

Like Andrew, I feel that MINC would benefit by moving back towards
netcdf standards.  One reason this appeals to me is that we could
possibly take advantage of the *already existing* bindings that
netcdf has for various languages, e.g. C++, Java, Python, and perl.
The thinner the "minc layer" on top of bare netcdf, the less code one
has to maintain for all these other languages.

Peter has raised one big difference between netcdf and MINC, namely
that MINC supports independent scaling on each slice of an image,
while netcdf "conventions" allow only a single overall scale for
packed data.  I wonder if Peter has some feeling about whether
we could work with the netcdf folks to get the MINC view blessed.  I'm
not sure this really buys us much in the short run, really, since the
netcdf "convention" for packed data is not actually implemented
in the core library -- it is up to the application to support such
a convention.  At least, that's my understanding.  However, in the
long run, a layer of code supporing minc-style rescaling for packed
data may appear in netcdf.  Heck, we could supply them such a layer ;-)



> 3. MINC requires extensions to support files larger than core.

Notwithstanding your later disclaimer that the term "MINC"
refers to both the file structure, libraries, and apps, I think
this point needs some precision.  I don't think there is anything
in the file format, nor in netcdf, nor even in MINC that prevents
one from processing a file larger than core.  I'm not too sure about
volume_io, but I think it also allows this (?).

Isn't it also true that all the minc programs are carefully
written (using voxel_loop) so that processing files larger than
core is feasible?

So, what's left?  ;-)



> 4. MINC requires extensions to support files larger than 2 GB.

It seems that netcdf itself needs modifications to allow this
in full generality.  The netcdf FAQ is a little cagey about this,
saying that "large files can be written" (with certain restrictions
like the last variable may be extend past the 2GB limit, but it
must start before that mark, etc).  

What escapes me, though, is: are there restrictions on how you
can USE such a large file?  Netcdf stores the offset to the beginning
of the variable in a signed int (thus the 2GB restriction).  But
what about accessing a given slice?  Is seeking going on -- and
if so, are these offsets correctly calculated using off_t in netcdf?
If yes, this is good news for us!



> 6. Given #3, block-structuring MINC data will improve file access speed.

... for some access patterns.  It may degrade other access patterns,
e.g. traditional voxel_loop processing.


 
> What other major or minor tasks are obvious to people?
 
On a much longer time scale / more speculative note: should
MINC remain based on netcdf, or is something like HDF more
suitable?  I don't know, but some of the proposed additions
like compression or sparse data storage suggest that the netcdf
view of the world (i.e. data is a dense matrix and fast random
access is key) is not fitting well.  It also doesn't fit well
for polyhedral data, of course.


That's enough babbling for now,
-Steve