[MINC-development] Draft proposal for MINC 2.0

David Gobbi minc-development@bic.mni.mcgill.ca
Thu, 2 Jan 2003 10:18:10 -0500 (EST)


It looks like a good proposal overall, but I have a couple comments
particularly about Compression and Labels:

Compression (page 7):
  "uncompressing the whole dataset into a temporary disk space"

  This doesn't sound reasonable to me, since disk access is so slow
  (in fact disk speed has hardly changed over 5 years while the
  amount of core memory in most PCs has climbed from 32MB to 256MB
  and CPU speeds have increased from 300 MHz to 2 GHz).  There is
  a lot of CPU power available to do the decompression and a fair bit
  of memory for cacheing the uncompressed data.

  Simple math: a 2GB data set, written out at a typical disk write
  speed of around 10MB/s takes 200s or 3 min to write and then about
  another 1.5 min to read for a total time of about 5 min before the
  user can view the data.

  A cacheing scheme can be used that keeps a chunk of data in memory and
  uncompresses data from the file on-the-fly into the cache as necessary.
  There will have to be a way of telling the volume_io library what the
  maximum cache  size can be (and ideally, the volume_io library should be
  able to figure out for itself how much core memory is available).  I
  don't know enough about volume_io to say how well its current cacheing
  scheme fits into this.

  And the tradeoff between compression and random-access speed is not
  so bad given a fast CPU and cacheing of the uncompressed data.
  Saying that good compression is incompatible with random access is
  going a bit far, for example movie files are compressed and have key
  frames to make it possible to do random access.  Likewise, JPEG images
  are broken into 16x16 macroblocks.

Label format (page 7):

  Specific voxel values cannot be used as labels, since a voxel must
  contain the data as well as the label.  Maybe you meant that some of
  the bits in the voxel would be used to store the label, e.g. the file
  could contain 16-bit voxels which have 12 bits for data and 4 bits for
  labels.  But 4 bits means only 15 different available labels which
  isn't very many, assuming that those 4 bits are even available.

  I think it's best if the labels are stored in a separate variable
  within the NetCDF file.  Some reasons for this are
  1) it is not guaranteed that there will be any extra voxel bits
     available for labels, on top of those bits required for image data
  2) label data will compress extremely well if kept separate from the
     data (because labels are generally large blobs of the same color)
  3) if labels are stored in a separate variable, that makes it very easy
     for the MINC tools that don't care about labels to ignore the labels
     and simply copy them from the input file to the output file

Slice scaling (page 8):

  It will be necessary to at least read per-slice scaled MINC files in
  order to maintain backwards compatibility with existing MINC files.  But
  I think that all files that are written by the new MINC tools should use
  just one scale factor for the entire data set.  If per-slice scaling is
  applied to the data before it is compressed, the compression will
  probably suffer as a result.

I hope these comments are useful, and that I'm not stepping on anyone's
toes.

 - David

-- 
  David Gobbi, MSc                dgobbi@imaging.robarts.ca
  Advanced Imaging Research Group
  Robarts Research Institute, University of Western Ontario

On Mon, 23 Dec 2002, John G. Sled wrote:

>
>
> Hi everyone,
>
> Last time I was in Montreal, I had a chance to meet with Jason and
> Bert and put together an outline for MINC 2.0.  Based on that
> discussion, Leila and I have written a proposal outlining the
> requirements and design of MINC 2.0.  I've attached it with this
> email.  Please comment.  I'm hoping that this will be the basis of
> a meeting in January in which we can all get together to hammer out the
> details.
>
> cheers,
>
> John
>