[MINC-development] Draft proposal for MINC 2.0
David Gobbi
minc-development@bic.mni.mcgill.ca
Thu, 2 Jan 2003 10:18:10 -0500 (EST)
It looks like a good proposal overall, but I have a couple comments
particularly about Compression and Labels:
Compression (page 7):
"uncompressing the whole dataset into a temporary disk space"
This doesn't sound reasonable to me, since disk access is so slow
(in fact disk speed has hardly changed over 5 years while the
amount of core memory in most PCs has climbed from 32MB to 256MB
and CPU speeds have increased from 300 MHz to 2 GHz). There is
a lot of CPU power available to do the decompression and a fair bit
of memory for cacheing the uncompressed data.
Simple math: a 2GB data set, written out at a typical disk write
speed of around 10MB/s takes 200s or 3 min to write and then about
another 1.5 min to read for a total time of about 5 min before the
user can view the data.
A cacheing scheme can be used that keeps a chunk of data in memory and
uncompresses data from the file on-the-fly into the cache as necessary.
There will have to be a way of telling the volume_io library what the
maximum cache size can be (and ideally, the volume_io library should be
able to figure out for itself how much core memory is available). I
don't know enough about volume_io to say how well its current cacheing
scheme fits into this.
And the tradeoff between compression and random-access speed is not
so bad given a fast CPU and cacheing of the uncompressed data.
Saying that good compression is incompatible with random access is
going a bit far, for example movie files are compressed and have key
frames to make it possible to do random access. Likewise, JPEG images
are broken into 16x16 macroblocks.
Label format (page 7):
Specific voxel values cannot be used as labels, since a voxel must
contain the data as well as the label. Maybe you meant that some of
the bits in the voxel would be used to store the label, e.g. the file
could contain 16-bit voxels which have 12 bits for data and 4 bits for
labels. But 4 bits means only 15 different available labels which
isn't very many, assuming that those 4 bits are even available.
I think it's best if the labels are stored in a separate variable
within the NetCDF file. Some reasons for this are
1) it is not guaranteed that there will be any extra voxel bits
available for labels, on top of those bits required for image data
2) label data will compress extremely well if kept separate from the
data (because labels are generally large blobs of the same color)
3) if labels are stored in a separate variable, that makes it very easy
for the MINC tools that don't care about labels to ignore the labels
and simply copy them from the input file to the output file
Slice scaling (page 8):
It will be necessary to at least read per-slice scaled MINC files in
order to maintain backwards compatibility with existing MINC files. But
I think that all files that are written by the new MINC tools should use
just one scale factor for the entire data set. If per-slice scaling is
applied to the data before it is compressed, the compression will
probably suffer as a result.
I hope these comments are useful, and that I'm not stepping on anyone's
toes.
- David
--
David Gobbi, MSc dgobbi@imaging.robarts.ca
Advanced Imaging Research Group
Robarts Research Institute, University of Western Ontario
On Mon, 23 Dec 2002, John G. Sled wrote:
>
>
> Hi everyone,
>
> Last time I was in Montreal, I had a chance to meet with Jason and
> Bert and put together an outline for MINC 2.0. Based on that
> discussion, Leila and I have written a proposal outlining the
> requirements and design of MINC 2.0. I've attached it with this
> email. Please comment. I'm hoping that this will be the basis of
> a meeting in January in which we can all get together to hammer out the
> details.
>
> cheers,
>
> John
>