[MINC-users] minccomplete's completeness

Andrew Janke a.janke at gmail.com
Sat May 14 06:49:09 EDT 2016


Hi Alex (and others)

Peter and I talked about checksums in MINC many years back and I think
I had subsequent discussions with Bert. At the time we were looking at
creating md5 signatures of both the data and the header and a joint
md5, predominately for reasons of data integrity, a prototype of this
was implemented at one point but I don't think it ever made it into
the main distribution. No idea where it is now, possibly in a
directory of Peter Neelins somewhere.

Part of the discussion then turned to a better way of how to determine
if the file was complete and the best we could come up with is the
current "write a flag when done" approach. I have seen a similar thing
to the error you are seeing but I tended to find that it was related
to reading broken MINC1 files with MINC2. Your problem looks different
though. It seems to be missing data chunks?

There is no good way that I can think of to determine if a minc file's
data is "good" other than to write an MD5 at the time of write along
with the complete flag. From memory the plan was

   image:md5_data   -- md5 of the data part only, minccmp/mincdiff
could make use of it.
   image:md5_header -- md5 of just the header
   image:md5_file  -- md5 of all of it.

We could then add a -md5 option to minccmp to check that what you have
in your file is indeed what you had in RAM while writing it out. It
would make minccmp slower as we'd have to read through the whole file
but it would achieve your aim. In your case the call of mincmd5 would
then bomb on the broken file which would achieve your aim but also a
perhaps achieve a more noble goal also.

And then of course do we get all meta and also check the value of the
md5 after we have written it to make sure it's been written correctly?
and then again in case we read it wrong?

And should we restrict to md5?  image:checksum_data?

I'm all for this extra functionality, it seems a logical extension for
a file format that aims to be as reproducible as possible. There were
also plans to use this as part of checking the provenance of a MINC
file but in favour of pragmatism we went for :ident as implemented by
Bert many years ago.


a

PS: Claude, yes your suggestion is also be a good heuristic!

On 14 May 2016 at 04:35, Alex Zijdenbos <zijdenbos at gmail.com> wrote:
> Hi Claude,
>
> That's a useful idea; but this specific test doesn't actually find the
> errors in the volumes I have
> so still incomplete :-)
>
> -- A
>
> On Fri, May 13, 2016 at 4:25 PM, Claude LEPAGE <claude at bic.mni.mcgill.ca>
> wrote:
>
>> Alex,
>>
>> Here is what I've been doing to check for completeness of minc2 files
>> (for BigBrain 7404 slices):
>>
>> sub minc_valid {
>>
>>   my $input = shift;
>>
>>   my $ret = `h5ls -r $input`;
>>   return( 0 ) if( !( $ret =~ m/\/minc-2.0\/info/ ) );
>>   return( 0 ) if( !( $ret =~ m/\/minc-2.0\/image\/0\/image/ ) );
>>   return( 0 ) if( !( $ret =~ m/\/minc-2.0\/image\/0\/image-min/ ) );
>>   return( 0 ) if( !( $ret =~ m/\/minc-2.0\/image\/0\/image-max/ ) );
>>
>>   return( 1 );
>> }
>>
>> h5ls is a HDF5 tool.
>>
>> Claude
>>
>> >
>> > Hi all,
>> >
>> > I managed to generate a large number of broken MINC files;
>> possibly/likely
>> > due to a filesystem problem. The processes that created them (e.g.,
>> > mincaverage) did not produce any warnings and completed successfully; in
>> > addition, minccomplete tells me that the files are complete.
>> >
>> > Unfortunately, trying to read these files throws HDF5 and miicv errors
>> (see
>> > below) and they are obviously corrupt.
>> >
>> > I am thinking that it would be useful to complete minccomplete by having
>> it
>> > actually test-read the data, such that it would report on file integrity?
>> > This would make it easy to find these kinds of corruptions - and could
>> even
>> > tack that end the end of scripts to make sure outputs are intact. I'm
>> > currently using 'mincstats -quiet -min' to locate them, but it seems the
>> > natural place for this test would actually be minccomplete.
>> >
>> > -- A
>> >
>> > HDF5-DIAG: Error detected in HDF5 (1.8.9) thread 0:
>> >   #000: H5Dio.c line 174 in H5Dread(): can't read data
>> >     major: Dataset
>> >     minor: Read failed
>> >   #001: H5Dio.c line 449 in H5D_read(): can't read data
>> >     major: Dataset
>> >     minor: Read failed
>> >   #002: H5Dchunk.c line 1729 in H5D_chunk_read(): unable to read raw data
>> > chunk
>> >     major: Low-level I/O
>> >     minor: Read failed
>> >   #003: H5Dchunk.c line 2760 in H5D_chunk_lock(): data pipeline read
>> failed
>> >     major: Data filters
>> >     minor: Filter operation failed
>> >   #004: H5Z.c line 1120 in H5Z_pipeline(): filter returned failure during
>> > read
>> >     major: Data filters
>> >     minor: Read failed
>> >   #005: H5Zdeflate.c line 125 in H5Z_filter_deflate(): inflate() failed
>> >     major: Data filters
>> >     minor: Unable to initialize object
>> > mincstats (from miicv_get): Can't read dataset /minc-2.0/image/0/image
>> > _______________________________________________
>> > MINC-users at bic.mni.mcgill.ca
>> > http://www.bic.mni.mcgill.ca/mailman/listinfo/minc-users
>> >
>> _______________________________________________
>> MINC-users at bic.mni.mcgill.ca
>> http://www.bic.mni.mcgill.ca/mailman/listinfo/minc-users
>>
>>
> _______________________________________________
> MINC-users at bic.mni.mcgill.ca
> http://www.bic.mni.mcgill.ca/mailman/listinfo/minc-users


More information about the MINC-users mailing list