[Loris-dev] Error "not a unique file" inserting segmentation files in LORIS

Cecile Madjar cecile.madjar at mcin.ca
Tue Sep 10 11:59:12 EDT 2019


Hi Alfredo,

Sorry, I just came back from vacation. I don't know if you still have
issues inserting the data.

Just in case and to let you know, the MD5SUM is computed on the MINC files
based on the following list of MINC headers:

   - patient:full_name
   - study:start_time
   - patient:identification
   - patient:sex
   - patient:age
   - patient:birthdate
   - study_instance_uid
   - series_description
   - processing:intergradient_rejected

The code with the list of headers to use to create an MD5sum is in the
MRI.pm library (function called compute_hash
<https://github.com/aces/Loris-MRI/blob/442860c60d66dca13b6523ad1c087e1b28628f26/uploadNeuroDB/NeuroDB/MRI.pm#L1162>
).

When I had to develop the insertion of processed DWI images into the
database, most of the headers used were identical to the native image which
resulted in the duplicate error message you got. To bypass that, I created
a new MINC header (under a processing category) in the processed images
that I wanted to insert with some parameters specific to that processed
image and modified the code in the compute_hash
<https://github.com/aces/Loris-MRI/blob/442860c60d66dca13b6523ad1c087e1b28628f26/uploadNeuroDB/NeuroDB/MRI.pm#L1162>
function to include that new header when computing the MD5sum. You would
just need to add an additional if statement like in line 1198 of MRI.pm
<https://github.com/aces/Loris-MRI/blob/442860c60d66dca13b6523ad1c087e1b28628f26/uploadNeuroDB/NeuroDB/MRI.pm#L1198>
.

Let me know if you have questions or if something is not clear.

Best,

Cécile

On Fri, Aug 30, 2019 at 3:35 PM Christine Rogers, Ms. <
christine.rogers at mcgill.ca> wrote:

> Hi Alfredo,
>
> Great, this is helpful to know. If I understand you - the filename is the
> same for all the packages of processed data you are trying to upload, for
> various subject-visits.
> e.g. you may have organized them in subdirectories like:
> $subject/$visit/processed_data.gz
>
> One solution could be to make a copy with a different name before
>> uploading.
>
>
> Instead of making a copy (or renaming) -- try a unique soft-link for each
> package.  I should think that would work (though can't confirm right now).
> e.g. Use a simple bash command to execute for each package:
> >  ln -s  $subject/$visit/processed_data.gz
> ./$subject_$visit_processed_data.gz
> (then tell the pipeline to load each $subject_$visit_processed_data.gz )
>
> Again, this no-duplicate check exists to help ensure duplicate data isn't
> inserted, which is a protection we try to preserve as much as possible.
> Since all the code is customizable, this check could be manually bypassed,
> but ideally this workaround will help keep the error-proofing intact.
>
> Let us know how this goes -
> We'll be offline for Labour day weekend but back in the office next
> Tuesday.
> Best,
> Christine
>
>
> On Fri, Aug 30, 2019 at 12:53 PM Morales Pinzon, Alfredo <
> AMORALESPINZON at bwh.harvard.edu> wrote:
>
>> Hi Christine,
>>
>> Thank you for your answer.
>>
>> Here are the answers to your questions:
>>
>> Can you use distinct filenames ?
>> We could but we already have all the images, thousands, with a
>> pre-defined convention. One solution could be to make a copy with a
>> different name before uploading.
>>
>> Are you are trying to load additional data for a participant session
>> (i.e. same IDs and visit label) ?
>> Yes, we are uploading the result of a couple of pipelines for each
>> participant for each label.
>>
>> Let me know if you can find a workaround, in the mean time I will check
>> with Pisti if we can make a copy of the files with a different name before
>> uploading.
>>
>> Best,
>> Alfredo.
>>
>> ------------------------------
>> *From:* Christine Rogers, Ms. <christine.rogers at mcgill.ca>
>> *Sent:* Friday, August 30, 2019 11:17 AM
>> *To:* Morales Pinzon, Alfredo <AMORALESPINZON at BWH.HARVARD.EDU>
>> *Cc:* loris-dev at bic.mni.mcgill.ca <loris-dev at bic.mni.mcgill.ca>; Cecile
>> Madjar <cecile.madjar at mcin.ca>; Sridar Narayanan, Dr. <
>> sridar.narayanan at mcgill.ca>; Rozie Arnaoutelis, Ms. <
>> rozie.arnaoutelis at mcgill.ca>; Douglas Arnold, Dr. <
>> douglas.arnold at mcgill.ca>; Guttmann, Charles,M.D. <
>> guttmann at bwh.harvard.edu>
>> *Subject:* Re: [Loris-dev] Error "not a unique file" inserting
>> segmentation files in LORIS
>>
>>
>>         External Email - Use Caution
>>
>> Hi Alfredo,
>>
>> Could you please help me inserting those files which are different in
>> size and md5 from previous uploaded files?
>>
>>
>>
>> The only similarity between the previous uploaded files and the ones that
>> could not be uploaded is the filename.
>>
>>
>> To provide a quick answer (since most of our imaging team is on vacation
>> this week) :
>>
>> Yes the MD5hash seems to require a unique filename (* below).
>> Can you use distinct filenames ?  i.e. Are you are trying to load
>> additional data for a participant session (i.e. same IDs and visit label)
>> ?
>> Or, are you trying to load more than one participant/session at a time?
>>
>> (*) This line in the actual MRIProcessingUtility library
>> <https://github.com/aces/Loris-MRI/blob/master/uploadNeuroDB/NeuroDB/MRIProcessingUtility.pm#L617>:
>> (around Line 617)
>>
>> $md5hash = &NeuroDB::MRI::compute_hash(\$file);
>>
>> and
>>
>> my $unique = &NeuroDB::MRI::is_unique_hash(\$file);
>>
>>
>>
>> I'll check with other imaging devs to see if we have a workaround while
>> our senior devs are away -- I think there must be some solution...
>> Meanwhile, the MD5hash for imaging files is documented here (per this
>> script documentation
>> <https://github.com/aces/Loris-MRI/blob/21.0-dev/docs/scripts_md/MRIProcessingUtility.md#computemd5hashfile-upload_id>)
>> :
>>
>> computeMd5Hash($file, $upload_id)
>> Computes the MD5 hash of a file and makes sure it is unique.
>> INPUTS:
>>
>>    - $file : file to use to compute the MD5 hash
>>
>>
>>    - $upload_id: upload ID of the study
>>
>> RETURNS: 1 if the file is unique, 0 otherwise
>>
>>
>>
>> Best,
>> Christine
>>
>>
>>
>> On Thu, Aug 29, 2019 at 9:46 PM Morales Pinzon, Alfredo <
>> AMORALESPINZON at bwh.harvard.edu> wrote:
>>
>> Dear DevLoris Team,
>>
>> I started the uploading process of processed files, segmentations and
>> transformations, using the file "register_processed_data.pl" but some
>> files are not being uploaded. The error reported in
>> "data/logs/registerProcessed" show the following log for one of the files
>> that could not be inserted:
>>
>>
>> -------------------------------------------------------------------------------
>> ==> Successfully connected to database
>> Log file, 2019-08-29_19:08:17
>>
>>
>> ==>Mapped DICOM parameters
>>  -> using user-defined filterParameters for
>> /xxx/w024/gvf_ISPC-stx152lsq6.mnc.gz
>>
>> ==> Verifying acquisition center
>>  - Center Name  : UNKN
>>  - CenterID     : 0
>>  -> Set ScannerID to 0.
>>
>> ==> Data found for candidate   : 123456 - Visit: w024
>>  -> Set SessionID to 28269.
>>  -> Set SourceFileID to 49598.
>>  -> Set AcquisitionProtocolID to 1013.
>>  -> Set CoordinateSpace to stx152lsq6.
>>  -> Set SourcePipeline to ConsensusGd.
>>  -> Set PipelineDate to 2019-08-29.
>>  -> Set OutputType to gvf.
>>  -> Set md5hash to *b877648ed0ef9a7458ad4931f4dbfd11*.
>>
>> ==> NeuroDB::File=HASH(0x136baa8) is not a unique file and will not be
>> added to database.
>>
>> -------------------------------------------------------------------------------
>>
>> I checked the md5hash for a previous uploaded file, which is different
>> from the previous, but the same "md5hash" was calculated. See the following
>> log:
>>
>>
>> -------------------------------------------------------------------------------
>>
>> ==> Successfully connected to database
>> Log file, 2019-08-19_10:11:53
>>
>>
>> ==>Mapped DICOM parameters
>>  -> using user-defined filterParameters for
>>
>> ==> Verifying acquisition center
>>  - Center Name  : UNKN
>>  - CenterID     : 0
>>  -> Set ScannerID to 0.
>>
>> ==> Data found for candidate   : 123456 - Visit: baseline
>>  -> Set SessionID to 28268.
>>  -> Set SourceFileID to 49593.
>>  -> Set AcquisitionProtocolID to 1013.
>>  -> Set CoordinateSpace to stx152lsq6.
>>  -> Set SourcePipeline to T2Vol.
>>  -> Set PipelineDate to 2019-08-19.
>>  -> Set OutputType to gvf.
>>  -> Set md5hash to *b877648ed0ef9a7458ad4931f4dbfd11*.
>> File /xxx/baseline/gvf_ISPC-stx152lsq6.mnc.gz
>>  moved to:
>>
>>  /yyy/data/assembly/123456/baseline/mri/processed/T2Vol/IPMSA_123456_baseline_t1c_001_gvf_001.mnc.gz
>>
>> ==> FAILED TO INSERT INTERMEDIARY FILES FOR 112945!
>>
>> Making JIV
>>
>>  ==> Registered
>> /data_/ipmsa/loris_data/IPMSA/data/assembly/307024/baseline/mri/processed/T2Vol/IPMSA_307024_baseline_t1c_001_gvf_001.mnc.gz
>> in database, given FileID: 112945
>>
>> -------------------------------------------------------------------------------
>>
>> Here are the corresponding md5 for each file calculated using the command
>> md5sum:
>> /xxx/w024/gvf_ISPC-stx152lsq6.mnc.gz   7306010346c4614d8e338595b386ca5c
>> /xxx/baseline/gvf_ISPC-stx152lsq6.mnc.gz 22107253f8b87c9918ac7f82fdc22c36
>>
>> Could you please help me inserting those files which are different in
>> size and md5 from previous uploaded files? The only similarity between the
>> previous uploaded files and the ones that could not be uploaded is the
>> filename.
>>
>> Let me know if you need more information.
>>
>> Regards,
>> Alfredo.
>>
>> The information in this e-mail is intended only for the person to whom it
>> is
>> addressed. If you believe this e-mail was sent to you in error and the
>> e-mail
>> contains patient information, please contact the Partners Compliance
>> HelpLine at
>> http://www.partners.org/complianceline . If the e-mail was sent to you
>> in error
>> but does not contain patient information, please contact the sender and
>> properly
>> dispose of the e-mail.
>> _______________________________________________
>> Loris-dev mailing list
>> Loris-dev at bic.mni.mcgill.ca
>> https://mailman.bic.mni.mcgill.ca/mailman/listinfo/loris-dev
>>
>>
>>
>> --
>>
>> christine.rogers at mcgill.ca
>> McGill Centre for Integrative Neuroscience | MCIN.ca
>> Montreal Neurological Institute
>> McGill University | Montreal | Canada
>> _______________________________________________
>> Loris-dev mailing list
>> Loris-dev at bic.mni.mcgill.ca
>> https://mailman.bic.mni.mcgill.ca/mailman/listinfo/loris-dev
>>
>
>
> --
>
> christine.rogers at mcgill.ca
> McGill Centre for Integrative Neuroscience | MCIN.ca
> Montreal Neurological Institute
> McGill University | Montreal | Canada
> _______________________________________________
> Loris-dev mailing list
> Loris-dev at bic.mni.mcgill.ca
> https://mailman.bic.mni.mcgill.ca/mailman/listinfo/loris-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.bic.mni.mcgill.ca/pipermail/loris-dev/attachments/20190910/27f90882/attachment-0001.html>


More information about the Loris-dev mailing list