[Loris-dev] Error "not a unique file" inserting segmentation files in LORIS

Cecile Madjar cecile.madjar at mcin.ca
Tue Sep 17 15:15:52 EDT 2019


Hi Alfredo,

The "processing" entry I was talking about would need to be created by your
wrapper script before calling register_processed_data to create the MINC
header in the processed data. A bit like it is done in that function
<https://github.com/aces/Loris-MRI/blob/442860c60d66dca13b6523ad1c087e1b28628f26/DTIPrep/DTI/DTI.pm#L694>
of
the DTIPrep pipeline used with LORIS.

Then you could use a combination of these new processing tags to
differentiate the processed data from the native one.

You could always bypass the md5sum check if you prefer but that would mean
then that you will not be able to detect duplicated files when inserting a
new file.

Hope this helps! Let me know how things go,

Cécile

On Tue, Sep 17, 2019 at 8:08 AM Morales Pinzon, Alfredo <
AMORALESPINZON at bwh.harvard.edu> wrote:

> Hi Cécile,
>
> I tried your solution but it didn't work as the mnc headers do not contain
> any information about patient nor "processing". The headers only have
> information about the spacing and intensity of the image and are almost
> similar among them. Attached I'm sending the header for two different mnc
> files.
>
> Is there another way to make the images have different hash? Or maybe
> bypass the hash check?
>
> Best,
> Alfredo.
> ------------------------------
> *From:* Cecile Madjar <cecile.madjar at mcin.ca>
> *Sent:* Tuesday, September 10, 2019 11:59 AM
> *To:* Christine Rogers, Ms. <christine.rogers at mcgill.ca>
> *Cc:* Morales Pinzon, Alfredo <AMORALESPINZON at BWH.HARVARD.EDU>;
> loris-dev at bic.mni.mcgill.ca <loris-dev at bic.mni.mcgill.ca>; Rozie
> Arnaoutelis, Ms. <rozie.arnaoutelis at mcgill.ca>; Sridar Narayanan, Dr. <
> sridar.narayanan at mcgill.ca>; Douglas Arnold, Dr. <douglas.arnold at mcgill.ca>;
> Guttmann, Charles,M.D. <guttmann at bwh.harvard.edu>
> *Subject:* Re: [Loris-dev] Error "not a unique file" inserting
> segmentation files in LORIS
>
>
>         External Email - Use Caution
>
> Hi Alfredo,
>
> Sorry, I just came back from vacation. I don't know if you still have
> issues inserting the data.
>
> Just in case and to let you know, the MD5SUM is computed on the MINC files
> based on the following list of MINC headers:
>
>    - patient:full_name
>    - study:start_time
>    - patient:identification
>    - patient:sex
>    - patient:age
>    - patient:birthdate
>    - study_instance_uid
>    - series_description
>    - processing:intergradient_rejected
>
> The code with the list of headers to use to create an MD5sum is in the
> MRI.pm library (function called compute_hash
> <https://github.com/aces/Loris-MRI/blob/442860c60d66dca13b6523ad1c087e1b28628f26/uploadNeuroDB/NeuroDB/MRI.pm#L1162>
> ).
>
> When I had to develop the insertion of processed DWI images into the
> database, most of the headers used were identical to the native image which
> resulted in the duplicate error message you got. To bypass that, I created
> a new MINC header (under a processing category) in the processed images
> that I wanted to insert with some parameters specific to that processed
> image and modified the code in the compute_hash
> <https://github.com/aces/Loris-MRI/blob/442860c60d66dca13b6523ad1c087e1b28628f26/uploadNeuroDB/NeuroDB/MRI.pm#L1162>
> function to include that new header when computing the MD5sum. You would
> just need to add an additional if statement like in line 1198 of MRI.pm
> <https://github.com/aces/Loris-MRI/blob/442860c60d66dca13b6523ad1c087e1b28628f26/uploadNeuroDB/NeuroDB/MRI.pm#L1198>
> .
>
> Let me know if you have questions or if something is not clear.
>
> Best,
>
> Cécile
>
> On Fri, Aug 30, 2019 at 3:35 PM Christine Rogers, Ms. <
> christine.rogers at mcgill.ca> wrote:
>
> Hi Alfredo,
>
> Great, this is helpful to know. If I understand you - the filename is the
> same for all the packages of processed data you are trying to upload, for
> various subject-visits.
> e.g. you may have organized them in subdirectories like:
> $subject/$visit/processed_data.gz
>
> One solution could be to make a copy with a different name before
> uploading.
>
>
> Instead of making a copy (or renaming) -- try a unique soft-link for each
> package.  I should think that would work (though can't confirm right now).
> e.g. Use a simple bash command to execute for each package:
> >  ln -s  $subject/$visit/processed_data.gz
> ./$subject_$visit_processed_data.gz
> (then tell the pipeline to load each $subject_$visit_processed_data.gz )
>
> Again, this no-duplicate check exists to help ensure duplicate data isn't
> inserted, which is a protection we try to preserve as much as possible.
> Since all the code is customizable, this check could be manually bypassed,
> but ideally this workaround will help keep the error-proofing intact.
>
> Let us know how this goes -
> We'll be offline for Labour day weekend but back in the office next
> Tuesday.
> Best,
> Christine
>
>
> On Fri, Aug 30, 2019 at 12:53 PM Morales Pinzon, Alfredo <
> AMORALESPINZON at bwh.harvard.edu> wrote:
>
> Hi Christine,
>
> Thank you for your answer.
>
> Here are the answers to your questions:
>
> Can you use distinct filenames ?
> We could but we already have all the images, thousands, with a pre-defined
> convention. One solution could be to make a copy with a different name
> before uploading.
>
> Are you are trying to load additional data for a participant session (i.e.
> same IDs and visit label) ?
> Yes, we are uploading the result of a couple of pipelines for each
> participant for each label.
>
> Let me know if you can find a workaround, in the mean time I will check
> with Pisti if we can make a copy of the files with a different name before
> uploading.
>
> Best,
> Alfredo.
>
> ------------------------------
> *From:* Christine Rogers, Ms. <christine.rogers at mcgill.ca>
> *Sent:* Friday, August 30, 2019 11:17 AM
> *To:* Morales Pinzon, Alfredo <AMORALESPINZON at BWH.HARVARD.EDU>
> *Cc:* loris-dev at bic.mni.mcgill.ca <loris-dev at bic.mni.mcgill.ca>; Cecile
> Madjar <cecile.madjar at mcin.ca>; Sridar Narayanan, Dr. <
> sridar.narayanan at mcgill.ca>; Rozie Arnaoutelis, Ms. <
> rozie.arnaoutelis at mcgill.ca>; Douglas Arnold, Dr. <
> douglas.arnold at mcgill.ca>; Guttmann, Charles,M.D. <
> guttmann at bwh.harvard.edu>
> *Subject:* Re: [Loris-dev] Error "not a unique file" inserting
> segmentation files in LORIS
>
>
>         External Email - Use Caution
>
> Hi Alfredo,
>
> Could you please help me inserting those files which are different in size
> and md5 from previous uploaded files?
>
>
>
> The only similarity between the previous uploaded files and the ones that
> could not be uploaded is the filename.
>
>
> To provide a quick answer (since most of our imaging team is on vacation
> this week) :
>
> Yes the MD5hash seems to require a unique filename (* below).
> Can you use distinct filenames ?  i.e. Are you are trying to load
> additional data for a participant session (i.e. same IDs and visit label)
> ?
> Or, are you trying to load more than one participant/session at a time?
>
> (*) This line in the actual MRIProcessingUtility library
> <https://github.com/aces/Loris-MRI/blob/master/uploadNeuroDB/NeuroDB/MRIProcessingUtility.pm#L617>:
> (around Line 617)
>
> $md5hash = &NeuroDB::MRI::compute_hash(\$file);
>
> and
>
> my $unique = &NeuroDB::MRI::is_unique_hash(\$file);
>
>
>
> I'll check with other imaging devs to see if we have a workaround while
> our senior devs are away -- I think there must be some solution...
> Meanwhile, the MD5hash for imaging files is documented here (per this
> script documentation
> <https://github.com/aces/Loris-MRI/blob/21.0-dev/docs/scripts_md/MRIProcessingUtility.md#computemd5hashfile-upload_id>)
> :
>
> computeMd5Hash($file, $upload_id)
> Computes the MD5 hash of a file and makes sure it is unique.
> INPUTS:
>
>    - $file : file to use to compute the MD5 hash
>
>
>    - $upload_id: upload ID of the study
>
> RETURNS: 1 if the file is unique, 0 otherwise
>
>
>
> Best,
> Christine
>
>
>
> On Thu, Aug 29, 2019 at 9:46 PM Morales Pinzon, Alfredo <
> AMORALESPINZON at bwh.harvard.edu> wrote:
>
> Dear DevLoris Team,
>
> I started the uploading process of processed files, segmentations and
> transformations, using the file "register_processed_data.pl" but some
> files are not being uploaded. The error reported in
> "data/logs/registerProcessed" show the following log for one of the files
> that could not be inserted:
>
>
> -------------------------------------------------------------------------------
> ==> Successfully connected to database
> Log file, 2019-08-29_19:08:17
>
>
> ==>Mapped DICOM parameters
>  -> using user-defined filterParameters for
> /xxx/w024/gvf_ISPC-stx152lsq6.mnc.gz
>
> ==> Verifying acquisition center
>  - Center Name  : UNKN
>  - CenterID     : 0
>  -> Set ScannerID to 0.
>
> ==> Data found for candidate   : 123456 - Visit: w024
>  -> Set SessionID to 28269.
>  -> Set SourceFileID to 49598.
>  -> Set AcquisitionProtocolID to 1013.
>  -> Set CoordinateSpace to stx152lsq6.
>  -> Set SourcePipeline to ConsensusGd.
>  -> Set PipelineDate to 2019-08-29.
>  -> Set OutputType to gvf.
>  -> Set md5hash to *b877648ed0ef9a7458ad4931f4dbfd11*.
>
> ==> NeuroDB::File=HASH(0x136baa8) is not a unique file and will not be
> added to database.
>
> -------------------------------------------------------------------------------
>
> I checked the md5hash for a previous uploaded file, which is different
> from the previous, but the same "md5hash" was calculated. See the following
> log:
>
>
> -------------------------------------------------------------------------------
>
> ==> Successfully connected to database
> Log file, 2019-08-19_10:11:53
>
>
> ==>Mapped DICOM parameters
>  -> using user-defined filterParameters for
>
> ==> Verifying acquisition center
>  - Center Name  : UNKN
>  - CenterID     : 0
>  -> Set ScannerID to 0.
>
> ==> Data found for candidate   : 123456 - Visit: baseline
>  -> Set SessionID to 28268.
>  -> Set SourceFileID to 49593.
>  -> Set AcquisitionProtocolID to 1013.
>  -> Set CoordinateSpace to stx152lsq6.
>  -> Set SourcePipeline to T2Vol.
>  -> Set PipelineDate to 2019-08-19.
>  -> Set OutputType to gvf.
>  -> Set md5hash to *b877648ed0ef9a7458ad4931f4dbfd11*.
> File /xxx/baseline/gvf_ISPC-stx152lsq6.mnc.gz
>  moved to:
>
>  /yyy/data/assembly/123456/baseline/mri/processed/T2Vol/IPMSA_123456_baseline_t1c_001_gvf_001.mnc.gz
>
> ==> FAILED TO INSERT INTERMEDIARY FILES FOR 112945!
>
> Making JIV
>
>  ==> Registered
> /data_/ipmsa/loris_data/IPMSA/data/assembly/307024/baseline/mri/processed/T2Vol/IPMSA_307024_baseline_t1c_001_gvf_001.mnc.gz
> in database, given FileID: 112945
>
> -------------------------------------------------------------------------------
>
> Here are the corresponding md5 for each file calculated using the command
> md5sum:
> /xxx/w024/gvf_ISPC-stx152lsq6.mnc.gz   7306010346c4614d8e338595b386ca5c
> /xxx/baseline/gvf_ISPC-stx152lsq6.mnc.gz 22107253f8b87c9918ac7f82fdc22c36
>
> Could you please help me inserting those files which are different in size
> and md5 from previous uploaded files? The only similarity between the
> previous uploaded files and the ones that could not be uploaded is the
> filename.
>
> Let me know if you need more information.
>
> Regards,
> Alfredo.
>
> The information in this e-mail is intended only for the person to whom it
> is
> addressed. If you believe this e-mail was sent to you in error and the
> e-mail
> contains patient information, please contact the Partners Compliance
> HelpLine at
> http://www.partners.org/complianceline . If the e-mail was sent to you in
> error
> but does not contain patient information, please contact the sender and
> properly
> dispose of the e-mail.
> _______________________________________________
> Loris-dev mailing list
> Loris-dev at bic.mni.mcgill.ca
> https://mailman.bic.mni.mcgill.ca/mailman/listinfo/loris-dev
>
>
>
> --
>
> christine.rogers at mcgill.ca
> McGill Centre for Integrative Neuroscience | MCIN.ca
> Montreal Neurological Institute
> McGill University | Montreal | Canada
> _______________________________________________
> Loris-dev mailing list
> Loris-dev at bic.mni.mcgill.ca
> https://mailman.bic.mni.mcgill.ca/mailman/listinfo/loris-dev
>
>
>
> --
>
> christine.rogers at mcgill.ca
> McGill Centre for Integrative Neuroscience | MCIN.ca
> Montreal Neurological Institute
> McGill University | Montreal | Canada
> _______________________________________________
> Loris-dev mailing list
> Loris-dev at bic.mni.mcgill.ca
> https://mailman.bic.mni.mcgill.ca/mailman/listinfo/loris-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.bic.mni.mcgill.ca/pipermail/loris-dev/attachments/20190917/e76684dd/attachment-0001.html>


More information about the Loris-dev mailing list