[Loris-dev] Error "not a unique file" inserting segmentation files in LORIS

Morales Pinzon, Alfredo AMORALESPINZON at bwh.harvard.edu
Wed Sep 18 16:06:54 EDT 2019


Hi Cécile,

I added the "filename" to the hash calculation in order to be able to insert the files. Once I finish the insertion of transformations and masks I will remove the line so that it won't change the importing behavior for other images. The modification actually prevents inserting the same exact file based on the filepath. This is the modification:

_____________________________________
sub compute_hash {
   ...
    if($fileType eq 'mnc') {
        ...
        $ctx->add($file->getFileDatum('File'));
    }
   ...
}
_____________________________________

Thank you for the help.

Best,
Alfredo.
________________________________
From: Cecile Madjar <cecile.madjar at mcin.ca>
Sent: Tuesday, September 17, 2019 3:15 PM
To: Morales Pinzon, Alfredo <AMORALESPINZON at BWH.HARVARD.EDU>
Cc: Christine Rogers, Ms. <christine.rogers at mcgill.ca>; loris-dev at bic.mni.mcgill.ca <loris-dev at bic.mni.mcgill.ca>; Rozie Arnaoutelis, Ms. <rozie.arnaoutelis at mcgill.ca>; Sridar Narayanan, Dr. <sridar.narayanan at mcgill.ca>; Douglas Arnold, Dr. <douglas.arnold at mcgill.ca>; Guttmann, Charles,M.D. <guttmann at bwh.harvard.edu>
Subject: Re: [Loris-dev] Error "not a unique file" inserting segmentation files in LORIS


        External Email - Use Caution

Hi Alfredo,

The "processing" entry I was talking about would need to be created by your wrapper script before calling register_processed_data to create the MINC header in the processed data. A bit like it is done in that function<https://github.com/aces/Loris-MRI/blob/442860c60d66dca13b6523ad1c087e1b28628f26/DTIPrep/DTI/DTI.pm#L694> of the DTIPrep pipeline used with LORIS.

Then you could use a combination of these new processing tags to differentiate the processed data from the native one.

You could always bypass the md5sum check if you prefer but that would mean then that you will not be able to detect duplicated files when inserting a new file.

Hope this helps! Let me know how things go,

Cécile

On Tue, Sep 17, 2019 at 8:08 AM Morales Pinzon, Alfredo <AMORALESPINZON at bwh.harvard.edu<mailto:AMORALESPINZON at bwh.harvard.edu>> wrote:
Hi Cécile,

I tried your solution but it didn't work as the mnc headers do not contain any information about patient nor "processing". The headers only have information about the spacing and intensity of the image and are almost similar among them. Attached I'm sending the header for two different mnc files.

Is there another way to make the images have different hash? Or maybe bypass the hash check?

Best,
Alfredo.
________________________________
From: Cecile Madjar <cecile.madjar at mcin.ca<mailto:cecile.madjar at mcin.ca>>
Sent: Tuesday, September 10, 2019 11:59 AM
To: Christine Rogers, Ms. <christine.rogers at mcgill.ca<mailto:christine.rogers at mcgill.ca>>
Cc: Morales Pinzon, Alfredo <AMORALESPINZON at BWH.HARVARD.EDU<mailto:AMORALESPINZON at BWH.HARVARD.EDU>>; loris-dev at bic.mni.mcgill.ca<mailto:loris-dev at bic.mni.mcgill.ca> <loris-dev at bic.mni.mcgill.ca<mailto:loris-dev at bic.mni.mcgill.ca>>; Rozie Arnaoutelis, Ms. <rozie.arnaoutelis at mcgill.ca<mailto:rozie.arnaoutelis at mcgill.ca>>; Sridar Narayanan, Dr. <sridar.narayanan at mcgill.ca<mailto:sridar.narayanan at mcgill.ca>>; Douglas Arnold, Dr. <douglas.arnold at mcgill.ca<mailto:douglas.arnold at mcgill.ca>>; Guttmann, Charles,M.D. <guttmann at bwh.harvard.edu<mailto:guttmann at bwh.harvard.edu>>
Subject: Re: [Loris-dev] Error "not a unique file" inserting segmentation files in LORIS


        External Email - Use Caution

Hi Alfredo,

Sorry, I just came back from vacation. I don't know if you still have issues inserting the data.

Just in case and to let you know, the MD5SUM is computed on the MINC files based on the following list of MINC headers:

  *   patient:full_name
  *   study:start_time
  *   patient:identification
  *   patient:sex
  *   patient:age
  *   patient:birthdate
  *   study_instance_uid
  *   series_description
  *   processing:intergradient_rejected

The code with the list of headers to use to create an MD5sum is in the MRI.pm library (function called compute_hash<https://github.com/aces/Loris-MRI/blob/442860c60d66dca13b6523ad1c087e1b28628f26/uploadNeuroDB/NeuroDB/MRI.pm#L1162>).

When I had to develop the insertion of processed DWI images into the database, most of the headers used were identical to the native image which resulted in the duplicate error message you got. To bypass that, I created a new MINC header (under a processing category) in the processed images that I wanted to insert with some parameters specific to that processed image and modified the code in the compute_hash<https://github.com/aces/Loris-MRI/blob/442860c60d66dca13b6523ad1c087e1b28628f26/uploadNeuroDB/NeuroDB/MRI.pm#L1162> function to include that new header when computing the MD5sum. You would just need to add an additional if statement like in line 1198 of MRI.pm<https://github.com/aces/Loris-MRI/blob/442860c60d66dca13b6523ad1c087e1b28628f26/uploadNeuroDB/NeuroDB/MRI.pm#L1198>.

Let me know if you have questions or if something is not clear.

Best,

Cécile

On Fri, Aug 30, 2019 at 3:35 PM Christine Rogers, Ms. <christine.rogers at mcgill.ca<mailto:christine.rogers at mcgill.ca>> wrote:
Hi Alfredo,

Great, this is helpful to know. If I understand you - the filename is the same for all the packages of processed data you are trying to upload, for various subject-visits.
e.g. you may have organized them in subdirectories like: $subject/$visit/processed_data.gz

One solution could be to make a copy with a different name before uploading.

Instead of making a copy (or renaming) -- try a unique soft-link for each package.  I should think that would work (though can't confirm right now).
e.g. Use a simple bash command to execute for each package:
>  ln -s  $subject/$visit/processed_data.gz  ./$subject_$visit_processed_data.gz
(then tell the pipeline to load each $subject_$visit_processed_data.gz )

Again, this no-duplicate check exists to help ensure duplicate data isn't inserted, which is a protection we try to preserve as much as possible.
Since all the code is customizable, this check could be manually bypassed, but ideally this workaround will help keep the error-proofing intact.

Let us know how this goes -
We'll be offline for Labour day weekend but back in the office next Tuesday.
Best,
Christine


On Fri, Aug 30, 2019 at 12:53 PM Morales Pinzon, Alfredo <AMORALESPINZON at bwh.harvard.edu<mailto:AMORALESPINZON at bwh.harvard.edu>> wrote:
Hi Christine,

Thank you for your answer.

Here are the answers to your questions:

Can you use distinct filenames ?
We could but we already have all the images, thousands, with a pre-defined convention. One solution could be to make a copy with a different name before uploading.

Are you are trying to load additional data for a participant session (i.e. same IDs and visit label) ?
Yes, we are uploading the result of a couple of pipelines for each participant for each label.

Let me know if you can find a workaround, in the mean time I will check with Pisti if we can make a copy of the files with a different name before uploading.

Best,
Alfredo.

________________________________
From: Christine Rogers, Ms. <christine.rogers at mcgill.ca<mailto:christine.rogers at mcgill.ca>>
Sent: Friday, August 30, 2019 11:17 AM
To: Morales Pinzon, Alfredo <AMORALESPINZON at BWH.HARVARD.EDU<mailto:AMORALESPINZON at BWH.HARVARD.EDU>>
Cc: loris-dev at bic.mni.mcgill.ca<mailto:loris-dev at bic.mni.mcgill.ca> <loris-dev at bic.mni.mcgill.ca<mailto:loris-dev at bic.mni.mcgill.ca>>; Cecile Madjar <cecile.madjar at mcin.ca<mailto:cecile.madjar at mcin.ca>>; Sridar Narayanan, Dr. <sridar.narayanan at mcgill.ca<mailto:sridar.narayanan at mcgill.ca>>; Rozie Arnaoutelis, Ms. <rozie.arnaoutelis at mcgill.ca<mailto:rozie.arnaoutelis at mcgill.ca>>; Douglas Arnold, Dr. <douglas.arnold at mcgill.ca<mailto:douglas.arnold at mcgill.ca>>; Guttmann, Charles,M.D. <guttmann at bwh.harvard.edu<mailto:guttmann at bwh.harvard.edu>>
Subject: Re: [Loris-dev] Error "not a unique file" inserting segmentation files in LORIS


        External Email - Use Caution

Hi Alfredo,

Could you please help me inserting those files which are different in size and md5 from previous uploaded files?

The only similarity between the previous uploaded files and the ones that could not be uploaded is the filename.

To provide a quick answer (since most of our imaging team is on vacation this week) :

Yes the MD5hash seems to require a unique filename (* below).
Can you use distinct filenames ?  i.e. Are you are trying to load additional data for a participant session (i.e. same IDs and visit label) ?
Or, are you trying to load more than one participant/session at a time?

(*) This line in the actual MRIProcessingUtility library<https://github.com/aces/Loris-MRI/blob/master/uploadNeuroDB/NeuroDB/MRIProcessingUtility.pm#L617>:  (around Line 617)
$md5hash = &NeuroDB::MRI::compute_hash(\$file);
and
my $unique = &NeuroDB::MRI::is_unique_hash(\$file);


I'll check with other imaging devs to see if we have a workaround while our senior devs are away -- I think there must be some solution...
Meanwhile, the MD5hash for imaging files is documented here (per this script documentation<https://github.com/aces/Loris-MRI/blob/21.0-dev/docs/scripts_md/MRIProcessingUtility.md#computemd5hashfile-upload_id>)  :

computeMd5Hash($file, $upload_id)
Computes the MD5 hash of a file and makes sure it is unique.
INPUTS:

  *   $file : file to use to compute the MD5 hash

  *   $upload_id: upload ID of the study

RETURNS: 1 if the file is unique, 0 otherwise


Best,
Christine



On Thu, Aug 29, 2019 at 9:46 PM Morales Pinzon, Alfredo <AMORALESPINZON at bwh.harvard.edu<mailto:AMORALESPINZON at bwh.harvard.edu>> wrote:
Dear DevLoris Team,

I started the uploading process of processed files, segmentations and transformations, using the file "register_processed_data.pl<http://register_processed_data.pl>" but some files are not being uploaded. The error reported in "data/logs/registerProcessed" show the following log for one of the files that could not be inserted:

-------------------------------------------------------------------------------
==> Successfully connected to database
Log file, 2019-08-29_19:08:17


==>Mapped DICOM parameters
 -> using user-defined filterParameters for /xxx/w024/gvf_ISPC-stx152lsq6.mnc.gz

==> Verifying acquisition center
 - Center Name  : UNKN
 - CenterID     : 0
 -> Set ScannerID to 0.

==> Data found for candidate   : 123456 - Visit: w024
 -> Set SessionID to 28269.
 -> Set SourceFileID to 49598.
 -> Set AcquisitionProtocolID to 1013.
 -> Set CoordinateSpace to stx152lsq6.
 -> Set SourcePipeline to ConsensusGd.
 -> Set PipelineDate to 2019-08-29.
 -> Set OutputType to gvf.
 -> Set md5hash to b877648ed0ef9a7458ad4931f4dbfd11.

==> NeuroDB::File=HASH(0x136baa8) is not a unique file and will not be added to database.
-------------------------------------------------------------------------------

I checked the md5hash for a previous uploaded file, which is different from the previous, but the same "md5hash" was calculated. See the following log:

-------------------------------------------------------------------------------

==> Successfully connected to database
Log file, 2019-08-19_10:11:53


==>Mapped DICOM parameters
 -> using user-defined filterParameters for

==> Verifying acquisition center
 - Center Name  : UNKN
 - CenterID     : 0
 -> Set ScannerID to 0.

==> Data found for candidate   : 123456 - Visit: baseline
 -> Set SessionID to 28268.
 -> Set SourceFileID to 49593.
 -> Set AcquisitionProtocolID to 1013.
 -> Set CoordinateSpace to stx152lsq6.
 -> Set SourcePipeline to T2Vol.
 -> Set PipelineDate to 2019-08-19.
 -> Set OutputType to gvf.
 -> Set md5hash to b877648ed0ef9a7458ad4931f4dbfd11.
File /xxx/baseline/gvf_ISPC-stx152lsq6.mnc.gz
 moved to:
 /yyy/data/assembly/123456/baseline/mri/processed/T2Vol/IPMSA_123456_baseline_t1c_001_gvf_001.mnc.gz

==> FAILED TO INSERT INTERMEDIARY FILES FOR 112945!

Making JIV

 ==> Registered /data_/ipmsa/loris_data/IPMSA/data/assembly/307024/baseline/mri/processed/T2Vol/IPMSA_307024_baseline_t1c_001_gvf_001.mnc.gz in database, given FileID: 112945
-------------------------------------------------------------------------------

Here are the corresponding md5 for each file calculated using the command md5sum:
/xxx/w024/gvf_ISPC-stx152lsq6.mnc.gz   7306010346c4614d8e338595b386ca5c
/xxx/baseline/gvf_ISPC-stx152lsq6.mnc.gz 22107253f8b87c9918ac7f82fdc22c36

Could you please help me inserting those files which are different in size and md5 from previous uploaded files? The only similarity between the previous uploaded files and the ones that could not be uploaded is the filename.

Let me know if you need more information.

Regards,
Alfredo.


The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

_______________________________________________
Loris-dev mailing list
Loris-dev at bic.mni.mcgill.ca<mailto:Loris-dev at bic.mni.mcgill.ca>
https://mailman.bic.mni.mcgill.ca/mailman/listinfo/loris-dev


--

christine.rogers at mcgill.ca<mailto:christine.rogers at mcgill.ca>
McGill Centre for Integrative Neuroscience | MCIN.ca
Montreal Neurological Institute
McGill University | Montreal | Canada
_______________________________________________
Loris-dev mailing list
Loris-dev at bic.mni.mcgill.ca<mailto:Loris-dev at bic.mni.mcgill.ca>
https://mailman.bic.mni.mcgill.ca/mailman/listinfo/loris-dev


--

christine.rogers at mcgill.ca<mailto:christine.rogers at mcgill.ca>
McGill Centre for Integrative Neuroscience | MCIN.ca
Montreal Neurological Institute
McGill University | Montreal | Canada
_______________________________________________
Loris-dev mailing list
Loris-dev at bic.mni.mcgill.ca<mailto:Loris-dev at bic.mni.mcgill.ca>
https://mailman.bic.mni.mcgill.ca/mailman/listinfo/loris-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.bic.mni.mcgill.ca/pipermail/loris-dev/attachments/20190918/a09447a9/attachment-0001.html>


More information about the Loris-dev mailing list