The DAM Forum
Welcome, Guest. Please login or register.
May 23, 2013, 02:04:28 PM

Login with username, password and session length
Search:     Advanced search
Jan 9, 2012
John Beardsworth's new Lightroom site
Lightroom Solutions
27960 Posts in 5113 Topics by 2914 Members
Latest Member: imthedamstar
* Home Help Search Login Register
+  The DAM Forum
|-+  DAM Stuff
| |-+  Data Validation
| | |-+  ImageVerifier
« previous next »
Pages: [1] 2 Print
Author Topic: ImageVerifier  (Read 12199 times)
peterkrogh
Administrator
Hero Member
*****
Posts: 5682


View Profile
« on: November 01, 2007, 07:56:11 PM »

I've been working with ImageVerifier for the last couple of weeks, and I think it's a real step forward in the preservation of digital images.

Here's what it does:

1. It checks file structure. It can look at files on your drive and see if the file is a valid TIFF, JPEG, DNG, and, to a lesser extent, other kind of image file.  It returns a report on each suspect file.

2. More importantly, it makes a "hash" of each file. This is a numeric code that is generated by running the file's numbers through an equation.  The resulting number can be used to check the file at a later date to se if anything has changes in the file - down to a single bit.  If the file checks out, it's in exactly the same state it was when the hash was first created.

It can run hash comparisons that can tell you if the file has been changes in any way.  Reasons for this include:
a. The user wanted to change the file (I suggest not doing this to backup files)
b. There was a virus
c. The media that the file i stored on, like CD, DVD, or hard drive, is starting to fail.
d. There was some unintended human error that changed the file.

I'd be very interested in hearing what experience others are having in using ImageVerifier as a data integrity validation tool.

Peter
Logged
danaltick
Hero Member
*****
Posts: 1616


View Profile WWW
« Reply #1 on: November 01, 2007, 08:14:38 PM »

Peter,

Thanks for the info.  I will try to get a little time to test it out over the next week with my archive.  I'll let you know how it goes.

Dan
Logged

WindowsXP, ImageIngester Pro, RapidFixer, IVMP 3, ACR4, Photoshop CS4, Controlled Keyword Catalog, Canon EOS50D
Marc Rochkind
Hero Member
*****
Posts: 1136


View Profile WWW
« Reply #2 on: November 01, 2007, 10:58:59 PM »

Thanks for taking the time to try IV out, Peter... My focus has been on IIP 2.3 for the last half-year or so, and IV hasn't gotten much visibility. The hash approach is unique, as far as I know. Lloyd Chambers (diglloyd.com) talked about something similar a while back, but as far as I know never implemented it.

What's unusual about IV's approach is that the hash is associated with an ID for the image file that's location independent, so if you move the file from one archive to another or back it up, IV can still find its previously-recorded hash.

--Marc
Logged

peterkrogh
Administrator
Hero Member
*****
Posts: 5682


View Profile
« Reply #3 on: November 02, 2007, 04:40:03 AM »

Marc,
Thanks for that info.
I have a few other questions.

Your post seems to indicate that I could use the same hash for both the DVD copy of the files, and the Hard Drive version of the files.  If the hard drive verified multiple buckets in a single run (say, 033-064), how do I ask IV to only do the comparison for bucket Raw_039 when I put the disk Raw_039 in?

I've been pointing IV at entire backup drives and running.  When I do this, I have been making new Jobs for each drive.  Is this necessary?  What is the point of a different "job"?

When my backup drives come on-site for periodic validation, there's a fair amount of processing to do, and it would be nice to distribute the load to multiple computers.  I assume that there's no way that I could run IV with different machines, but collect all the hashes onto a single machine. 

A representative of the Drobo company asked me if there is any error-checking in the storage of the hash.  The questions was, "if you're seeing hash errors, is there any way to know for sure that it's the target drive producing the errors, rather than the drive that the hashes are stored on?"  I thought that was a good question.

I'd love a little nagware pop-up that would remind me it's time to run a validation check every x months.

Thanks again for this ground-breaking program.

Peter
Logged
Marc Rochkind
Hero Member
*****
Posts: 1136


View Profile WWW
« Reply #4 on: November 02, 2007, 07:40:25 AM »

Peter--

First, for those following this thread, there's a detailed explanation of how IV works here: http://basepath.com/ImageIngester/ivinfo.php.

To answer your questions:

A job is a collection of folders to verify along with a few other parameters. If you put in a DVD for bucket 39, say, then you just tell IV to scan that folder. (On the Mac, you have to choose it, as there's no way in IV to just say "scan the DVD I just put in." On Windows, there is, since you can refer to the drive letter.)

IV doesn't distribute across computers, but it will use all the CPUs on a single computer.

It's true that the keys and hashes could be corrupted, as they are stored in a SQLite database. However, if this occurs, verification will fail, so you'll know it, although in theory you won't be able to tell whether it's the hash or the image that are wrong (or both). But, in practice, it's much less likely that a SQLite database could be corrupted without the SQLite system detecting it than it is for an image to be corrupted. This is because images have large areas of data that can contain anything at all, whereas the SQLite database used by IV is mostly structural information about how the columns and rows in the database relate, with very little data. (There's some more info about this on the web page, cited above.)

A reminder popup is a good idea... You might just use iCal for this.

I'm really glad that there's some interest in IV!

By the way, my plan is to keep IV free until at least the end of this year, and then bundle it in with IIP at no additional charge. Also, the next update of the IIP User's Manual, which I'm editing now, will have a section on IV.

--Marc


Logged

peterkrogh
Administrator
Hero Member
*****
Posts: 5682


View Profile
« Reply #5 on: November 02, 2007, 09:29:00 AM »

>If you put in a DVD for bucket 39, say, then you just tell IV to scan that folder. (On the Mac, you have to choose it, as there's no way in IV to >just say "scan the DVD I just put in." On Windows, there is, since you can refer to the drive letter.)

Should it be able to find the hash for the included files by itself, even of they were first hashed on a different volume? 
Peter
Logged
Marc Rochkind
Hero Member
*****
Posts: 1136


View Profile WWW
« Reply #6 on: November 02, 2007, 10:16:20 AM »

Peter--

Yes, IV has no trouble finding the hash. I thought you were asking about the folder of images to be verified. It's never necessary to tell IV where the hashes are, as there's only one database per user per machine.

Given an image, IV calculates the key (ID), composed of the file name (without the path), size, modification date, and a few other things. It uses that to look up the stored hash.

--Marc
Logged

danaltick
Hero Member
*****
Posts: 1616


View Profile WWW
« Reply #7 on: November 02, 2007, 02:20:02 PM »

Marc,

I'm assuming IV takes into account the embedded metadata when performing a hash check.  If so, I think it would be nice if there were an option to turn that off.  I always sync my metadata from iView on my DNG's before creating derivatives in PS.  This would cause the primaries and backups to differ for those DNG's.  If I could tell IV to just check the Raw data, the hashes should match... correct?  Also, if I'm doing a check on just the primary, I could use this to find out if hash differences are a result of the metadata, and if so, go ahead and create a new hash for those images.  I guess the way to implement this would be to either ignore the metadata on hash storage and checks, or generate two hashes for each file, one with the metadata and one without.  If I'm on the wrong track here, let me know.

Also, I just ran IV for the first time on a folder and I got a few aborts on a few of the DNG's.  All the aborts said, "ERROR: Error reading response from pipe --Permission denied."  The files appear to be fine when I edit them in ACR.

Dan
« Last Edit: November 02, 2007, 02:34:00 PM by danaltick » Logged

WindowsXP, ImageIngester Pro, RapidFixer, IVMP 3, ACR4, Photoshop CS4, Controlled Keyword Catalog, Canon EOS50D
Marc Rochkind
Hero Member
*****
Posts: 1136


View Profile WWW
« Reply #8 on: November 02, 2007, 02:33:56 PM »

Dan--

It's a good idea, but well beyond my resources to implement. It would involve getting into each image format to isolate and skip over the metadata. I suppose I could do it for DNGs, but even that isn't something I could possibly fit in. (My experience extracting thumbnails for IIP is enough to scare me away from doing something even more ambitious in IV.)

Regarding the error: I assume you were on XP? Can you retry it running as administrator? (If you were already running that way, let me know.) If you were on Vista, same thing.

(The error is one where IV had internal problems; it's unrelated to any actual image problem.)

--Marc
Logged

danaltick
Hero Member
*****
Posts: 1616


View Profile WWW
« Reply #9 on: November 02, 2007, 02:34:58 PM »

Marc,

I am already administrator.  If you would like me to send you a download link to one of the offending DNG's, I can do that later this evening.

Actually, on second thought, if the DNG's metadata changes, its modification date would also change.  This would be the indicator that it is ok to re-check the structure and generate a new hash for these files...correct?  If so, how difficult would it be to automate this in IV?

Dan
« Last Edit: November 02, 2007, 02:46:51 PM by danaltick » Logged

WindowsXP, ImageIngester Pro, RapidFixer, IVMP 3, ACR4, Photoshop CS4, Controlled Keyword Catalog, Canon EOS50D
Marc Rochkind
Hero Member
*****
Posts: 1136


View Profile WWW
« Reply #10 on: November 02, 2007, 04:25:14 PM »

Dan--

The error isn't caused by any file. I'm assuming IV did not run for you at all... is that the case, or did it verify some files and then quit?

I guess it would be possible to add feature to IV whereby it would update hashes only for file that have changed. (However, I'm not working on IV now... too much to do on IIP.)

--Marc
Logged

danaltick
Hero Member
*****
Posts: 1616


View Profile WWW
« Reply #11 on: November 02, 2007, 04:57:24 PM »

Marc,

It appeared to run through all 38 DNG's in the selected folder.  The verification progress dialog says it processed all 38 files with 30 valid ones.  The dialog then hangs waiting on the subprocess to finish, but they never finish because I got four error dialogs.  If I click Close on all four of them, it get a popup that says, "Error reading response from pipe (Verification Cancelled)."  If I click Ok on that dialog, it says, "Work path not completed (Verification Cancelled)".  When I click Ok on this dialog, it finishes with the following log:

Distribution:
   Worker 0: 3
   Worker 1: 2
   Worker 2: 9
   Worker 3: 2
   Worker 4: 3
   Worker 5: 10
   Worker 6: 5
   Worker 7: 4
Verification Checks: Structure

Total processed: 38
Total skipped: 0
Total valid: 30 (hashes stored)
Total invalid: 0
Total indeterminate: 0
Total error: 0
Total hash error: 0

ERROR: File count discrepancy

Valid DNGs: 30

Total time: 00:05:19
Time per image: 8.39 seconds

I definitely don't want to break your concentration on IIP, so do please keep working on that.  I'm just recording recommendations and issues so you can look back at these when you have the time.

Dan
« Last Edit: November 02, 2007, 05:00:01 PM by danaltick » Logged

WindowsXP, ImageIngester Pro, RapidFixer, IVMP 3, ACR4, Photoshop CS4, Controlled Keyword Catalog, Canon EOS50D
Marc Rochkind
Hero Member
*****
Posts: 1136


View Profile WWW
« Reply #12 on: November 02, 2007, 05:15:36 PM »

Dan--

Thanks for all this detail! I will take a look and get back to you. I'm definitely supporting IV fully... just not adding new features right now. Should have made that more clear.

--Marc
Logged

peterkrogh
Administrator
Hero Member
*****
Posts: 5682


View Profile
« Reply #13 on: November 03, 2007, 04:20:56 AM »

Dan,
I think the idea behind using IV really has most to do with validating backup files.  Primary versions of files are going to change periodically for many people, so there is just no way to have the program provide the same level of total certainty that it can with backup files.

I'd suggest that the highest value archive structure would look like this:

• The Primary version of the files gets partially validated by running 1. iView or other catalog software to check for completeness, 2, disk utilities  to determine the general drive/volume health, and 3 perhaps an ImageVerifier check for file structure integrity.  Since Primary versions of the files are going to change (metadata added, for instance) I think it's probably a waste of time to run the hash check.)

• The backup files should be put away (on 2 separate medias) and the hash should be run on these. Unless there' a *really good* reason to update the files, I would suggest taht they *never* get changed.  If that's the case, then you can run an ImageVerifier hash on the files.  By running the files through the hash periodically (not sure what the right interval is yet), you can have absolute certainty that the files have integrity.

If the hash test fails, it's time to make new copies immediately, and figure out what's going on.

• Part of what makes this work is is the use of Lightroom to readjust all image files after initial archiving.  Since the Lightroom database records all changes to the Parametric Image Edits (PIE), you can protect this work from loss by periodically saving out a copy of the Lightroom lrcat file.

Peter
Logged
danaltick
Hero Member
*****
Posts: 1616


View Profile WWW
« Reply #14 on: November 03, 2007, 06:02:54 AM »

Peter,

Makes sense, but it does seem that it would be beneficial if IV could automate the structure check and hash creation based on a file's modification date.  If IV could do this, then wouldn't it be just as reliable to run the hash check on the primary as it would on the backups?  IV should be able to do this completely hands-off from the user; possibly through a selectable option.  Of course there could be more to it than this.  This is the first time I've given image verification any real thought.

Dan
« Last Edit: November 03, 2007, 06:12:55 AM by danaltick » Logged

WindowsXP, ImageIngester Pro, RapidFixer, IVMP 3, ACR4, Photoshop CS4, Controlled Keyword Catalog, Canon EOS50D
Pages: [1] 2 Print 
« previous next »
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.11 | SMF © 2006-2009, Simple Machines LLC Valid XHTML 1.0! Valid CSS!