The DAM Forum
Welcome, Guest. Please login or register.
August 20, 2014, 03:42:49 AM

Login with username, password and session length
Search:     Advanced search
28016 Posts in 5137 Topics by 2911 Members
Latest Member: kbroch
* Home Help Search Login Register
+  The DAM Forum
|-+  DAM Stuff
| |-+  Data Validation
| | |-+  Welcome to the data validation board
« previous next »
Pages: [1] Print
Author Topic: Welcome to the data validation board  (Read 10989 times)
peterkrogh
Administrator
Hero Member
*****
Posts: 5682


View Profile Email
« on: November 01, 2007, 07:46:12 PM »

I've been pretty deep in this subject for the last couple of weeks, and I thought I'd put a board up.

As part of the process of researching for the new book, and researching for the Library of Congress project I'm working on (as well as checking on the health of my own archive), I've been running some data validation tasks.¬  I thought I'd share what I am finding, particularly because some of the results are a bit alarming.

The critical question that I'm trying to answer is, "what is the state of the archive - both primary and backup files?". This covers issues of completeness, as well as data integrity.

At the moment, there are no comprehensive tools to automatically determine the integrity of the image files.¬  There are some workarounds and kludges, but they are time-consuming to run, and not foolproof.¬  I hope that this exercise will yield some workflow principles that can be turned into automatic tools to let the user know when problems are arising, and how to recover from them.

Here are some of the issues:
1. Is the archive complete?¬  IOW, is everything that's supposed to be there, actually there?¬  This one is relatively easy to answer, if you use cataloging software.¬  You can have it run through the archive, and tell you if anything is missing. (easier with iView/Expression than, say, Lightroom at the moment).

To confirm this, I run periodic "Find Missing Items" checks on all primary and backup drives with iView MediaPro.¬  I will also make sure that the contents of all buckets are, in fact included in the proper catalogs.¬  I have found a few errors.

2. Is the drive volume structure sound?¬  Basically, this is asking if the "table of contents" of the drive is accurate and without defect.¬  There are utilities that can help with this.¬  On Mac (which is what my server runs) I can use Disk Warrior to determine the condition of the volume structure.

In general, my results have shown the volume structure to be sound.¬  This can be a bit of false reassurance, as drives that check out fine may still have bad problems.

3. Is the Media sound?¬  This one gets a bit trickier, since media can degrade over time.¬  You can run disk utilities to check that the media (the Hard Drive or the DVD or CD) is still readable.¬  On Mac, I'm using TechTool Pro to do this confirmation.

Again, this does not always flag a problem with the drive. I'm looking to get to the bottom of this one.

4. Are the files exactly as you left them?¬  This is a larger question that actually combines all of the previous issues into one. It's particularly important for backup media, since that's the stuff that you really want to remain stable, complete, and with full integrity.

I've been using ImageVerifier as a tool for this, and it has found a number of unexpected errors recently.¬  In once case, it caught an image that did not copy correctly to the backup drive, and in two other cases, it seems to have flagged¬  drives that are starting to fail.

In fact, I've come to the conclusion that the only files you can verify with absolute certainty are backup files. (And all that you can really verify with absolute certainty is that they have not been altered since you first hashed them - more on that in an ImageVerifier thread.)

I'm not sure where this discussion will lead, but here are a few thoughts:
1. I'd like to hear about anybody's data validation tools that seem to be working.
2. I'll post any relevant information I gather here, as I work through the process of data validation.
3. I'd like to hear results here of other people's data validation results.¬  Try running ImageVerifier (it's free - www.imageingester.com) on your primary and backup drives and see what you find.

Anyway, that's all for now.
Peter
Logged
danaltick
Hero Member
*****
Posts: 1616

evaa-xdtb@spamex.com danaltick
View Profile WWW Email
« Reply #1 on: November 11, 2007, 05:50:56 PM »

Peter,

I've been using "chkdsk" under Windows.  It appears to do a good job.  I would suspect it's still available under Vista since the file system is still NTFS.  In case you haven't used it, here is some info on it  http://en.wikipedia.org/wiki/CHKDSK.

Dan
Logged

WindowsXP, ImageIngester Pro, RapidFixer, IVMP 3, ACR4, Photoshop CS4, Controlled Keyword Catalog, Canon EOS50D
danaltick
Hero Member
*****
Posts: 1616

evaa-xdtb@spamex.com danaltick
View Profile WWW Email
« Reply #2 on: November 12, 2007, 06:58:18 PM »

Peter,

I spent quite a bit of time at this site www.grc.com a few years ago while studying PC security.  Steve Gibson has been around for a while and has developed some pretty impressive utilities including this one http://www.grc.com/sr/spinrite.htm.  I would recommend downloading the two 10-minute video clips and taking a look.

Dan
Logged

WindowsXP, ImageIngester Pro, RapidFixer, IVMP 3, ACR4, Photoshop CS4, Controlled Keyword Catalog, Canon EOS50D
danaltick
Hero Member
*****
Posts: 1616

evaa-xdtb@spamex.com danaltick
View Profile WWW Email
« Reply #3 on: December 17, 2007, 06:45:24 PM »

Peter,

I've come across another utility that John Harrington mentions in his book to compare backup files to primary files after the copy http://www.zizasoft.com/products/zsCompare/index.shtml.¬  However, based on your conclusion:
Quote
"In fact, I've come to the conclusion that the only files you can verify with absolute certainty are backup files. (And all that you can really verify with absolute certainty is that they have not been altered since you first hashed them"
.
Do you feel there is any real benefit to file comparison like this immediately after the backup copy is made?  I would assume if you are copying the files, the O/S or TCP/IP would ensure the integrity of the copy.

Dan
« Last Edit: December 17, 2007, 06:49:42 PM by danaltick » Logged

WindowsXP, ImageIngester Pro, RapidFixer, IVMP 3, ACR4, Photoshop CS4, Controlled Keyword Catalog, Canon EOS50D
peterkrogh
Administrator
Hero Member
*****
Posts: 5682


View Profile Email
« Reply #4 on: December 17, 2007, 11:23:15 PM »

Dan,
I'd say that in general, it's probably overkill to do this.
If there is any doubt about the integrity of the copy process, or if the files are particularly valuable, it might make sense to do.
Ultimately, this should be part of the asset management process, without an additional application.
Peter
Logged
Marc Rochkind
Hero Member
*****
Posts: 1136


View Profile WWW Email
« Reply #5 on: December 21, 2007, 12:02:07 AM »

Peter--

I've just had an experience that indicates that errors in copying can go undetected. See my new topic in this section posted today.

--Marc
Logged

matthewjheaney
Jr. Member
**
Posts: 52

674071051 matthewjheaney@msn.com matthewjheaney matthewjheaney
View Profile Email
« Reply #6 on: March 28, 2008, 11:30:26 AM »

3. Is the Media sound?  This one gets a bit trickier, since media can degrade over time.  You can run disk utilities to check that the media (the Hard Drive or the DVD or CD) is still readable.  On Mac, I'm using TechTool Pro to do this confirmation.

I've just started using the free Microsoft utility fciv.exe to make an xml database of md5 and sha1 keys for the images I burn to a CD or DVD.  It has a verify mode by which you can compare the file (on a DVD, say) to the md5 key you made when you burned the DVD, allowing you detect any problems with the optical media.

The Knowledge Base article 841290 has more info about "File Checksum Integrity Verifier" utility:

http://support.microsoft.com/default.aspx?scid=kb;en-us;841290

I've been loosely following the workflow techniques you describe in your DAM book, and also the workflow techniques enunciated by Kevin Ames.  For example, I burn my raw .NEFs (and .JPGs made in-camera) to DVD, and then make DNGs from the NEFs on the DVD, rather than making DNGs from the raw files on my hard drive.  (I use ImgBurn for burning DVDs, as it also includes a verify mode, so this process for making DNGs provides an additional level of redundancy). 

http://www.iview-multimedia.com/showcase/ames/index.php
http://www.lexar.com/dp/tips_lessons/kames_protect.html

I embed the NEF files in the DNG during the conversion, and make an md5 database for the DNGs, and then burn the DNG files to DVD too.  So even if a DVD becomes corrupt (either the DVD of NEFs, or the DVD of DNGs), I can still get back to the original NEF.

One thing I haven't tried is extracting the NEF from the DNG, and then comparing the md5 key of the original NEF to the extracted NEF, to see if they match.  I'll post again once I do that...

-Matt
Logged
matthewjheaney
Jr. Member
**
Posts: 52

674071051 matthewjheaney@msn.com matthewjheaney matthewjheaney
View Profile Email
« Reply #7 on: March 29, 2008, 05:44:00 PM »

One thing I haven't tried is extracting the NEF from the DNG, and then comparing the md5 key of the original NEF to the extracted NEF, to see if they match.  I'll post again once I do that...

fciv.exe is telling me that checksums of the NEFs extracted from the DNGs match the hash keys generated earlier from the original NEFs, so it looks like DNG Converter really is making a binary copy of the original.

fciv.exe does complain (with a warning) if it can't find files named in the original database.  This happened to me because I originally told fciv.exe to make a database comprising hash keys for both JPGs and NEFs, and the directory I was checking only had NEFs (because they were extracted from DNGs).  I was able to work around that by by using XmlList (a Microsoft library for C++ developers) to extract the hash keys for only the NEF files named in the original database, and had fciv.exe use that database instead.

In the future, I'll probably just create a separate database for each type of file.  fciv.exe lets you do that with the "-type" switch.

Regards,
Matt


Logged
matthewjheaney
Jr. Member
**
Posts: 52

674071051 matthewjheaney@msn.com matthewjheaney matthewjheaney
View Profile Email
« Reply #8 on: April 02, 2008, 09:35:24 PM »

1. I'd like to hear about anybody's data validation tools that seem to be working.

Well I did some vbscript hacking tonight and was able to use xMedia custom fields for verification purposes.  What I did was to create a custom field to store the MD5 hash of a file.  If you select a file in the catalog, and then run the script, it calls fciv.exe to generate the MD5 value, captures the output, and then writes it into the catalog.  (For now the script just does a single file, but this was just a proof of concept anyway.)

I then wrote another script to call fciv.exe to generate an MD5 hash, and then display whether the re-generated value matches what's in the catalog.

So that would be one way to verify the integrity of your files.  I checked that it's working by going into bridge and changing the rating; the verification then fails because the file has been changed.

I mentioned in a previous post that fciv.exe will store the MD5 values for the files in a directory in an XML file.  My next plan is to extract that MD5 value for a file from that XML database, and use that as the catalog value, or to compare to the MD5 value already in the catalog.  I'll post when I get that working.  The scripts are listed below.

Cheers,
Matt


'MJH_Set_MD5_From_File.vbs
Option Explicit

Const kTitle = "MJH: Set MD5 From File (via FCIV.EXE)"

Main

Sub Main()
   Dim app, cat, sel, item, ff, f, re, mm, m
   Dim WshShell, execObj, line, name, StdOut

   Set app = CreateObject("ExpressionMedia.Application")

   If app.Catalogs.count <= 0 Then
      MsgBox "No catalog; please start Expression Media.", vbCritical, kTitle
      Exit Sub
   End If

   Set cat = app.ActiveCatalog
   Set sel = cat.Selection

   If sel.Count <= 0 Then
      MsgBox "No items selected.", vbCritical, kTitle
      Exit Sub
   End If

   If sel.Count > 1 Then
      MsgBox "Only 1 item may be selected", vbCritical, kTitle
      Exit Sub
   End If
   
   Set item = sel.Item(1)  'yes, this is a 1-based index
   Set ff = item.CustomFields

   If ff.Count <= 0 Then
      MsgBox "No custom fields defined", vbOKOnly, kTitle
      Exit Sub
   End If

   '2008/04/02
   'Here is where we should check the name, but xMedia 1.0 SP1 has a
   'bug such that custom field names are changed (it appends a junk character
   'to the name) when the catalog is saved and then re-opened.  A work-around
   'is to use the index of the custom field instead of the name.

   'MD5 custom field
   'TODO when xMedia 2.0 is released 2008 Q2
   'f = ff.Item("MD5")
   'workaround:
   Set f = ff.Item(CInt(1))  'CustomFieldIndexOrName
   
   Set WshShell = CreateObject("WScript.Shell")

   'TODO: the shell for this app is briefly visible while the command
   'executes.  See if there are some vbs hacks to prevent the window from
   'being visible.  (I think WshShell.Run allows you to do that, but we need
   'to use Exec because only that function allows you to capture stdout.)
   Set execObj = WshShell.Exec("fciv -wp """ & item.path & """")

   Set StdOut = execObj.StdOut

   'This is written with a certain amount of paranoia, but that's appropriate
   'here, since we're generating a hash value that will be used to verify
   'the integrity of this file, now and forever.

   line = StdOut.ReadLine()

   If StrComp(line, "//", vbTextCompare) Then
      MsgBox "bad FCIV result (1st line)", vbCritical, kTitle
      Exit Sub
   End If

   line = StdOut.ReadLine()

   If StrComp(line, "// File Checksum Integrity Verifier version 2.05.", vbTextCompare) Then
      MsgBox "bad FCIV result (2nd line)", vbCritical, kTitle
      Exit Sub
   End If

   line = StdOut.ReadLine()

   If StrComp(line, "//", vbTextCompare) Then
      MsgBox "bad FCIV result (3rd line)", vbCritical, kTitle
      Exit Sub
   End If

   line = StdOut.ReadLine()

   If not StdOut.AtEndOfStream Then
      MsgBox "bad FCIV result (end-of-stream)", vbCritical, kTitle
      Exit Sub
   End If

   '2008/04/02
   'xMedia 1.0 SP1 also has a bug (I think caused by QuickTime) such
   'that Item.Name returns the DOS 8.3 name, so as a work-around we
   'use the full path and strip off the leading path-part
   name = NamePart(item.path) 

   Set re = New RegExp
   re.Pattern = "^(\S+) (\S+)$"  'parens turn on SubMatches

   Set mm = re.Execute(line)
   Set m = mm(0)

   'TODO: can we say mm.Item(1) instead?
   If StrComp(LCase(m.SubMatches(1)), LCase(name), vbTextCompare) Then
      MsgBox "bad FCIV result (4th line - no submatch)", vbCritical, kTitle
      Exit Sub
   End If

   f.Value = m.SubMatches(0)
End Sub


function NamePart(Path)
    Dim intFileNamePos

    intFileNamePos = InStrRev(Path, "\", -1, vbTextCompare)
    NamePart = mid(Path, intFileNamePos + 1)
End function



'MJH_Check_MD5.vbs
Option Explicit

Const kTitle = "MJH: Check MD5 Of File (via FCIV.EXE)"

Main

Sub Main()
   Dim app, cat, sel, item, ff, f, re, mm, m
   Dim WshShell, execObj, line, name, StdOut

   Set app = CreateObject("ExpressionMedia.Application")

   If app.Catalogs.count <= 0 Then
      MsgBox "No catalog; please start Expression Media.", vbCritical, kTitle
      Exit Sub
   End If

   Set cat = app.ActiveCatalog
   Set sel = cat.Selection

   If sel.Count <= 0 Then
      MsgBox "No items selected.", vbCritical, kTitle
      Exit Sub
   End If

   If sel.Count > 1 Then
      MsgBox "Only 1 item may be selected", vbCritical, kTitle
      Exit Sub
   End If
   
   Set item = sel.Item(1)  'yes, this is a 1-based index
   Set ff = item.CustomFields

   If ff.Count <= 0 Then
      MsgBox "No custom fields defined", vbOKOnly, kTitle
      Exit Sub
   End If

   '2008/04/02
   'Here is where we should check the name, but xMedia 1.0 SP1 has a
   'bug such that custom field names are changed (it appends a junk character
   'to the name) when the catalog is saved and then re-opened.  A work-around
   'is to use the index of the custom field instead of the name.

   'MD5 custom field
   'TODO when xMedia 2.0 is released 2008 Q2
   'f = ff.Item("MD5")
   'workaround:
   Set f = ff.Item(CInt(1))  'MD5
   
   Set WshShell = CreateObject("WScript.Shell")
   Set execObj = WshShell.Exec("fciv -wp """ & item.path & """")
   Set StdOut = execObj.StdOut

   line = StdOut.ReadLine()

   If StrComp(line, "//", vbTextCompare) Then
      MsgBox "bad FCIV result (1st line)", vbCritical, kTitle
      Exit Sub
   End If

   line = StdOut.ReadLine()

   If StrComp(line, "// File Checksum Integrity Verifier version 2.05.", vbTextCompare) Then
      MsgBox "bad FCIV result (2nd line)", vbCritical, kTitle
      Exit Sub
   End If

   line = StdOut.ReadLine()

   If StrComp(line, "//", vbTextCompare) Then
      MsgBox "bad FCIV result (3rd line)", vbCritical, kTitle
      Exit Sub
   End If

   line = StdOut.ReadLine()

   If not StdOut.AtEndOfStream Then
      MsgBox "bad FCIV result (end-of-stream)", vbCritical, kTitle
      Exit Sub
   End If

   '2008/04/02
   'xMedia 1.0 SP1 also has a bug (I think caused by QuickTime) such
   'that Item.Name returns the DOS 8.3 name, so as a work-around we
   'use the full path and strip off the leading path-part
   name = NamePart(item.path) 

   Set re = New RegExp
   re.Pattern = "^(\S+) (\S+)$"  'parens turn on SubMatches
   're.IgnoreCase = True
   're.Global = False

   Set mm = re.Execute(line)
   Set m = mm(0)

   'TODO: can we say mm.Item(1) instead?
   If StrComp(LCase(m.SubMatches(1)), LCase(name), vbTextCompare) Then
      MsgBox "bad FCIV result (4th line - no submatch)", vbCritical, kTitle
      Exit Sub
   End If

   If f.Value = m.SubMatches(0) Then
      MsgBox "MD5 of file matches catalog", vbOKOnly, kTitle
   Else
      MsgBox "MD5 of file does not match catalog", vbCritical, kTitle
   End If
End Sub


function NamePart(Path)
    Dim intFileNamePos

    intFileNamePos = InStrRev(Path, "\", -1, vbTextCompare)
    NamePart = mid(Path, intFileNamePos + 1)
End function


Logged
Dennis O'Clair
Jr. Member
**
Posts: 61


View Profile WWW Email
« Reply #9 on: June 01, 2008, 05:01:19 AM »

Peter,

I haven't been here in a while so when I checked the message board I found this new subject heading, Data Validation, to be just what I was looking for.

Here is my current situation.  In my active archive, I've recently come across a few psd files that have disappeared.  The file name is there, but the file size is 0 bytes and of course you can't access the file.  I'm assuming that somehow the file was corrupted or there is a bad sector on the drive which is rendering the file useless.  To make matters worse, when backing up the drive the bad files are copied to the back up, overwriting existing "good" files.  I use Retrospect and apparently, the software thinks that the corrupted source file has changed and therefore needs to be updated on the back up copy.

Is there someway to prevent this inadvertant copying of corrupted files?  Is there a data validation check that can be run before backing up?

Is it possible to recover these missing files using Image Rescuse or other software without losing other data on the drive?

Dennis O'Clair 
Logged
Dennis O'Clair
Jr. Member
**
Posts: 61


View Profile WWW Email
« Reply #10 on: June 01, 2008, 05:08:22 AM »

Peter,

Started a new thread on the previous post.  Sorry about the cross posting.

Dennis O'Clair
Logged
Pages: [1] Print 
« previous next »
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.11 | SMF © 2006-2009, Simple Machines LLC Valid XHTML 1.0! Valid CSS!