I’ve been looking forward to the day this can be announced since 2007. In Lightroom 5, there is now a one-click solution to verify an entire collection of DNG files. It’s a really simple idea, with pretty huge ramifications from a data management standpoint. Interestingly, it’s nearly absent from any Adobe marketing materials for LR 5.
Read all about it after the jump.
Near the bottom of Lightroom 5’s Library menu, is an item that lets you validate an entire collection of DNG files with a single click. It’s right below the “Find Missing” command. These two tools, when used together, offer excellent verification workflow.
As I have written before, one of the most vexing problems in asset management is knowing whether the files you stored are still sitting on the drive in an uncorrupted state. It’s possible that they can appear to be fine even though they are suffering from bit rot, or they were damaged in transfer, or have been corrupted by virus or other volume error. It’s a more common problem than many people think.
The way this issue is being handled in big institutions like university media collections is by the use of a checksum. Basically, you run all the bits of the file through an equation, and it produces a unique result. By recomputing the checksum periodically, you can see if any of the bits have changed. Any unwanted change would indicate a problem in the storage or transfer of the file, and that should trigger a restoration of the file from backup, as well as a detective procedure to try and find out where the problem came from.
This process is called “fixity checking” and it is the subject of lots of research and thesis writing due to the difficulty of actually implementing it. It’s always a major topic at the Library of Congress storage conferences I go to.
The biggest issue with fixity checking is that sometimes you change a file on purpose. Let’s say that you have embedded your contact information in a file, and you change address. You might want to update the embedded information. Unfortunately, any change to the file triggers a checksum mismatch. And at that point you make the checksum worthless, because it only tells you that the file has been changed, and can’t distinguish between a good change and a bad one.
I’d been working on this problem a lot after the publication of the first DAM Book, and Marc Rochkind and I came up with an interesting solution. In a DNG file, the important part of the file – the part with the original image data – is never supposed to change. So the checksum could look at that part, and ignore all the other parts, such as the metadata headers. This way you could rename the file, update metadata, even update the embedded previews, and the checksum would still tell you if the most important part of the file remains uncorrupted.
Marc and I took the idea to Thomas Knoll, who wrote it into the draft DNG 1.2 specification within two days. (Thomas may have already been working on it, but the email record indicates it was a new idea). Lightroom and Camera Raw have been using the checksum for years to verify individual files when they are opened into the Develop module or into ACR. But until now there were really no easy ways to check through an entire collection of DNG files to find the bad ones.
Now it’s part of Lightroom 5. With this one simple feature, Adobe gives Lightroom users something that has been a decades-long quest in the digital preservation space, and not all that well implemented even in smart institutions.
One-click verification has enormous implications for long-term preservation of image collections. It was a milestone I was waiting for to simplify (or even scrap) the bucket system and move safely away from optical media backup, if implemented properly. It’s taken six years and six months to be fully implemented, but now it’s here.
I will write more about this as time presents.