The DAM Forum
Welcome, Guest. Please login or register.
May 24, 2013, 10:27:54 AM

Login with username, password and session length
Search:     Advanced search
Jan 9, 2012
John Beardsworth's new Lightroom site
Lightroom Solutions
27960 Posts in 5113 Topics by 2914 Members
Latest Member: imthedamstar
* Home Help Search Login Register
+  The DAM Forum
|-+  Workflow Discussions
| |-+  High Volume
| | |-+  War on Duplicate Images
« previous next »
Pages: [1] Print
Author Topic: War on Duplicate Images  (Read 8512 times)
Sam Faulkner
Newbie
*
Posts: 1


View Profile
« on: March 03, 2009, 03:54:53 PM »

I am in the process of organising my image archive (maybe 200,000 images) and instituting a completely scalable and future-proof (yeah right!) structure. I need to find a way of identifying lots of duplicate images. Not all the images that are duplicates will be shot digitally some will have been scanned. There is also a problem with some of the creation dates of the images reverting to a default.
I have tried to use Duplicate Image Detector but it has crashed every time I have tried to use it on a large catalogue. Running it on a Mac Pro 2x 3.2 Ghz Quad Core Intel with 32 GB of Ram. If it ran I think it would be the answer.
I have aperture 2 and the demo of Lightroom, but can't find a decent way of finding dupe images using either. Is there a plugin that can do it?
The search needs to compare metadata or checksums or visual similarities not just file names. I think I managed to eliminate the files with the same name using EasyFind.
Any thoughts on how to do this?
Logged
peterkrogh
Administrator
Hero Member
*****
Posts: 5682


View Profile
« Reply #1 on: March 03, 2009, 04:16:21 PM »

Sam,
A couple things:
First, hunting for individual dupes may not be worth that much - consider that storage costs about $.10/GB. Even in triplicate, that is only .30 per gig. to pay yourself $5/hr, you'd need to delete 15 GB. Does that make financial sense to you?

Are you using Expression Media? 

I'd do this.

1. If you can find lots of duplicated structures (like two drives with pretty much the same files on both in the same folder structure) that's a good place to work. Make a catalog of each drive and open them side-by-side. Check dates, filetypes, number of items in the catalog and see if they are the same.  If you can delete here, it makes good sense.

2. For multiple variations of the same file, it's probably better to just collect them than to delete.  Set your View Options to show File Size, Dimensions, Color Space, Created Date, Modified Date and you can pretty quickly sort between multiple variations to see which is the best one.  Use Catalog Sets to narrow from the many to the few. You could also use color labels for this, or maybe even custom fields.

3. Lightroom and Aperture will be of limited help sorting between duplicates.

Peter
Logged
rogerhoward
Full Member
***
Posts: 103


View Profile
« Reply #2 on: March 04, 2009, 09:52:16 AM »

If you've got the potential for duplicate files which have had metadata embedded *since* the duplication, then a checksum based approach won't help - change one bit in the file (including metadata) and the checksum will differ. Likewise, if you're trying to also sort out derivative images which may have found their way back into your catalog, checksums won't help.

Without knowing more about your situation, and how technically-oriented you are (are you comfortable with the command line and scripting?) I can offer these bits of advice:

1. Checksums are great and easy for a first pass
2. If you've got multiple copies of a DNG, perhaps even with different settings/metadata applied, look into the XMP property "RawDataUniqueID" as reported by exiftool - this should remain the same value across multiple edits of the DNG
3. If you have a mix of different file formats, sizes, etc, and need a visual similarity tool, I've found VSDIF (Windows only) to be EXCELLENT - available at http://www.mindgems.com/products/VS-Duplicate-Image-Finder/VSDIF-About.htm
Logged

-----

Roger Howard
WaltSorensen
Newbie
*
Posts: 7


View Profile WWW
« Reply #3 on: August 20, 2009, 10:21:10 AM »

Roger,

Great tool suggestion for Windows users. I think it would be nice to see a visual similarity tool added to lightroom as a plug-in.

when I have an ambitious day I like to attempt to undo the years of bad DAM practices I had where I copied files to new folders for printing, portfolio selections, different file sizes and edits. in addition to the 3 months of shooting in Jpeg and duplicating the files to Tiff when I first went digital.

I'll have to pick up VSDIF, it sounds like a onetime must have DAM cleanup tool
Logged

~Walt
ronandownes
Full Member
***
Posts: 122


View Profile
« Reply #4 on: September 20, 2009, 01:33:02 PM »

I just rename in LR

YYYYMMDDHHMMSS and any file the ends in -something is a duplicate.

There is one caveat. Multiple files taken on the same second. INspection usually shows which it is and if they are vlubale. If in doubht keep. This Worked for me.
Ronan
Logged
rogerhoward
Full Member
***
Posts: 103


View Profile
« Reply #5 on: September 22, 2009, 09:18:51 AM »

Ronan - good practices are important going forward, or in a workflow you can control, but sometimes you have a legacy mess that makes it impossible to identify derivatives based solely on naming/metadata...

Even in a controlled workflow, a good example where these tools can help is if you receive requests for images based solely on a thumbnail someone harvested on Google Images - I get this all the time... I can plug such a request into a visual search index and get an answer in seconds whether I have that image or not in my collection.

- Roger Howard
Logged

-----

Roger Howard
ronandownes
Full Member
***
Posts: 122


View Profile
« Reply #6 on: September 23, 2009, 01:35:04 PM »

Hi Roger,

good points.

For me it was loads of files and folders with copies of DNGs everywhere. Lightroom but them into dated boxes and named them for me and by in large anything with a - in the name was a duplicate. Sure Image Verifier should have been used but even though I own all marcs apps I ddin't realise the importance and luckily got away with it.

I appreciate with metadata kgettingstripped from files on export my approach was only anygood for a local mess. Also like I said i risked deleting many unique fiels shot within one second.

Hey why does PM allow naming to Sub sec   {ssec} and not LR . Maybe IMage INgester does?

Ronan
Logged
Pages: [1] Print 
« previous next »
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.11 | SMF © 2006-2009, Simple Machines LLC Valid XHTML 1.0! Valid CSS!