The DAM Forum
Welcome, Guest. Please login or register.
June 19, 2013, 07:52:01 AM

Login with username, password and session length
Search:     Advanced search
Jan 9, 2012
John Beardsworth's new Lightroom site
Lightroom Solutions
27968 Posts in 5116 Topics by 2914 Members
Latest Member: imthedamstar
* Home Help Search Login Register
+  The DAM Forum
|-+  DAM Stuff
| |-+  Keywords and Controlled Vocabulary
| | |-+  Keywording Nightmare - How do/should I manage thousands of keywords.
« previous next »
Pages: [1] 2 Print
Author Topic: Keywording Nightmare - How do/should I manage thousands of keywords.  (Read 5510 times)
dcthompson
Newbie
*
Posts: 10


View Profile
« on: September 23, 2007, 09:50:09 AM »

Hi All,

First time poster so, I believe, an introduction is in order.  I am an advanced amateur, former semi-pro (I haven't pursued any business related photography since '95).  From the late '70s until '95 I did weddings, portraits, and some magazine one-offs as a side business (I am a software engineer in real life).  My primary photographic interests are wildlife, nature, landscape, and railroads.  It is the last one on this list that is giving me fits.  I have been looking through all of the forums/messages that I can find and I don't see anyone that has addressed a situation similar to mine (I know that others must have dealt with similar problems but I can't find 'em).  The workflow I am testing is based on Peter's fine book, suggestions found throughout the forum, IIP (soon), LR, DNG converter, and iVMP.  Photoshop will be used for creating masters when I find the time.

In order to handle, prepare stock, etc.  I need to be able to locate railroad photographs by specific content (i.e. railroad name and number - for equipment such as locomotives, etc.).  This requirement will lead to many thousands of keywords since 1. there are a ton of railroad companies that I have photographed, and 2. most railroads number their locomotives with a four digit number since they have thousands of them.  I thought about using name and number separately but that would require a significant amount of manual selection once the search was completed.  The prefered method would be to have keywords that combine the name and number.  An example is probably easier to follow than my ramblings so...

Photographs with BNSF 5404 and UP 3416 should not show up in a search result if the search was for BNSF 3416.  If the keywords were split (e.g. "BNSF", "UP", "5404", "3416") then this is what would happen if the search was for "BNSF and 3416".  Further complicating matters, if the photos being searched for contained a "UP" keyword because they had BNSF 3416 and UP 9722 in them, I would not be able to search for "BNSF" and 3416 and not UP" since the desired photos would be missed.  If I were to use the split keyword approach I would have (# of railroads + ~10,000 keywords).  If the keywords that would get me to what I want were used (e.g. "BNSF 3416", "BNSF 5404", "UP 3416", "UP 9722")  I would end up with  (# railroads x  something less that 10,000 keywords).  Neither of these solutions puts a big smile on my face.  And, yes, I do have enough photographs of railroads to make this doomsday scenario happen, pretty quickly too (A recent "quick" trip from Plano, TX to San Francisco, CA and back resulted in ~1600 photos of which ~1000 were railroad related and had, on average, four locomotives per picture).  Factor in camera scans of a portion of well over 100,000 images taken since the early '70s and I am headed for a, you guessed it, train wreck ;-)

I am moving on with a plan to add the keywords in question after initial ingestion since 1.  I must get this mess in order, and 2. my daughter has recently begun to show an increased interest in photography and we have discussed kick-starting my old business.

Also, I would like the solution to be compatible across my existing workflow so, I think, hierarchical keywording is out since I use iVMP.  Any opinions?

Thanks,

David C. Thompson
Logged
peterkrogh
Administrator
Hero Member
*****
Posts: 5682


View Profile
« Reply #1 on: September 23, 2007, 12:53:13 PM »

David,
First, you have to determine if the work to code each photo will ever pay you back.  Does it really make sense to essentially pre-search the entire archive?  Are you actually going to get a request for BNSF 5404 that would not be interested in BNSF 5401?

Maybe it would be a god first step to just tag the images by railroad company, and then assign the number as any particular project gives you a reason to.

If this is the case, then I'd suggest using the company as a keyword, and the number as a separate keyword.  Then you could search by keywords contains BNSF and Keywords Contains 5404.

I would also suggest that by using Ratings, you will be able to substantially narrow the number of images you need to annotate.  Do all 100,000 images offer an equal prospect of sale? Are you really going to put that many on line?  I'd start with the best and most salable images, and work out from there.

Peter
Logged
dcthompson
Newbie
*
Posts: 10


View Profile
« Reply #2 on: September 23, 2007, 06:02:02 PM »

Peter,

Good points.  I have thought about the return on investment quite a bit.  I am sure that, from a financial perspective, adding the keywords up front is not wise.  The problem is that I also look at these from a personal/hobby perspective and most of them are directly related to trip logs that I, and others with similar interests, enjoy reading.  And, yes, it is often the case where a specific engine has become news worthy for various reasons (personal and commercial).  A current example would be my personal interest in a specific locomotive that I photographed last week.  I googled for it in order to find the lineage and was surprised to find that it was a locomotive that I saw several owners ago (and about 25 years ago) in California (last week it was in Texas).  Unfortunately, as is the case with many things, the "virtual metadata" has faded to the point that I have only a general idea (and 18 years worth of images while I lived in California) as to where to find photographs that I may have taken of it in the past.  While this case is not one that has any immediate commercial impact, it does happen.  I recently saw a current photograph of a locomotive (new owner, paint, location) that I had had published in two railroad publications in 1983.  I do think the rating idea has merit because there is a definite range of quality/interest.  I am just concerned that, if I am not more specific than a railroad name it will be next to impossible to locate something specific in the future.  I would guess that I have several thousand BNSF related images, the thought of going through all of them to find a specific one is somewhat scary.

On another note.  Thank you for helping me with the forum registration.  It is good to finally be here!

David
Logged
johnbeardy
Administrator
Hero Member
*****
Posts: 1813


View Profile WWW
« Reply #3 on: September 24, 2007, 12:41:22 AM »

David

While you can determine the project's worth in terms of monetary value or simply wanting to do it, Peter also pointed to something that is really worth emphasizing (I was half way through a post when I saw he'd got there first). That is that in iView you can do an "and" query for the railroad and the number. So keywords contains "BNSF" AND keywords contains "3416". This means an item's keywords would have to contain both individual keywords.

I decided long ago that a long keywords list goes with the DAM territory and it's not something to lose sleep over. But it's funny how often people offer the excuse of not wanting a long keyword list when they try to get you to help them do something stupid (eg bug you to devote a valuable chunk of your life to writing a script for them that would be so edifying for them to piece together themselves from others you've written).

You may consider efficient ways to enrich your data. For example, if your filenames or folder paths (or other fields) include information such as "D:/Railroads/BNSF" that describes the images, then you can use scripts to populate the keywords. Another technique is to start by building up catalog sets - I often group up similar images in subsets, and only then apply keywords, selecting each subset and applying common words. You see new patterns and groupings in the pictures, build a hierarchy which may be valuable in itself, and it's a whole lot more efficient and consistent to apply keywords to groups. Also look at using metadata templates (eg one for the BNSF railroad) and the Vocabulary Editor (eg comma separated entries would allow you to enter "BNSF" and the program would add its full name and other synonyms).

John

Logged
dcthompson
Newbie
*
Posts: 10


View Profile
« Reply #4 on: September 24, 2007, 11:53:39 PM »

John,

Thank you for the additional suggestions.  I was aware of, and planning to use, the synonyms (a great time-saver) for the railroad names (i.e. BNSF, Burlington Northern Santa Fe, etc.).  The concern was more with the individual pieces of equipment, primarily the locomotives, in the photographs.  By using the separate keywords, as suggested by both you and Peter, I would have many false positives (see example in the original post) but...  I think that the false positives could be dealt with rather easily when I needed to narrow the search so...  I think (always leaving the door open) that I have settled on a hybrid approach that combines your thoughts and Peter's thoughts yet still addresses my paranoia.  The steps would be as follows :

1. Apply railroad names (with synonyms) to the photographs, as applicable.
2. Add individual numbers to those photographs that deserve (very subjective) them.
3. Utilize ratings to select photographs that are of high quality or non-pedestrian content and add more specific keywords so that they can be located without sifting through false-positives.
4. Make use of catalog sets to create sets of interest (another way to get back to selects without the false positives).
5. Wonder where all my free time has gone ;-)

Any additional thoughts on 1-4 above?

As to the directory structure, no joy there since I hadn't gotten things organized well enough that the structure would have been of any use (backed up yes, organized, not so much).  I am implementing a bucket strategy right off the bat since I feel that it has the most scalability.

Though I am a new member of the forum, I have spent many hours reading posts and have noticed all the scripting requests that are fielded by the more experienced among you.  As a software engineer, I firmly believe that the best way to succeed is to look at examples, ask questions, read manuals, and create the scripts on your own.  That said, I find that I am often a focal point for those that are scared of scripts/software or just plain lazy, so I certainly feel your pain.

I have known for a long time (I started developing a SQL solution for my data years ago) that the keyword list was going to be huge.  My concern lies mostly, if not completely, in my lack of experience with IVMP, LR, etc. and what types of problems I might run into (i.e. sluggish software, corruption, database bloat, etc.).  I don't want to get deeply into the keywording and then find that I have hit some hidden limitation.  I know that the documents say that keywords and catalog sets have no limits but... documents and software implementations have a way of diverging when least expected.  Have you or Peter, or anyone else for that matter, ever stressed IVMP or LR to determine what a practical limit is?

Thanks for all the input,

David


Logged
johnbeardy
Administrator
Hero Member
*****
Posts: 1813


View Profile WWW
« Reply #5 on: September 25, 2007, 12:44:26 AM »

David

I don't think you will get false positives - that's why I emphasized the point. Not with the Find dialog anyway. Have you tested this? In any case it would be possible to write a script to test for the presence of more than one word in the keywords string (though that would involve looping through large datasets).

Of your 1-4, I don't see any problem, though I'd tend to put 4 first for the reasons I mentioned earlier. It may not be important if you only shot a subject once, and all the images are together, but more helpful if those pictures are shot over a period and not all together in one bucket.

As for the limits, I find it hard to be sure what the practical limits are, and it's hard to pin anyone down. 60000/1.5Gb seems fair for iView, but LR is a lot more variable. Its database should be capable of very many records, but the water is muddied by all the file access that's going on, both with regards to the originals and the previews, and you hit a variety of hardware and networking constraints and inconsistencies. My biggest test catalogue has contained 80000 raw and dng files, but that was only for a test and while performance was OK, I suspect it would have been better on a multi core PC. Now when I get round to building that 256 core Niagara Falls cooled monster....

John
Logged
dcthompson
Newbie
*
Posts: 10


View Profile
« Reply #6 on: September 25, 2007, 04:29:23 PM »

John,

You are absolutely correct, I should be creating/updating the catalogs earlier in the process.  It was late (or early depending upon how you look at it) and my brain tends to fry about 2am.  On to the false positives...

I have tested the search and do get false positives but it is entirely possible that I have missed something.  I have boiled down my original example to contents/keywords/results below.  If you see a way to get the correct results with separate keywords, please, enlighten me.

Image 1 contains locomotives BNSF 5404 and UP 3416.
Image 2 contains locomotives BNSF 3416 and UP 9722.

The objective is to find Image 2 with BNSF 3416 in it.

Case 1 - Separate Keywords - Image 1 ( "BNSF", "5404", "UP", "3416" ), Image 2 ( "BNSF", "3416", "UP", "9722" )
A search for "contains BNSF AND 3416" will return Image 1 and Image 2 - Image 1 is a false positive.

Case 2 - Combined Keywords - Image 1 ( "BNSF 5404", "UP 3416" ), Image 2 ( "BNSF 3416", "UP 9722" )
A search for "an exact match of BNSF 3416" will return Image 2 and ignores Image 1 as it should.

So...  as far as I can tell there is no way to get away from some ( likely large ) number of false positives unless combined keywords and an "exact match" search are performed.  Clearly, individual catalogs for each railroad with child catalogs for each locomotive could be used but I don't see that it would be any easier to maintain.  Further, I was hoping for a solution, such as flat keywords, that would be easily portable as DAM software evolves.

Any final thoughts before I dive in?


Thanks,

David
Logged
peterkrogh
Administrator
Hero Member
*****
Posts: 5682


View Profile
« Reply #7 on: September 29, 2007, 04:41:52 AM »

David,
Dive in.
Peter
Logged
mikeseb
Jr. Member
**
Posts: 91


View Profile WWW
« Reply #8 on: September 29, 2007, 06:30:38 AM »

I suggest this with trepidation, given my recent aggravation with Expression Media, the tedious recounting of which I'll spare you here; but the railroad-and-engine-number dilemma seems ideal for a Custom Fields solution in iVMP.

You'd make a Custom Field for Railroad Name, eg BNSF, CSX, or whatever; and another CF for Engine Number. You can limit these fields, if desired, to a defined list of terms, which is nice for preventing data-entry mistakes. The upside is that you can then sort your entire catalog quickly and zoom in on just the engine you seek. The downside is the questionable portability of Custom Field data from iVMP/ExMedia to whatever succeeds it.

I've reread the thread, but perhaps I've missed something; could you not construct composite keywords like BNSF_5412, or simply do a Boolean search on the terms as John suggested? (Assuming in constructing such keywords you'd not be undoing massive amounts of "sunk" work!) It seems to me that from a flexibility and portability standpoint, separate keywords makes the most sense, eg "BNSF" and "5412".

Mike S.
Logged

<a href="http://www.michaelsebastian.com">Michael Sebastian</a>
danaltick
Hero Member
*****
Posts: 1616


View Profile WWW
« Reply #9 on: September 29, 2007, 06:57:57 AM »

David,

The way I would address that problem is by adding more identifying keywords that you can "AND" in.  My father is a model railroader.  I used to spend countless hours building scale models for his railroad.  I love trains too.  I'm sure there are many more ways to differentiate locomotives other than just a 4-letter acronym and number... what about the wheel count, color, style, diesel, steam, etc.

Dan
Logged

WindowsXP, ImageIngester Pro, RapidFixer, IVMP 3, ACR4, Photoshop CS4, Controlled Keyword Catalog, Canon EOS50D
dcthompson
Newbie
*
Posts: 10


View Profile
« Reply #10 on: September 30, 2007, 10:51:14 AM »

All,

First of all, I thank all of you for your suggestions.  Some will be used, some won't, but I am always open to other opinions - what's the point of a forum if you're not?

Dan,

Glad to know I am not the only person out here that likes railroads.  I was planning on adding other notations beyond railroad name/number but view that as less important for the first pass.  I think that by having the name/numbers and date on the images it will be easier to create catalogs and apply other information in bulk.  As I am sure you know, similar equipment is usually grouped together by number within a railroad and date range ( date range since the numbers are often reused over the years ).  However, your point that having this information would help narrow the search if the name and number were separate is one that hadn't clicked in my DAM thought process.  All the more reason to go with a separate name and number. Thanks :-)

Mike,

I had considered custom fields a few work flow iterations ago but decided against them due to the portability issues you sited.  The composite keyword idea is what I was originally considering, and may still use to some extent.  The search based on logically ANDing the search terms works, but leaves me with "false positives" which may be more trouble than they are worth ( see my previous post on September 25th for an example ).  I may do a small test in an attempt to quantify the percentage of "false positives" that I may run into but... can you spell P-A-I-N-F-U-L?  As to the flexibility of composite versus separate keywords, I think that I would be o.k. with composites since I can do a "contains" type of search to extract them if I construct the composite keyword with a space ( i.e. "BNSF 3416" ).  I have tested both of these keywording schemes and have been able to verify that the searches do produce the results I have put forth.  The problem is that I can't see the future and don't know how bad the "false positive" problem will be.  Thanks for the input :-)

Peter,

Donning scuba gear, preparing to dive ;-)  Thanks for the push - I think I have been dealing with a severe case of analysis paralysis regarding the keywording ( and the photographs just keep accumulating ).


Again, thanks to all, see you in the next thread ( when I come up for air ).

David

Logged
johnbeardy
Administrator
Hero Member
*****
Posts: 1813


View Profile WWW
« Reply #11 on: September 30, 2007, 11:33:57 AM »

As a point of detail regarding portability, data in custom fields is very easy and (unlike catalog sets) reliable to script over to keywords. I'd also point out that custom fields are single value, so there's less risk of assigning the same picture to two different railroad companies, which can happen with sets and keywords.

John
Logged
mikeseb
Jr. Member
**
Posts: 91


View Profile WWW
« Reply #12 on: October 01, 2007, 07:53:47 AM »

John,

Do you know of any existing Mac scripts to get Custom Fields data (or People, or Catalog Sets--though I have few enough of the latter that I could do it manually) into keywords? I've not been able to find any; and I'm at the toe end of the learning curve on scripting! Smiley

Thanks,
Mike
Logged

<a href="http://www.michaelsebastian.com">Michael Sebastian</a>
johnbeardy
Administrator
Hero Member
*****
Posts: 1813


View Profile WWW
« Reply #13 on: October 01, 2007, 07:57:33 AM »

Not off the top of my head, but I don't think people or custom fields would be hard to figure out - they'd be good for learning. Sets might be more difficult but I suspect there is one for that somewhere. Best to check the iView AppleScript forum.

John
Logged
danaltick
Hero Member
*****
Posts: 1616


View Profile WWW
« Reply #14 on: October 01, 2007, 05:58:33 PM »

John,

Thanks for pointing out the singled-value nature of custom fields.  I knew there was a difference, I just couldn't quite put my finger on it.  Now I'll know better when to use them and when not.  I was actually using them a while back for model release names, when one day I took some photos of more than one model.  At that time I moved them over to keywords and didn't give it anymore thought; hence, leaving me open to make the same mistake again.  I seem to do that more and more as I get older, but that's another story ;-).

Dan
« Last Edit: October 01, 2007, 06:05:48 PM by danaltick » Logged

WindowsXP, ImageIngester Pro, RapidFixer, IVMP 3, ACR4, Photoshop CS4, Controlled Keyword Catalog, Canon EOS50D
Pages: [1] 2 Print 
« previous next »
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.11 | SMF © 2006-2009, Simple Machines LLC Valid XHTML 1.0! Valid CSS!