All posts by admin

Using Google Cloud Vision for OCR

Editor’s Note: This post combines a couple threads I’ve been writing about. I’ll provide some real-world methods for converting visible text to searchable metadata as discussed in Digitizing Your Photos. In the course of this, I’ll also be fleshing out real-world workflow for Computational Tagging as discussed in The DAM Book 3 Sneak Peeks.

In the book Digitizing Your Photos, I made the case for digitizing textual documents as part of any scanning project. This includes newspaper clippings, invitations, yearbooks and other memorabilia. These items can provide important context for your image archive and the people and events that are pictured.

Ideally, you’ll want to change the visible text into searchable metadata. Instead of typing it out, you can use Optical Character Recognition (OCR) to automate the process. OCR is one of the earliest Machine Learning technologies, and it’s common to find in scanners, fax machines and in PDF software. But there have not been easy ways to automatically convert OCR text to searchable image metadata.

In Digitizing Your Photos,  I show how you can manually run images through Machine Learning services and convert any text in the image into metadata through cut-and-paste. And I promised to post new methods for automating this process as I found them. Here’s the first entry in that series.

The Any Vision Lightroom Plugin
I’ve been testing a Lightroom plugin that automates the process of reading visible text in an image and pasting it into a metadata field. Any Vision from developer John Ellis uses Google’s Cloud Vision service to tag your images for several types of information, including visible text. You can tell Any Vision where you want the text to be written, choosing between one of four fields. as shown below.

Here is part of the Any Vision interface, with only OCR selected. As you can see, you have the ability to target any found text to either the Caption, Headline, Title or Source filed. I have opted to use the Headline field myself, since I don’t use it for anything else. 

Results
Here are my findings, in brief:

  • Text that appears in real-life photos (as opposed to copies of textual documents) might be readable, but the results seem a lot less useful.
  • Google does a very good job reading text on many typewritten or typeset documents. If you have scanned clippings or a scrapbook, yearbook or other typeset documents, standard fonts seem to be translated reasonably well.
  • Google mostly did a poor job of organizing columns of text. It simply read across the columns as though they were one long line of non-sensical text. Microsoft Cognitive Services does a better job, but I’m not aware of an easy way to bring this into image metadata.
  • Handwriting is typically ignored.
  • For some reason, the translate function did not work for me. I was scanning some Danish newspapers and the text was transcribed but not translated. I will test this further.

Examples
(Click on images to see a larger version)

Let’s start with an image that shows why I’m targeting the Headline field rather than the caption field. This image by Paul H. J. Krogh already has a caption, and adding a bunch of junk to it would not be helping anybody.
You can also see that the sign in the background is partially recognized, but lettering in red is not seen and player numbers are ignored even though they are easily readable.

In the example below, from my mother’s Hollins College yearbook, you can see that the text is read straight across, creating a bit of nonsense. However, since the text is searchable, this would still make it easy to find individual names or unbroken phrases in a search of the archive.
You can also see that the handwriting on the page is not picked up at all.

In the next example, you can see that Google was able to see the boxes of text, changes of font and use of underline to hep parse text more  properly. 

And in this last example you can see that Google is having a terrible time with the gothic font in this certificate, only picking out a small fraction of letters properly. 

The Bottom Line
If you have a collection of scanned articles or other scanned textual documents in Lightroom, this is a great way to make visible text searchable. While Google is not the best OCR, thanks to Any Vision, it’s the easiest way I know of to add the text to image metadata automatically.

AnyVision is pretty geeky to install and use, but the developer has laid out very clear instructions for getting it up and running and for signing up for your Google Cloud Vision account. Read all about it here.

Cost
Google’s Cloud Vision is very inexpensive – it’s priced at $.0015/per image (which works out to $1.50 for 1000 images.) Google will currently give you a $300 credit when you create an account, so you can test this very thoroughly before you run up much of a bill.

Watch for another upcoming post where I outline some of the other  uses of Any Vision‘s tagging tools.

 

 

Camera Scanning the Nixon Legacy

I had the pleasure of making a visit to the National Archives and Records Agency early this month, meeting with Steve Greene and Cary McStay who are in charge of scanning the official photos from the Nixon administration. They are using a digital camera to do the scanning, in much the same way as I outline in Digitizing Your Photos.

The film from the White House Photo Office was transferred to NARA so that it would be sure to be preserved. (This was done along with the audio tapes which are still being digitized.)  There were 258,318 images on 14,526 rolls. Most of it is 35mm b&w or C22 negative film.  There is also some 4×5 film.

NARA first used conventional scanners for the project, but it was clear that conventional scanning was going to take too long to accomplish. They began testing camera scans, and became very comfortable that the quality coming out of a D810 was high enough for the vast number of uses of the archive.

Shown below are some of the photos that were scanned as part of this project. Eventually, the images will be transferred to the Nixon Library in Yorba Linda, where they will be available to scholars, authors and others interested in this period in our history.

March 31, 1969. Various interior scenes from National Cathedral during former President Eisenhower’s funeral services.
March 11, 1969. A musical group of Girl Scouts entertain guests and members of the press at a ceremony donating land to the Girl Scouts of America. NARA is scanning the film with a film stage that allows the original frame numbers to be seen in the scan.
February 19, 1969. White House photographer Robert Knudsen and Photo Office office manager Buck May, sitting in the White House Photo office. Various portraits and moments of President Nixon, family members, inauguration, and Nixon with Heads of State, taken by WHPO photographers are tacked on the wall behind them.
February 6, 1969. President Nixon standing and gesturing while speaking at a press conference.
June 28, 1970. Pat Nixon speaks with the presss corps aboard Air Force One en route to Lima, Peru with earthquake disaster relief supplies.
May 20, 1971. President Nixon at a podium announcing an agreement between the governments of the United States and the Soviet Union on the Strategic Arms Limitation Talks (SALT). These remarks were being broadcast by radio and television.
February 28, 1969. A cutaway shot of a spectator holding a Stars and Stripes umbrella.
March 20, 1969. President Nixon meeting with White House News Photographers Association (WHNPA) contest winners and WHNPA President Sam Stearns in the Roosevelt Room.

PS – NARA is also digitizing the Nixon audio tapes. These are going more slowly due to a number of factors, and are still in production.

The window into a hallway at NARA counts down the remaining Nixon tapes to digitize. I had not noticed it at the time, but there is a nice collection of Nixon-themed figurines sitting on the window sill, including the presidential series of Pez dispensers and Futurama figurines.

Computational Tagging – What is it good for? (Absolutely something!)

This post is adapted from the forthcoming The DAM Book3.

There is a lot of hype and hazy discussion about the future of AI, but it’s often very loosely defined.  In a previous blog post, I made the case for lumping a lot of this into a category I’m calling Computational Tagging. In the second post, I made a distinction between Artificial Intelligence, Machine learning, and Deep Learning, In this post, I’ll outline a number of the capabilities that fall under the rubric of Computational Tagging.

What can computers tag for?

The subject matter will be an ever growing list, and in large part will be determined by the willingness of people and companies to pay for these services. but as of this writing, the following categories are becoming pretty common.

  • Objects shown – This was one of the first goals of AI services, and has come a long way. Most computational tagging services can identify objects, landscapes and other generically identifiable elements.
  • People and activities shown – AI services can usually identify if a person appears in a photo, although they may not know who it is unless it is a celebrity or unless the service has been trained for that particular person. Many activities can now be recognized by AI services, running the gamut from sports to work to leisure.
  • Specific People – Some services can be trained to recognize specific people in your library. Face tagging is part of most consumer-level services and is also found in some trainable enterprise services.
  • Species shown – Not long ago, it was hard for Artificial Intelligence to tell the difference between a cat and a dog. Now it’s common for some services to be able to tell you which breed of cat or dog (as well as many other animals and plants.) This is a natural fit for a machine learning project, since plants and animals are well-categorized training set and there are a lot of apparent use cases.
  • Adult content – Many computational tagging services can identify adult content, which is quite useful for automatic filtering. Of course, notions of what constitutes adult content varies greatly by culture.
  • Readable text – Optical Character Recognition has been a staple of AI services since the very beginning. This is now being extended to handwriting recognition.
  • Natural Language Processing – It’s one thing to be able to read text, it’s another thing to understand its meaning. Natural Language Processing (NLP) is the study of the way that we use language. NLP allows us to understand slang and metaphors in addition to strict literal meaning. (e.g. we can understand what the phrase “how much did those shoes set you back?”). NLP is important in tagging, but even more important in the search process.
  • Sentiment analysis – Tagging systems may be able to add some tags that describe sentiments. (e.g. It’s getting common for services to categorize facial expressions as being happy, sad or mad.) Some services may also be able to assign an emotion tag to images based upon subject matter, such as adding the keyword “sad” to a photo of a funeral.
  • Situational analysis – One of he next great leaps in Computational Tagging will be true machine learning capability for situational analysis. Some of this is straightforward (e.g. “this is a soccer game”.) Some is more difficult (“This is a dangerous situation.”) At the moment, a lot of situational analysis is actually rule based. (e.g. Add the keyword vacation when you see a photo of a beach.)
  • Celebrities – There is a big market of celebrity photos, and there are excellent training sets.
  • Trademarks and products – Trademarks are also easy to identify, and there is a ready market for trademark identification (e.g. alert me whenever our trademark shows up in someone’s Instagram feed). When you get to specific products, you probably need to have a trainable system.
  • Graphic elements – ML services can evaluate images according to nearly any graphic component. This includes shapes and colors in an image, These can be used to find similar images across a single collection or on the web at large. This was an early capability of rule-based AI service, and remains an important goal for both ML and DL services. .
  • Aesthetic ranking – Computer vision can do some evaluation of image quality. It can find faces, blinks and smiles. It can also check for color, exposure and composition and make some programmatic ranking assessments.
  • Image Matching services – Image matching as a technology is pretty mature, but the services built on image matching are just beginning. Used on the open web, for instance, image matching can tell you about the spread of an idea or meme. It can also help you find duplicate or similar images within your own system, company or library.
  • Linked data – There is an unlimited body of knowledge about the people, places and events shown in an image collection – far more than could ever be stuffed in to a database.  Linking media objects to data stacks will be a key tool to understanding the subject matter of the photo in a programmatic context.
  • Data exhaust – I use this term to mean the personal data that you create as you move through the world, which could be used to help understand the meaning and context of an image. Your calendar entries, texts or emails all contain information that is useful for automatically tagging images. There are lots of difficult privacy issues related to this, but it’s the most promising way to attach knowledge specific to the creator to the object.
  • Language Translation – We’re probably all familiar with the ability to use Google Translate to change a phrase from one language to another. Building language translation into image semantics will help to make it a truly transcultural communication system.

Update on DAM Book 3

It is with a healthy dose of chagrin that I report that the publication of The DAM Book 3 will be postponed yet again. I have been working on the book full time for the last three months (and quite a bit before that), and it is simply taking a long time to get it done properly.

When I announced an outline and publication date in early September, I was assuming that I could reuse as much as 40% of the copy in the book. As it currently stands, that number is hovering at close to 1%. Changes in the digital photography ecosystem and in the book’s scope  have driven a need to rewrite everything.

Not only has the rewriting been time consuming, but the changes in imaging and associated technologies has required a lot of research. I’ve been chasing down a lot of details on topics like artificial intelligence and machine learning, new technologies like depth mapping, and the state of the art in emerging metadata standards. It’s been a lot more work than I anticipated.

We saw a couple late-breaking changes that have been very important to include in the book. October’s release of a cloud-native version of Lightroom helps to complete the puzzle of where imaging and media management

Complicating matters, I’m going in for ankle replacement surgery in early December. I’ll be finishing the book while my leg is healing. But the pace at which I can work while recuperating is unknown, so I’m not prepared to make another announcement about publication dates.

In the end, I’ve had to choose between hitting a deadline and making the book be as good as possible. I’ve opted for quality.

Sneak Peek blog posts

I’ve been working with my editor to identify and publish content from the new book as we continue in production. The first series of these posts will provide some insight on Computational Tagging, a subject I first posted about last month.

Computational Tagging – Artificial Intelligence, Machine Learning, and Deep learning

This post is adapted from the forthcoming The DAM Book3.

There is a lot of hype and hazy discussion about the future of AI, but it’s often very loosely defined.  In a previous blog post, I made the case for lumping a lot of this into a category I’m calling Computational Tagging. In this post, I’ll split that into some large component parts. (Read the next post here).

What’s the difference between Computational Tagging, Artificial Intelligence, Machine Learning, and Deep Learning?

While the definitions of these processes have a lot of overlap, we can draw some useful distinctions. Let’s use a Venn diagram to illustrate the relationships.

Computational tagging refers to any system of automated tagging that is done by a computer. This includes the metadata added by your camera. It also includes information like a Wikipedia page or other network-accessible information  that could be added by simple linking.

Artificial Intelligence (AI) encompasses any computer technology that appears to emulate human reasoning. AI could be as simple as a set of rules that can create an intelligent looking behavior (e.g. a self-driving car could be taught the “rule” that you don’t want to cross a double yellow line.) AI also includes the more complex services  outlined below.

Machine Learning (ML) is a subset of AI that is more complex. Instead of just following an established  set of rules, in an ML environment, the system can be trained to discover the rules. An ML system for identifying species, for instance, uses a training set of tagged images to figure out what a Labrador retriever looks like.

Deep Learning (DL) is a specific type of ML that makes use of a predictive model in its learning process. This process actually mimics the way the brain works. In Deep Learning, the system does not just look at  results, but it uses a predictive model to train itself.  It is constantly testing a hypothesis against results, and adjusting the hypothesis according to this results.

Here’s how it works in your brain. The central nervous system is providing constant  input stimulus. Your brain then makes constant predictions about what the next input should be. When the input does not match the prediction, it recalibrates. You experience this process when you taste something you expect to be sweet and it’s salty, or when you take a step and the level of the ground is not where you expect it to be.

Read the next post here.

ICYMI – D850 scan test results

Last month I published the results of my tests of the Nikon D850 “negative digitizer” on PetaPixel. Judging by the dialog and web traffic, a lot of people saw it.  I’m putting up a short synopsis here, along with a link for those who missed it.

The bottom line: great camera for scanning, digitizer not ready yet.

The Nikon D850 is a truly wonderful camera for scanning photographs as I outline in my book Digitizing Your Photos. It offers a significant bump in resolution over the Nikon D800 which I have been using. If you are looking for the highest quality camera scan from a 35mm style DSLR, then the D850 would be an excellent choice (as would the Canon 5DSR, or the Sony a7r II mirrorless).

The camera includes a “negative digitizer” feature which can flip B&W and color negatives into positives in the camera. In theory, this could provide some real workflow advantages over manual conversion. However, there are some problems in the current implementation. Here they are in brief:

  • Clips highlights and shadows too aggressively
  • Can only shoot JPEG, which means highlights and shadows not recoverable
  • Exposure compensation and contrast control disabled in digitizer feature
  • Uncontrolled variation between frames means that batch corrections are not possible

Blown highlights were a particular problem with flash photos. Because they are JPEG files, they are not recoverable. 

Here are two images from the same negative strip. As you can see, the color is rendered very differently. This makes it impossible to batch correct with a consistent look. 

Nikon’s unofficial response

At the PhotoPlus Expo, I got a chance to have an extended dialog with Nikon representatives. They had seen the PetaPixel article and understood the issues I raised. More important, they indicated that it’s essential to fix these problems. I left the meetings with the impression that there would be a concerted effort by the Nikon USA staff to address these problems.

When I hear back from Nikon, I’ll be sure to post an update on the situation. In the meantime, if you are interested in using your D850 (or any other camera) as a scanner, the best methodology is outlined in my newest book, Digitizing Your Photos.

Digitizing Your photos - a guide to photo scanning with a digital camera
Scanning your photos with a digital camera – a comprehensive guide by Peter Krogh

Lightroom and the Innovator’s Dilemma

Adobe announced some big changes to Lightroom today, including a new cloud-native version (Lightroom CC) as well as a re-branding of the familiar desktop version (Lightroom Classic). Additionally, they have discontinued development of a “perpetual” version and all new versions will be licensed on a subscription basis. What gives?

The Innovator’s Dilemma
Clayton Christensen’s 1997 book, The Innovator’s Dilemma helps to shed some light on Adobe’s behavior. In the book, Christensen tracks the rise and fall of disruptive innovation, which includes rapid growth of successful applications, and an eventual leveling off of growth as the market becomes saturated. Eventually, changes in the market landscape allow for new competitors to arise, and the company becomes vulnerable to disruptive innovation and the loss of market dominance. If you don’t innovate on an equally aggressive basis, the company faces real danger. In this circumstance, your market dominance may prevent you from creating new software as you focus on maintaining the success of the old product.

The digital photography revolution
Lightroom was, in large part, an earlier response to the innovator’s dilemma. Photoshop was the clear leader in imaging software, but it was developed before the advent of digital cameras. Camera Raw was developed as companion application to Photoshop to deal with raw files, but ultimately the very structure of Photoshop was incompatible with the needs of busy digital photographers. It had a one-at-a-time file handling structure that was insufficient for many workflows.
Lightroom was developed in response to this new market reality. Adobe took the Camera Raw engine from Photoshop and grafted it on to a database, creating one of the most successful applications in the company’s history. Lightroom was developed by a small team working inside Adobe, essentially functioning as competition to the flagship product. If Adobe had put all their effort into shoring up Photoshop, they would be in very serious trouble right now as a preferred tool for digital photographers.

Mobile>digital
We are now at another inflection point, and this one, I believe, is even more transformational. The use of photography as a language, created on and consumed on smartphones has changed the way we communicate. One of the primary needs in this new world is continuous access and connectivity. Dependence on desktop software is incompatible with many of the important uses of photography. Often, we simply can’t wait until we get back to the home or office to send photos. And a great collection of images is frustratingly out of reach if you are away from your computer.
In order to serve the needs of mobile photographic communication, the Lightroom team has spent years working on ways to create an integrated cloud component to Lightroom. Publish Services allow the extension of Lightroom to integrate with a wide variety of other applications, including many cloud offerings. And the introduction of Lightroom Mobile, along with some integration with traditional Lightroom catalogs, offered some seamless interchange.
But the architecture of Lightroom as a desktop application simply cannot be stretched enough to create a great mobile application. The desktop flexibility that has powers such a wide array of workflows can’t be shoehorned in to full cloud compatibility. The freedom to set up your drives, files and folders as you wish makes a nightmare for seamless access. And the flexibility to create massive keyword and collection taxonomies does not work with small mobile screens. After years of experimentation, the only good answer was the creation of a new cloud native architecture. As with the creation of the original Lightroom, this was done by taking the existing Camera Raw imaging engine and bolting it on to a new chassis – this time a cloud native architecture.

Managed file storage
In order to have “my stuff everywhere” the new application has to be cloud native. The primary storage of your images and videos is now in the cloud. This allows Lightroom to have seamless access on multiple devices. And in order to allow Lightroom to push these files around, you need to give up control over the configuration of folders. By giving the control over to Lightroom, the application itself can help to manage the transfer of files between devices, using downsized versions when storage space is not adequate for full size copies. (and, yes, you can have a complete full-sized archive on your own drives, which is something I would suggest).

Computational Tagging
Lightroom has also made a major break with the metadata methods of the past, opting for a computational tagging system. Some of this is familiar – the use of date-time stamps and GPS tags to organize photos. Some is new, like the Machine Learning and Artificial Intelligence tagging that can automagically find images according to content. While these tools and techniques are pretty rudimentary now, we can expect them to mature quickly and continually. (Google Photos, for instance, just announced that they can now identify photos of your pets, and, voila, the tags simply appear.)

Not the end of desktop Lightroom
Just as the advent of Lightroom did not kill Photoshop, the introduction of Lightroom CC will not kill Lightroom Classic. It’s a hugely popular program for an important part of their customer base. And creating a cloud-native version of the software, instead of trying to shoehorn the program into a workflow it did not fit, frees up resources to make Lightroom a better desktop application. The Camera Raw development team can continue to make improvements to the engine, and each of the chassis builders – Camera Raw + Bridge, Lightroom Classic and Lightroom CC can focus on building workflows for their customer’s needs.
There are a number of important uses of Lightroom that are pretty far off for Lightroom CC. Many power users depend on custom keyword taxonomies and deep collection hierarchies and these may never appear in Lightroom CC. And there are lots of existing integrating of Lightroom through Publish Services that won’t be easy to migrate. There are also a ton of clever and useful Lightroom plugins that may be impossible to add to the cloud version.
For my own workflow, I’ll be sticking to Lightroom Classic as far as the eye can see. But I expect that my wife and kids will be happier with Lightroom CC.

The end of Perpetual Lightroom
There is certain to be some unhappiness with the discontinuation of the perpetual versions of Lightroom. For those who don’t want cloud connectivity or who don’t use smart phones, this change forces them into a subscription service that may be unwanted. I feel your pain.
But the world is changing, and photography is becoming a more important part of it. I’ve spent the last four months working on The DAM Book 3, writing about these tectonic changes, and I’m convinced that mobile imaging (and image consumption) is a driving force. Adobe is in a position to help us take advantage of that change and make the most of it. If they did not accept the evolution of the imaging landscape, they could be in real trouble. As it is, it will still be a challenge to maintain their leadership in such a fast-moving market.
Although Lightroom CC does introduce some black box functionality, Adobe is still a clear leader in “you own your stuff, and you can take it wth you.” I think this attitude, central to Adobe’s products since Geschke and Warnock left Xerox PARC to found the company, remains one of the strongest reasons to use their tools. Mobile and cloud computing has changed the landscape, but this attitude remains intact.
Note – If you want a more granular description of the changes to Lightroom, check out the ever-comprehensive Victoria Bampton’s post here. 

Computational Tagging

In my SXSW panel this year, Ramesh Jain and Anna Dickson and I delved into the implications of Artificial Intelligence (AI) becoming a commodity, which will be a commonplace reality by the end of 2017.  We looked at several classes of services and considered what they were good for.

I’ve been spending a lot of time on the subject over the last few months writing The DAM Book 3. Clearly AI will be important in collection management and the deployment of images for various types of communication.

But I  hate using the term AI to describe the array of services that help you make sense of your photos. There’s actually a bunch of useful stuff that is not technically AI. Adding date or GPS info is definitely not AI. And linking to other data (like a wikipedia page) is not really AI. ( It’s actually just linking). Machine Learning and programmatic tagging comes in a lot of forms – some is really basic, and some is complex.

The term Computational Imaging was pretty obscure when the last version of The Dam Book was published, but it’s become a very common term. I think this is a useful concept to extend to the whole AI/Machine Learning/Data Scraping/Programmatic Tagging stack.

In The DAM Book 3, I’m using the term Computational Tagging to refer to all the computer-based tagging methods that involve some level of automation. This runs from the tags made by the computer in my camera to the sophisticated AI environments of the future. At the moment, it’s not widely-used term (Google shows 138 instances on the web), but I think it’s the best general description for the automatic and computer-assisted tagging that are becoming an essential part of working with images.

Reminder – Two upcoming presentations on scanning

I’ve got two presentations scheduled for October. The first is a free two hour seminar on scanning with a digital camera at the Click! Photo festival in Durham, NC. It takes place 10am-noon Oct 6. Here’s a link.

And I’ll be in New York at PhotoPlus, doing a tag-team presentation with Katrin Eismann called Preserving Your Photographic History. I’ll show how to scan with a digital camera, and then Katrin will demonstrate reparations and restoration techniques from her revised book on retouching. Here’s a link for that.

Use this custom landing page for a 15% discount and free show pass.

Testing Nikon D850 for Camera Scans

Nikon sent me a D850 to do some camera scan testing. My initial impression is that it it really promising, but I have not been able to get it to do exactly what I want. It does a pretty good job for most negatives, but it’s having problems on dark images.

I’ll run a number of rolls through it over the weekend and report back.  I will say that even if it’s not perfect now, I can tell this is going to be a great solution for camera scanning color negatives, particularly in conjunction with Lightroom.