Category Archives: Camera Scanning

Using Google Cloud Vision for OCR

Editor’s Note: This post combines a couple threads I’ve been writing about. I’ll provide some real-world methods for converting visible text to searchable metadata as discussed in Digitizing Your Photos. In the course of this, I’ll also be fleshing out real-world workflow for Computational Tagging as discussed in The DAM Book 3 Sneak Peeks.

In the book Digitizing Your Photos, I made the case for digitizing textual documents as part of any scanning project. This includes newspaper clippings, invitations, yearbooks and other memorabilia. These items can provide important context for your image archive and the people and events that are pictured.

Ideally, you’ll want to change the visible text into searchable metadata. Instead of typing it out, you can use Optical Character Recognition (OCR) to automate the process. OCR is one of the earliest Machine Learning technologies, and it’s common to find in scanners, fax machines and in PDF software. But there have not been easy ways to automatically convert OCR text to searchable image metadata.

In Digitizing Your Photos,  I show how you can manually run images through Machine Learning services and convert any text in the image into metadata through cut-and-paste. And I promised to post new methods for automating this process as I found them. Here’s the first entry in that series.

The Any Vision Lightroom Plugin
I’ve been testing a Lightroom plugin that automates the process of reading visible text in an image and pasting it into a metadata field. Any Vision from developer John Ellis uses Google’s Cloud Vision service to tag your images for several types of information, including visible text. You can tell Any Vision where you want the text to be written, choosing between one of four fields. as shown below.

Here is part of the Any Vision interface, with only OCR selected. As you can see, you have the ability to target any found text to either the Caption, Headline, Title or Source filed. I have opted to use the Headline field myself, since I don’t use it for anything else. 

Results
Here are my findings, in brief:

  • Text that appears in real-life photos (as opposed to copies of textual documents) might be readable, but the results seem a lot less useful.
  • Google does a very good job reading text on many typewritten or typeset documents. If you have scanned clippings or a scrapbook, yearbook or other typeset documents, standard fonts seem to be translated reasonably well.
  • Google mostly did a poor job of organizing columns of text. It simply read across the columns as though they were one long line of non-sensical text. Microsoft Cognitive Services does a better job, but I’m not aware of an easy way to bring this into image metadata.
  • Handwriting is typically ignored.
  • For some reason, the translate function did not work for me. I was scanning some Danish newspapers and the text was transcribed but not translated. I will test this further.

Examples
(Click on images to see a larger version)

Let’s start with an image that shows why I’m targeting the Headline field rather than the caption field. This image by Paul H. J. Krogh already has a caption, and adding a bunch of junk to it would not be helping anybody.
You can also see that the sign in the background is partially recognized, but lettering in red is not seen and player numbers are ignored even though they are easily readable.

In the example below, from my mother’s Hollins College yearbook, you can see that the text is read straight across, creating a bit of nonsense. However, since the text is searchable, this would still make it easy to find individual names or unbroken phrases in a search of the archive.
You can also see that the handwriting on the page is not picked up at all.

In the next example, you can see that Google was able to see the boxes of text, changes of font and use of underline to hep parse text more  properly. 

And in this last example you can see that Google is having a terrible time with the gothic font in this certificate, only picking out a small fraction of letters properly. 

The Bottom Line
If you have a collection of scanned articles or other scanned textual documents in Lightroom, this is a great way to make visible text searchable. While Google is not the best OCR, thanks to Any Vision, it’s the easiest way I know of to add the text to image metadata automatically.

AnyVision is pretty geeky to install and use, but the developer has laid out very clear instructions for getting it up and running and for signing up for your Google Cloud Vision account. Read all about it here.

Cost
Google’s Cloud Vision is very inexpensive – it’s priced at $.0015/per image (which works out to $1.50 for 1000 images.) Google will currently give you a $300 credit when you create an account, so you can test this very thoroughly before you run up much of a bill.

Watch for another upcoming post where I outline some of the other  uses of Any Vision‘s tagging tools.

 

 

Camera Scanning the Nixon Legacy

I had the pleasure of making a visit to the National Archives and Records Agency early this month, meeting with Steve Greene and Cary McStay who are in charge of scanning the official photos from the Nixon administration. They are using a digital camera to do the scanning, in much the same way as I outline in Digitizing Your Photos.

The film from the White House Photo Office was transferred to NARA so that it would be sure to be preserved. (This was done along with the audio tapes which are still being digitized.)  There were 258,318 images on 14,526 rolls. Most of it is 35mm b&w or C22 negative film.  There is also some 4×5 film.

NARA first used conventional scanners for the project, but it was clear that conventional scanning was going to take too long to accomplish. They began testing camera scans, and became very comfortable that the quality coming out of a D810 was high enough for the vast number of uses of the archive.

Shown below are some of the photos that were scanned as part of this project. Eventually, the images will be transferred to the Nixon Library in Yorba Linda, where they will be available to scholars, authors and others interested in this period in our history.

March 31, 1969. Various interior scenes from National Cathedral during former President Eisenhower’s funeral services.
March 11, 1969. A musical group of Girl Scouts entertain guests and members of the press at a ceremony donating land to the Girl Scouts of America. NARA is scanning the film with a film stage that allows the original frame numbers to be seen in the scan.
February 19, 1969. White House photographer Robert Knudsen and Photo Office office manager Buck May, sitting in the White House Photo office. Various portraits and moments of President Nixon, family members, inauguration, and Nixon with Heads of State, taken by WHPO photographers are tacked on the wall behind them.
February 6, 1969. President Nixon standing and gesturing while speaking at a press conference.
June 28, 1970. Pat Nixon speaks with the presss corps aboard Air Force One en route to Lima, Peru with earthquake disaster relief supplies.
May 20, 1971. President Nixon at a podium announcing an agreement between the governments of the United States and the Soviet Union on the Strategic Arms Limitation Talks (SALT). These remarks were being broadcast by radio and television.
February 28, 1969. A cutaway shot of a spectator holding a Stars and Stripes umbrella.
March 20, 1969. President Nixon meeting with White House News Photographers Association (WHNPA) contest winners and WHNPA President Sam Stearns in the Roosevelt Room.

PS – NARA is also digitizing the Nixon audio tapes. These are going more slowly due to a number of factors, and are still in production.

The window into a hallway at NARA counts down the remaining Nixon tapes to digitize. I had not noticed it at the time, but there is a nice collection of Nixon-themed figurines sitting on the window sill, including the presidential series of Pez dispensers and Futurama figurines.

ICYMI – D850 scan test results

Last month I published the results of my tests of the Nikon D850 “negative digitizer” on PetaPixel. Judging by the dialog and web traffic, a lot of people saw it.  I’m putting up a short synopsis here, along with a link for those who missed it.

The bottom line: great camera for scanning, digitizer not ready yet.

The Nikon D850 is a truly wonderful camera for scanning photographs as I outline in my book Digitizing Your Photos. It offers a significant bump in resolution over the Nikon D800 which I have been using. If you are looking for the highest quality camera scan from a 35mm style DSLR, then the D850 would be an excellent choice (as would the Canon 5DSR, or the Sony a7r II mirrorless).

The camera includes a “negative digitizer” feature which can flip B&W and color negatives into positives in the camera. In theory, this could provide some real workflow advantages over manual conversion. However, there are some problems in the current implementation. Here they are in brief:

  • Clips highlights and shadows too aggressively
  • Can only shoot JPEG, which means highlights and shadows not recoverable
  • Exposure compensation and contrast control disabled in digitizer feature
  • Uncontrolled variation between frames means that batch corrections are not possible

Blown highlights were a particular problem with flash photos. Because they are JPEG files, they are not recoverable. 

Here are two images from the same negative strip. As you can see, the color is rendered very differently. This makes it impossible to batch correct with a consistent look. 

Nikon’s unofficial response

At the PhotoPlus Expo, I got a chance to have an extended dialog with Nikon representatives. They had seen the PetaPixel article and understood the issues I raised. More important, they indicated that it’s essential to fix these problems. I left the meetings with the impression that there would be a concerted effort by the Nikon USA staff to address these problems.

When I hear back from Nikon, I’ll be sure to post an update on the situation. In the meantime, if you are interested in using your D850 (or any other camera) as a scanner, the best methodology is outlined in my newest book, Digitizing Your Photos.

Digitizing Your photos - a guide to photo scanning with a digital camera
Scanning your photos with a digital camera – a comprehensive guide by Peter Krogh

Reminder – Two upcoming presentations on scanning

I’ve got two presentations scheduled for October. The first is a free two hour seminar on scanning with a digital camera at the Click! Photo festival in Durham, NC. It takes place 10am-noon Oct 6. Here’s a link.

And I’ll be in New York at PhotoPlus, doing a tag-team presentation with Katrin Eismann called Preserving Your Photographic History. I’ll show how to scan with a digital camera, and then Katrin will demonstrate reparations and restoration techniques from her revised book on retouching. Here’s a link for that.

Use this custom landing page for a 15% discount and free show pass.

Testing Nikon D850 for Camera Scans

Nikon sent me a D850 to do some camera scan testing. My initial impression is that it it really promising, but I have not been able to get it to do exactly what I want. It does a pretty good job for most negatives, but it’s having problems on dark images.

I’ll run a number of rolls through it over the weekend and report back.  I will say that even if it’s not perfect now, I can tell this is going to be a great solution for camera scanning color negatives, particularly in conjunction with Lightroom.

Nikon D850 – Built as a scanner?

The Nikon D850 has been announced, and it looks like a heck of a nice camera. The headline stuff includes everything we’ve come to expect from the next magical generation of digital SLR cameras – 45 megapixels, 7 frames per second, ISO 25,600, 8k video, touch screen, and so much more.

But tucked away on page 85 of the PDF brochure is this: a negative digitizer! Apparently the camera has a built-in algorithm for flipping negatives positive.

Some seasoned photographers may be exploring ways to convert their film assets created with old cameras into digital data. Taking advantage of its high-pixel count of 45 megapixels, the D850 offers an option for digitizing film (35mm-format), which can handle color and monochrome negatives. First, set an optional ES-2 Film Digitizing Adapter onto a lens such as the AF-S Micro NIKKOR 60mm f/2.8G ED attached to the D850. Then, insert the film to be digitized in an FH-4 Strip Film Holder or FH-5 Slide Mount Holder, and shoot. The camera’s digitizing function automatically reverses the colors and stores them as JPEG images. This once time-consuming process involving a film scanner can be done much more quickly. You can enjoy pictures with family and friends while selecting and digitizing by displaying them on a large TV monitor connected via an HDMI cable. Enjoy your old film images by digitizing them with the D850.

There are several items above that I’d love to test. If it could handle color negatives reasonably well, this could be a major workflow improvement for camera scanning.

And I don’t really like the ES-2, but there’s no reason not to use a rail system or copystand /lightbox .

In any case, it’s very exciting to see Nikon acknowledge this missing market niche, especially when so many photographers have mourned the loss of the Nikon scanner line.

Photo Scanning Webinar

Scanning Photos With Your CameraDigitizing Your photos - a guide to photo scanning with a digital camera

September 13th, I’m presenting at B&H’s Event Space in NYC to share techniques from my new book Digitizing Your Photos with your Camera and Lightroom. You can come see it live if you’re in New York, or see it on the web.

I’ll be presenting material from my book on scanning photos with a digital camera. In the webinar we’ll cover:

  • The camera scanning advantage
  • Hardware setups for scanning prints, slides and negatives
  • How to ensure top quality
  • Using Lightroom for camera scans
  • Tagging your images
  • Publishing and sharing your scans

When: Wednesday, September 13, 2017, 1:00p – 3:00p
Skill Level: Basic, Intermediate, Advanced – Everyone will get something out of it
Location: B&H Event Space
Address: Second Floor of B&H NYC SuperStore at 420 9th Avenue, New York NY 10001

Register Here

FYI
All of their events are FREE!  If you want to guarantee a seat for an event, please register ASAP. Their events can fill up fast.

Can’t get to NY? The event will be streamed. Register to watch online.

Not available on the 13th?  B&H will post the video on their website.

Other questions? See B&H’s FAQ for Event Space details.  

 

New high-CRI lights for film scanning

LED lighting is a fast-moving product landscape, with prices plummeting and quality increasing faster than anything I’ve ever seen in photography.I was over at B&H last week getting things all set up for my September 13 presentation at the Event Space. I took the opportunity to look at the LED lights on display. I found a nice little unit from Dracast that should be great for camera scanning transparencies on a rail system. At $68, it looked well made. Even better, it listed a CRI number of 95, which is a very high quality light for the price.

I was talking to Gary on the sales floor, and I wondered if this light was really 95 CRI. He smiled and said he’d be back in a minute. When he returned, he had a $2200 Sekonic C‑7000 SpectroMaster Color in hand. “Let’s find out” he said. We took readings of the light, and, sure enough, it showed a CRI over 97.

I’m going to buy one of these lights and take it for a spin. Note that because this light is designed to be used on-camera, it does not come with an AC adapter. I checked with the company and they tell me it takes a 12 volt 10 watt power supply. I have a bunch of old 12 volt power supplies lying around, so I’ll test with these when the light arrives.

Note, there are several variations of the Camlux light from Dracast. For camera scanning, I’m not interested in the bicolor, but they would be useful for shooting. These come in daylight or bicolor. Here they are linked.

160 LED Bicolor $69
160 LED Daylight  $68
160 LED Bi-color with battery and charger $89

ASMP Webinar July 26 – Digitizing Photo Archives

I’m happy to be back in the ASMP fold, doing a webinar next week on digitizing photo collections. Of course this will be based on our new book, Digitizing Your Photos, but with a special emphasis on the relevance to professional photographers.

I’ll be demonstrating how camera scanning can allow for large-scale conversion of film and print originals to digital images, which is important for those of us who have large film archives. I’ve digitized more than 50,000 of my own images, and continue to add new images.

I’ll also be touching on business models that photographers can consider for new services for their clients. There are a lot of companies and institutions that have large collections of physical photos. I’ve been able to help some of my clients with the process, as part of my professional services. I’ll discuss some business models for adding these services.