Performing OCR with Google Search vs Commercial OCR Software

I earlier recommended using the built-in OCR (Optical Character Recognition) engine of Google Web Search to convert scanned PDFs into text. You had to upload the scanned documents to a website and then wait for Google bots to index them.

Now assuming that you know how to extract text from scanned PDF images via Google OCR, the next important question is how good (and reliable) is Google’s text recognition technology vis-a-vis other commercial OCR software like Abbyy FineReader or Adobe Acrobat Professional.

For comparison sake, I chose this scanned PDF* as it contains a mix of tables, images and text of different sizes. The resolution of the scanned paper document is fairly poor as you can easily make it out from the document snapshot:

Scanned PDF for Text Recognition

*The PDF document was initially available on the Hindu website from where Google crawlers picked up the document and converted it into a HTML version.

Google OCR

This is the digitized version of the scanned PDF created using Google OCR.

Google’s software (or rather web search engine) could successfully recognize most of the text and tables in the scanned image though, as expected, it did skip the images in the PDF document. There were a couple of junk characters included in the extracted version but I think that’s more due to the poor scan resolution.

OCR in Adobe Acrobat

I then tried using the OCR feature of Adobe Acrobat to extract text from the scanned PDF and here’s the resulting Word document.

Acrobat could recognize pages in the PDF document that had images and exported these pages as such to Microsoft Word. In some cases, it even recognized the text captions beneath the images and exported them as searchable text but overall, the results were too disappointing. The formatting was not preserved on most pages and there were just too many junk characters added to the extracted version.

Abbyy FineReader OCR

After Acrobat, I used Abbyy FineReader to digitize the scanned PDF and here’s the result. Abbyy, being a commercial OCR software, delivered the best performance - it retained the layout on almost every page, removed unnecessary line breaks and added minimal number of junk characters to just a few pages.

There’s however one area where Google OCR software definitely scored above Abbyy FineReader - recognizing image captions. One of the pages in the scanned PDF had around six images with text captions  - FineReader recognized the whole page as one image while Google OCR could extract all these individual captions as text. And when compared with Adobe Acrobat, Google OCR was definitely a better choice.

Google’s online OCR is both free and requires no installation. If you have access to a public web server and can afford to wait for a couple of days for Google to convert your scanned PDF files, there’s really no need to hunt for free OCR alternatives anymore.

Also see: Software Tools for a Paperless Office

Amit Agarwal

Amit Agarwal

Google Developer Expert, Google Cloud Champion

Amit Agarwal is a Google Developer Expert in Google Workspace and Google Apps Script. He holds an engineering degree in Computer Science (I.I.T.) and is the first professional blogger in India.

Amit has developed several popular Google add-ons including Mail Merge for Gmail and Document Studio. Read more on Lifehacker and YourStory

0

Awards & Titles

Digital Inspiration has won several awards since it's launch in 2004.

Google Developer Expert

Google Developer Expert

Google awarded us the Google Developer Expert award recogizing our work in Google Workspace.

ProductHunt Golden Kitty

ProductHunt Golden Kitty

Our Gmail tool won the Lifehack of the Year award at ProductHunt Golden Kitty Awards in 2017.

Microsoft MVP Alumni

Microsoft MVP Alumni

Microsoft awarded us the Most Valuable Professional (MVP) title for 5 years in a row.

Google Cloud Champion

Google Cloud Champion

Google awarded us the Champion Innovator title recognizing our technical skill and expertise.

Email Newsletter

Sign up for our email newsletter to stay up to date.

We will never send any spam emails. Promise.