Performing OCR with Google Search vs Commercial OCR Software

Published in: OCR - PDF

I earlier recommended using the built-in OCR (Optical Character Recognition) engine of Google Web Search to convert scanned PDFs into text. You had to upload the scanned documents to a website and then wait for Google bots to index them.

Now assuming that you know how to extract text from scanned PDF images via Google OCR, the next important question is how good (and reliable) is Google’s text recognition technology vis-a-vis other commercial OCR software like Abbyy FineReader or Adobe Acrobat Professional.

For comparison sake, I chose this scanned PDF* as it contains a mix of tables, images and text of different sizes. The resolution of the scanned paper document is fairly poor as you can easily make it out from the document snapshot:

Scanned PDF for Text Recognition

*The PDF document was initially available on the Hindu website from where Google crawlers picked up the document and converted it into a HTML version.

Google OCR

This is the digitized version of the scanned PDF created using Google OCR.

Google’s software (or rather web search engine) could successfully recognize most of the text and tables in the scanned image though, as expected, it did skip the images in the PDF document. There were a couple of junk characters included in the extracted version but I think that’s more due to the poor scan resolution.

OCR in Adobe Acrobat

I then tried using the OCR feature of Adobe Acrobat to extract text from the scanned PDF and here’s the resulting Word document.

Acrobat could recognize pages in the PDF document that had images and exported these pages as such to Microsoft Word. In some cases, it even recognized the text captions beneath the images and exported them as searchable text but overall, the results were too disappointing. The formatting was not preserved on most pages and there were just too many junk characters added to the extracted version.

Abbyy FineReader OCR

After Acrobat, I used Abbyy FineReader to digitize the scanned PDF and here’s the result. Abbyy, being a commercial OCR software, delivered the best performance - it retained the layout on almost every page, removed unnecessary line breaks and added minimal number of junk characters to just a few pages.

There’s however one area where Google OCR software definitely scored above Abbyy FineReader - recognizing image captions. One of the pages in the scanned PDF had around six images with text captions  - FineReader recognized the whole page as one image while Google OCR could extract all these individual captions as text. And when compared with Adobe Acrobat, Google OCR was definitely a better choice.

Google’s online OCR is both free and requires no installation. If you have access to a public web server and can afford to wait for a couple of days for Google to convert your scanned PDF files, there’s really no need to hunt for free OCR alternatives anymore.

Also see: Software Tools for a Paperless Office

📮  Subscribe to our Email Newsletter for Google tips and tutorials!
Published in: OCR - PDF

Looking for something? Find here!

Meet the Author

Web Geek, Google Developer Expert
Amit Agarwal

Amit Agarwal is a Google Developer Expert in Google Workspace and Google Apps Script. He holds an engineering degree in Computer Science (I.I.T.) and is the first professional blogger in India. He is the developer of Mail Merge for Gmail and Document Studio. Read more on Lifehacker and YourStory

Get in touch

Google Add-ons

Do more with your Gmail and GSuite account

We build bespoke solutions that use the capabilities and the features of Google Workspace for automating business processes and driving work productivity.

  1. Mail Merge with Attachments
    Send personalized email to your Google Contact with a Google Sheet and Gmail
  2. Save Emails and Attachments
    Download email messages and file attachments from Gmail to your Google Drive
  3. Google Forms Email Notifications
    Send email notifications to multiple people when a new Google Form is submitted
  4. Document Studio
    Create beautiful pixel perfect documents merging data from Google Sheets and Google Forms
  5. Creator Studio for Google Slides
    Turn your Google Slides presentations into animated GIFs and videos for uploading to YouTube