Convert Scanned PDF Documents to Text with Google OCR
There are two types of PDF documents – those created by sending Office files, images, etc. to an Acrobat like PDF printer and those created by scanning physical paper like pages of a book, legal documents, etc.
Since scanned PDFs are nothing but images, don’t be surprised if Google adds a “search by text” function to their Image Search engine similar to OneNote or EverNote. That will surely be huge.
Now if you have bunch of scanned PDF files on your hard drive and no OCR software, here’s what you can do to convert them into recognizable text.
Create a folder in your website (say abc.com/pdf) and upload all the PDF images to that folder. Now create a public web page that links to all the PDF files. Wait for the Google bots to spider your stuff.
Once done, type the query “site:abc.com/pdf filetype:pdf” to see the PDF documents as HTML.
Google Developer Expert, Google Cloud Champion
Amit Agarwal is a Google Developer Expert in Google Workspace and Google Apps Script. He holds an engineering degree in Computer Science (I.I.T.) and is the first professional blogger in India.