Extract Text from PDF files with Google Apps Script

You can use Google Apps Script to extract text from a PDF file and save the extracted text as a new document in Google Drive. The document will also retain the simple formatting of the PDF file.

The following script illustrates how to use the Google Drive API as an OCR engine and extract text from a PDF file on the Internet. The code can be modified to convert PDF files existing in Google Drive to an editable document.

function extractTextFromPDF() {
  // PDF File URL
  // You can also pull PDFs from Google Drive
  var url = 'https://img.labnol.org/files/Most-Useful-Websites.pdf';

  var blob = UrlFetchApp.fetch(url).getBlob();
  var resource = {
    title: blob.getName(),
    mimeType: blob.getContentType(),
  };

  // Enable the Advanced Drive API Service
  var file = Drive.Files.insert(resource, blob, { ocr: true, ocrLanguage: 'en' });

  // Extract Text from PDF file
  var doc = DocumentApp.openById(file.id);
  var text = doc.getBody().getText();

  return text;
}

Google Drive API can perform OCR on JPG, PNG, GIF and PDF files. You can also specify the ocrLanguage property to specify the language to use for OCR.

Combine this with the doGet method and you’ve made an HTTP Rest API that can perform OCR on any web document with a simple GET request. This can be modified to work with file upload forms as well.

Amit Agarwal

Amit Agarwal

Google Developer Expert, Google Cloud Champion

Amit Agarwal is a Google Developer Expert in Google Workspace and Google Apps Script. He holds an engineering degree in Computer Science (I.I.T.) and is the first professional blogger in India.

Amit has developed several popular Google add-ons including Mail Merge for Gmail and Document Studio. Read more on Lifehacker and YourStory

0

Awards & Titles

Digital Inspiration has won several awards since it's launch in 2004.

Google Developer Expert

Google Developer Expert

Google awarded us the Google Developer Expert award recogizing our work in Google Workspace.

ProductHunt Golden Kitty

ProductHunt Golden Kitty

Our Gmail tool won the Lifehack of the Year award at ProductHunt Golden Kitty Awards in 2017.

Microsoft MVP Alumni

Microsoft MVP Alumni

Microsoft awarded us the Most Valuable Professional (MVP) title for 5 years in a row.

Google Cloud Champion

Google Cloud Champion

Google awarded us the Champion Innovator title recognizing our technical skill and expertise.

Email Newsletter

Sign up for our email newsletter to stay up to date.

We will never send any spam emails. Promise.