Convert HTML Content into Plain Text

Published in: JavaScript

Say you have an HTML snippet and you would like to extract the plain text from the snippet without any of the HTML tags. This may come handy when you are sending mail through a program that doesn’t support HTML Mail.

The easiest way would be to strip all the HTML tags using the replace() method of JavaScript. It finds all tags enclosed in angle brackets and replaces them with a space.

var text = html.replace(/<\/?[^>]+>/ig, " ");

The problem with the above approach is that it may fail for malformed HTML or when the HTML content contains entities like dashes, ampersands and other punctuation codes. The workaround is simple though.

   var temp = document.createElement("div");
   temp.innerHTML = html;
   return temp.textContent || temp.innerText || "";
Published in: JavaScript

Looking for something? Find here!

Meet the Author

Web Geek, Google Developer Expert
Amit Agarwal

Amit Agarwal is a Google Developer Expert in Google Workspace and Google Apps Script. He holds an engineering degree in Computer Science (I.I.T.) and is the first professional blogger in India. He is the developer of Mail Merge for Gmail and Document Studio. Read more on Lifehacker and YourStory

Get in touch