Scrape Web Pages with YQL and Apps Script

S
Published in: Google Apps Script

Some web services, Google Search and Amazon Prices for example, may not offer APIs or, if they do, not every detail available on the website pages may be available through the API. In such cases, you can use web scraping with YQL (Yahoo Query Language) and Google Scripts to extract any data from their web pages.

You need to specify the URL of the page that you wish to scrape and also the XPath of the element that should be extracted. If you are not familiar with XPath, use the Chrome Dev Tools to inspect the element, right click the node in the DOM tree and choose Copy XPath to know the XPath (see screenshot).

scrape-web-pages

In the snippet below, we are fetching the home page of the New York Times technology section as a JSON though YQL and the results are parsed with Google Apps Scripts.

/*
   Paste it in Google Script Editor and choose Run -> Scrape Web
*/

function scrapeTheWeb() {

  // The URL of the page to scrape
  var url   = "http://www.nytimes.com/pages/technology/index.html";

  // The XPATH for the data to extract
  var xpath = '//div[@class="story"]//h3/a';

  // Contruct a YQL URL
  var query = "select * from html where url = '" + url + "' and xpath = '" + xpath + "'";

  // Notice that we request the data in JSON format
  var yql   = "https://query.yahooapis.com/v1/public/yql?format=json&q=" + encodeURIComponent(query);

  var response = UrlFetchApp.fetch(yql);

  // Parse the JSON response from YQL
  var json = JSON.parse(response.getContentText());

  var urls = json.query.results.a;

  for (var url in urls) {

    // Output the scrapped URLs and titles
    Logger.log(urls[url].content + " - " + urls[url].href);

  }

}
📮  Subscribe to our Email Newsletter for Google tips and tutorials!
Published in: Google Apps Script

Looking for something? Find here!

Meet the Author

Web Geek, Google Developer Expert
A
Amit Agarwal

Amit Agarwal is a Google Developer Expert in Google Workspace and Google Apps Script. He holds an engineering degree in Computer Science (I.I.T.) and is the first professional blogger in India. He is the developer of Mail Merge for Gmail and Document Studio. Read more on Lifehacker and YourStory

Get in touch

Google Add-ons

Do more with your Gmail and GSuite account
G

We build bespoke solutions that use the capabilities and the features of Google Workspace for automating business processes and driving work productivity.

  1. Mail Merge with Attachments
    Send personalized email to your Google Contact with a Google Sheet and Gmail
  2. Save Emails and Attachments
    Download email messages and file attachments from Gmail to your Google Drive
  3. Google Forms Email Notifications
    Send email notifications to multiple people when a new Google Form is submitted
  4. Document Studio
    Create beautiful pixel perfect documents merging data from Google Sheets and Google Forms
  5. Creator Studio for Google Slides
    Turn your Google Slides presentations into animated GIFs and videos for uploading to YouTube