Last month, my story about Facebook hitting a trillion page-views received an unexpected amount of interest around the web. I obviously don’t have inside sources nor did anyone contact me with those numbers – so how did I get the news about the trillion milestone before anyone else on the web?
The answer is simple. I use a web monitoring software that tracks a list of web pages (URLs) at set intervals and alerts me whenever content is added or deleted from these pages. In the above case, the monitoring utility was watching a page on google.com and the moment Google uploaded the new numbers, I got an alert on my desktop.
The utility that I have on my Windows machine is called Website Watcher from Aegnis.com – a single-user license for the basic edition of Website Watcher is about â‚¬30 and it supports all types of web addresses including secure http and ftp based URLs.
A web monitoring tool, in simple English, works something like this. You specify the address (URL) of a web page that you would like to track and how frequently the tool should ping the given page to determine if the content has changed.
In the case of Website Watcher, you can also visually specify the portions of a page that should be ignored for tracking (like the sidebar or the footer). Later, if the tool detects that a page has changed, you can compare the before and after versions of the page side-by-side and, like any other diff tool, the changed text is highlighted for quick comparison.
Website Watcher works well but if you are looking for a free alternative, check out NotiPage. This is again a Windows-only utility for monitoring web pages with basic monitoring features except for one limitation - NotiPage only highlights the new content that has been added to a page but you won’t be able figure out what has been removed from a page.
If you are monitoring a page that follows a regular pattern – like a Google Search results page where all the different results are rendered as a pattern using a similar set of HTML tags – you may also use Google Docs as a page monitor. You essentially scrap the page content into Google Docs with the help of ImportXML function and then track changes through RSS. This does however require some knowledge of XPath and CSS.
Versionista is another awesome web-based tool for tracking web pages. It lets you compare the two versions of a page side-by-side and thus you can know what has been added, or removed, from a page since you last viewed it. Versionista also lets you apply regular expression based filters to help you specify what kind of page edits should be ignored by the tool during the comparison.