This will come handy when you are trying to recover an accidentally deleted website or you need to retrieve a web page that no longer exists at the original location.
You opened a web page on the Internet but the server hosting the site returns a 404 error meaning that either the web page has been removed or moved to a different location.
To recover the lost page, the best option is that you search the page across all three major search engines (Google, Yahoo, Windows Live Search) and hope that a copy of the web page exists in the cache somewhere.
All major search engines store cached copies of web pages
If the original page is not available in any of the search engine’s cache, you can repeat the search process at Internet Archive’s Wayback Machine - it is the largest web repository holding a snapshot (or backup) of over 10 billion web pages.
The Internet Archive doesn’t store web pages created or modified in the past 6-12 months while search engines may have the most recent version of the web pages in their cache.

Recover Deleted Websites Automatically
While it is often possible to recover lost websites using a combination of search engine caches and web archives, the process can be very time-consuming especially if you are trying to recover a large site that had more than a few dozen web pages.
To ease the site recovery process, Frank McCown at Harding University created a tool called Warrick that lets you reconstruct any lost website (or single web page) automatically. Simply type the URL of the web site and Warrick will let you know via email once the recover process is over.
The tool is essentially a web crawler that scans and collects missing web pages from all the four web repositories - Internet Archive, Google, Live Search, and Yahoo. If a web page is found in more than one web repository, Warrick saves the page with the most recent date.
The recovery process may take some time for large websites. For instance, I tried Warrick for reconstructing Digital Inspiration and it took about a week to complete the job. The recovered web pages were provided as a zipped archive (~50 MB).
Warrick is available both as an online service or you can download the Perl source files and run them locally on your own computer.
If you have accidentally deleted or overwritten your web pages, make sure your run Warrick before Google and other search bots attempt to re-crawl the site and replace their cached copies with something else.
Also see: Archive Web Pages Permanently with Iterasi
Find this article at: http://www.labnol.org/internet/recover-deleted-webpages-from-internet/6529/
web: http://www.labnol.org/ email: amit@labnol.org


Reader Comments
Amit,
you know who would need this article most now? The folks at Journalspace.com !! They recently wiped all their data and lost their user’s blogs since they set the hard disks in a RAID array and wiped data on one of the disk. The other RAID mirrored disk promptly erased itself soon after leaving all their users with no easy way to recover the data.
-Tipscurry
Written by Tipscurry on 01.14.09
Amit. Are you aware one UK ISP has banned Inernet Wayback machine? Demon Internet owned by British Telcom company is blocking all 85 million pages that Wayback has in its archive.
Written by Neeta on 01.14.09
Totally agree with Amit (the commentor above)
But I think I read in TC, that Journal space didnt have anything on Google cache or Wayback machine..
And did you send them the link to this post..? They would be very grateful to you .
Cheers
Written by Arun Basil Lal on 01.14.09
Last year, my blog was on blogspot and it was suddenly removed. I then used the same technique(google cache) to get all my posts.
Thanks for mentioning this tool.
Written by Nihar on 01.14.09
I cannot speak more great fully about Web.archive’s usefulness. Nirantar.org’s old issues were entirely deleted and we never kept a backup. Through Web archive I was able to retrieve each and every page of the those 5 editions, 2 years after they were created. It’s an incredible service.
Written by Debashish on 01.15.09
I wish that Magnolia would have done this and maybe have gotten back a load of data for at least some of its users. That, or users should have done it on their profiles once Magnolia went down.
Written by Chacha on 03.18.09