How to Beat Content Scrapers with Fat Pings
When you publish an article on your blog, the RSS feed is updated and everyone subscribed to that feed will instantly know that there’s something new worth checking out on your blog.
A majority of your blog feed subscribers will be human beings who are genuinely interested in reading your articles but some bots may also be watching your feed and their only intention is to scrap web content and republish it on to their own website.
These content scraping bots can sometimes confuse search engines. You write a story on your website and, because of these bots, a dozen sites manage to copy that story word-by-word often within seconds of you hitting the publish button. You can add links in the feed to say that you are the original author but scrapers can easily remove them with a simple preg_replace. What next?
Google engineer Matt Cutts, during his keynote at PubCon 2011, suggested that website owners set up fat pings which essentially means that you inform Google about new content as soon as you publish it on the web. And since it’s a fat ping, you just don’t inform Google but also send them the entire content of your blog post.
If your blog is hosted on Blogger or on WordPress.com, you don’t have to do anything as they are already Pubsubhubbub enabled. That is, when you publish a blog post, it will instantly ping Google and that is a strong signal that you are the original author.
WordPress.org users may install this plug-in to let Google know in real-time that your blog has been updated. However, if you are using a caching plugin, make sure that you disable feed caching else Google’s hub won’t see your updated feed content.
If you are syndicating feeds via FeedBurner, you can go to feedburner.com and activate the PingShot service under the Publicize tab. Thus, when FeedBurner discovers any new post in your raw RSS feed, it will ping Google’s hub with the content of that post.
That said, relying on FeedBurner along for Fat Pings may not be a great idea. That’s because FeedBurner may take its own sweet time to poll your RSS feeds for new content and scraper bot have an opportunity to republish your content in that duration.
This is much like participating in a race where scrapers are your rivals. Google is standing at the finish line and you’ve to run really fast to prove that you are the original content creator.
Google Developer Expert, Google Cloud Champion
Amit Agarwal is a Google Developer Expert in Google Workspace and Google Apps Script. He holds an engineering degree in Computer Science (I.I.T.) and is the first professional blogger in India.