Always Block Google from Accessing your Site's Search Results

If you are using Google Custom Search or another site search service on your website, make sure that the search results pages - like the one available here - are not accessible to Googlebot. This is necessary else spam domains can create serious problems for your website for no fault of yours.

Few days ago, I got an automatically generated email from Google Webmaster Tools saying that Googlebot is having trouble indexing my website labnol.org as it found a large number of new URLs. The message said:

Googlebot encountered extremely large numbers of links on your site. This may indicate a problem with your site’s URL structure… As a result Googlebot may consume much more bandwidth than necessary, or may be unable to completely index all of the content on your site.

This was a worrying signal because it meant that tons of new pages have been added to the website without my knowledge. I logged into Webmaster Tools and, as expected, there were thousands of pages that were in the crawling queue of Google.

Here’s what happened.

Some spam domains had suddenly started linking to the search page of my website using search queries in Chinese language that obviously returned no search results. Each search link is technically considered a separate web page - as they they have unique addresses - and hence the Googlebot was trying to crawl them all thinking they are different pages.

External Domains blocked with robots.txt

Because thousands of such fake links were generated in a short span of time, Googlebot assumed that these many pages have been suddenly added to the site and hence a warning message was flagged.

There are two solutions to the problem.

I can either get Google to not crawl links found on spam domains, something which is obviously not possible, or I can prevent the Googlebot from indexing these non-existent search pages on my website. The latter is possible so I fired up my VIM editor, opened the robots.txt file and added this line at the top. You’ll find this file in the root folder of your website.

User-agent: *
Disallow: /?s=*

Block Search pages from Google with robots.txt

The directive essentially prevents Googlebot, and any other search engine bot, from indexing links that have the “s” parameter the URL query string. If your site uses “q” or “search” or something else for the search variable, you may have to replace “s” with that variable.

The other option is to add the NOINDEX meta tag but that won’t have been an effective solution as Google would still have to crawl the page before deciding not to index it. Also, this is a WordPress specific issue because the Blogger robots.txt already blocks search engines from crawling the results pages.

Related: CSS for Google Custom Search

Soft 404 Errors - Google Webmaster Tools

Amit Agarwal

Amit Agarwal

Google Developer Expert, Google Cloud Champion

Amit Agarwal is a Google Developer Expert in Google Workspace and Google Apps Script. He holds an engineering degree in Computer Science (I.I.T.) and is the first professional blogger in India.

Amit has developed several popular Google add-ons including Mail Merge for Gmail and Document Studio. Read more on Lifehacker and YourStory

0

Awards & Titles

Digital Inspiration has won several awards since it's launch in 2004.

Google Developer Expert

Google Developer Expert

Google awarded us the Google Developer Expert award recogizing our work in Google Workspace.

ProductHunt Golden Kitty

ProductHunt Golden Kitty

Our Gmail tool won the Lifehack of the Year award at ProductHunt Golden Kitty Awards in 2017.

Microsoft MVP Alumni

Microsoft MVP Alumni

Microsoft awarded us the Most Valuable Professional (MVP) title for 5 years in a row.

Google Cloud Champion

Google Cloud Champion

Google awarded us the Champion Innovator title recognizing our technical skill and expertise.

Email Newsletter

Sign up for our email newsletter to stay up to date.

We will never send any spam emails. Promise.