What URLs should you put in the XML sitemap of your website / blog? All pages or just the important pages or pages that have not been crawled or indexed yet by search engine spiders.

October 15, 2008

"What URLs should you put in the XML sitemap of your website / blog?" The three confusion and non-overlapping options are (a) everything, (b) just the important pages or (c) URLs that have not been crawled and indexed yet.

Sitemaps, as you know, is a simple text file in XML format that contains a list of URLs that are part of a website. It’s extremely important that you create sitemaps of your site for two reasons:

1. They help you bring certain pages to the notice of search engines that may otherwise be ignored.

2. If there are duplicate URL issues with your site (for instance if abc.com?p=123 & abc.com/123/ point to the same page), you may use Sitemaps to specify the version that get preference in search engines.

Search Suchter of Yahoo! web search team recently suggested that webmasters should put only the important pages in the sitemap, rather than every page of the website because Yahoo uses sitemaps for figuring out which pages are valuable on a site.

I asked Vanessa Fox about her opinion on what should really go in a Sitemap and her preferred approach is that website owners should put a comprehensive list of URLs in the Sitemap.

"Why not tell search engines what the definitive list of pages on your site is? Why limit it to really important ones? One benefit to this is that there’s at least one place other than crawling that Sitemaps can be helpful, and that’s canonicalization. If a search engine has detected that several URLs display the same page, the version of the URL that’s in the Sitemap is a signal as to which is the canonical version."

Now this may sound like contradictory opinion and unfortunately, one site can’t maintain multiple sitemaps to fit the needs of all search engines so what may help here is, as Vanessa points out, if search engines can give us more details about how they use sitemaps and what are some of the best "common" practices.

Until that happens, I will probably continue to dump all URL in my XML sitemap including pages for tags and categories which are not very important from organic rankings point-of-view.

