Google has a sitemap.xml file!

Google has a sitemap.xml file.
Google has a sitemap.xml file.

We should not be surprised by this factoid, but check out hxxp:// (replace xx with tt). It is 4 MB in size. If you thought that it would be a sitemap index file consisting of thousands of sitemaps, you’d be mistaken.

The file is 142,111 lines long, which means there are 35,527 URL entries in it. What are the interesting pages?

  • looks interesting, but try loading it in your browser and you are taken to
  • doesn’t load, but you end up at Weird.
  • leads to, which happens to be a 404. Will Google get penalised? Will it lose PR? [I am just parodying forum newbies, relax.]
  • There are plenty of pages relating to ads – AdWords and AdSense, which is to be expected. The usual corporate pages, April Fool gags, zeitgeist, etc.
  • Numerous foreign-language versions of its content for its overseas markets.
  • Numerous university searches, such as – where is Gopher these days?
  • Only the home page has a priority of 1.0; the rest are all 0.5.

Google also has a robots.txt file, but it doesn’t reference this sitemap.


Yes, a pretty small site, if you took out the non-English content. All fits in a single sitemap.xml file. :lol:


Ash Nallawalla

Search strategist experienced in large, complex websites. SEO consultant.

Related Posts

Will Experts Exchange become a victim of the new Chrome extension?

Feel free to share...FacebookTwitterRedditStumbleUponLinkedinemailGoogle just released a new extension for its Chrome browser. Initially I wasn’t sure what it is called, as it seemed to be “block sites from Google’s web search results”. On closer inspection, it is “Personal Blocklist” and here is the official description: The personal blocklist extension will transmit to Google the […]

Read More

JC Penney followup: Doug Pierce’s research for the NYT expose

Feel free to share...FacebookTwitterRedditStumbleUponLinkedinemailA few blogs have picked up the story about the paid links allegedly obtained by JC Penney’s former SEO company SearchDex. Vanessa Fox’s detailed article in SearchEngineLand led me to Doug Unplugged, the blog of Doug Pierce, of Blue Fountain Media. An interesting find by Doug was SearchDex’s client list, which has […]

Read More

Older Posts