Google has a sitemap.xml file!

Reading Time: < 1 minute
Google has a sitemap.xml file.
Google has a sitemap.xml file.

We should not be surprised by this factoid, but check out hxxp://www.google.com/sitemap.xml (replace xx with tt). It is 4 MB in size. If you thought that it would be a sitemap index file consisting of thousands of sitemaps, you’d be mistaken.

The file is 142,111 lines long, which means there are 35,527 URL entries in it. What are the interesting pages?

  • http://www.google.com/a/help/intl/en/admins/overview.html looks interesting, but try loading it in your browser and you are taken to http://www.google.com/a/help/intl/en/index.html
  • http://www.google.com/a/cpanel/domain doesn’t load, but you end up at http://www.google.com/a/cpanel/domain/new. Weird.
  • http://www.google.com/a/interest leads to http://www.google.com/a/cpanel/interest, which happens to be a 404. Will Google get penalised? Will it lose PR? [I am just parodying forum newbies, relax.]
  • There are plenty of pages relating to ads – AdWords and AdSense, which is to be expected. The usual corporate pages, April Fool gags, zeitgeist, etc.
  • Numerous foreign-language versions of its content for its overseas markets.
  • Numerous university searches, such as http://www.google.com/univ/calpoly – where is Gopher these days?
  • Only the home page has a priority of 1.0; the rest are all 0.5.

Google also has a robots.txt file, but it doesn’t reference this sitemap.

Yes, a pretty small site, if you took out the non-English content. All fits in a single sitemap.xml file. :lol:

Mastodon