We should not be surprised by this factoid, but check out hxxp://www.google.com/sitemap.xml (replace xx with tt). It is 4 MB in size. If you thought that it would be a sitemap index file consisting of thousands of sitemaps, you’d be mistaken.
The file is 142,111 lines long, which means there are 35,527 URL entries in it. What are the interesting pages?
http://www.google.com/a/help/intl/en/admins/overview.html looks interesting, but try loading it in your browser and you are taken to http://www.google.com/a/help/intl/en/index.html
http://www.google.com/a/cpanel/domain doesn’t load, but you end up at http://www.google.com/a/cpanel/domain/new. Weird.
http://www.google.com/a/interest leads to http://www.google.com/a/cpanel/interest, which happens to be a 404. Will Google get penalised? Will it lose PR? [I am just parodying forum newbies, relax.]
There are plenty of pages relating to ads – AdWords and AdSense, which is to be expected. The usual corporate pages, April Fool gags, zeitgeist, etc.
Numerous foreign-language versions of its content for its overseas markets.
Numerous university searches, such as http://www.google.com/univ/calpoly – where is Gopher these days?
Only the home page has a priority of 1.0; the rest are all 0.5.
Google also has a robots.txt file, but it doesn’t reference this sitemap.
Yes, a pretty small site, if you took out the non-English content. All fits in a single sitemap.xml file.
Feel free to share...Google just released a new extension for its Chrome browser. Initially I wasn’t sure what it is called, as it seemed to be “block sites from Google’s web search results”. On closer inspection, it is “Personal Blocklist” and here is the official description: The personal blocklist extension will transmit to Google the […]
Feel free to share...A few blogs have picked up the story about the paid links allegedly obtained by JC Penney’s former SEO company SearchDex. Vanessa Fox’s detailed article in SearchEngineLand led me to Doug Unplugged, the blog of Doug Pierce, of Blue Fountain Media. An interesting find by Doug was SearchDex’s client list, which has […]