Quantcast
Channel: HosterStats.com - Knowledge From Numbers » Search Engines
Viewing all articles
Browse latest Browse all 3

Bing Wrong About Large Website Sitemaps

$
0
0

Bing (Microsoft’s Search Engine) is wrong about large website sitemaps going by its latest blogpost. It uses all the right buzzwords and even a Social Science number for emphasis but it misses one fundamental thing with large websites and their sitemaps – the importance of the lastmod element.

The main problem with extra-large sitemaps is that search engines are often not able to discover all links in them as it takes time to download all these sitemaps each day.

Operators of large websites typically have a well defined sitemaps strategy that prioritises changed content and additions over unchanged sitemaps. The sitemap index files are used, in effect, to signal to search engines which sitemaps have changed. Thus after the initial download of sitemaps, the search engine only needs to check then sitemap index files for changed sitemaps and then download and process the the changed sitemaps only.

Search engines cannot download thousands of sitemaps in a few seconds or minutes to avoid over crawling web sites; the total size of sitemap XML files can reach more than 100 Giga-Bytes.

A Social Science number. Sounds impressive but it is not based on reality. When you have a large site with large numbers of sitemaps, you count bytes.

Between the time we download the sitemaps index file to discover sitemaps files URLs, and the time we downloaded these sitemap files, these sitemaps may have expired or be over-written.

This isn’t just wrong. It is ignorant of the importance of structure in sitemaps and how it applies to large websites.

These are the important things in a sitemap file:
<lastmod>2005-01-01</lastmod >
<changefreq>monthly</changefreq>

They tell the search engine when the sitemap was last updated and when to check again for updates. Both are optional and the changefreq is a hint rather than a demand. With a large site, there is a reliance on sitemap index files and they prioritse the use of lastmod. That means that they already tell the search engine which sitemap files have changed so that all the search engine has to do is to hit the sitemap index file to find out which file(s) to download. Unless the search engine has completely broken its sitemap parsing and misunderstood the sitemap protocol, this works well for the site owner and the search engine.

Again quoting from the sitemap protocol, This is what the important data in a sitemap index file looks like:
<sitemap>
<loc>http://www.example.com/sitemap1.xml.gz</loc>
<lastmod>2004-10-01T18:23:17+00:00</lastmod>
</sitemap>

That lastmod element is important because it tells the search engine, or at least a properly designed one, when the sitemap was last modified. Simple, isn’t it?

Additionally search engines don’t download sitemaps at specific time of the day; they are so often not in sync with web sites sitemaps generation process.

Again, the whole concept of lastmod time is crucial for large websites. But Bing does not seem to understand that simple concept.

Having fixed names for sitemaps files does not often solve the issue as files, and so URLs listed, can be overwritten during the download process.

So from merely being wrong, Bing takes a detour off down the road to complete cluelessness. It is not difficult to see why Google is beating it in the search engine business. One of the key aspects of building large sitemaps is to split the pages into sitemap files and maintain them in those files. They are typically run as a cronjob at specific times. That keeps a continuity of structure and new content gets included in new files where necessary. The sitemaps grow in synch with the architecture of the websites. The new sitemap files get added to the sitemap index files. These are the ones that should be first checked for updates.

The Bing sitemaps post is not a reliable guide to sitemap practices for large websites and it is quite wrong in critical places. It might be better for Bing if it actually talked to people who know about large websites and building sitemaps for large websites. It could save a lot of bandwith for both the Bing search engine and more importantly for large website owners.


Viewing all articles
Browse latest Browse all 3

Latest Images

Trending Articles





Latest Images