Using the Meta Robots Tag in Blogger
(Page 1 of 4 )
You may be surprised to learn that many blogs have problems with duplicate content issues. This is because they aren't set up to prevent search engines from indexing their archived content pages. There may be other reasons to restrict search engine spiders from indexing some of the pages on your blog. If you use Blogger, this article will show you how to do it, and fine-tune your control over what the search engines will and will not crawl on your blog.
Some Important Background about the Robots Exclusion Protocol
In Google Blogger, a blog can be hosted in servers other than Blogspot's. This offers several useful advantages, such as uploading robots.txt to the root directory of the server. Robots.txt is one of the robots exclusion protocols that will prevent specific (or sometimes any) types of bots from accessing the site files and directories. There are several reasons to put on restrictions, for example:
Confidentiality of Information – You might have some files containing Social Security numbers, credit card information, personal files and other classified items. By using the robots exclusion protocol, these types of files will not be crawled, and information contained in the file or directory will not be exposed or shown in any manner, such as displaying the information in the search engine results.
Prevent Stealing of Bandwidth - Bots visiting the site consume a lot of bandwidth, which can hinder other users visiting the site. By using robots.txt, you can disallow other bots so that only a limited amount of useful bots can crawl the site to get information. Examples of useful bots include Googlebot, Yahoo Web Crawler and MSN/Live Bot. Other bots will just consume a lot of server resources and slow down the site.
Prevent Duplicate Content - This is one of the most useful features of any robots exclusion protocol such as robots.txt and the meta robots tag. Duplicate content causes some filtering issues with search engines such as Google. Duplicate content dilutes the relevance of an important page by spreading similar information across all site pages. Although search engines like Google uses page rank to determine the page with the highest authority (often the one with the highest number of incoming links), sometimes the original version of a page is less "authoritative." By using robots.txt or the meta robots tag, duplicate pages in the site will not be crawled, thus diverting more importance to authoritative pages.
Increase Search Engine Bots' Crawling Efficiency – Bots from search engines are known to download web pages from the servers. This is called the “crawling process.” Small sites with around 20 pages can be crawled with no problems, but for big e-commerce sites, crawling will take some time. This will cause the deep pages to be less crawled than those near the root directory. Google bots crawl from top to bottom, with the top being the most frequently crawled because it receives the highest number of inbound links. By prioritizing which files are to be crawled, Googlebot can index the site with priority in mind, which will increase crawling efficiency. The deep pages can be crawled faster using this method.
Robots exclusion standards can be implemented in two ways:
Using Robots.txt – The crawling instructions are written in a text file and uploaded to the root directory of the server.
Using Meta Robots Tag – The crawling instructions are embedded in the HTML code or website template using <meta name="robots" content="robots-terms"> tag. This is especially useful if the blog is hosted in Blogspot, where authors cannot upload a robots.txt file.
This article covers the implementation and techniques of using the meta robots tag in a Google blog hosted in Blogspot.com. The main objective is to block duplicate content and present unique, accurate information to search engines for the best crawling efficiency.
More Blog Help Articles
More By Codex-M