Search Engine Tricks

  Homes arrow Search Engine Tricks arrow Search Engine Optimization - Creating a robot...
 Webmaster Tools
 
Base64 Encoding 
Browser Settings 
CSS Coder 
CSS Navigation Menu 
Datetime Converter 
DHTML Tooltip 
Dig Utility 
DNS Utility 
Dropdown Menu 
Fetch Content 
Fetch Header 
Floating Layer 
htaccess Generator 
HTML to PHP 
HTML Encoder 
HTML Entities 
IP Convert 
Meta Tags 
Password Encryption
 
Password Strength
 
Pattern Extractor 
Ping Utility 
Pop-Up Window 
Regex Extractor 
Regex Match 
Scrollbar Color 
Source Viewer 
Syntax Highlighting 
URL Encoding 
Web Safe Colors 
Forums Sitemap 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
SEARCH ENGINE TRICKS

Search Engine Optimization - Creating a robots.txt file
By: Developer Shed
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating:  stars stars stars stars stars / 0
    2004-01-26

    Table of Contents:

    Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     

    SEARCH DEV MECHANIC

    TOOLS YOU CAN USE

    advertisement

    Search Engine Optimization - Creating a robots.txt file
    by Sumantra Roy

    Introduction

    Some people believe that they should create different pages for different search engines, each page optimized for one keyword and for one search engine. Now, while I don't recommend that people create different pages for different search engines, if you do decide to create such pages, there is one issue that you need to be aware of.

    These pages, although optimized for different search engines, often turn out to be pretty similar to each other. The search engines now have the ability to detect when a site has created such similar looking pages and are penalizing or even banning such sites. In order to prevent your site from being penalized for spamming, you need to prevent the search engine spiders from indexing pages which are not meant for it, i.e. you need to prevent AltaVista from indexing pages meant for Google and vice-versa. The best way to do that is to use a robots.txt file.

    Create your robots.txt file

    You should create a robots.txt file using a text editor like Windows Notepad. Don't use your word processor to create such a file.

    Here is the basic syntax of the robots.txt file:

    User-Agent: [Spider Name]
    Disallow: [File Name]

    For instance, to tell AltaVista's spider, Scooter, not to spider the file named myfile1.html residing in the root directory of the server, you would write

    User-Agent: Scooter
    Disallow: /myfile1.html

    To tell Google's spider, called Googlebot, not to spider the files myfile2.html and myfile3.html, you would write

    User-Agent: Googlebot
    Disallow: /myfile2.html
    Disallow: /myfile3.html

    You can, of course, put multiple User-Agent statements in the same robots.txt file. Hence, to tell AltaVista not to spider the file named myfile1.html, and to tell Google not to spider the files myfile2.html and myfile3.html, you would write

    User-Agent: Scooter
    Disallow: /myfile1.html

    User-Agent: Googlebot
    Disallow: /myfile2.html
    Disallow: /myfile3.html

    If you want to prevent all robots from spidering the file named myfile4.html, you can use the * wildcard character in the User-Agent line, i.e. you would write

    User-Agent: *
    Disallow: /myfile4.html

    However, you cannot use the wildcard character in the Disallow line.

    Once you have created the robots.txt file, you should upload it to the root directory of your domain. Uploading it to any sub-directory won't work - the robots.txt file needs to be in the root directory.

    I won't discuss the syntax and structure of the robots.txt file any further - you can get the complete specifications from http://www.robotstxt.org/wc/norobots.html

    Now we come to how the robots.txt file can be used to prevent your site from being penalized for spamming in case you are creating different pages for different search engines. What you need to do is to prevent each search engine from spidering pages which are not meant for it.

    For simplicity, let's assume that you are targeting only two keywords: "tourism in Australia" and "travel to Australia". Also, let's assume that you are targeting only three of the major search engines: AltaVista, HotBot and Google.

    Now, suppose you have followed the following convention for naming the files: Each page is named by separating the individual words of the keyword for which the page is being optimized by hyphens. To this is added the first two letters of the name of the search engine for which the page is being optimized.

    Hence, the files for AltaVista are

    tourism-in-australia-al.html
    travel-to-australia-al.html

    The files for HotBot are

    tourism-in-australia-ho.html
    travel-to-australia-ho.html

    The files for Google are

    tourism-in-australia-go.html
    travel-to-australia-go.html

    As I noted earlier, AltaVista's spider is called Scooter and Google's spider is called Googlebot.

    A list of spiders for the major search engines can be found at http://www.jafsoft.com/searchengines/webbots.html

    Now, we know that HotBot uses Inktomi and from this list, we find that Inktomi's spider is called Slurp. Using this knowledge, here's what the robots.txt file should contain:

    User-Agent: Scooter
    Disallow: /tourism-in-australia-ho.html
    Disallow: /travel-to-australia-ho.html
    Disallow: /tourism-in-australia-go.html
    Disallow: /travel-to-australia-go.html

    User-Agent: Slurp
    Disallow: /tourism-in-australia-al.html
    Disallow: /travel-to-australia-al.html
    Disallow: /tourism-in-australia-go.html
    Disallow: /travel-to-australia-go.html

    User-Agent: Googlebot
    Disallow: /tourism-in-australia-al.html
    Disallow: /travel-to-australia-al.html
    Disallow: /tourism-in-australia-ho.html
    Disallow: /travel-to-australia-ho.html

    When you put the above lines in the robots.txt file, you instruct each search engine not to spider the files meant for the other search engines.

    When you have finished creating the robots.txt file, double-check to ensure that you have not made any errors anywhere in it. A small error can have disastrous consequences - a search engine may spider files which are not meant for it, in which case it can penalize your site for spamming, or, it may not spider any files at all, in which case you won't get top rankings in that search engine.

    An useful tool to check the syntax of your robots.txt file can be found at http://tool.motoricerca.info/robots-checker.phtml. While it will help you correct syntactical errors in the robots.txt file, it won't help you correct any logical errors, for which you will still need to go through the robots.txt thoroughly, as mentioned above.


    About the Author
    Sumantra is one of the most respected search engine positioning specialists on the Internet. To have Sumantra's company place your site at the top of the search engines, go to http://www.1stSearchRanking.com/t.cgi?3741 For more advice on how you can take your web site to the top of the search engines, subscribe to his FREE newsletter by going to http://www.1stSearchRanking.com/t.cgi?3741&newsletter.htm


    DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

    More Search Engine Tricks Articles
    More By Developer Shed

       

    SEARCH ENGINE TRICKS ARTICLES

    - Time for Your Site`s SEO Tune-up?
    - The Basic Parts of an SEO Campaign
    - Dynamic Content Still Challenging for Search...
    - Google`s Panda Update at One Year Old
    - Why Links Don`t Count Instantly
    - Check Your Code for SEO Issues
    - To Be an SEO, Start With What You Know
    - Don`t Worry About Bad Inbound Links
    - Guard Your Google Places Listing
    - Overlooked SEO Tricks
    - A Simple Long Tail Keyword Strategy
    - Writing for the Long Tail
    - Choosing and Using Keywords
    - Seven More SEO Myths That Can Hurt Your Sit...
    - Google Tips and Tricks

    Developer Shed Affiliates

     



    © 2003-2018 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap