Search Engine Tricks
  Home arrow Search Engine Tricks arrow Page 3 - How to Control Search Engine Robots
Affiliate Promotion  
Blog Help  
Domain Name Tips  
How To  
Newsletter Marketing  
Online Business Help  
Search Engine Tricks  
Web Development  
Web Hosting  
Website Advertising  
Website Content  
Website Marketing  
 Webmaster Tools
 
Base64 Encoding 
Browser Settings 
CSS Coder 
CSS Navigation Menu 
Datetime Converter 
DHTML Tooltip 
Dig Utility 
DNS Utility 
Dropdown Menu 
Fetch Content 
Fetch Header 
Floating Layer 
htaccess Generator 
HTML to PHP 
HTML Encoder 
HTML Entities 
IP Convert 
Meta Tags 
Password Encryption
 
Password Strength
 
Pattern Extractor 
Ping Utility 
Pop-Up Window 
Regex Extractor 
Regex Match 
Scrollbar Color 
Source Viewer 
Syntax Highlighting 
URL Encoding 
Web Safe Colors 
Whois
 
Forums Sitemap 
Mobile Linux 
APP Generation ROI 
IBM® developerWorks 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
SEARCH ENGINE TRICKS

How to Control Search Engine Robots
By: Michael Rock
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 1 stars1 stars1 stars1 stars1 stars / 1
    2005-05-14

    Table of Contents:
  • How to Control Search Engine Robots
  • Meta tags are...
  • You can deter...
  • Did you know...

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    How to Control Search Engine Robots - You can deter...


    (Page 3 of 4 )

    You can deter crawlers from indexing the 'duplicate' directory by typing this into your robots.txt file. Or if you would like to have the robots.txt file created for you, visit www.rietta.com/robogen. To validate your robots.txt file to make sure it works properly you can visit www.searchengineworld.com/cgi-bin/robotcheck.cgi

    User-agent: *
    Disallow: /duplicate/

    The * after user-agent says that this action applies to all crawlers and /duplicate/ after disallow tells all crawlers to ignore this directory and not search it. For each user-agent and disallow line there must be a blank space between them in order for it to function correctly. So this is how you would create the above two commands into a robots.txt file:

    # this identifies the wayback machine
    User-agent: ia_archiver
    Disallow: /

    User-agent: *
    Disallow: /duplicate/

    One thing to note that is very important: Anyone can access the robots.txt file of a site. So if you have information that you don't want anyone to see don't include it into the robots.txt file. If the directory that you don't want anyone to see is not linked to from your web site the crawlers won't index it anyway.

    An alternative to blocking indexing of your site is to put a meta tag into the page. It looks like this: meta name="robots" content="noindex,nofollow"

    You put this into the head tag of your web page. This line tells the robot crawlers not to index (search) the page and not to follow any of the hyperlinks on the page. So as an example meta name="robots" content="noindex,follow" tells the robots crawlers to not index the page, but follow the hyperlinks on this page.

    More Search Engine Tricks Articles
    More By Jase Dow


     

       

    SEARCH ENGINE TRICKS ARTICLES

    - Search Engine Nightmares: Grammatical Errors...
    - Identifying Keywords
    - Crafting Perfect Keyword Phrases
    - Why Are Search Engines So Popular?
    - Write SEO-Perfect Articles
    - What Does Google Want?
    - Can`t Find the Right Keywords?
    - A Guide to Spamdexing
    - Make it Searchable
    - Search Engine Optimization (SEO) in Internet...
    - Google Adsense - Ads That Make You Money!
    - A Hard Look at PPC, Click Fraud and the Alte...
    - The Net`s New Information Highway
    - Gerrymandering The Google Search Results
    - Dispelling Fears About The GoogleBomb Algori...

     
    Create the Optimal Architecture for your Critical Applications
    Warburton's the largest independently owned bakery in the UK faced a number of d....

     
    Five Best Practices for Deploying a Successful Service-Oriented Architecture
    This white paper describes the benefits you can expect with SOA, and how IBM can....

     
    Gartner Magic Quadrant for Application Delivery Controllers
    Gartner summarizes its view on Application Delivery Controllers, evaluates stren....

     
    Knowledge is Power
    What you don't know can hurt you, and is likely costing you money and increasing....

     
    Rationalizing the Multi-Tool Environment
    The rationalized multi-tool approach is flexible, scalable and cost effective. I....

     




    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 4 Hosted by Hostway
    Stay green...Green IT