Blog Help
  Home arrow Blog Help arrow Page 2 - Maximize Crawlability of WordPress Blogs and ...
Affiliate Promotion  
Blog Help  
Domain Name Tips  
How To  
Newsletter Marketing  
Online Business Help  
Search Engine Tricks  
Web Development  
Web Hosting  
Website Advertising  
Website Content  
Website Marketing  
 Webmaster Tools
 
Base64 Encoding 
Browser Settings 
CSS Coder 
CSS Navigation Menu 
Datetime Converter 
DHTML Tooltip 
Dig Utility 
DNS Utility 
Dropdown Menu 
Fetch Content 
Fetch Header 
Floating Layer 
htaccess Generator 
HTML to PHP 
HTML Encoder 
HTML Entities 
IP Convert 
Meta Tags 
Password Encryption
 
Password Strength
 
Pattern Extractor 
Ping Utility 
Pop-Up Window 
Regex Extractor 
Regex Match 
Scrollbar Color 
Source Viewer 
Syntax Highlighting 
URL Encoding 
Web Safe Colors 
Whois
 
Forums Sitemap 
Mobile Linux 
APP Generation ROI 
IBM® developerWorks 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
BLOG HELP

Maximize Crawlability of WordPress Blogs and Prevent Duplicate Content
By: Codex-M
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 3
    2009-04-13

    Table of Contents:
  • Maximize Crawlability of WordPress Blogs and Prevent Duplicate Content
  • Possible WordPress Crawling Issues
  • Solutions to WordPress Crawling Issues
  • More WordPress Crawling Issue Solutions

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    Maximize Crawlability of WordPress Blogs and Prevent Duplicate Content - Possible WordPress Crawling Issues


    (Page 2 of 4 )

    1. Category, post, and archived pages contain the same content.

    • Like other blog publishing platforms, when a new post is created, by default it will be shown automatically in the post URL, categories and archives. The post URL is the actual URL where the post is located. This is the correct URL to include in the search engines, because this will contain some keywords in the URL.


    • WordPress categories just contain those posts, but arranged in topics. The purpose is to easily classify the post, which will greatly help blog visitors. When search engine bots visit these categories, it creates duplicate content problems because it will contain the same information as the post.


    • Google uses the "Page Rank" system to score and classify document importance. The "Page Rank" is a measurement of popularity in terms of the number of back links pointing to the URL. So this means that if those WordPress categories can get higher "Page Rank" than the post, the categories will be placed in the main Google index, while a much more important post will be buried very deep in the Google index and will be a "second priority" document.

    1. Default URLs are complex , do not contain the targeted keywords and will be very long.

    • WordPress's default URLs are very unfriendly and confusing. They will present crawling problems, especially with long URLs. The default URLs contain query strings "?" and other characters, which does not help in the indexing and crawling process in the search engines. The longer those URLs, the greater the risk that they will simply be treated as "second priority" documents in the crawling process.

    1. WordPress's front page contains the same content as the post page, archive page and category pages.

    • WordPress's Front page by default contains same content as the post page. This setup creates much more problems with Google because of its "Page Rank" system. If the front page has more Page Rank than the URLs of your posts (this is always true in reality), the front page will be the one that is prioritized in the search results, not the URLs of posts. Again, those post URLs will be classified as "second priority" documents and will be placed in the supplemental index. However, this risk will be minimized if you post on a daily basis,because the front page is always updated and viewers tend to bookmark or link to post URLs because of frequent updating.

    1. RSS feeds and Track back pages get crawled and contain the same information as front pages, category pages and archived pages.

    • By default, there will always be RSS feeds in WordPress. These RSS feeds are indexable by default, and again contain the same content as the post, front page and archived pages. The RSS feeds are only used in syndicating documents or providing users with updated content. These URLs are not as important as the post URL and front page.

    1. WordPress admin pages are indexable, like the wp-login.

    • The WordPress admin pages that belong in the wp directory (e.g http://www.thisisasampleblog.com/wp-login.php) are indexable by default. This looks unprofessional, exposes security problems and should not be crawled.

    1. Dynamic URLs in WordPress are also crawlable.

    • Dynamic URLs have no place in WordPress if you need to maximize search engine crawling of your blog. Dynamic URLs are used in previews, blog search results and admin pages, which are not important pages.

    1. Default WordPress installation does not give a 301 redirect from non-www pages to www version.

    • Have you noticed that when you type the non-www version of your blog into your browser's address bar, you get the same information as the www version? This will create crawling problems and decrease the potential for maximizing the crawling of search engine bots on your site, particularly if you are targeting Google. The reason is that if search engines get the chance to crawl those non-www pages, they will end up as duplicate content and give signal that your blog pages are confusing to crawl, especially if the search engines cannot properly determine the right documents to index; you must provide some guidance or clues. In the SEO industry, having both www and non www pages indexed is called a "canonical issue."

    1. Keep in mind that some installation of WordPress does not give 404 header statuses but will give a 200 header status, even though the URL does not exist at all.

    • It is true that there will be some WordPress blogs in some servers that will not return a 404 header status even if the page does not exist. This is a particularly strange and rare event, but if your blog is a victim of this issue, then you will fail to get the maximum crawling priorities for your pages. The obvious issue is that if search engines happen to index previously 200 (OK) pages that should be 404 now, it will still return the 200 status (OK), and depending on your setup it could return live documents on your blog, creating duplicate issues.
    1. Current WordPress installation does not offer XML and static sitemaps to guide visitors and search engine crawlers.

    • This is particularly annoying for WordPress, because by default, the admin page does not have enough features to automatically create a sitemap. XML Sitemaps are used by Googlebots to find important URLs in your blog, which will help in maximizing the crawling potential of your blog URLs.

    More Blog Help Articles
    More By Codex-M


       · These are all awesome tips. Thank you very much.
     

       

    BLOG HELP ARTICLES

    - Create a Vlog
    - Creating a Personal Blog People Actually Wan...
    - Offline Blog Managers
    - Get More from WordPress
    - Improving Blogspot Traffic with Free Google ...
    - LiveJournal: Blogs for All
    - WordPress Security Tips
    - Blogging with Yahoo
    - Sell Songs Online Safely Through Your Blog
    - Are E-Zines Still Relevant?
    - Rewriting URLs and Doing WordPress Redirects...
    - Build a Better Blog
    - Setting a Static Front or Main Page in Blogg...
    - Maximize Crawlability of WordPress Blogs and...
    - Create a Blogging Site

     
    Create the Optimal Architecture for your Critical Applications
    Warburton's the largest independently owned bakery in the UK faced a number of d....

     
    Five Best Practices for Deploying a Successful Service-Oriented Architecture
    This white paper describes the benefits you can expect with SOA, and how IBM can....

     
    Gartner Magic Quadrant for Application Delivery Controllers
    Gartner summarizes its view on Application Delivery Controllers, evaluates stren....

     
    Knowledge is Power
    What you don't know can hurt you, and is likely costing you money and increasing....

     
    Rationalizing the Multi-Tool Environment
    The rationalized multi-tool approach is flexible, scalable and cost effective. I....

     




    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 2 Hosted by Hostway
    For more Enterprise Application Development news, visit eWeek