Maximize Crawlability of WordPress Blogs and Prevent Duplicate Content - Solutions to WordPress Crawling Issues
(Page 3 of 4 )
For issue #1, #4, #5 and #6:
Use Robots.txt. Upload it to your server root directory. Below is the suggested robots.txt file that will solve those issues cited above for WordPress, provided you have placed WordPress in the server root directory.
User-agent: *
#Block all dynamically generated URL.
Disallow: /*?
#Block feed pages in 1st order root.
Disallow: /feed
#Block feed pages in a deeper directory
Disallow: */feed
#Block Wordpress admin pages
Disallow: /wp-
#Block this php file.
Disallow: /xmlrpc.php
#Block all track URLs.
Disallow: */trackback
Robots.txt is an exclusion protocol that is applicable in all web server platforms. This is a useful file that will instruct Google bots which pages are not to be crawled, thus diverting their focus to crawl important pages in the site and achieving maximum crawling potential for the important pages.
For issue #2 and #7:
To simplify the post URL and make it shorter, you need to customize the permalinks structure of WordPress. For example, before:
www.thisismyblogexample.com/2008/08/thisismypost/
to:
www.thisismyblogexample.com/thisismypost/
Go to Dashboard-Settings-Permalinks, then under common settings, find "custom structure" and copy and paste:
/%postname%/
You can then do a 301 permanent redirect (simple redirection type) from the old (source) to new URLs (target) using the WordPress Redirection Plugin. This is a great tool for creating 301 redirects and simplifying .htaccess mod rewrites. This is the URL where you can download redirection plugin: http://urbangiraffe.com/plugins/redirection/
Make 301 redirects from all non-www URLs to www URLs in your blog; use the WordPress Redirection plugin and go to Manage-Redirection-Options- and then set the Root domain to "www."
Next: More WordPress Crawling Issue Solutions >>
More Blog Help Articles
More By Codex-M