The Apache server power commander part 2
By Dirk Brockhausen
In this tutorial's last instalment we started off with a discussion of the basics of Module mod_rewrite. In the example reviewed there we made use of a rule
which, put in full words, states:
"If access to file .htaccess is attempted, return an error message stating that access is denied."
This rule is valid globally, i.e. everyone will receive the specified error message.
We can, however, restrict a rule by what is termed "rule conditions" - in this case, the rule will only be executed if the condition set has actually been met.
Syntax: The condition must precede the rule!
Let us explain this procedure with an example.
(The lines below are entries in file ".htaccess".)
RewriteEngine on
Options +FollowSymlinks
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon
RewriteRule ^.*$ - [F]
The first three lines were covered in detail in Part 1 of this tutorial. Their function is to initialize the rewriting engine.
The last two lines will refuse access to a spider carrying UserAgent "EmailSiphon". This specific spider is an email harvester culling addresses from web pages.
Our line:
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon
is made up of the following three parts:
Directive: RewriteCond
TestString: %{HTTP_USER_AGENT}
CondPattern: ^EmailSiphon
The TestString is a server variable which
is written in the general form of
"%{NAME_OF_VARIABLE}".
In our example we have defined the "HTTP_USER_AGENT"
as "NAME_OF_VARIABLE".
CondPattern is a regular expression. Before we continue with its specifics, let us take a
look at regular expressions and their function in general.
Regular expressions
Regular expressions are a means of describing text patterns. They are used to check if a text pattern is present in any given text. Once determined, this pattern can then be manipulated.
Regular expressions are similar to a small, compact programming language in its own right.
E.g. the regular expression "s/abc/xyz/g" will globally replace the string "abc" in a text by "xyz".
Here is an overview of the most important elements with some examples:
.(dot) - text (any character)
| - alternation (i.e. /abc|def/)
* - quantifier (any number is allowed)
^ $ - line anchors
s - operator (string1 to be replaced by string2)
g - modifier (search parses the whole text)
Regular expressions are construed with the help of these elements and alphanumeric characters.
Regular expressions are not used isolated by themselves; instead, they are integrated in other tools, e.g. in languages like Perl or in text editors such as Emacs.
In connection with Module mod_rewrite they are used in the directives RewriteRule and RewriteCond.
"^" represents the beginning of a string. It follows that the UserAgent must begin with string "EmailSiphon" and nothing else. ("NewEmailSiphon", for example, would not work.) In this case the condition would not be met.
But as this particular regular expression doesn't contain the character "$" (end of line anchor), the UserAgent could, for example, be "EmailSiphon2".
The last script line
RewriteRule ^.*$ - [F]
defines what will happen when a spider is requesting access.
The regular expression "^.*$" signifies:
If access to any file is requested, the error message "forbidden" will be displayed.
The dot "." in the regular expression is a meta symbol
(wildcard) and signifies any random character.
"*" signifies that the string may occur an unlimited number of times. In this case, regardless which specific page is called, an error message will be displayed.
EmailSiphon is, of course, not the only email harvester. Another famous member of this family is "ExtractorPro".
So let's say we want to fend off this spider as well. In this case we will require another condition to be met.
This gives us the following entries to file ".htaccess":
RewriteEngine on
Options +FollowSymlinks
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro
RewriteRule ^.*$ - [F]
The third argument ([OR]) in line:
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
is termed a "flag". In regard to conditions there
exist two possible flags:
NC (no case)
OR (or next condition)
Flag "NC" permits case insensitive testing of the condition pattern.
Example
RewriteCond %{HTTP_USER_AGENT} ^emailsiphon [NC]
This line specifies that both "emailsiphon" and "EmailSiphon" shall be recognized.
If you wish to use multiple flags, you may delimit them by commas.
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro
There are no restrictions to the number of conditions. Thus, you could block 10, 100, 1000 or more established email harvesters. Defining these 1000 conditions is merely a question of server performance and of ".htaccess" transparency.
In the above example, the string "HTTP_USER_AGENT" is being used.
Further server variables are:
REMOTE_HOST
REMOTE_ADDR
For example, if you want to block the spider comming from < www.cyveillance.com >, you will use variable "REMOTE_HOST". Thus:
RewriteCond %{REMOTE_HOST} ^www\.cyveillance\.com$
RewriteRule ^.*$ - [F]
The dot "." in the domain name must be protected by "\" (backslash), otherwise it would be handled like any other meta character.
If you want to block any given IP, the condition will read:
RewriteCond %{REMOTE_ADDR} ^216\.32\.64\.10$
RewriteRule ^.*$ - [F]
In the regular expression, enter the IP in its entirety, delimited by the line anchors.
You may even exclude a whole IP range from access:
RewriteCond %{REMOTE_ADDR} ^216\.32\.64\.
RewriteRule ^.*$ - [F]
This example will cover all individual IPs from
"216.32.64.0" through "216.32.64.255".
Here's a little teaser quiz for you to check out your skills. (The solution will be featured in the next part of our tutorial.) Enjoy!
RewriteCond %{REMOTE_ADDR} ^216\.32\.64
RewriteRule ^.*$ - [F]
Quiz question:
--------------
If we don't write "^216\.32\.64\." for a regular expression in the configuration above, but
"^216\.32\.64" instead, will we get the identical effect, i.e. will this exclude the same IPs?
Up until now we have used a simple RewriteRule which will generate an error message. In the 3rd part of our tutorial we will analyze how RewriteRule may be used to redirect visitors to specific files.
Continue with this tutorial >>>
| DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware. |
More Web Development Articles
More By Developer Shed
developerWorks - FREE Tools! |
Join this Rational Talks to You teleconference, featuring Paul Boustany and Mark Krasovich, to speak to the experts about becoming a Rational ClearCase power user. Get a chance to ask your questions and learn tips and tricks for using Rational ClearCase in Agile development FREE! Go There Now!
|
|
|
|
Learn how you can extend modern application lifecycle management to IBM System z through the IBM Rational Software Delivery Platform (SDP). The Did you say mainframe? e-kit includes podcasts, webcasts, tutorials, white and red papers, demos, and articles designed to help ease the challenges of modernizing your enterprise. This complimentary kit for mainframe developers is a practical, how-to guide for making the most of an existing development environment, including the skills and infrastructure already in place at an established enterprise. FREE! Go There Now!
|
|
|
|
Visit IBM developerWorks to download the latest trial version of IBM Data Studio V1.1 at no cost. IBM Data Studio is a comprehensive data management solution that helps you effectively design, develop, deploy and manage your data, databases, and database applications throughout the data management life cycle utilizing a consistent and integrated user interface. Unlike other client-side data management solutions that focus on only one aspect of the application lifecycle or database administration, Data Studio complements the Rational Software Delivery platform, providing unparalleled flexibility for a heterogeneous data server environment across platforms. FREE! Go There Now!
|
|
|
|
Download a free trial version of IBM Rational Developer for System z, software that can help you deliver core development capabilities; the power of Java Platform, Enterprise Edition (Java EE); and rapid application development support to diverse enterprise application development teams. With comprehensive development tools to help create, deploy and maintain traditional enterprise and composite applications, Rational Developer for System z enables developers with different technical backgrounds to easily participate in important technology projects. FREE! Go There Now!
|
|
|
|
Visit IBM developerWorks to download a free trial of the latest release of IBM Lotus Sametime Standard V8.0. Lotus Sametime Standard V8.0 is a platform for unified communications and collaboration that combines security features with an extensible, open solution including integrated Voice over IP, geographic location awareness, mobile clients, and a robust Business Partner community offering telephony and video integration. FREE! Go There Now!
|
|
|
|
IBM Enterprise Modernization solutions help organizations evolve core IT systems towards modern architectures and technologies—reducing the burden of maintenance and freeing up resources to develop new business requirements and capabilities. With the IBM Enterprise Modernization Sandbox for System z you can evaluate IBM Enterprise Modernization solutions focused on five key areas: Assets, Architectures, Skills, Processes and Infrastructures, and Investment. Each solution is based upon real customer experiences and offers a proven path to get you started with your modernization projects. FREE! Go There Now!
|
|
|
|
Join this Rational Talks to You teleconference on November 29 at 1:00 pm ET to participate in an interactive discusssion with Grady Booch around architecture and reuse. Get your questions answered! FREE! Go There Now!
|
|
|
|
As organizations have grown increasingly dependent on online software, the risk of malicious attacks has also become far more serious. Fortunately, well-governed organizations can protect their Web applications by injecting vulnerability assessments and ethical hacks into their software development and delivery processes. This paper describes 12 of the most common hacker attacks and provides basic rules that you can follow to help create more hack-resistant Web applications. FREE! Go There Now!
|
|
|
|
This paper is about the critical role that a discipline called integrated requirements management can play in helping to ensure that your business goals and IT investments are continuously aligned—whether you are sourcing, integrating, building or maintaining software. It also looks at ways that automated IBM Rational® products can work together to help you use requirements in the very best way. FREE! Go There Now!
|
|
|
|
Get a free trial download of IBM Lotus Forms V3.0 (formerly Workplace Forms), which provides a zero-footprint eForms solution to help you automate and move forms-based business processes off the desktop and onto the Web. With Lotus Forms, you can extend applications beyond the firewall by creating a single electronic form document ready for use in both thick and Web 2.0 thin client format. FREE! Go There Now!
|
|
|
|
All FREE IBM® developerWorks Tools! |