The Apache server power commander part 3
By Dirk Brockhausen
In the two preceding parts of this tutorial we explained the basics of Rules and Conditions.
We will now follow up with two examples to illustrate their use for somewhat more complex applications.
The first example deals with dynamicall generated pages while the second example will cover calling up ".txt" files.
For our first example, let's assume that you want to sell several items of merchandise on your web site.
Your clients are guided to various detailed product descriptions via a script:
http://www.yoursite.com/cgi-bin/shop.cgi?product1
http://www.yoursite.com/cgi-bin/shop.cgi?product2
http://www.yoursite.com/cgi-bin/shop.cgi?product3
These URLs are included as links on your site.
If you want to submit these dynamic pages to the search engines, you are confronted with the problem that most of them will not accept URLs containing
the "?" character.
However, it would be perfectly possible to submit an URL of the following format:
http://www.yoursite.com/cgi-bin/shop.cgi/product1
Here, the "?" character has been replaced by "/".
Yet more pleasing to the eye would be a URL of this type:
http://www.yoursite.com/shop/product1
To the search engine, this appears to be just another acceptable hyperlink, with "shop" presenting a directory containing files "product1", "product2", etc.
If a visitor clicks this link on a search engine's results page, this URL must be reconverted to make sure that "shop.cgi?product1" will actually be called.
To this effect we will make use of mod_rewrite with the following entries:
RewriteEngine on
Options +FollowSymlinks
RewriteBase /
RewriteRule ^(.*)shop/(.*)$ $1cgi-bin/shop.cgi?$2
The variables $1 and $2 constitute so-called
"backreferences". These are related to text groups.
Everything called in the clicked URL which is located before "shop" plus everything following "shop/" is defined by and stored in the two variables $1 and $2
Up to this point our given examples made use of rules such as this one:
RewriteRule ^.htaccess*$ - [F]
However, we did not yet achieve a true rewrite in the sense that one URL would be switched to another.
For the entry in our current example:
RewriteRule ^(.*)shop/(.*)$ $1cgi-bin/shop.cgi?$2
this general syntax applies:
RewriteRule currentURL rewrittenURL
As you can see, this command executes a real rewrite.
In addition to installing the ".htaccess" file, all links in your normal HTML pages which follow the format "cgi-bin/shop.cgi?product" must be changed to: "shop/product" (without the quotes).
When a spider visits a normal HTML page of this kind it will also follow or crawl the product links because there is no question mark contained in the link anymore to prevent it from doing so.
So employing this method you can convert dynamically generated product descriptions into seemingly static web pages and feed them to the search engines.
In our second example we will discuss how to redirect calls for ".txt" files to a program script.
Many webspace providers running Apache will feature system log files only in common format. What this means is that these logs will not store visitor Referrers and UserAgents.
However, in relation to "robots.txt" calls it is preferable to have access to this information in order to learn more about visiting spiders than merely their IPa.
To effect this, the entries in ".htaccess" should be as follows:
RewriteEngine on
Options +FollowSymlinks
RewriteBase /
RewriteRule ^\robots.txt$ /text.cgi?%{REQUEST_URI}
Now, when "robots.txt" is called, the applied Rule will redirect your visitor to the program script "text.cgi".
Furthermore, a variable is conveyed to the script which will be processed by the program.
"REQUEST_URI" defines the name of the file you expect to be called. In out example this is "robots.txt".
The script will now read the contents of "robots.txt" and will forward them to the web browser or the search engine spider.
Finally, the visitor hit is archived in the log file. To this effect, the script will pull the environmental variables "$ENV{'HTTP_USER_AGENT'}" etc. This will provide the required information.
Here is the source code for the cgi script mentioned above:
<BEGIN SOURCE CODE>
#!/usr/bin/perl
# If required, adjust line above to point to Perl 5.
######################################################
# (c) Copyright 2000 by fantomaster.com #
# All rights reserved. #
######################################################
$stats_dir = "stats";
$log_file = "stats.log";
$remote_host = "$ENV{'REMOTE_HOST'}";
$remote_addr = "$ENV{'REMOTE_ADDR'}";
$user_agent = "$ENV{'HTTP_USER_AGENT'}";
$referer = "$ENV{'HTTP_REFERER'}";
$document_name = "$ENV{'QUERY_STRING'}";
open (FILE, "robots.txt");
@TEXT = <FILE>;
close (FILE);
&get_date;
&log_hits
("$date $remote_host $remote_addr $user_agent $referer $document_name\n");
print "Content-type: text/plain\n\n";
print @TEXT;
exit;
sub get_date {
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst)=localtime();
$mon++;
$sec = sprintf ("%02d", $sec);
$min = sprintf ("%02d", $min);
$hour = sprintf ("%02d", $hour);
$mday = sprintf ("%02d", $mday);
$mon = sprintf ("%02d", $mon);
$year = scalar localtime;
$year =~ s/.*?(\d{4})/$1/;
$date="$year-$mon-$mday, $hour:$min:$sec";
}
sub log_hits {
open (HITS, ">>$stats_dir/$log_file");
print HITS @_;
close (HITS);
}
<END SOURCE CODE>
To install the script, upload it to your web site's main or DocumentRoot directory by ftp and change file permissions to 755.
Next, create the directory "stats".
A more detailed description on how to install a script can he found in our online manuals, e.g. here:
< http://www.fantomaster.com/fantomasSuite/logFrog/lfhelp.txt >
If your server's configuration does not permit execution of Perl or CGI scripts in the main directory (DocumentRoot), you may wish to try the following RewriteRule instead:
RewriteRule ^\robots.txt$ /cgi-bin/text.cgi?%{REQUEST_URI}
Note, however, that in this case you will have to modify the paths accordingly in the program script!
Finally, here's the solution to our quiz from the previous issue of fantomNews:
RewriteCond %{REMOTE_ADDR} ^216\.32\.64
RewriteRule ^.*$ - [F]
Quiz question:
If we don't write "^216\.32\.64\." for a regular
expression in the configuration above, but
"^216\.32\.64" instead, will we get the identical
effect, i.e. will this exclude the same IPs?
The regular expression ^216\.32\.64 will apply e.g. to the following strings:
216.32.64
216.32.640
216.32.641
216.32.64a
216.32.64abc
216.32.64.12
216.32.642.12
Hence, "4" may be followed by any character string.
However, IP addresses can only have the maximal value
255.255.255.255 - which implies that e.g. 216.32.642.12 is not a valid IP. The only valid IP in the list above is 216.32.64.12!
Although the two regular expressions "^216\.32\.64\." and "^216\.32\.64" allow for different strings, due to the technical limitation of IP addresses to 0-255 this range of IPs will remain excluded.
Continue with this tutorial >>>
| DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware. |
More Web Development Articles
More By Developer Shed
developerWorks - FREE Tools! |
David Barnes, Lead Evangelist for IBM Emerging Internet Technologies will discuss aspects of Web 2.0 that bring value to corporations, academia, and government. He'll also discuss IBM's vision around Web 2.0, including the importance of remixability and consumability. The discussion will culminate with examples of various IBM Software Group solutions you can use to get ahead of the Web 2.0 adoption curve. FREE! Go There Now!
|
|
|
|
Poor Requirements Management capabilities in an Enterprise have been linked to excessive project failures, escalating IT costs, and failure to deliver competitive advantage into the marketplace. Join Brianna M Smith from IBM Rational and learn about how successful organizations align IT and Business stakeholders through collaborative processes and tools for effective requirements management, and how an integrated approach across the IT lifecycle can provide unparalleled visibility and traceability to ensure that project teams are delivering on the business vision by "doing the right things" and "doing things right." FREE! Go There Now!
|
|
|
|
Visit IBM developerWorks to download a free trial of the latest release of IBM Lotus Sametime Standard V8.0. Lotus Sametime Standard V8.0 is a platform for unified communications and collaboration that combines security features with an extensible, open solution including integrated Voice over IP, geographic location awareness, mobile clients, and a robust Business Partner community offering telephony and video integration. FREE! Go There Now!
|
|
|
|
Download a free trial version of IBM Rational Software Analyzer Developer Edition V7.0 to identify bug defects earlier in the software development cycle. Rational Software Analyzer is an extensible software development solution that reduces the expense of bug-fixes by enabling static analysis code reviews and bug identification very early in the development cycle. FREE! Go There Now!
|
|
|
|
Analysts, architects, and developers who have existing COBOL or PL/I skills and want to extend those skills to deploy new workloads on the mainframe can use the IBM Enterprise Modernization Sandbox for System z to find hands-on walkthroughs of common real world scenarios. The scenarios provide examples of how to rapidly design, create, assemble, test, and deploy high-quality Web, Web services, portal, and SOA applications for IBM CICS, IBM IMS, and IBM WebSphere Application Server. FREE! Go There Now!
|
|
|
|
This paper is about the critical role that a discipline called integrated requirements management can play in helping to ensure that your business goals and IT investments are continuously aligned—whether you are sourcing, integrating, building or maintaining software. It also looks at ways that automated IBM Rational® products can work together to help you use requirements in the very best way. FREE! Go There Now!
|
|
|
|
Visit IBM developerWorks to try the IBM SOA Sandbox for process. The SOA Sandbox for process focuses on providing a trial environment with the necessary tooling and components required to gain a better understanding of business processes and how to best improve existing business processes to derive value quickly. FREE! Go There Now!
|
|
|
|
Join this webcast to learn how IBM Rational's Functional Testing solution enables you to implement automation your way, at your pace, with your existing staff. In this webcast, you’ll learn how you can eliminate redundancy of manual test scripts, reduce errors, and increase test coverage through test automation. After this presentation you will understand how IBM Rational Functional Testing solution can streamline your manual testing and make test automation easily attainable. FREE! Go There Now!
|
|
|
|
WebSphere Process Server delivers a unique integration framework that simplifies existing IT resources. Often, as IT assets grow to support business demand, so too does their complexity and manageability. In this webcast, we’ll discuss how WebSphere Process Server helps deliver an SOA infrastructure that provides a common model to orchestrate, mediate, connect, map, and execute the underlying IT functions. Discover how WebSphere Process Server simplifies integration of business processes by leveraging existing IT assets as reusable services without the complexities of traditional integration methodologies. FREE! Go There Now!
|
|
|
|
Viper 2 brings a great value to developer communities including SQL, XML, PHP, Ruby, .NET and Java. You probably already know that DB2 Express-C is free for developers to develop, deploy and distribute. Viper 2 provides a variety of means that help move your application from the development stage to deployment more rapidly. This webcast shows how to best utilize the latest tools available for developing DB2 applications. FREE! Go There Now!
|
|
|
|
All FREE IBM® developerWorks Tools! |