After many testings, I understand it much better, now. Having said that, I do not pretend to know it perfectly. I also make maistakes, regulary.
So, although I tested all these on my local server and PowWeb, do not trust what I say blindly. Rather, please use it as a suggestion or bases for your own experiments.
Please see this page for commonly requested rules, such as missing trailing slash fix, forcing the URL with or without 'www'.
'''Options Followsymlinks''' # You need this in your .htaccess on some servers. # You don't need it on PowWeb because it's already set. '''RewriteEngine on''' # You must put this line '''once''' before to put RewrteRule. '''RewriteBase /''' # You don't have to put this in many cases. # But specifying it can reduce the risk of endless looping. # # Also, certain internal redirect rules and conditions # will not work well if you don't use this line. # So, put it, and forget about adding / at the beginning of the # substitute string # With "RewriteBase /", substitute string of RewriteRule will have # forward slash added automatically. # # But this have no effect on %{REQUEST_URI}, and it starts with # forward slash unlike URL of RewriteRule. # # RewriteRule URL: "/" is stripped automatically. # RewriteRule substitute: "/" is added automatically. # %(REQUEST_URI}: always start with "/".
RewriteRules
# RewriteRule have following format. # Each part separated by a space. '''RewriteRule''' REGEX substitution [options if any] # REGEX is Perl compatible Regular expression. # ex. RewriteRule ^(abc.*)$ xxx$1 [R] # # ^ == beginning of URL # $ == end of URL # . (period) == matches any one character # * == matches with zero or more of previous character. # # The URL is matched against REGEX without "http://host.com/", # and the QUERY_STRING part (the part after ? ) stripped. # ie. The address: http://zoro.com'''/faint.cgi'''?a=2&b=xxx # The URL used in the rule: '''/faint.cgi''' # # So, this matches any URL starting with "abc". # And entire URL is grouped by ( ) to be used in the substitution # (referenced by) $1. # As an option, this specify external redirect 302 (Temorary) # RewriteRules may have additional Conditions. RewriteCond STRING REGEX [optional flags] RewriteCond STRING expression [optional flags] RewriteRule REGEX substitution [optional flags] # For the details of REGEX, expressions, substitutions and options, # you should read Apache module mod_rewrite documentation. mod_rewrite documentation http://httpd.apache.org/docs/mod/mod_rewrite.html URL rewriting Guide http://httpd.apache.org/docs/misc/rewriteguide.html
What happens when the rules modify URL (in .htaccess)
When RewriteRule that changes URL is matched,modified URL will go through Next roud of processingfrom the beginning of the rule sets, again.This is a very important point.It's because of the way Apache handles per-directory context (.htaccess, or in <directory> tab).It has to do per-directory auth and other processes for the newly generated path.In per-server or virtual host context (in httpd.conf), this doesn't happen.ex. RewriteRule ^(.*)$ index.cgi?page=$1 [QSA] In this example, URL is modified and the RewriteRule will be applied again. ex. RewriteRule ^(.*)$ $1?Added_QUERY_STRING [QSA] In this example, URL is unchanged. (For this document, URL is the part of address without "http://host.com/" and "?QUERY_STRING" portion.) Thus, it will not go through 2nd round of processing (the modification on QUERY_STRING does not affect this), unless it's a URL for a directory and transformed according to the DirectoryIndex directive or when resulting filepath doesn't exist. ex. RewriteRule ^(abc.*)$ xxx/$1 [L] RewriteRule ^xxx/(.*)$ yyy/$1 As the 1st rule changes URL, modified URL "/xxx/abcSomething" will go through next round and checked against from the 1st rule, again. So, even the 1st rule has [L] (last) option to indicate following rules to be skipped, '''in the 2nd round''', the URL will match the next rule, and modified to "/yyy/abcSomething". Then, this URL will go through 3rd round. But it will not match any rules, and the processing stops there.
How to avoid endless processing
If you make a mistake and processing go into endless loop, Apache will stop after preset numbers of times and issue 500 (Internal Server Error) with a log entry saying
"mod_rewrite: maximum number of internal redirects reached. ....".If you want to see that, try this. (on your home machine...)
ex. RewriteRule ^(.*)$ xyz/$1 One of the easiest method to avoid endless looping is using %{ENV:REDIRECT_STATUS}
ex1.# Add loop stoper condition RewriteCond %{ENV:REDIRECT_STATUS} !^$ RewriteRule ^(.*)\.html xyz/whatever.cgi?$1 [L] ex2.# The loop stopper rule for many following rules RewriteCond %{ENV:REDIRECT_STATUS} ^. RewriteRule ^ - [L] RewriteRule ^(.*)/abc\.html$ abc/$1 [L] RewriteRule ^(.*)$ xyz/$1 [L] .... Often RewriteCond with %{REQUEST_URI} has been used.
ex. RewriteCond %{REQUEST_URI} !^/xyz/ RewriteRule ^(.*)$ xyz/$1 Note 1. We don't need to put / in front of xyz in the right side of RewriteRule line, with RewriteBase /. If we put it, it does not harm most of the cases. But it will create endless loop easier, and it will break other rules that does not expect multiple / in front of URL. So, it's better not to put it. Note 2. REQUEST_URI contains "/" + URL To check from the beginning og REQUEST_URI, we must use "/" like: ^/something However, the example code can be writteh like this, saving one REGEX processing.
ex2. RewriteRule !^/+xyz/ xyz%{REQUEST_URI} Or, RewriteRule (that does not change URL) and [L] option will do the same.
ex. RewriteRule ^/xyz/ - [L] RewriteRule ^(.*)$ xyz/$1 ex2. RewriteRule \. - [L] RewriteRule ^(.*)$ index.php?p=$1 Often, PHP people uses many many way too many RewriteRules to achieve SEO friendly URL hype. By placing a simple rule that exclude any URL with a dot from being processed, you can save lots of wasteful REGEX processing for narmal files, such as .html, .jpg, .css. Example of this loop stopping method:Generic .htaccess Method for sub/pointed domainsHowever, you can't use these tricks in some cases. (a.html => b.html, b.html => a.html)
One alternative is using %{THE_REQUEST}ex. RewriteRule a.html b.html [L] RewriteRule b.html a.html We can use %{QUERY_STRING} to check if it is the first round or subsequent one in some cases. But this method alone can't treat some cases.ex. RewriteRule a.html b.html [L] RewriteCond %{THE_REQUEST} ^(GET|HEAD)\ /b.html RewriteRule b.html a.html THE_REQUEST contains the first line of HTTP request header. It is something like "GET /index.html HTTP/1.1". So, by verifying this variable, we can make sure that the URL of "b.html" is not coming from the internal redirect but from the original request. Note. To match a string with a space, just escape with "\" as shown in the example above. # Stop processing if QUERY_STRING ends with certain string. # And remove that string from QUERY_STRING # ($1 is from RewriteRule line, and %1 is from RewriteCond line.) # # The first Rule is needed to stop the URL that ends with slashe. # RewriteCond %{QUERY_STRING} __XXX__$ RewriteRule /$ - [L] RewriteCond %{QUERY_STRING} ^(.*)__XXX__$ RewriteRule ^(.*)$ $1?%1 [L] # Add key string to the QUERY_STRING RewriteRule ^(.*)$ $1?%{QUERY_STRING}__XXX__ # Following rules will be checked only onece, # as long as QUERY_STRING is unmodified or the key string is kept. # No modification to the QUERY_STRING RewriteRule ^(other.*)$ rules/$1 # QUERY_STRING is conserved RewriteRule ^(more.*)$ rules.cgi?$1%{QUERY_STRING} # key string is placed, explicitly. RewriteRule ^(yetmore.*)$ rules.cgi?$1__XXX___ I tried to use [E=ENVVAR:STRING] to distinguish the subsequest round but ENV variables seem to be reset on each round... So, following example for preventing 2nd round does not work.
RewriteCond %{ENV:DONE} YES RewriteRule ^.*$ - [L] RewriteRule ^.*$ - [L,E=DONE:YES] This trick can be used to check if a certain rule is matched in the same round, though.
But you can use [S] (skip) and/or [C] (chain) in most cases instead of this %{ENV:VAR} trick.RewriteRule ^pattern$ substitute [E=DONE:YES] RewriteRule ^pattern1$ substitute1 RewriteRule ^pattern2$ substitute2 RewriteRule ^pattern3$ substitute3 RewriteCond %{ENV:DONE} YES RewriteRule ^patternX$ substituteX Rant
I think mod_rewrite is BADLY designed.It doesn't have definitive way to control looping,and we can't use variables in the right had side of the RewriteCond.Also, the fact ENV variables get reset on each round is stupid ...
How to give new ENV variable to cgi
Note. On a server with suExec, most env variables are cleansed by suExec.You should prefix the env var with 'HTTP_' and it will survive!We can put any information we want to pass to CGI in QUERY_STRING, though.[E=VAR:STRING] option can be used to set ENV variable. But it will not go to cgi if it's set in the URL changing rule, or in the round that has URL changing rule.
These rules should be placed at the beginning of the ruleset so that they are set again withoutt fail at the final round.
# Example of setting 'HTTP_AUTH' ENV variable to Authorizatio header. # As this is not usually available to CGI program, # it is very useful in DIY authentication with CGI. RewriteRule ^.*$ - [E=HTTP_AUTH:%{HTTP:Authorization}] # Example of setting 'HTTP_TIME' to the TIME variable # available in RewriteRules. RewriteRule ^.*$ - [E=HTTP_TIME:%{TIME}] QUERY_STRING can be used for passing parameters, too.
RewriteRule ^(.*)$ $1?AUTH=%{HTTP:Authorization} [QSA] or RewriteRule ^(.*)$ $1?%{HTTP:Authorization}__SEPARATER__%{QUERY_STRING} __SEPARATOR__ can be anything you want. You don't need it if QUERY_STRING is empty.You can pass any ENV variable, such as THE_REQUEST or TIME, this way.
This can be usefull for debugging RewriteRule
How to make rules more efficient
It is similar to other programing language.
Identify the resource consuming part, and try to minimize the trafic that goes through that part.
For RewriteCond, "=" is probably the least time consuming of all, and -f, -d, -s, and others more time consuming. -U and -F could be the most costly one.
REGEX maybe pretty heavy if the string checked is long and the pattern is complex. Cheking against REQUEST_URI can be heavy because it inclueds both URL part and QUERY_STRING, which can be very very long...
By using ^ to do forward matching, it may require less backtracking, thus more efficient. (On powerful servers, the difference can be invisible...)
RewriteCond %{REQUEST_URI} !subdir/ RewriteRule ^(.*)$ subdir/$1 # This is not efficient...and may not work sometime # because the REQUEST_URI may contain "subdir/" # in a part of QUERY_STRING, and also it matches # "/sub/sub-subdir" and "/abcdefgsubdir/" and so on. RewriteCond %{REQUEST_URI} !^/subdir/ RewriteRule ^(.*)$ subdir/$1 # Now, it is more efficient and no room for confusion.
Secret directory
I wrote a separate page for secret directory. secretdir.html (
This can be used with CGI Authentication and other tricks.
Anti-Leech, bandwidth saving, Referer blocking
I understand the desire to do these thing. However, it's not really effective, and it often causes more headaches.
I do not recommend it unless you know well about rewriting and the limited effectiveness and potential problems.
If you are a user of PowWeb, we have enough bandwidth allowance to cope with usual "Leeching".
If you dislike leeching, maybe you can add your URL on the picture using ImageMagik or sitebuilder tool soon available from PowWeb!
More about Anti-Leech measuresHowever, kicking off certain robots is a good practice. Some robots will access well over several hundreds items per minutes. If your script is hit like this, the server may experience lots of load. Although there seems to be a built-in safety cut off mechanism of PowWeb, we can do our part in this.
## Keep bad robots off. ## Give them blank page instead of 403. Cost less for thr server RewriteEngine on RewriteBase / RewriteRule ^blank\.txt - [L] RewriteCond %{HTTP_USER_AGENT} (MSIECrawler|Ninja|Microsoft|MSFront|WebCopier|Pockey) [NC] RewriteRule ^(.*)$ blank.txt [L] Usually, the last rule is like this, "RewriteRule ^(.*)$ - [F]" and it gives 403 Forbidden error with error_log entry.
I don't like to see massive entries in my error_log because detecting more serious trouble will be harder due to too many garbage entries.
So, I decided to send them a nice white blank page without any data.
This saves bit of bandwidth, simplifies my error_log, and cost a little less for the server because there is no need to make two log entries.
And as it is a little more polite to send blank page with 200 OK response code than 403 Forbidden, it may even reduce the risk of atacks by frustrated youth.
Search Engine friendly URL
It seems to be another hot topics among PHP users.I think it's better to parse the URL in PHP rather thantrying to do something with mod_rewrite.The idea is, using such a URLhttp://host.net/aa/bb/cc/ddis better than usual php thinghttp://host.net/index.php?aa=bb&cc=ddAnd with mod_rewrite,RewriteRule ^/*index.php - [L] RewriteRule ^/*(([^/]+)/+([^/]+)/+([^/]+)/*(.*)$ index.php?$1=$2&$3=$4 [L] do something like this.While simple example like this works, more complex rule could be tricky.Although the string parsing power of mod_rewrite is not that bad,it should be much easier to do your own parsing in phpusing $_SERVER{QUERY_STRING} or $_SERVER{REQUEST_URI} and other variables.Samething apllies to Perl and other language.But php people seem to be more eager to do this...somehow.Maybe they don't want to change their script or they don't knowwhat to change, how to change...Oh well, here is very inefficient but flexible version.This one can treat any number of parameters.But I think it is a resource consuming hog.RewriteRule ^/*index.php - [L] RewriteRule ^/*([^/]+)/+([^/]+)(.*)/*$ $3?$1=$2 [L,QSA] RewriteRule ^([^/]+)$ index.php?$1 [L,QSA] Tring to double the parameters treated in one round.
RewriteRule ^/*index.php - [L] RewriteRule ^/*([^/]+)/+([^/]+)/+([^/]+)/+([^/]+)(.*)/*$ $5?$1=$2&$3=$4 [L,QSA] RewriteRule ^([^/]+)$ index.php?$1 [L,QSA] A little better idea is, using such a URL
http://host.net/bb/dd (instead of http://host.net/aa/bb/cc/dd) to obtain this.http://host.net/index.php?aa=bb&cc=ddWhile the URL looks better and more efficient, this one is not flexible.RewriteRule ^/*index.php - [L] RewriteRule ^/*(([^/]+)/+([^/]+)/*(.*)$ index.php?aa=$1&bb=$2&$3 [L]
Serve dynamic page statically
Generating a page for each and every request is a pure waste of server resources unless there is a good reason.
Most of the time, exactly same page can be served many times. So, it makes sense to implement "generate once, serve many times" system.
I wrote an example of such system (very simple one) for someone.
Please take a look if you are intersted. [cache.html]
ampescape ?
I saw a question about RewriteMap, recently. As we can't use RewriteMap in .htaccess, I wrote a RewriteRule that escape & to %26, as ampescape would do.
RewriteRule ^/*whatever\.php - [L] RewriteRule ^([^&]*)&(.*)$ $1\%26$2 [NE,N] RewriteRule ^([^&]*)$ whatever.php?title=$1 [L] Some wiki and php scripts have difficulity with a URL with & because it will split QUERY_STRING with it.
But this solution is not so efficient.
You had better modify script with this type of rule.
RewriteRule ^(.*)$ whatever.php?$1 [L] And use $_SERVER{'QUERY_STIRING'} in place of
$_GET{'title'},$_POST{'title'},or $REQUEST{'title'}.Note. $_SERVER{'QUERY_STIRING'} may contain "%20" in it. If so, you may have to replace it with ' '(space). (php may URI_decode %20 automatically...
but I don't know well about php.)
Banner rotation
Simple (and not so simple) banner rotation can be done with rewriterule.
See BannerRotation
Trouble shooting
If the rewrite has external redirect, it essential to have a tool that shows response headers.
See RedirectProblems
What else?
You had better consult mod_rewrite documents and mod_rewrite Guide, as well as other web pages and forums.
mod_rewrite documentation http://httpd.apache.org/docs/mod/mod_rewrite.html
URL rewriting Guide http://httpd.apache.org/docs/misc/rewriteguide.html
modrewrite.com Forum. Lots of questions. Some answers. Some of the answers are not really correct... http://forum.modrewrite.com/
Questionable color of this page is dictated by blueberry cream cake, my favorite dessert.
This page is http://Check-these.info/mod_rewrite-basic.html
My main site is hosted by PowWeb, one of the best low budget host !
![]()
12,000 MB + 10 Gigs/Day Transfer - $7.77/mo Web Hosting!
Last modified: 2006-04-06_06:18:42 Powered by Wikiciter CMS