Mod Rewrite

From GoBlueMich Wiki
Jump to navigation Jump to search

What exactly is mod_rewrite?

The Apache module mod_rewite allows you to completely modify a request for Apache. Don't want specific file types being requested? Rewrite/redirect those to nothing. Changed your permalink style in your CMS and now a bunch of links on some social media platform no longer work? Let's rewrite those so they do work!

Basic Regex

You'll need a little bit of familiarity with regex if you're going to be writing some rewrite rules. However, since this wiki is mod_rewrite, I won't be going to deep into the regex.

^ - beginning of line
$ - end of line
\ - escapes the next character
( ) - grouping
. (period) - any character
? - zero or one of the preceding character
* - zero or more of the preceding character
+ - one or more of the preceding character
[ ] - character class
{ } - range/amount of preceding character

Rewrite Rule

Description
Defines rules for the rewriting engine
Syntax
RewriteRule Pattern Substitution [flags]

Example:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule ^oldpage\.html$ /newpage.html [R=301,L]
</IfModule>

The Substitution may be a:

The previous example redirects http://domain.tld/oldpage.html to http://domain.tld/newpage.html. The Pattern that it is looking for is beginning of string followed by the word oldpage with .html as the end of the line. The beginning and end of line characters are good to have because it lets you specify exactly what you're looking for. In this case, only if oldpage.html is requested will it redirect.

If you left off the beginning of line character multiple things could be matched, like superoldpage.html or 123-oldpage.html. This is because the pattern isn't looking for a beginning of line, it's simply saying "I want to match oldpage.html where html is at the end of the line.". In this case oldpage.html is part of the string superoldpage.html and 123-oldpage.html, and if those were requested, they would redirect.

RewriteCond

Description
Defines a condition under which rewriting will take place
Syntax
RewriteCond TestString CondPattern

The most common RewriteCond you may see is %{HTTP_HOST}. This is used to compare the requested domain or sub-domain against your CondPattern. A good example would be if you were looking to redirect domain.tld to www.domain.tld:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_HOST} ^domain\.tld$
RewriteRule ^(.*)$ www.domain.tld/$1 [R=301,L]
</IfModule>

This is checking if the requested domain is domain.tld and, if so, the following rewrite rule applies. In the rewrite rule, you can see that the pattern it's looking to match is anything. Any requested URI will be redirected to www.domain.tld. But wait, what's that $1 mean? That is called a back reference, which is a variable that equals whatever was in the parenthesis in the pattern. If you have more than one back reference (set of parenthesis), you can use $1-$9 to pull the information. More examples later.

Regex is only required in your pattern fields. You can, for example, use it in the Substitution field of a RewriteRule but is not necessary.

RewriteBase

Description
Sets the base URL for per-directory rewrites
Syntax
RewriteBase URL-path

Example:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /yolo/
RewriteRule ^index\.php$ welcome.php [R=301,L]
</IfModule>

For the example above, any request for domain.tld/index.php is going to be redirected to domain.tld/yolo/welcome.php. This could be useful if you moved specific pages into a sub directory and you wanted to make sure that any old links out on the web are still able to get to the right place.

A typical RewriteBase path is /: RewriteBase /, so that file references are to the document root for the domain.

Examples

Typically the rewrite rules you put in .htaccess are not going to need domain checking, since you typically have a one-to-one relationship between domains and document roots. However, sometimes you will have parked domains and since they share the docroot of the domain you parked it on, maybe you'll want domain checking. Either way, let's get into it.

Redirect single page

To redirect a specific page, we just need to make the Pattern match that file name. Let's say we want to redirect http://www.dom.tld/prices.html to http://www.dom.tld/products.php. We'll be looking for:

start of string
^
file name
prices
extension
\.html
end of line
$

Put it together and what do we got?

^prices\.html$

Since the Substitution doesn't require regex, I don't use it. Again, the format for a RewriteRule is:

RewriteRule Pattern Substitution [flags]

Let's fill in the "missing" pieces:

RewriteRule ^prices\.html$ http://www.dom.tld/products.php

Any time http://www.dom.tld/prices.html is requested, it will get redirected to http://www.dom.tld/products.php.

Rewrite a URL

If permalinks have been changed from a /year/month/postname format to the more common /postname, all those links you posted on Facetigram and Twitterbook and Pinspace, won't be changed and CMS's don't have anything built in to redirect a request that is in a different format to the one being used. There's a few different ways you could use back references to essentially strip out certain parts of the URI, but I'll cover what I consider the most specific and easiest.

We want to match the format /2016/01/crackers, but instead of crackers, we want any of the post names.

We need to match:

beginning of the string
^
4 digits
[0-9]{4}
slash
/
2 digits
[0-9]{2}
slash
/
any characters
.+
end of the string
$

If we put that together, we get:

^[0-9]{4}/[0-9]{2}/.+$

We still need to somehow reference the post name in the Substitution and that's where back references come into play. Let's put parenthesis around the .+ so we can call whatever is matched:

^[0-9]{4}/[0-9]{2}/(.+)$

Now, let's put that into a rewrite rule:

<IfModule mod_rewrite.c>
    RewriteEngine on
    RewriteRule ^[0-9]{4}/[0-9]{2}/(.+)$ http://www.domain.tld/$1 [R=301,L]
</IfModule>

Because there is only one set of parenthesis and that's what we want to append to the redirect, we put $1 in the URI spot in the Substitution.

You may have noticed I put the rewrite rule in an IfModule section and turned RewriteEngine on. You don't have to do this, but I do this as best practice. The IfModule is there in case the server doesn't actually have mod_rewrite, but if you're working on a cPanel server, it should be there. Similar reason for the RewriteEngine, more than likely it should be on, but there may be a global setting to have it turned off.

Force WWW

On all pages =

There are a few different ways to accomplish this, but overall, you need to check the %{HTTP_HOST} variable to see what is being requested. This variable contains the domain only, such as www.domain.tld or domain.tld or even sub2.sub.domain.tld. What we need to look for is if the beginning of the line doesn't start with www, so we're basically just looking for the domain:

beginning of line
^
domain
domain\.tld
end of line
$

Putting that together gives us:

^domain\.tld$

Our RewriteCond will look like:

RewriteCond %{HTTP_HOST} ^domain\.tld$ [NC]

The [NC] stands for NoCase, or case-insensitive. Now, since we are wanting to just for WWW on any request, we need to match any string for the URI and call that with a back reference in the Substitution, like so:

RewriteCond %{HTTP_HOST} ^domain\.tld$ [NC]
RewriteRule ^(.+)$ http://www.domain.tld/$1 [R=301,L]

The R=301 stands for Redirect and the type of. The L stands for Last, or last rule that should be applied to the request. There are many other flags that can be used, and Apache's documentation is fantastic for that.

On a specific domain

As an alternative to the above example, you can specify the domain you want the www to redirect to. This can be helpful if you have addon or subdomain documentroots within the main documentroot of your site.

 RewriteEngine on
 RewriteCond %{HTTP_HOST} !^www\.your_domain\.com$
 RewriteRule ^(.*)$ http://www.your_domain.com/$1 [R=301,L]

Strip WWW from all pages

In this scenario, we want to remove WWW from all requests. The rule is almost the same as forcing WWW but reverse (obviously, I know).

RewriteEngine on
RewriteCond %{http_host} ^www\.domain\.com [NC]
RewriteRule ^(.*)$ http://domain.com/$1 [R=301,NC]

non-domain/non-protocol specific version:

RewriteEngine on
RewriteCond %{HTTP_HOST} ^www\.(.*) [NC]
RewriteRule ^(.*)$ %{REQUEST_SCHEME}://%1/$1 [R=301,NC,L]

Force SSL

On all pages =

This rule checks to see if (2) the request is not made over HTTPS directly and (3) the request is not made over HTTPS via a proxy; if neither (4) the request is redirected. Also we'll use %{HTTP_HOST} in line (4) to avoid typos with the domain name.

RewriteEngine On
RewriteCond %{HTTPS} off
RewriteCond %{HTTP:X-Forwarded-Proto} !https
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
On a specific domain

Handy if you have a WP multisite with only some domains using https:

 RewriteEngine on
 RewriteCond %{HTTP_HOST} ^(www\.)?domain.com$
 RewriteCond %{HTTPS} off
 RewriteCond %{HTTP:X-Forwarded-Proto} !https
 RewriteRule ^ https://www.domain.com%{REQUEST_URI} [NC,L,R=301]

Force WWW and SSL on all pages

This is actually two rules that handle three possibilities: (1) both "www" and "https" are missing, (2) only "www" is missing, and (3) only "https" is missing. The first rule will catch #1 and #2 redirecting to www with https, and the second rule catches case #3 adding the missing https.

RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule (.*) https://www.%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
RewriteCond %{HTTPS} off
RewriteCond %{HTTP:X-Forwarded-Proto} !https
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]

Remove SSL from pages

Template:Warning

The option and what we're rewriting to are just going to be reverse of forcing SSL. Here we set [OR] on the condition in line 2, since we can see HTTPS is being used either if this is requested directly over HTTPS or the X-Forwarded-Proto HTTP header is set by a proxy. Requiring both to match (aka "AND" which is the default) would be a mistake and would not catch HTTPS being proxied to HTTP.

RewriteEngine On
RewriteCond %{HTTPS} on [OR]
RewriteCond %{HTTP:X-Forwarded-Proto} https
RewriteRule ^(.*)$ http://%{HTTP_HOST}/$1 [R=301,L]

Redirect Non-CMS pages to CMS pages

Just like with anything else, there's a few ways to accomplish this, so I'll give a few examples.

Example 1

This first example will rewrite any html or htm pages to their respective posts in the CMS (using Wordpress format as examples):

RewriteRule ^(.+)\.html?$ http://%{HTTP_HOST}/$1 [R=301,L]
Example 2

Going off the earlier permalink format rewrite, let's say you need to strip out part of a file name AND the extension. Sounds rough, eh? Surprisingly not. We just need to surround the parts of the request we want to keep with parenthesis. So, we have a file called order-form-ab.html and we want to redirect it to just http://domain.tld/order-form. We need to match:

order-form
^(order-form)
-ab
-ab
.html extension at end of line
\.html$

If we put that regex together and into a rewrite rule, we get:

RewriteRule ^(order-form)-ab\.html$ http://domain.tld/$1 [R=301,L]

In short, this will strip off the -ab.html of the request and rewrite to domain.tld/order-form. Instead of using the back reference, you could add the actual text, but again, this can help prevent typos and possibly save some time. This will only work correctly if the post names in the CMS are named that first part, otherwise your CMS will probably give a 404.

Example 3

If you have a lot of files in the previously mentioned format, file-ab.html, you can simply replace the mentioned order-form text with (.+) to match any characters:

RewriteRule ^(.+)-ab\.html$ http://domain.tld/$1 [R=301,L]

Helpful links

Mod_rewrite cheatsheet

Redirect generator good for long lists of redirects