I host my personal site on a shared Dreamhost instance. I like it because it's relatively inexpensive, I have low traffic, and I'm not doing anything particularly complicated. Having full root access by hosting on a Virtual Private Server(VPS) or Dedicated Server doesn't make sense for what I'm doing. However, I still want to have some server level control over what's available from my website and how. That's where .htaccess
comes in.
Update: as of early 2019 my personal site is using an nginx setup on Digital Ocean, rather than an apache setup on Dreamhost. So far I'm liking the Digital Ocean experience a lot better, for whatever that's worth. :)
What is .htaccess
?
Many hosting options use Apache HTTP Servers on the back end to serve up your website. At the most basic level, your website is a collection of files sitting on a computer somewhere. You want people on other computers to be able to access those files over the internet. The web server is the software that receives requests, finds the correct files, and returns the content to the client. .htaccess
is a file used by Apache to give you more control how those requests are processed. .htaccess
files set the configuration for all requests for files in the directory where it lives.
Why should I care?
Your website is going to be able to function without an .htaccess
file to fine tune your server configuration. But taking advantage of its features can enable cosmetic improvements like forcibly adding or removing www
from your url or eliminating the need for .html
file extensions when linking to pages on your site.
It can also provide security benefits. If you use git to control your site deploys, you might accidentally be exposing credentials or sensitive information to the open internet. You can also force people to use https
when connecting to your site, which, in addition to being the right move security-wise, can help boost your SEO ranking.
Limitations and drawbacks of .htaccess (and shared hosting)
Some people are quick to point out that using .htaccess
for configuration rather than using the main Apache configuration incurs a performance hit. This performance hit gets worse when your server is under high load. In general, you'll probably be leaning on .htaccess
most heavily if you're in a shared hosting environment. If you're just looking to host a small portfolio site or a toy app, diving into cloud computing, getting root access, and becoming your own Apache sysadmin is probably overkill.
If you're attracting a lot of traffic and/or having difficulties with high load on your server, you probably should move to a VPS where you have root access and use the main server configuration. If this is a problem for you, you're probably also growing beyond a one-person operation and should enlist someone to help with site reliability. But if you're just running a small, mostly static site, it's not worth worrying about performance degredation as a result of using .htaccess
.
.htaccess
is more limited in scope than your main Apache configuration in httpd.conf
. You won't be able to do things like directory level operations with .htaccess
. If you're doing a lot of work where you want to control directory level access to your site, a more robust hosting option makes sense. But unless you're working in a complex environment and are looking to do a lot of site reliability fine tuning, you're going to be fine without root access.
Common use case cheat sheet
Here are some things that you might want to use .htaccess
for. Feel free to copy/paste directly from here, as I have copy/pasted from those who came before me.
Many of these snippets require you to turn the RewriteEngine
on. You only need to do this once per file, and can put all of your rewrite requests under it. For ease of copy/pasting, I'm including RewriteEngine On
in all the relevant snippets. Make sure you don't copy it more than once.
Protect yourself from bandwidth theft
Bandwidth theft, or hotlinking, is the practice of linking to an image on someone else's website and displaying it on your own. This uses the bandwidth of the server you're linking to instead of your own. It's pretty rude!
However, you can protect your site from having any of your images hotlinked.
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http(s)?://(.+\.)?mysite\.com [NC]
RewriteRule .*\.(jp(e)?g|gif|bmp|png)$ https://imgur.com/ZtXiCBw.gif [L,R]
(Make sure to replace mysite\.com
with your actual url)
The above code checks where the request is coming from, or the referer. If the referer isn't your site (or empty) it will display whatever image you rediret to instead.
The above link redirects to the following image on imgur:
SEO improvements
Redirect old pages to new location
One of the most basic and useful things you can do with .htaccess
is redirect a page that used to exist at one location to a new URL. This is extremely important for SEO. If you wrote a popular blog post that you're moving elsewhere, you don't want to lose the link equity - or the influence a link has on a search engine - just because your content is changing locations.
In general, you'll probably want to use either a 301 or a 302 redirect. A 301 is a permanent redirect. A 302 redirect is temporary and will not preserve your link equity. However, when mucking around with redirects, you should always use a 302 first and test your changes. If everything looks good, then change it to a 301. You should do this so you don't accidentally forward the link equity from the old location to the wrong place.
When formatting a Redirect stament, the order of arguments is as follows
- status code (301, 302, 404, etc.)
- old file location
- new file location
Redirect 302 /2016/08/02/rad-old-blog-post/ https://newdomain.party/rad-old-blog-post/
Redirect 301 /menu/dessert.html https://new-restaurant-website.supply/menu/dessert/
Enforce https
Some hosting options allow you to force https
when people connect to your site. Others merely allow https
without strictly enforcing its use. Using https
is very important for both security and SEO, so if your hosting provider doesn't enforce https
connetions, you should.
You can rewrite all insecure requests to https
requests with the following snippet.
RewriteEngine On
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
Cosmetic improvements
Don't require .html
Nothing screams tacky and dated like having .html
hanging off the end of your url. I mean, I'm mostly kidding, but I really don't like having that file extension dangling about. Luckily, you can fix this!
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html
This condition checks to see if the request you're making is for a valid directory in your project. We're checking for directories here, instead of files because the request will be something like mysite.com/info
, which looks like a directory name. If it can't find a valid directory, then it checks to see if there is a valid file called info.html
. If there is, rewrite the request to serve content from mysite.com/info.html
.
You can use this same idea to remove other file extensions like .htm
or .php
.
Enforce (or forcibly remove) www
Force www:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301,NC]
Remove www:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www\.example\.com [NC]
RewriteRule ^(.*)$ http://example.com/$1 [L,R=301]
The [NC] flag here stands for No Case, so you don't have to worry about case sensitivity.
The [L] flag stands for "Last", meaning that if this condition is true, execute the RewriteRule
and do not attempt to match other RewriteConds
or perform other operations on the request.
The [R] flag stands for "Redirect". You can pass in an option to R to set the HTTP status code of the redirect. Example [R=302] or [R=404].
Security improvements
Hide .git
directory
Version control is fantastic, even for static sites. It can help you revert to previous versions easily, and lets you look back over the evolution of your code. However, controlling your site updates via git push/pull can open up a security vulnerability. Visitors can publicly access your .git
directory via their browser, and go through your entire repo, including any secrets or keys you might be storing there. If you're hosting something more complex than a static personal site, it also allows anyone to view your source code and copy your work, even if you meant to keep it private.
Luckily, you can redirect any attempts to request your .git
directory.
RedirectMatch 404 /\.git
Simple! You can return any status you like here. I personally like using a 404 because it seems like no such directory even exists on the server, and might deter further attempts to access similar files. However, a 403 also makes sense here. It's up to you.
Restrict access to sensitive files
You probably don't want random visitors to your site to have access to your dotfiles, or other sensitive information. You can disallow access to specific files for everyone, or you can allow/block certain IP addresses in .htaccess
.
Note: You cannot restrict or allow access on a directory level using .htaccess
. This is one of those operations that needs to go in httpd.conf
and requires root access.
# Don't allow anyone to see my .htaccess file
<Files .htaccess>
Order allow,deny
Deny from all
</Files>
The Order
keyword is a little tricky here, so let's go through it. Order
accepts one of two arguments. Either allow,deny
or deny,allow
. Make sure that there is no space in betwen the comma and the second word in the argument. If you put a space there, it will cause all requests to your site to 500. This argument tells Apache in which order it should process allow and deny statements.
allow,deny
means that it will process any Allow
statements first, and Deny
statements last. The most recent statement will override any statements that come before it. So, in the example above, if you wanted to allow your IP address, and only your IP address to view .htaccess
via curl or the browser, you might include a statement like Allow from 198.51.100.254
. However, because your configuration instructs the server to process all Allow
statements first, and then override them, you would still be denied access because Deny from all
was the most recent order.
Conversely, deny,allow
processes all Deny
statements first, and then overrides with Allow
. If you're going to do anything more complicated than denying or allowing access to everyone, make sure you pay attention to your Order
argument, and make sure to test it.
You can also use this syntax for limiting access to specific HTTP methods. For example, if you wanted, you could restrict POST
and PUT
operations to your Local Area Network (LAN).
<Limit POST PUT>
Order deny,allow
Deny from all
Allow from 198.51.100.254
</Limit>
Debugging and other general tips
Make sure you know where your logfiles are. Even if you don't have root access, you should be able to see access and error logs for your site. The error logs will be very useful in debugging any unexpected results or errors you encounter when trying to set up your configuration.
bad flag delimiter
If your error message includes the phrase bad flag delimiter
, you probably included an extra space in your RewriteRule
flags. Flag are comma delimited, not comma + space delimited, so make sure you format your flags like this: [L,R=301,NC]
, not like this: [L, R=301, NC]
Some operation not allowed here
If your error message includes the phrase not allowed here
, you're probably trying to do something that needs to be done in the httpd.conf
main configuration file. You'll need to either figure out some other way to accomplish your goal, or to move to a solution where you have root access and can modify httpd.conf
.
Additional Resources
- The htaccess snippet awesome list on github
-
The Apache HTTP Server Tutorial:
.htaccess
files - Apache Configuration Files Documentation
What have you used .htaccess
for on your sites? Is there something you want to do with .htaccess
but haven't been able to figure out? Let me know in the comments!
Top comments (10)
And you probably should move away from the Apache web server which does not scale well - and then you won't have a
.htaccess
file anyway.Definitely, if you're working on a project where web server performance is very important, rather than just a dinky link dump like my personal site, Apache is probably something you should move on from. But I'm assuming that if you're working on that kind of scaling problem, you're not looking to fix small problems here and there with an
.htaccess
file.What would you recommend moving away from Apache to?
PS. Really great article. Thanks for putting this together. I found it helpful and I am sure so many more will too.
NGINX (pronounced "engine x") has a more robust feature set for dynamic content and is more performant at scale.
There are definitely pros and cons to each solution, and the amount you "feel" the difference between them will largely depend on how complex your site is and what your scaling needs are.
Here's a good side by side comparison from the beginning of this year, if you're trying to decide which solution is right for you.
Okay, I definitely learned something today, because I had no clue how NGINX was pronounced, haha
I am trying to convert dirty URLs (pages.php?id=2) to a clean URL (pages/2) and I have been able to do that but I am getting "Page not found" errors. This is what I am using now (RewriteRule ^([a-zA-Z0-9]+)/$ pages.php?page=$1). Does anyone have a better solution? I have been all over the web looking for a fix but haven't found anything that works.
It's been a bit since I've looked at this, but I can try to see if I can find you problem later today. Let me know if you arrive at a solution before I do!
I really like how you provide a view of the surroundings of any command you use - what it is about, why you would use it (or not), the specifics of parameters....
Some comments may only be visible to logged-in visitors. Sign in to view all comments.