loading...

Quick and dirty .htaccess for small personal sites

thejessleigh profile image jess unrein Updated on ・9 min read

I host my personal site on a shared Dreamhost instance. I like it because it's relatively inexpensive, I have low traffic, and I'm not doing anything particularly complicated. Having full root access by hosting on a Virtual Private Server(VPS) or Dedicated Server doesn't make sense for what I'm doing. However, I still want to have some server level control over what's available from my website and how. That's where .htaccess comes in.

Update: as of early 2019 my personal site is using an nginx setup on Digital Ocean, rather than an apache setup on Dreamhost. So far I'm liking the Digital Ocean experience a lot better, for whatever that's worth. :)

What is .htaccess?

Many hosting options use Apache HTTP Servers on the back end to serve up your website. At the most basic level, your website is a collection of files sitting on a computer somewhere. You want people on other computers to be able to access those files over the internet. The web server is the software that receives requests, finds the correct files, and returns the content to the client. .htaccess is a file used by Apache to give you more control how those requests are processed. .htaccess files set the configuration for all requests for files in the directory where it lives.

Why should I care?

Your website is going to be able to function without an .htaccess file to fine tune your server configuration. But taking advantage of its features can enable cosmetic improvements like forcibly adding or removing www from your url or eliminating the need for .html file extensions when linking to pages on your site.

It can also provide security benefits. If you use git to control your site deploys, you might accidentally be exposing credentials or sensitive information to the open internet. You can also force people to use https when connecting to your site, which, in addition to being the right move security-wise, can help boost your SEO ranking.

Limitations and drawbacks of .htaccess (and shared hosting)

Some people are quick to point out that using .htaccess for configuration rather than using the main Apache configuration incurs a performance hit. This performance hit gets worse when your server is under high load. In general, you'll probably be leaning on .htaccess most heavily if you're in a shared hosting environment. If you're just looking to host a small portfolio site or a toy app, diving into cloud computing, getting root access, and becoming your own Apache sysadmin is probably overkill.

If you're attracting a lot of traffic and/or having difficulties with high load on your server, you probably should move to a VPS where you have root access and use the main server configuration. If this is a problem for you, you're probably also growing beyond a one-person operation and should enlist someone to help with site reliability. But if you're just running a small, mostly static site, it's not worth worrying about performance degredation as a result of using .htaccess.

.htaccess is more limited in scope than your main Apache configuration in httpd.conf. You won't be able to do things like directory level operations with .htaccess. If you're doing a lot of work where you want to control directory level access to your site, a more robust hosting option makes sense. But unless you're working in a complex environment and are looking to do a lot of site reliability fine tuning, you're going to be fine without root access.

Common use case cheat sheet

Here are some things that you might want to use .htaccess for. Feel free to copy/paste directly from here, as I have copy/pasted from those who came before me.

Many of these snippets require you to turn the RewriteEngine on. You only need to do this once per file, and can put all of your rewrite requests under it. For ease of copy/pasting, I'm including RewriteEngine On in all the relevant snippets. Make sure you don't copy it more than once.

Protect yourself from bandwidth theft

Bandwidth theft, or hotlinking, is the practice of linking to an image on someone else's website and displaying it on your own. This uses the bandwidth of the server you're linking to instead of your own. It's pretty rude!

However, you can protect your site from having any of your images hotlinked.

RewriteEngine On
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http(s)?://(.+\.)?mysite\.com [NC]
RewriteRule .*\.(jp(e)?g|gif|bmp|png)$ https://imgur.com/ZtXiCBw.gif [L,R]

(Make sure to replace mysite\.com with your actual url)

The above code checks where the request is coming from, or the referer. If the referer isn't your site (or empty) it will display whatever image you rediret to instead.

The above link redirects to the following image on imgur:

Animated radioactivity symbol with text reading "WARNING DO NOT HOTLINK IMAGES AND STEAL BANDWIDTH beneath

Anyone who runs into this enough will learn quickly to stop stealing bandwidth from you :)

SEO improvements

Redirect old pages to new location

One of the most basic and useful things you can do with .htaccess is redirect a page that used to exist at one location to a new URL. This is extremely important for SEO. If you wrote a popular blog post that you're moving elsewhere, you don't want to lose the link equity - or the influence a link has on a search engine - just because your content is changing locations.

In general, you'll probably want to use either a 301 or a 302 redirect. A 301 is a permanent redirect. A 302 redirect is temporary and will not preserve your link equity. However, when mucking around with redirects, you should always use a 302 first and test your changes. If everything looks good, then change it to a 301. You should do this so you don't accidentally forward the link equity from the old location to the wrong place.

When formatting a Redirect stament, the order of arguments is as follows

  • status code (301, 302, 404, etc.)
  • old file location
  • new file location
Redirect 302 /2016/08/02/rad-old-blog-post/ https://newdomain.party/rad-old-blog-post/
Redirect 301 /menu/dessert.html https://new-restaurant-website.supply/menu/dessert/

Enforce https

Some hosting options allow you to force https when people connect to your site. Others merely allow https without strictly enforcing its use. Using https is very important for both security and SEO, so if your hosting provider doesn't enforce https connetions, you should.

You can rewrite all insecure requests to https requests with the following snippet.

RewriteEngine On
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

Cosmetic improvements

Don't require .html

Nothing screams tacky and dated like having .html hanging off the end of your url. I mean, I'm mostly kidding, but I really don't like having that file extension dangling about. Luckily, you can fix this!

RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html

This condition checks to see if the request you're making is for a valid directory in your project. We're checking for directories here, instead of files because the request will be something like mysite.com/info, which looks like a directory name. If it can't find a valid directory, then it checks to see if there is a valid file called info.html. If there is, rewrite the request to serve content from mysite.com/info.html.

You can use this same idea to remove other file extensions like .htm or .php.

Enforce (or forcibly remove) www

Force www:

RewriteEngine On
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301,NC]

Remove www:

RewriteEngine On
RewriteCond %{HTTP_HOST} ^www\.example\.com [NC]
RewriteRule ^(.*)$ http://example.com/$1 [L,R=301]

The [NC] flag here stands for No Case, so you don't have to worry about case sensitivity.

The [L] flag stands for "Last", meaning that if this condition is true, execute the RewriteRule and do not attempt to match other RewriteConds or perform other operations on the request.

The [R] flag stands for "Redirect". You can pass in an option to R to set the HTTP status code of the redirect. Example [R=302] or [R=404].

Security improvements

Hide .git directory

Version control is fantastic, even for static sites. It can help you revert to previous versions easily, and lets you look back over the evolution of your code. However, controlling your site updates via git push/pull can open up a security vulnerability. Visitors can publicly access your .git directory via their browser, and go through your entire repo, including any secrets or keys you might be storing there. If you're hosting something more complex than a static personal site, it also allows anyone to view your source code and copy your work, even if you meant to keep it private.

Luckily, you can redirect any attempts to request your .git directory.

RedirectMatch 404 /\.git

Simple! You can return any status you like here. I personally like using a 404 because it seems like no such directory even exists on the server, and might deter further attempts to access similar files. However, a 403 also makes sense here. It's up to you.

Restrict access to sensitive files

You probably don't want random visitors to your site to have access to your dotfiles, or other sensitive information. You can disallow access to specific files for everyone, or you can allow/block certain IP addresses in .htaccess.

Note: You cannot restrict or allow access on a directory level using .htaccess. This is one of those operations that needs to go in httpd.conf and requires root access.

# Don't allow anyone to see my .htaccess file
<Files .htaccess>
  Order allow,deny
  Deny from all
</Files>

The Order keyword is a little tricky here, so let's go through it. Order accepts one of two arguments. Either allow,deny or deny,allow. Make sure that there is no space in betwen the comma and the second word in the argument. If you put a space there, it will cause all requests to your site to 500. This argument tells Apache in which order it should process allow and deny statements.

allow,deny means that it will process any Allow statements first, and Deny statements last. The most recent statement will override any statements that come before it. So, in the example above, if you wanted to allow your IP address, and only your IP address to view .htaccess via curl or the browser, you might include a statement like Allow from 198.51.100.254. However, because your configuration instructs the server to process all Allow statements first, and then override them, you would still be denied access because Deny from all was the most recent order.

Conversely, deny,allow processes all Deny statements first, and then overrides with Allow. If you're going to do anything more complicated than denying or allowing access to everyone, make sure you pay attention to your Order argument, and make sure to test it.

You can also use this syntax for limiting access to specific HTTP methods. For example, if you wanted, you could restrict POST and PUT operations to your Local Area Network (LAN).

<Limit POST PUT>
  Order deny,allow
  Deny from all
  Allow from 198.51.100.254
</Limit>

Debugging and other general tips

Make sure you know where your logfiles are. Even if you don't have root access, you should be able to see access and error logs for your site. The error logs will be very useful in debugging any unexpected results or errors you encounter when trying to set up your configuration.

bad flag delimiter

If your error message includes the phrase bad flag delimiter, you probably included an extra space in your RewriteRule flags. Flag are comma delimited, not comma + space delimited, so make sure you format your flags like this: [L,R=301,NC], not like this: [L, R=301, NC]

Some operation not allowed here

If your error message includes the phrase not allowed here, you're probably trying to do something that needs to be done in the httpd.conf main configuration file. You'll need to either figure out some other way to accomplish your goal, or to move to a solution where you have root access and can modify httpd.conf.

Additional Resources

What have you used .htaccess for on your sites? Is there something you want to do with .htaccess but haven't been able to figure out? Let me know in the comments!

Posted on Oct 29 '18 by:

thejessleigh profile

jess unrein

@thejessleigh

Pronouns: they/them | | | Pythonista, cat lover, avid reader, and gamer in Chicago. Tip jar: https://ko-fi.com/thejessleigh

Discussion

markdown guide
 

If you're attracting a lot of traffic and/or having difficulties with high load on your server, you probably should move to a VPS where you have root access and use the main server configuration.

And you probably should move away from the Apache web server which does not scale well - and then you won't have a .htaccess file anyway.

 

Definitely, if you're working on a project where web server performance is very important, rather than just a dinky link dump like my personal site, Apache is probably something you should move on from. But I'm assuming that if you're working on that kind of scaling problem, you're not looking to fix small problems here and there with an .htaccess file.

 

What would you recommend moving away from Apache to?

PS. Really great article. Thanks for putting this together. I found it helpful and I am sure so many more will too.

NGINX (pronounced "engine x") has a more robust feature set for dynamic content and is more performant at scale.

There are definitely pros and cons to each solution, and the amount you "feel" the difference between them will largely depend on how complex your site is and what your scaling needs are.

Here's a good side by side comparison from the beginning of this year, if you're trying to decide which solution is right for you.

Okay, I definitely learned something today, because I had no clue how NGINX was pronounced, haha

 

I really like how you provide a view of the surroundings of any command you use - what it is about, why you would use it (or not), the specifics of parameters....

 

Thank you so much for the .html snippet, I've just deployed a Pelican website and that fixed my inner pages.

 

The article is great, but I would like to deviate and try to fix the problem by not having it.

Your website looks pretty static, you can avoid having a server at all by using an object storage like Amazon S3 or Google Datastore. The free tier covers most cases, I use them with free SSL from CloudFlare.

If you really want to play with servers you can have a full VPS at the providers I mentioned, for free.

This brings me back some nice memories, I stopped using apache around 8yrs ago, while learning web dev and thinking that VPSs are black magic

 

Definitely makes sense to put static resources on Amazon S3 or Google Datastore, and I do use those resources for other toy projects and at work. However, I like having my site on shared hosting with an Apache server because I get to play around with things like .htaccess configuration, or run little experiments in a very low stakes environment. It's worth the ~$10 a month to have a space to mess around in. If I mess up, I'll learn something from it and more than likely no one will notice. :)