Welcome - Sadly or not, that's the last part of this series
We finally come to an end - today I won't be talking about PHP itself
You know, when we were writing a code, we've always had .php
at the end. Like /login.php
But it can be better
We want it more appealing, like /login
, don't we?
That's the perfect example of .htaccess
file usefulness
Apart from that, I'll also explain usage and importance of robots.txt
file - by the way talking about it's security issues
What will we learn today?
- When can we use
.htaccess
- How to add SEO-friendly addresses to our website
- Adding error pages
- Basic
robots.txt
structures
No more talking - let's go
When can we use .htaccess
files?
.htaccess
is a config overwrite mechanism made for Apache HTTP server - and only for it
So, If you are using Nginx or Microsoft IIS - sorry to say, but you need to use native solutions.
Right now (I assume) we are working with Apache server so using .htaccess
shouldn't be a problem for us
Also
Quick disclaimer
Using .htaccess
files, even despite it's popularity, comes with some flaws.
.htaccess
slows down requests - It's server-side mechanism so It can't be cached in a browser. Every request requires "executing" .htaccess
file(s)
99% of things we'll be doing in .htaccess
can be done in default Apache config file - httpd.conf
If you have write access to it (like on VPS) - use it as much as you can. Your site will perform much better
Also, .htaccess
is very vulnerable to syntax errors - one minor typo and you get error 500 all over your website
So to sum up - use .htaccess
files only if you don't have access to httpd.conf
(for example on shared hosting) and use only 1 at a time
With this said - let's go
SEO-friendly addresses
I think, while browsing the internet you've seen website with link like this http://domain.com/home
- That's where .htaccess
comes in handy
Using RewriteEngine
we can create such addresses for pages
First - enable RewriteEngine
RewriteEngine on
Then, we can start adding our rules
RewriteRule home index.html
RewriteRule about-us aboutus.html
RewriteRule privacy-policy privacy-policy.html
To find some pattern - it looks like this
RewriteRule [new address] [file]
When creating them, pay attention to letter case - write everything in lowercase and use snake-case
to separate words
Why?
- Lowercase helps SEO - sometimes search engine might index
My-page
2 times. AsMy-page
andmy-page
which makes you loose your position in search results -
snake-case
(with dashes instead of spaces) - When Google seesmy-page
it replaces-
with space, as it can't distinguish any other meaning for it. Then it matches typical Google searches
On the other hand - using _
suggests that there is a letter, but we don't know which one. So my_page
could be read as myopage
or myjpage
and so on
Error Documents
From part 8 we know that sometimes websites don't work as we want - so we have 4xx
and 5xx
errors
Now, displaying typical 404 Apache page looks unprofessional - some websites have their own error pages, with matching theme and some funny images
We can do this in .htaccess
as well - for example with 404
page
ErrorDocument 404 https://example.com/404.php
One question? Can't we just use relative path - /404.php
Not really - sometimes .htaccess
reads /
not as website's root directory (/var/www/html
or C:/xampp/htdocs
), but as filesystem's root directory (/
or C:/
)
So, using domain and absolute path in general assures us, that no matter where 404 will happen, Apache will always find valid error document.
What is robots.txt
?
Imagine this:
You've created a website. What to do now?
Why not publish it and get it indexed (visible) at Google (and other search engines)
But we don't really want to show our style sheets or account configuration panel in Google, do we?
That's the main purpose of robots.txt files
Google defines it as
[robots.txt] tells search engine crawlers which URLs the crawler can access on your site
I highly recommend you to get familiar with robots.txt docs from Google, before publishing the website
Now, let's say we don't want to index topsecretpage.php
.
We need to specify:
- To What crawlers this rule should be applied
User-Agent: *
*
stands for all (just like in SQL)
- Where crawlers can't or can enter
Disallow: /topsecretpage.php
Or
Allow: /openpage.php
In robots.txt
we also declare sitemap - XML file containing every sub-page on your website
Sitemap: https://www.example.com/sitemap.xml
Security issues
Don't use robots.txt
to hide pages from everyone - this file is easily accessible.
For example - this is YouTube's robots.txt
And I didn't hack YT or anything - just went to https://www.youtube.com/robots.txt
So yeah, robots.txt
is cool to hide style sheets or some files that require logging in - but don't put credential or employee-only files in there
Conclusion
As you seen, this was more of site administration article - but hey!
Every knowledge is valuable - especially if it resolves somewhere around our main interest
Thanks for reading - check out my blog and rest of this series
They all will get a re-write soon - so more valuable content and better summarized knowledge for you.
Little personal afterword
As that's the last part of this "chapter" - I'd like to say something
Thank you so much for every view, comment and like - Even though it wasn't much, still I'm glad that I could help someone.
Also, big thanks to everyone who took their time and reviewed this course - from Forum Pasja Informatyki and Reddit
I'll definitely post something like this again - when my work will reach satisfying level
I've learned a lot, I hope you had as well - and what more can I say.
Thanks for everything and see you while creating projects (and in some other articles)
Top comments (0)