In the previous posts we've seen how to manipulate sockets using netcat. Moreover, we learned how to start reading and parsing an HTTP message using ShellScript.
Time to definitely improve our app.
App requirements
Since our application is a Web application, we should respond in a more appropriate way. Let's define some requirements for our application:
-
GET /login
should render an HTML form containing just a text field called "Name" to perform the login -
GET /
should render a home page containing the textHello, {{name}}
. Where does the{{name}}
come from? We'll see soon, but we should read the HTTP headerCookie
. -
POST /login
should request with a body containing the name coming from the HTML form. Afterwards, the server responds anHTTP 301
with a headerSet-Cookie
, so the client could use this cookie to perform future requests such asGET /
-
POST /logout
should request with a headerCookie
containing the name. Afterwards, the server responds anHTTP 301
with a headerSet-Cookie
using an expiration date in the past. This way, a client HTTP will permanently remove the Cookie.
Web fundamentals matter
If you need, read the requirements again.
Make no mistake, this is exactly how a simple web app used to work prior to the blast of frontend frameworks circa 2015.
Software abstractions are good and save us a lot of time, but unfortunately they also keep developers far from the basics. It's very important for us as developers, to understand how servers keep sessions across HTTP requests.
If you acknowledge that HTTP requests are stateless (the server usually closes the TCP connection at every request), a wide range of capabilities will open for debugging and troubleshooting.
Let's build a login system, and as soon as we understand that HTTP requests are stateless, we realize the logins do not exist.
After all, originally, the web mostly uses HTTP header Cookies to store user data in the HTTP client-side, sending those cookies across future requests to the same domain.
Separate HTTP responses in files
A good practice is to keep our HTTP responses in separate files, thus leading to a more readable and extensible code:
cat login.html
HTTP/1.1 200 OK
Content-Type: text/html
<form method="POST" action="/login">
<input type="text" name="name" />
<input type="submit" value="Login" />
</form>
cat 404.html
HTTP/1.1 404 NotFound
Content-Type: text/html
<h1>Sorry, not found</h1>
Okay, with HTML files in place, let's see how our function handleRequest
looks like so far:
function handleRequest() {
## Read request, parse each line and breaks until empty line
while read line; do
echo $line
trline=`echo $line | tr -d '[\r\n]'`
[ -z "$trline" ] && break
HEADLINE_REGEX='(.*?)\s(.*?)\sHTTP.*?'
[[ "$trline" =~ $HEADLINE_REGEX ]] &&
REQUEST=$(echo $trline | sed -E "s/$HEADLINE_REGEX/\1 \2/")
done
## Route to the response handler based on the REQUEST match
case "$REQUEST" in
"GET /login") RESPONSE=$(cat login.html) ;;
*) RESPONSE=$(cat 404.html) ;;
esac
echo -e "$RESPONSE" > response
}
Yes, much more shorter and readable.
Keep the server running forever
Not sure if you noticed, but the server is being closed at every client connection, which is making us to run the server over and over again, at every single time we want to perform a new HTTP request.
We can keep the server in an infinite loop. By doing this, we no longer need to run the server for every request:
while true; do
cat response | nc -lN 3000 | processRequest
done
How cool is that?
Nice, but show me some login
Okay. Open localhost:3000/login
in your web browser, type a name in the form and submit it.
I'm sure you got the response "Sorry, not found", right?
No problem, now go to the server STDOUT and check the message coming from the HTTP client:
POST /login HTTP/1.1
Host: localhost:3000
Content-Length: 12
....
....
What does it mean?
Following the HTTP standard, web browsers perform a POST
on a form submit, like we have in form action=POST
. Upon submission, every field is then added to the HTTP request body.
Please also note that the HTTP message contains a lot of HTTP headers (I'm using Google Chrome), but the HTTP body is nowhere to be found. Why's that?
We should keep reading the request body after the empty line, remember? However, we should first acknowledge the "length" of the body, otherwise, when should we stop reading the remaining bytes?
Thankfully, the HTTP standard covers that. The web browser has already sent an important header called Content-Length
.
Our work is then basically parsing this header and use it to read the remaining HTTP request message bytes, the body.
Reading the HTTP body message
Similar to the headline parsing, inside the reading loop, let's place:
CONTENT_LENGTH_REGEX='Content-Length:\s(.*?)'
[[ "$trline" =~ $CONTENT_LENGTH_REGEX ]] &&
CONTENT_LENGTH=`echo $trline | sed -E "s/$CONTENT_LENGTH_REGEX/\1/"`
Now, after the reading loop, we should read the remaining bytes in case there's a Content-Length
header (not always clients send the Content-Length
header).
## Check if Content-Length is not empty
if [ ! -z "$CONTENT_LENGTH" ]; then
BODY_REGEX='(.*?)=(.*?)'
## Read the remaining request body
while read -n$CONTENT_LENGTH -t1 body; do
echo $body
INPUT_NAME=$(echo $body | sed -E "s/$BODY_REGEX/\1/")
INPUT_VALUE=$(echo $body | sed -E "s/$BODY_REGEX/\2/")
done
fi
Nice. Go and perform a POST /login
, you should see the body being displayed in the server STDOUT!
Explaining the read:
-
-n {N}
: reads an arbitrary number of bytes from a file or STDIN -
-t {T}
: timeout to read the entire message from the STDIN - inside the body reading loop, we parse each line trying to find a match to
field=value
coming from thePOST
HTTP request
Fantastic!
Handle the message
Reading the body message is no way close to the end. We have to handle it in the switch-case
, remember?
How about refactoring to something like this?
## Route to the response handlers (functions)
case "$REQUEST" in
"GET /login") handle_GET_login ;;
"GET /") handle_GET_home ;;
"POST /login") handle_POST_login ;;
*) handle_not_found ;;
esac
All we have to do is implementing the above functions. Let's start the handle_POST_login
:
function handle_POST_login() {
RESPONSE=$(cat post-login.http | \
sed "s/{{cookie_name}}/$INPUT_NAME/" | \
sed "s/{{cookie_value}}/$INPUT_VALUE/")
}
- reads the
post-login.http
file - search and replace for a pattern using
sed
Now, let's check thepost-login.http
file:
HTTP/1.1 301
Location: http://localhost:3000/
Set-Cookie: {{cookie_name}}={{cookie_value}}; path=/; HttpOnly
Time to perform the test. Check yourself a dead-simple login written in ShellScript!
Finally, the complete implementation
Handle request
First we should:
- setup the FIFO
- put
netcat
in a loop - handle requests
#!/bin/bash
## Create the response FIFO
rm -f response
mkfifo response
function handleRequest() {
## Read request
while read line; do
echo $line
trline=$(echo $line | tr -d '[\r\n]')
[ -z "$trline" ] && break
HEADLINE_REGEX='(.*?)\s(.*?)\sHTTP.*?'
[[ "$trline" =~ $HEADLINE_REGEX ]] &&
REQUEST=$(echo $trline | sed -E "s/$HEADLINE_REGEX/\1 \2/")
CONTENT_LENGTH_REGEX='Content-Length:\s(.*?)'
[[ "$trline" =~ $CONTENT_LENGTH_REGEX ]] &&
CONTENT_LENGTH=$(echo $trline | sed -E "s/$CONTENT_LENGTH_REGEX/\1/")
COOKIE_REGEX='Cookie:\s(.*?)\=(.*?).*?'
[[ "$trline" =~ $COOKIE_REGEX ]] &&
read COOKIE_NAME COOKIE_VALUE <<< $(echo $trline | sed -E "s/$COOKIE_REGEX/\1 \2/")
done
## Read body
if [ ! -z "$CONTENT_LENGTH" ]; then
BODY_REGEX='(.*?)=(.*?)'
while read -n$CONTENT_LENGTH -t1 line; do
echo $line
trline=`echo $line | tr -d '[\r\n]'`
[ -z "$trline" ] && break
read INPUT_NAME INPUT_VALUE <<< $(echo $trline | sed -E "s/$BODY_REGEX/\1 \2/")
done
fi
## Route to the response handlers
case "$REQUEST" in
"GET /login") handle_GET_login ;;
"GET /") handle_GET_home ;;
"POST /login") handle_POST_login ;;
"POST /logout") handle_POST_logout ;;
*) handle_not_found ;;
esac
echo -e "$RESPONSE" > response
}
echo 'Listening on 3000...'
## Keep server running forever
while true; do
## 1. wait for FIFO
## 2. creates a socket and listens to the port 3000
## 3. as soon as a request message arrives to the socket, pipes it to the handleRequest function
## 4. the handleRequest function processes the request message and routes it to the response handler, which writes to the FIFO
## 5. as soon as the FIFO receives a message, it's sent to the socket
## 6. closes the connection (`-N`), closes the socket and repeat the loop
cat response | nc -lN 3000 | handleRequest
done
In short, the server:
- waits for FIFO
- creates a socket and listens to the port 3000
- as soon as a request message arrives to the socket, pipes it to the handleRequest function
- the handleRequest function processes the request message and routes it to the response handler, which writes to the FIFO
- as soon as the FIFO receives a message, it's sent to the socket
- closes the connection (
-N
), closes the socket and repeat the loop
Implement the handle response functions
Look at the switch-case
. It's pretty simple but the functions are yet to be implemented.
Place them before the handleRequest
function, because ShellScript expects functions to be declared prior to calling.
function handle_GET_home() {
RESPONSE=$(cat home.html | \
sed "s/{{$COOKIE_NAME}}/$COOKIE_VALUE/")
}
function handle_GET_login() {
RESPONSE=$(cat login.html)
}
function handle_POST_login() {
RESPONSE=$(cat post-login.http | \
sed "s/{{cookie_name}}/$INPUT_NAME/" | \
sed "s/{{cookie_value}}/$INPUT_VALUE/")
}
function handle_POST_logout() {
RESPONSE=$(cat post-logout.http | \
sed "s/{{cookie_name}}/$COOKIE_NAME/" | \
sed "s/{{cookie_value}}/$COOKIE_VALUE/")
}
function handle_not_found() {
RESPONSE=$(cat 404.html)
}
Place the files accordingly
Now that we have the handle response
functions, note that all of them expect HTTP/HTML files to exist. Let's create them.
login.html
HTTP/1.1 200 OK
Content-Type: text/html
<form method="POST" action="/login">
<input type="text" name="name" />
<input type="submit" value="Login" />
</form>
home.html
HTTP/1.1 200 OK
Content-Type: text/html
<p>Hello, {{name}}</p>
<form method="POST" action="/logout">
<input type="submit" value="Logout" />
</form>
404.html
HTTP/1.1 404 NotFound
Content-Type: text/html
<h1>Sorry, not found</h1>
post-login.http
HTTP/1.1 301
Location: http://localhost:3000/
Set-Cookie: {{cookie_name}}={{cookie_value}}; path=/; HttpOnly
post-logout.http
HTTP/1.1 301
Location: http://localhost:3000/login
Set-Cookie: {{cookie_name}}={{cookie_value}}; path=/; HttpOnly; Expires=Thu, 01 Jan 1970 00:00:00 GMT
So much YAY!
Wrapping up
Today we walked through refactoring our Web server.
We added more capabilities, making it a complete web server delivering a login, home page and logout features.
In the next steps we'll explore the triad of Web: HTML, CSS and JS.
Top comments (12)
Thanks for the great article. One thing I didn't understand is: Where does
home.html
parse{{name}}
from?Using
sed
:Did I answer your question?
Thanks for the reply. Not quite. shouldn't there be a
name=
somewhere?Informative post!
I will let you know, you define the function as
handleRequest
however in one call of it you misname it:cat response | nc -lN 3000 | processRequest
Seems Part III is broken. Page shows "Liquid syntax error: Unknown tag 'endraw' "
thanks, it might be a platform issue, I edited then just saved the post. seems working now
Just want to give big thanks for this awesome post. Seriously great stuff!
The
cat
trick does not work for me. When I try to understand what is happening in Chrome DevTools it shows me this:Apparently the server is not able to parse correctly the HTML file. Did you have this problem before?
never seen this before. maybe OS issue? netcat implementation issue?
anyway, try using \n instead of \r\n
Maybe the problem is the OS or the IDE I'm using (Emacs). I tried setting explicitly the \r\n but it did not work either. But, it worked after I set the headers in the string, and just cat the HTML content, like this:
good catch! yes, unfortunately when dealing with stuff low-level we may face OS/tools implementations issues
Hi, can you wrap it all up in a github repo please if you have not done so already ?