DEV Community: Stephen Hyde

Reverse Proxy with SSL

Stephen Hyde — Sun, 14 Aug 2022 21:08:00 +0000

Introduction

It's often useful to make an application under local development available temporarily on the public internet. At Blended Edge, for example, our work with automating integrations means we run a lot of OAuth2 flows, and using OAuth2 for an application under development (usually) requires a publicly available endpoint with an SSL certificate.

Ngrok is a reverse proxy tool that makes this very simple, but it's also fairly easy to set up a reverse proxy using an AWS EC2 instance, certbot from Let's Encrypt, and the ssh command.

This walk-through assumes some basic knowledge of DNS, ssh, and EC2 instances (or similar cloud compute resources).

Create a Remote Server

We're going to start by launching an EC2 instance on AWS to act as our remote server. This could of course be done with any cloud provider, such as Azure or Digital Ocean.

Set the instance name as "Reverse-Proxy", and under "Application and OS Images" select Ubuntu. At the time of writing this the default Ubuntu image is 22.04.

I have an existing SSH key pair that I will use for accessing this instance, but if you don't have one or would like to use a new key pair simply click "Create new key pair".

Create a new security group that allows HTTP/HTTPS traffic from any IP address, but limit the SSH traffic to just your own IP.

Click "Launch Instance", navigate to the instances overview page, and wait for the status checks on the new instance to pass. This may take a couple of minutes.

Once the instance is ready, go to the overview page for the instance and click "Connect" in the upper right hand corner. Follow the steps to connect.

# if this is a new ssh key you may need to specify the path to the key
# ssh -i /path/to/key.pem <user>@<address>
ssh ubuntu@ec2-13-59-152-133.us-east-2.compute.amazonaws.com

Install Nginx

Update the package list, upgrade the packages, and install nginx.

sudo apt update
sudo apt upgrade -y
sudo apt install nginx

If you send a GET request to your instance's IP address via a web browser or curl you should get the default nginx page.

curl http://13.59.152.133

<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
...

Domain and SSL

Point a domain to the public IP address of your instance. The nameservers for my domain hydewd.com are with DigitalOcean, so in my case updating the A record to point the domain at my instance looks like this:

Note: If you are doing any kind of serious or long term work with this setup you should allocate an elastic IP address, associate it with your instance, and point your domain to the elastic IP.

Install certbot from Let's Encrypt to handle automatically setting up the SSL certificate.

sudo apt install certbot python3-certbot-nginx -y

Now run have certbot make the request for an SSL certificate. Here I am specifying that I want to cover both the naked domain hydewd.com as well as www.hydewd.com. You will be prompted to provide an email address for renewal notices, asked to agree to the terms of service, and asked if you are willing to share your email with the Electronic Frontier Foundation (this last one is optional).

sudo certbot --nginx -d hydewd.com -d www.hydewd.com

If everything has worked you should now be able to load the default nginx page at https://<your-domain>.

Nginx Configuration

The next step is to tell nginx to redirect http/https requests to another port (8080). In the following step we will tunnel that port to our local machine.

sudo vi /etc/nginx/sites-available/default

Within the location block of the default server block, add a line to forward traffic to http://127.0.0.1:8080.

We will be changing the contents of the location block. Here is what it looks like to begin with:

    location / {
        # First attempt to serve request as file, then
        # as directory, then fall back to displaying a 404.
        try_files $uri $uri/ =404;
    }

Change the contents of the location block to:

    location / {
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_pass http://127.0.0.1:8080/;
    }

Save the file, and ensure that the configuration is valid with the command sudo nginx -t. If there is no error, go ahead and restart nginx to apply the latest changes.

sudo nginx -s reload

SSH Remote Forwarding

The final step is to tunnel the requests from the server's port 8080 to the local machine. This can be done with the ssh remote forwarding command.

ssh -R 8080:localhost:8080 ubuntu@ec2-18-217-182-155.us-east-2.compute.amazonaws.com

Now run a server locally on port 8080. Here, for example, is a simple node server using the express framework that will reply to all GET requests with some information about what the server sees in the incoming request.

mkdir local-server && cd local-server
npm init -y
npm install express

Create a file called index.js with the following contents:

const express = require('express')
const app = express()
const port = 8080;

app.get('*', (req, res) => {
  const { baseUrl, headers, hostname, ip, method, originalUrl, path, protocol } = req;
  res.json({
    baseUrl,
    headers,
    hostname,
    ip,
    method,
    originalUrl,
    path,
    protocol
  })
});

app.listen(port, () => console.log(`App listening at http://localhost:${port}`));

node index.js

Sending a query to your domain will now return a response from your local express server.

curl -s 'https://www.hydewd.com/some/path?some=query&another=123'

Example response:

{
  "baseUrl": "",
  "headers": {
    "host": "www.hydewd.com",
    "x-real-ip": "104.1.2.3",
    "connection": "close",
    "user-agent": "curl/7.68.0",
    "accept": "*/*"
  },
  "hostname": "www.hydewd.com",
  "ip": "::ffff:127.0.0.1",
  "method": "GET",
  "originalUrl": "/some/path?some=query&another=123",
  "path": "/some/path",
  "protocol": "http"
}

Note that the header x-real-ip displays the IP address of the end user making the request, not the IP of the instance. This is a result of the configuration line proxy_set_header X-Real-IP $remote_addr;.

Conclusion

This approach will work for some basic use cases, but is overall quite limiting. In a future post we will look at using Traefik to provide a more feature rich setup.

Helpful Links

Digital Ocean Guides: How to Install Nginx

Nginx Docs: Reverse Proxy Documentation

Nginx Docs: Installing Lets Encrypt

HTTP Protocol Overview

Stephen Hyde — Sun, 08 Aug 2021 21:21:13 +0000

HTTP is so fundamental to web development that it is often overlooked. Let's take a minute to explore the protocol and gain a better understanding of how the HTTP request-response cycle works.

What is HTTP?

HTTP, or Hypertext Transfer Protocol, is an application level protocol for sharing hypertext.

protocol - A set of rules for how to transmit and receive data. If two programs use the same protocol then the receiver will know how to interpret the data from the sender.

application level - A protocol for programs that an end-user interacts with, as opposed to lower level programs that a user would not generally directly interact with. Most commonly a web browser like Google Chrome in the case of HTTP.

hypertext - Text documents displayed on a computer that allow for links to other documents. The most common way we lay out the structure of a hypertext document is with HyperText Markup Language, or HTML.

Summary - HTTP is a set of rules for how two high level programs communicate with each other. HTTP follows a client-server model, where a client makes a request and the server sends back a response. The client will usually be a web browser, and the server will usually be a web server, but this is not always the case.

GET Requests

The easiest way to see the raw data in a request is with netcat, a command line utility available on most systems as either netcat or nc. We will tell netcat to listen on a port, and then any data we send to that port will be dumped into the terminal for us to see.
nc -l -p 3000

-l means "listen"
-p 3000 means "use port 3000"

Now let's make a simple GET request to this port with curl in order to see what information is actually sent. With the previous netcat command running in another terminal, run curl localhost:3000. (Curl defaults to GET if no request method is specified.

The request sent by curl contains four lines of information, and netcat logs them to the terminal. The first line contains basic information about the request: GET / HTTP/1.1 The technical term is the "start line". More specifically, the start line of a request is the "request line", and the start line of a response is the "status line".

GET the HTTP verb, specifies the type of request
/ the request target, specifies the resource we want - could be "/", "index.html", or "/api/users"
HTTP/1.1 the protocol and the specific version - could be HTTP or HTTPS

After the start line we have three lines specifying different http header properties. They tell us more information about the request curl is making. Finally, it's easy to overlook the fourth line, which is simply an empty line signalling that we have reached the end of the request header.

This is a GET request, so there is no body, and we have reached the end of the request, but both the curl command and the netcat command are hanging rather than returning control to the user. Netcat is simply listening without responding, and will keep listening until the connection is closed, while curl is waiting until it either receives a response or the request it made times out. We can interrupt the curl command, which ends the connection, allowing netcat to exit as well.

Header: Host

The first header listed in the example above is Host: localhost:3000, which tells us the name (and optionally the port) of the server we are requesting. If we had multiple websites that we wanted to run on a server with only one IP address we could use the Host header to differentiate which requests went to which website. This can be configured relatively easily in Apache with virtual hosts or in Nginx with multiple server blocks.

Header: User-Agent

The User-Agent header provides information about the client making the request, and might specify the program name and the operating system. In this case we see the name of the program making the request, "curl", and the version, "7.68.0": User-Agent: curl/7.68.0. But this field can be confusing, especially when it comes to web browsers. Here is an interesting explanation of why Google Chrome's user agent string starts with "Mozilla".

Header: Accept

A media type or MIME type (Multipurpose Internet Mail Extensions) is a string declaring the format of an http body. It consists of two parts: the type and the subtype, separated by a forward slash. An asterisk is used as a wildcard.

The Accept header specifies what MIME types a client is capable of receiving for a specific request, like text/html or image/jpeg. Multiple values can be given as a comma separated list. In our example above Accept: */* means the client will accept any media type with any subtype.

We will see MIME types again when we look at the Content-Type header of a POST request.

The Connection

HTTP concerns itself with the formatting and handling of a request followed by a response, but it does not concern itself with the details of how a connection is made between the two programs, or how the data is actually transmitted over the wire. These details are handled by lower-level protocols, which for our purposes will almost always be TCP/IP.

In the below image we can see a Wireshark capture of the actual packets sent. The first three lines establish a connection in a process known as the TCP Three-Way Handshake, which consists of the client sending a packet with the SYN flag set (SYNchronize), the server responding with a SYN ACK packet (SYNchronize ACKnowledge), and finally the client sending a packet to ACKnowledge receiving the SYN ACK packet.

It is somewhat analogous to the way a phone call starts, establishing a connection but not sharing any actual information yet:

Client sends SYN - Alice dials Bob's number
Server sends SYN ACK - Bob picks up the phone and says "Hello?"
Client sends ACK - Alice says "Hello" back

The fourth line, highlighted in green, is the packet that carries the actual HTTP request from curl to netcat, and the following line is netcat's acknowledgement to curl that it received that particular packet.

The transmission of those first five packets occurs practically instantly from the human perspective - all within the same millisecond. It's not until we abort the curl request five seconds later that we see the final three lines terminating the TCP connection.

POST Requests

Let's see what the HTTP request looks like when it is a POST with some data attached. We have three pieces of information: a name, "Joe Smith", an age, 64, and some letters, "abc". There are several ways we can format this data when sending it to the server (or netcat in this case), and the first one we will examine is application/x-www-form-urlencoded.

URL-encoded form data splits the information up into key value pairs, with the key and value separated by an equals sign =, and the pairs separated by an ampersand &. Any characters that are not letters or numbers are escaped with the percent sign %. So our information becomes: name=Joe%20Smith&age=64&letters=abc.

If we send a POST request with a body to netcat we see the headers, followed by a blank line, followed by the request body: curl -X POST -d "name=Joe%20Smith&age=64&letters=abc" localhost:3000

-X POST this flag allows us to specify a method other than GET
-d "..." include data in the body of the request

Headers: Content-Length & Content-Type

The POST request contains two new headers. Content-Length: 35 tells the recipient that the request contains the header, an empty line, and then 35 bytes of data. But first we need to tell the server how to interpret those 35 bytes of data. Curl adds the header Content-Type: application/x-www-form-urlencoded by default when you specify data with the -d flag, but if the data you are sending is of a different type you can override this default with another flag: -H "Content-Type: ...".

The data itself is just ones and zeros to the computer. Below is an image showing the tail end of a wireshark capture of the POST request. It shows both the raw ones and zeros (taking up most of the width), as well as those bits interpreted as ASCII text (narrow column to the right). The body of the POST is highlighted in blue:

We can make it more readable by viewing the bytes in hexadecimal form, where each set of two characters corresponds to eight bits, or one byte. (For example the binary 01101110 is 6e in hex, interpreted as n in ASCII). The body of the request is again highlighted in blue, and I've drawn a red line around the sequence of two new lines that signal the end of the header. Once the server recognizes that the header has ended, it knows that the next 35 bytes belong to the body of the request (the 35 pairs of hex letters highlighted in blue).

It's important to note that without a content-type header the server does not know (though it may be able to guess) that it should interpret the numbers as ASCII codes, much less that the ASCII characters are in the format of key-value pairs escaped with %.

We can send JSON data with curl and specify a content-type in a header with the -H flag. The server still interprets the individual bytes as ASCII codes, but those decoded characters are now treated as JSON rather than url-encoded key value pairs. curl -X POST -d '{"data": "ABC123"}' -H 'Content-Type: application/json' localhost:3000.

Conclusion

The program you are using to send data to the server may or may not automatically determine the right content-type header for your data, and knowing how to set and check headers is an essential skill. To learn more about the HTTP protocol check out the MDN guide or read the official standard, RFC 7230.