Lucas Santos

Posted on Jun 29, 2022 • Edited on Aug 5, 2022 • Originally published at blog.lsantos.dev

Using HarperDB with Kubernetes

#kubernetes #harperdb #database #docker

Table of contents
HarperDB
Kubernetes and Databases
Containerizing HarperDB
Conclusion

When working with any sort of application, one of the biggest challenges is to make it reusable and maintainable. This is why we usually choose containers as the tool to get around this problem.

Containers are simple to build and easy to maintain, a while ago I wrote this article on FreeCodeCamp that explains a lot about what are containers and why should we use them.

Now, while building an application recently, I ran into another problem: How can I run my database, using HarperDB, in a container? And then how can I install it using Kubernetes?

After some brainstorming I came up with a solution, and now it's the time to share that with you!

But, first things first, what is HarperDB?

HarperDB

HarperDB is a multi-paradigm, general-purpose database that is designed to be fast and easy to use. It encapsulates SQL and NoSQL functionality, CSV data aggregation and a whole lot of helpers like Math.js for working with math, GeoJSON support for working with geographical data, and moment for working with dates.

Ok, but what's the big catch? Other databases already do this. The catch is that HarperDB stores this as a unified data structure and serves it through a unique storage algorithm that makes it really fast for all the different types of data.

On top of that, HarperDB is also projected to be a full fledged solution to build not only your database alone, but also a complete API that interacts with that through the use of Custom Functions (which I'll talk about in another article), websockets, and the cherry on top is that it doesn't require a specific driver to work with. It exposes a standard REST API that can be used by any language, so you can both query data and perform operations using simple HTTP calls.

Here's a quick diagram of how HarperDB works, taken straight from the HarperDB website:

In short, I wanted to try it out and wasn't disappointed! Now, how can we install this database in a Kubernetes environment?

Kubernetes and Databases

Usually, when dealing with databases, we tend to install it on a separate computer or VM, and connect our application to that. While this is a good option to make it easier to manage, it's not the best option to make it faster, nor distributable, highly available, or reliable.

To do that, we need to distribute the database into several machines using a cluster approach, which means we have a main control plane, which is the main connection point for a database, and several other instances of the same database on which the control plane will distribute the load. These instances are called nodes.

Some nodes can be read only, some can be write only, some can be both. This allows us to make sure we have the correct permissions for each node, and we can also make sure that we don't have any data loss or corruption. When any data is inserted, deleted, or changed in the database, these nodes publish a message to other nodes to make that change too, so that the data is always consistent. However it takes a while for this data to be replicated when we have a large infrastructure, this is what we call eventual consistency.

In this article I'll be creating a single HarperDB instance on Kubernetes, however, I plan to show you how to create a cluster of HarperDB instances in future articles.

Containerizing HarperDB

Most databases already have a container-ready approach, and Harper is not different. You can find the containerized version of HarperDB in their Docker Hub page, it's very straight-forward to install and run. Before putting the database in Kubernetes, let's make sure we can run it locally using Docker.

If you don't have Docker installed, you can install it from the Docker website.

The simple and easy command to run is:

docker run -d \
  -e HDB_ADMIN_USERNAME=user \
  -e HDB_ADMIN_PASSWORD=password \
  -p 9925:9925 \
  harperdb/harperdb

HarperDB needs port 9925 to be opened so we can connect it to the HarperDB Studio application, so we can manage it straight from the browser.

Aside from that, we need to remember that containers are ephemeral, which means they do not persist after they are stopped. This is why we need a volume to persist the data, for that we can add the -v flag in our Docker command, pointing to the /opt/harperdb/hdb directory inside the container:

docker run -d \
  -e HDB_ADMIN_USERNAME=user \
  -e HDB_ADMIN_PASSWORD=password \
  -p 9925:9925 \
  -v $PWD/.hdb:/opt/harperdb/hdb \
  harperdb/harperdb

In this command we're telling Docker to mount the /opt/harperdb/hdb directory in a local folder called /.hdb, so that we can persist the data. After the command is run, we can ls -la the /.hdb directory to see that it's actually mounted:

Now, we go to the HarperDB Studio application and login. If you don't have an account yet, go to the sign-up page and create one, it's free. Then create an organization for you and you should land on this page:

Let's create a local instance by clicking the Register User-installed Instance button:

Let's fill up the details of the instance, and then click the instance details button:

Then we select the free tier and click next:

Confirm the terms of service and Add Instance. You should end up with a local registered instance of the database:

If you click it, you should have access to all your schemas and tables. So that's what we need to do.

Migrating to Kubernetes

First, let's set up a plan of what we need to do:

Create a Kubernetes cluster
Create a namespace called database so we can organize our resources
Create a deployment for HarperDB so we can deploy it to the cluster
Create a persistent volume claim for the /opt/harperdb/hdb directory
Create a service for HarperDB so we can port-forward it locally and connect to it from the browser

Since we'll not be exposing the database to the internet, we won't create an Ingress rule for it, instead, we'll only connect to it from inside the cluster.

Create a Kubernetes cluster

There are a ton of ways to create a new Kubernetes cluster, but the easiest and fastest way is to use a managed cluster. In this tutorial we'll be using Azure's AKS, but you can use any other provider.

The first thing is to log in to your Azure account on the Azure Portal, if you don't have an Azure account, just create one in the Azure website. There are two ways to create a managed AKS cluster, one of them is through the Azure CLI, the other is through the portal itself, which is the way we'll be doing this tutorial.

Search for "Kubernetes" in the Azure Portal, and click on the Kubernetes Services result:

Then click on create dropdown and create a new cluster:

Let's create a new resource group called harperdb-aks, let's change the preset to development to save some money since we'll not be using the cluster for production, then name the cluster harperdb, choose your preferred region:

Let's select the B2s instance in the instance picker and remove the autoscale method, since we'll be using a fixed size cluster of 1 single node:

Click the blue Review+Create button to create the cluster:

The process will take a while to complete, so let's wait until it's done.

While you wait, take your time to install and login to the Azure CLI if you haven't done so yet, because we'll need it to connect to the cluster.

After the process is complete, jump to your terminal, start your az cli and install the kubectl tool with az aks install-cli. After that we'll download the credentials to our kubeconfig file, so we can connect to the cluster using az aks get-credentials -n harperdb -g harperdb-aks --admin.

If everything went well, you should be able to connect to the cluster with kubectl get nodes and get a response with a single node like this:

NAME                                STATUS   ROLES   AGE    VERSION
aks-agentpool-17395436-vmss000000   Ready    agent   2m6s   v1.22.6

Let's create our namespace with kubectl create namespace database, and now we're able to create our manifests.

Create manifests

To start the manifest creation, we'll need to create a deployment for HarperDB, so we can deploy it to the cluster. Let's create a new file called manifests.yaml where we'll add all the manifests we'll need. In this file, we'll start with the deployment:

# Deployment for HarperDB
apiVersion: apps/v1
kind: Deployment
metadata:
  name: harperdb
  namespace: database # note the namespace we created earlier
spec:
  selector:
    matchLabels:
      app: harperdb # group all the pods under the same label
  template:
    metadata:
      labels:
        app: harperdb # create pods with the label we want to group them under
    spec:
      containers:
      - name: harperdb-container
        image: harperdb/harperdb # same hdb image from dockerhub
        env:
          - name: HDB_ADMIN_USERNAME
            value: harperdb # fixing username, in production this should be a secret
          - name: HDB_ADMIN_PASSWORD
            value: harperdb # fixing password, in production this should be a secret
        resources:
          limits:
            memory: "512Mi" # limiting the usage of memory
            cpu: "128m" # limiting the usage of CPU to 128 millicores
        ports:
        - containerPort: 9925 # exposing the port 9925 to the outside world
          name: hdb-api # naming the port
        volumeMounts: # creating a persistent volume to store the data
          - mountPath: "/opt/harperdb" # mounting the volume to the path we want
            name: hdb-data # referencing the volume we want to mount
        livenessProbe: # creating a liveness probe to check if the container is running
          tcpSocket: # we'll ping hdb's port 9925
            port: 9925
          initialDelaySeconds: 30 # wait 30 seconds before starting the probe
          periodSeconds: 10 # then check the probe every 10 seconds
        readinessProbe: # creating a readiness probe to check if the container is ready to accept connections
          tcpSocket:
            port: 9925
          initialDelaySeconds: 10 # wait 10 seconds before starting the probe
          periodSeconds: 5 # then check the probe every 5 seconds
      volumes:
        - name:  hdb-data # this is the volume we'll mount
          persistentVolumeClaim: # it references another persistent volume claim
            claimName: harperdb-data # this is the name of the persistent volume claim

Before we go further, some explanation on volumes is necessary. Since we're dealing with managed volumes on Azure, we don't need (and can't) create a volume inside the host nor inside the pod because they'll be deleted when we delete the resources. What we want to do is to delegate the creation process to Azure through StorageClasses and PersistentVolumeClaims.

The usual process is to create a PersistentVolume and then claim it with a PersistentVolumeClaim

The idea is that we declare that we want to claim a determined amount of space from the Azure pool for the volume, and then we'll mount it to the path we want. Azure will receive this request through a Kubernetes operator that manages the CSI (Container Storage Interface) plugin and create the volume in our Azure account.

By default, Azure has two types of storages: Azure File and Azure Disk. We'll be using the Azure File storage because it's easier to see the files in the Azure Portal, also, clusterization is only supported in Azure File storage.

Also note that we're mounting the volume not on /opt/harperdb/hdb but on /opt/harperdb. This is because if we mount the volume in the innermost directory, we'll have permission problems since the outermost directory is owned by the user that runs the container, and the innermost directory is created by Azure.

The other important information is that HDB takes a small while to start, so we're setting a livenessProbe and readinessProbe to check if the pod is ready, these probes will check the connection on TCP port 9925, when this port is open, the service is online.

Next, we'll add the persistent volume claim:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: harperdb-data
  namespace: database
spec:
  resources:
    requests:
      storage: 2Gi # We'll request 2GB of storage
  accessModes:
    - ReadWriteOnce
  storageClassName: azurefile-csi # We'll use the Azure File storage

And to finish the manifest file, we'll add a service:

apiVersion: v1
kind: Service
metadata:
  name: harperdb
  namespace: database
spec:
  selector:
    app: harperdb
  ports:
  - port: 9925
    name: api
    targetPort: hdb-api

The final file will look like this:

# manifests.yml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: harperdb-data
  namespace: database
spec:
  resources:
    requests:
      storage: 2Gi
  accessModes:
    - ReadWriteOnce
  storageClassName: azurefile-csi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: harperdb
  namespace: database
spec:
  selector:
    matchLabels:
      app: harperdb
  template:
    metadata:
      labels:
        app: harperdb
    spec:
      containers:
      - name: harperdb-container
        image: harperdb/harperdb
        env:
          - name: HDB_ADMIN_USERNAME
            value: harperdb
          - name: HDB_ADMIN_PASSWORD
            value: harperdb
        resources:
          limits:
            memory: "512Mi"
            cpu: "128m"
        ports:
        - containerPort: 9925
          name: hdb-api
        volumeMounts:
          - mountPath: "/opt/harperdb"
            name: hdb-data
        livenessProbe:
          tcpSocket:
            port: 9925
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          tcpSocket:
            port: 9925
          initialDelaySeconds: 10
          periodSeconds: 5
      volumes:
        - name:  hdb-data
          persistentVolumeClaim:
            claimName: harperdb-data
---
apiVersion: v1
kind: Service
metadata:
  name: harperdb
  namespace: database
spec:
  selector:
    app: harperdb
  ports:
  - port: 9925
    name: api
    targetPort: hdb-api

The last thing is to apply the manifests to the cluster using kubectl apply -f ./manifests.yml. You'll receive a message like this:

persistentvolumeclaim/harperdb-data created
deployment.apps/harperdb created
service/harperdb created

We can check if everything is fine by running kubectl get deploy harperdb -n database:

NAME       READY   UP-TO-DATE   AVAILABLE   AGE
harperdb   1/1     1            1           20m

Then we can take a look on our Azure Portal and see if the volume is there. First, select Resource Groups in the portal, you can do this by searching for it or clicking in the home page.

Find a resource group that matches MC_harperdb-aks_harperdb_<location>, click it, you should see a bunch of resources.

One of these resources is our Storage Account, click it. Then, in the left panel, go to File Shares, you should see a Azure File volume starting with pvc-, click it.

Here's our mounted volume with all HDB's files. Now let's access our HDB instance and check if everything is working.

Accessing HarperDB

To access the database from the outside world, since we didn't expose it as a good practice, let's port forward our own connection to our service connection using the following command:

kubectl port-forward svc/harperdb 9925:9925 -n database

You should see the following messages:

Forwarding from 127.0.0.1:9925 -> 9925
Forwarding from [::1]:9925 -> 9925
Handling connection for 9925
Handling connection for 9925
Handling connection for 9925
...

Then, let's go to our studio in the browser and create a new instance with the credentials that we included in our deployment manifest, the username and the password are harperdb.