DEV Community

Vladislav
Vladislav

Posted on

Building of load balanced ownCloud service

Prerequisites

Greetings everyone.This article was written for my semester project, and I will now provide a detailed walkthrough on how to create a system with the following specifications:

  • OwnCloud on working nodes, to manage all the files
  • Apache2 web server for worker nodes and a load balancer
  • GlusterFS as a network filesystem

Planning

Now, let's examine the mentioned requirements and technologies in more detail:

Apache2: It is a widely used web server that enables the connection of virtual machines through the HTTP protocol. By using Apache2, we can distribute incoming requests within the system. The requests will be received by the load balancer, which will then forward them to the worker nodes. To facilitate this process, it is recommended to employ a Reverse Proxy.

Reverse Proxy operation scheme:
Reverse Proxy operation scheme

OwnCloud: is a platform for synchronizing and managing files. It provides a user-friendly interface that makes it simple to interact with and work with all your files.

GlusterFS: is a network distributed filesystem that offers high availability and scalability. In simple terms, it allows you to create a storage system that spans multiple servers and ensures redundancy. The general concept is as follows: by utilizing basic data blocks called bricks, we can construct the desired RAID architecture by combining these bricks into volumes. In our setup, we will use a Replicated Volume in GlusterFS, which will be mounted to the main file storage folder.

Replicated Volume
Replicated Volume

By employing this volume configuration, we minimize the risk of data loss. All files will be stored and synchronized across all worker machines. In the event of a problem with any particular worker machine, another node will assume the workload. However, it is important to address potential split-brain situations, where the bricks may diverge. One approach to handle this is by utilizing three replicas of the data (3 worker nodes will be created). Here is an overview of the infrastructure.

Image description

The Load Balancer is positioned in the public network and requires two network interfaces to function as a router. The other servers are situated in a private network. While it may not be classified as a DMZ (Demilitarized Zone) in the strict sense, it can still provide some level of isolation and security between the Internet and the balanced nodes within the internal network. We will begin constructing the infrastructure using VCSA and Ubuntu 22.04 as the operating system.

Load balancer

To begin, it is necessary to create a server with dual network interfaces. Subsequently, we need to configure the network interfaces using netplan, a built-in network configuration tool in Ubuntu.

Netplan config

Next step is to configure apache2 as reverse proxy and load balancer via apache2.conf:

apache2.conf

Setting up virtual machines:

To automize all operations with servers, it’s better to use IaC. VMware vSphere supports Terraform and its own solution – vRealize and Ansible (works through pyVmomi). Due to some singularities of this project, I will skip this step and set up all the servers manually.

List of created servers:
List of created VMs

GlusterFS

The Quick Start Guide is highly beneficial, and following a straightforward installation of the GlusterFS daemon, it becomes imperative to determine the disk and the necessary storage space. You can create and mount any disk or device (LVM as an example)

Gluster daemon installed and works successfully!
Gluster daemon installed and works successfully

Next step is a creating of trusted pool with 3 replicas (each machine is a replica to ensure data redundancy and high availability). In case of any troubles with peer probing pay attention to your network configuration (firewall, hostnames).

Trusted pool is created
Created Trusted Pool
As evident from the provided screenshots, all the bricks on each machine are currently operational and functioning reliably. To proceed, create a folder dedicated to volume bricks on every machine.

mkdir -p /data/brick1/gv0
Enter fullscreen mode Exit fullscreen mode

And create volume with 3 replicas (it will be replicated by default)

gluster volume create gv0 replica 3 server1:/data/brick1/gv0 server2:/data/brick1/gv0 server3:/data/brick1/gv0
Enter fullscreen mode Exit fullscreen mode

Obtained volume:
Obtained volume

From now on, if we create file or folder in any brick, regardless of any trouble on other bricks it will be stored in our replicated volume. Due to fault tolerance it is better to use other volume type in the future - Distributed Replicated.

After mounting to folder or disk networking filesystem should looks like this:
After mounting to folder or disk networking filesystem should looks like this:

OwnCloud

OwnCloud has an official Docker image with in-built apache2 webserver, that simplifies our work. Onwcloud container can be modified by overriding default config.php.Here is an example of the owncloud configuration

Example of the owncloud configuration

Certain parameters can be passed to Docker as env vars, allowing us to utilize the official image conveniently (don't forget to mount our network filesystem to owncloud data folder):

docker run -d --name owncloudcustom 
--restart=always  
-v /mnt/gluster/: var/www/html/data
-p 80:80 
-e OWNCLOUD_ADMIN_USERNAME=admin 
-e OWNCLOUD_ADMIN_PASSWORD=admin 
-e OWNCLOUD_TRUSTED_DOMAINS=192.168.64.2 owncloud:8.1
Enter fullscreen mode Exit fullscreen mode

Amazing! Our owncloud is up and has access from load balancer!
Amazing! Our owncloud is up and has access from load balancer!

Great, now we have access to our synchronized filesystem, let’s try to add some files and check if they are exist on all our nodes:

Added simple txt file, file saved in our mounted folder
Added simple txt file, file saved in our mounted folder
As you can see in /mnt/gluster (mounted directory for our network filesystem) appeared an owncloud files. And all these files are synchronized at all the three nodes

Content of mounted directory, you can see owncloud data files (including sqlalchemy db)
Content of mounted directory, you can see owncloud data files (including sqlalchemy db)

Conclusion:

I am very grateful for the time you investigated to read my first article. Hope it has provided you with a broad understanding of constructing a scalable and highly available file service. The suggested architecture can be expanded as needed in the future (adding PostgreSQL as db, adding backend server) and modified depending on your requirements.

Top comments (3)

Collapse
 
mia1kl profile image
Natalia Maiatskaia

Great work! Very clear explanation, thanks

Collapse
 
shegl profile image
Shegl

Do you have same but for nginx ? Would be a great bonus!

Collapse
 
enkaell profile image
Vladislav

Sure! You can use official docker image and configured it:

FROM nginx
COPY nginx.conf /etc/nginx/nginx.conf
Enter fullscreen mode Exit fullscreen mode

And here is a configuration for reverse proxy: docs.nginx.com/nginx/admin-guide/w...