Prerequisites
Greetings everyone.This article was written for my semester project, and I will now provide a detailed walkthrough on how to create a system with the following specifications:
- OwnCloud on working nodes, to manage all the files
- Apache2 web server for worker nodes and a load balancer
- GlusterFS as a network filesystem
Planning
Now, let's examine the mentioned requirements and technologies in more detail:
Apache2: It is a widely used web server that enables the connection of virtual machines through the HTTP protocol. By using Apache2, we can distribute incoming requests within the system. The requests will be received by the load balancer, which will then forward them to the worker nodes. To facilitate this process, it is recommended to employ a Reverse Proxy.
Reverse Proxy operation scheme:
OwnCloud: is a platform for synchronizing and managing files. It provides a user-friendly interface that makes it simple to interact with and work with all your files.
GlusterFS: is a network distributed filesystem that offers high availability and scalability. In simple terms, it allows you to create a storage system that spans multiple servers and ensures redundancy. The general concept is as follows: by utilizing basic data blocks called bricks, we can construct the desired RAID architecture by combining these bricks into volumes. In our setup, we will use a Replicated Volume in GlusterFS, which will be mounted to the main file storage folder.
By employing this volume configuration, we minimize the risk of data loss. All files will be stored and synchronized across all worker machines. In the event of a problem with any particular worker machine, another node will assume the workload. However, it is important to address potential split-brain situations, where the bricks may diverge. One approach to handle this is by utilizing three replicas of the data (3 worker nodes will be created). Here is an overview of the infrastructure.
The Load Balancer is positioned in the public network and requires two network interfaces to function as a router. The other servers are situated in a private network. While it may not be classified as a DMZ (Demilitarized Zone) in the strict sense, it can still provide some level of isolation and security between the Internet and the balanced nodes within the internal network. We will begin constructing the infrastructure using VCSA and Ubuntu 22.04 as the operating system.
Load balancer
To begin, it is necessary to create a server with dual network interfaces. Subsequently, we need to configure the network interfaces using netplan, a built-in network configuration tool in Ubuntu.
Next step is to configure apache2 as reverse proxy and load balancer via apache2.conf:
Setting up virtual machines:
To automize all operations with servers, it’s better to use IaC. VMware vSphere supports Terraform and its own solution – vRealize and Ansible (works through pyVmomi). Due to some singularities of this project, I will skip this step and set up all the servers manually.
GlusterFS
The Quick Start Guide is highly beneficial, and following a straightforward installation of the GlusterFS daemon, it becomes imperative to determine the disk and the necessary storage space. You can create and mount any disk or device (LVM as an example)
Gluster daemon installed and works successfully!
Next step is a creating of trusted pool with 3 replicas (each machine is a replica to ensure data redundancy and high availability). In case of any troubles with peer probing pay attention to your network configuration (firewall, hostnames).
Trusted pool is created
As evident from the provided screenshots, all the bricks on each machine are currently operational and functioning reliably. To proceed, create a folder dedicated to volume bricks on every machine.
mkdir -p /data/brick1/gv0
And create volume with 3 replicas (it will be replicated by default)
gluster volume create gv0 replica 3 server1:/data/brick1/gv0 server2:/data/brick1/gv0 server3:/data/brick1/gv0
From now on, if we create file or folder in any brick, regardless of any trouble on other bricks it will be stored in our replicated volume. Due to fault tolerance it is better to use other volume type in the future - Distributed Replicated.
After mounting to folder or disk networking filesystem should looks like this:
OwnCloud
OwnCloud has an official Docker image with in-built apache2 webserver, that simplifies our work. Onwcloud container can be modified by overriding default config.php.Here is an example of the owncloud configuration
Certain parameters can be passed to Docker as env vars, allowing us to utilize the official image conveniently (don't forget to mount our network filesystem to owncloud data folder):
docker run -d --name owncloudcustom
--restart=always
-v /mnt/gluster/: var/www/html/data
-p 80:80
-e OWNCLOUD_ADMIN_USERNAME=admin
-e OWNCLOUD_ADMIN_PASSWORD=admin
-e OWNCLOUD_TRUSTED_DOMAINS=192.168.64.2 owncloud:8.1
Amazing! Our owncloud is up and has access from load balancer!
Great, now we have access to our synchronized filesystem, let’s try to add some files and check if they are exist on all our nodes:
Added simple txt file, file saved in our mounted folder
As you can see in /mnt/gluster (mounted directory for our network filesystem) appeared an owncloud files. And all these files are synchronized at all the three nodes
Content of mounted directory, you can see owncloud data files (including sqlalchemy db)
Conclusion:
I am very grateful for the time you investigated to read my first article. Hope it has provided you with a broad understanding of constructing a scalable and highly available file service. The suggested architecture can be expanded as needed in the future (adding PostgreSQL as db, adding backend server) and modified depending on your requirements.
Top comments (3)
Great work! Very clear explanation, thanks
Do you have same but for nginx ? Would be a great bonus!
Sure! You can use official docker image and configured it:
And here is a configuration for reverse proxy: docs.nginx.com/nginx/admin-guide/w...