PostgreSQL is a powerful, open-source object-relational database system that safely stores and scales data workloads. It uses SQL and is widely used due to its reliability, data integrity, robustness, and many more features. While the world is moving towards Kubernetes to run their entire application ecosystem, migrating the stateful side of the ecosystem to Kubernetes is still not as straightforward. One has to be very sure of how without any data loss and downtime, we can migrate the databases to Kubernetes.
Being one of the widely used and popular relational databases, PostgreSQL DB can run on various platforms such as macOS, Windows, Linux, etc. We can set up PostgreSQL on virtual machines, on cloud or physical machines or use containerized databases .
In this blog post, we will be discussing a scenario, where we will learn how to migrate PostgreSQL data from a virtual machine to Kubernetes setup . While there are various ways to install postgres on Kubernetes, the best approach is to always look for an operator which can take away most of the operational and administration burden. One such operator is Kubergres, which we are going to use due to its simplicity and features. To know more about Kubegres, check out their official documentation
What is Kubegres? [Postgres operator for k8s]
Kubegres is a Kubernetes operator which helps us to deploy clusters of PostgreSQL instances with data replication and failover enabled out of the box. It brings simplicity when using PostgreSQL with Kubernetes.
Features:
It provides data replication, and replicates data from primary PostgreSQL instance to replica instances in real time
It manages failover by promoting the replica instance as primary PostgreSQL in case of failure
It has option to schedule backup of database and dump the data in a separate persistent volume
Having learnt about the features of Kubgres, let's proceed with the installation.
Prerequisites
One needs to have a Postgres database installed and running on a virtual machine. Also, access to a Kubernetes cluster will be required where we'll be migrating the database to.
Installation of Kubegres
Kubegres is an open source tool. We can deploy it on Kubernetes by using the Kubegres operator
Install Kubegres operator
Run the following command in a Kubernetes cluster:
kubectl apply -f https://raw.githubusercontent.com/reactive-tech/kubegres/v1.15/kubegres.yaml
After executing the above command, a kubegres-system namespace will be created, where the controller will be installed. we will see a deployment running in the kubegres-system namespace.
We will see the below output
namespace/kubegres-system created
customresourcedefinition.apiextensions.k8s.io/kubegres.kubegres.reactive-tech.io created
serviceaccount/kubegres-controller-manager created
role.rbac.authorization.k8s.io/kubegres-leader-election-role created
clusterrole.rbac.authorization.k8s.io/kubegres-manager-role created
clusterrole.rbac.authorization.k8s.io/kubegres-metrics-reader created
clusterrole.rbac.authorization.k8s.io/kubegres-proxy-role created
rolebinding.rbac.authorization.k8s.io/kubegres-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/kubegres-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/kubegres-proxy-rolebinding created
configmap/kubegres-manager-config created
service/kubegres-controller-manager-metrics-service created
deployment.apps/kubegres-controller-manager created
Create a Secret resource
Before creating a cluster for PostgreSQL, we will create a secret with the password of Postgres user, replication user and production database user. We will be using this later in the PostgreSQL configuration
Create a file:
vi postgresql-secret.yaml
Add the following contents:
apiVersion: v1
kind: Secret
metadata:
name: postgres-production-secret
namespace: default
type: Opaque
data:
superUserPassword: S2p4dVhpUmlLVWNBVj0=
replicationUserPassword: cmVwbGljYXRpb24=
password: cHJvZHVjdGlvbg==
Apply the changes:
kubectl apply -f postgresql-secret.yaml
Create a ConfigMap
Next we will create a configmap,as Kubegres allows us to override its default configurations and bash scripts based on our requirements. The base configmap contains the default configs of all Kubegres resources in that namespace.
We can override the following configurations by creating our configmap:
postgres.conf- the official PostgreSQL configs used for both primary and replica servers.
pg_hba.conf- the host-based authentication configs for both primary and replica servers.
primary_init_script.sh- a bash script which runs for the first time when a primary PostgreSql container is created. Here we can add instructions to create custom databases.
backup_database.sh- this bash script defines the actions to perform during a backup. It is executed regularly by a dedicated cronjob
Create a file:
vi postgres-configmap.yaml
Add the following contents:
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-configmap
namespace: default
data:
postgres.conf: |2
# Replication configs
listen_addresses = '*'
max_wal_senders = 10
max_connections = 5000
shared_buffers = 128MB
wal_level= logical
log_min_duration_statement= 5000
# Logging
log_destination = 'stderr,csvlog'
logging_collector = on
log_directory = 'pg_log'
log_filename= 'postgresql-%Y-%m-%d_%H%M%S.log'
pg_hba.conf: |
# TYPE DATABASE USER ADDRESS METHOD
# Replication connections by a user with the replication privilege
host replication replication all md5
# As long as it is authenticated, all connections allowed except from "0.0.0.0/0"
local all all md5
host all all all md5
host all all 0.0.0.0/0 reject
host production production 127.0.0.1/32 trust
host production production ::1/128 trust
host production production 0.0.0.0/0 trust
host production postgres 127.0.0.1/32 trust
primary_init_script.sh: |
#!/bin/bash
set -e
# This script assumes that the env-var $POSTGRES_MY_DB_PASSWORD contains the password of the custom user to create.
# You can add any env-var in your Kubegres resource config YAML.
dt=$(date '+%d/%m/%Y %H:%M:%S');
echo "$dt - Running init script the 1st time Primary PostgreSql container is created...";
customDatabaseName="production"
customUserName="production"
echo "$dt - Running: psql -v ON_ERROR_STOP=1 --username $POSTGRES_USER --dbname $POSTGRES_DB ...";
psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL
CREATE DATABASE $customDatabaseName;
CREATE USER $customUserName WITH PASSWORD '$POSTGRES_PRODUCTION_PASSWORD';
GRANT ALL PRIVILEGES ON DATABASE "$customDatabaseName" to $customUserName;
EOSQL
echo "$dt - Init script is completed";
backup_database.sh: |
#!/bin/bash
set -e
dt=$(date '+%d/%m/%Y %H:%M:%S');
fileDt=$(date '+%d_%m_%Y_%H_%M_%S');
backUpFileName="$KUBEGRES_RESOURCE_NAME-backup-$fileDt.gz"
backUpFilePath="$BACKUP_DESTINATION_FOLDER/$backUpFileName"
echo "$dt - Starting DB backup of Kubegres resource $KUBEGRES_RESOURCE_NAME into file: $backUpFilePath";
echo "$dt - Running: pg_dumpall -h $BACKUP_SOURCE_DB_HOST_NAME -U postgres -c | gzip > $backUpFilePath"
pg_dumpall -h $BACKUP_SOURCE_DB_HOST_NAME -U postgres -c | gzip > $backUpFilePath
if [ $? -ne 0 ]; then
rm $backUpFilePath
echo "Unable to execute a BackUp. Please check DB connection settings"
exit 1
fi
echo "$dt - DB backup completed for Kubegres resource $KUBEGRES_RESOURCE_NAME into file: $backUpFilePath";
Apply the changes:
kubectl apply -f postgres-configmap.yaml
In order to create a cluster of PostgreSQL, we need to create a Kubegres resource. We will be creating a file “kubegres.yaml” with the below content. This yaml file will create one primary PostgreSql pod and two replica PostgreSql pods. The data will be replicated in real time from the Primary pod to the 2 Replica pods.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgresql-backup-pvc
namespace: default
spec:
accessModes:
- ReadWriteOnce
storageClassName: standard-lrs
resources:
requests:
storage: 35Gi
---
apiVersion: kubegres.reactive-tech.io/v1
kind: Kubegres
metadata:
name: production-postgresql
namespace: default
annotations:
imageregistry: "https://hub.docker.com/"
nginx.ingress.kubernetes.io/affinity: cookie
spec:
replicas: 3
image: postgres:13.2
port: 5432
database:
size: 50G
storageClassName: standard-lrs
volumeMount: /var/lib/postgresql/data
customConfig: postgres-configmap
failover:
isDisabled: false
backup:
schedule: "0 */2 * * *"
pvcName: postgresql-backup-pvc
volumeMount: /var/lib/backup
env:
- name: POSTGRES_PRODUCTION_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-production-secret
key: password
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-production-secret
key: superUserPassword
- name: POSTGRES_REPLICATION_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-production-secret
key: replicationUserPassword
In the above file, for creating a persistent volume claim, use the storage class as per your setup ( replace standard-lrs with the storage class available in your cluster). Also, you can update the image version of Postgres with the latest image.
Apply the changes:
kubectl apply -f kubegres.yaml
Once we run the above command, three replicas of PostgreSQL will spin up. In our file, we have also scheduled a database backup of the database, which will be stored in a separate persistent volume.
kubectl get pods
NAME READY STATUS RESTARTS AGE
production-postgresql-1-0 1/1 Running 0 2m31s
production-postgresql-2-0 1/1 Running 0 2m20s
production-postgresql-3-0 1/1 Running 0 2m6s
Once we have our Kubegres pods up and running, we will create a sample database with few tables in the PostgreSQL VM and try to migrate it to the Kubegres setup in Kubernetes.
Sample Database and data on Postgres(VM) server
Considering PostgreSQL server is already running on a virtual machine, we will create a database “production”, insert some data into it, and then verify the same after migration is done.
First, we will check if the Postgres service is up and running:
postgres@richa-mishra:~$ systemctl status postgresql
● postgresql.service - PostgreSQL RDBMS
Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor preset: enabled)
Active: active (exited) since Mon 2022-07-25 08:51:59 IST; 9h ago
Process: 1233 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
Main PID: 1233 (code=exited, status=0/SUCCESS)
Now we will create a database called “production”. Switch to user postgres
and run:
psql
We will log in into the database with postgres
user and \create a database:
postgres=# CREATE DATABASE production;
Let's connect to this production database and insert some data:
postgres=# \c production
You are now connected to database "production" as user "postgres".
create a table and add some data into it:
production=#CREATE TABLE employee (
employee_id INT PRIMARY KEY,
first_name VARCHAR (255) NOT NULL,
last_name VARCHAR (255) NOT NULL,
manager_id INT,
FOREIGN KEY (manager_id)
REFERENCES employee (employee_id)
ON DELETE CASCADE
);
production=#INSERT INTO employee (
employee_id,
first_name,
last_name,
manager_id
)
VALUES
(1, 'Sandeep', 'Jain', NULL),
(2, 'Abhishek ', 'Kelenia', 1),
(3, 'Harsh', 'Aggarwal', 1),
(4, 'Raju', 'Kumar', 2),
(5, 'Nikhil', 'Aggarwal', 2),
(6, 'Anshul', 'Aggarwal', 2),
(7, 'Virat', 'Kohli', 3),
(8, 'Rohit', 'Sharma', 3);
Output:
production=# SELECT * FROM employee;
employee_id | first_name | last_name | manager_id
-------------+------------+-----------+------------
1 | Sandeep | Jain |
2 | Abhishek | Kelenia | 1
3 | Harsh | Aggarwal | 1
4 | Raju | Kumar | 2
5 | Nikhil | Aggarwal | 2
6 | Anshul | Aggarwal | 2
7 | Virat | Kohli | 3
8 | Rohit | Sharma | 3
(8 rows)
Now we have a database called production, a table and some data in it. After migration we will verify if the same data exists on Kubegres. For performing data migration, we will be performing a few steps for establishing network connectivity between PostgreSQL on VM and Kubegres on Kubernetes. Once the connectivity is established, we will be running a job which will dump the data from the VM and restore it to the database on Kubernetes. Lets perform this activity step by step as explained below.
Migration of data from Postgres Server to Kubegres:
We will edit pg_hba.conf file of PostgreSQL(VM) and add an entry to allow connections from Kubegres, basically will try to establish an authentication between kubegres and Postgres on VM
So switch to postgres
user and run the below command:
vim /etc/postgresql/13/main/pg_hba.conf
In my case the version of PostgreSQL(VM) is 13, so the folder name is 13.
We will add the following line having IP address of Kubegres:
host production all 172.18.0.2/32 trust
After making changes to pg_hba.conf, make sure to restart the postgresql service
systemctl restart postgresql
Next we will create a Service and an endpoint to establish connectivity between Postgres setup on VM and Kubernetes.
Services when used with a corresponding Endpoints object and without a selector can help us to connect the Postgres on VM ( backends outside the cluster). So we will create a service and an endpoint object manually.
When you create an Endpoints object for a Service, you set the name of the endpoint to be the same as that of the Service.
In the Endpoint, under addresses we will mention the IP address of Postgres(VirtualMachine)
vi external-service.yaml
Add the below contents:
apiVersion: v1
kind: Service
metadata:
name: pg
spec:
clusterIP: None
ports:
- name: postgres
port: 5432
targetPort: 5432
selector: null
---
apiVersion: v1
kind: Endpoints
metadata:
name: pg
subsets:
- addresses:
- ip: 192.168.1.41
ports:
- name: postgres
port: 5432
protocol: TCP
kubectl apply -f external-service.yaml
Now, we will create a configmap, which will contain all details of our setup as follows:
DB: Database name which needs to be migrated. We have already created a new database on Kubernetes setup with the same name “production” at the time of installation.
USERNAME: username/role having access of the DB
PASSWORD: password of the username
SRC: It refers to the host name of Postgres(VM), so here we will add the external service name and namespace which we created in above step
DST: It refers to the hostname of Kubegres setup
vi migration-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-migration-cm
labels:
app: production-postgresql
data:
DB: production
USER: postgres
PGPASSWORD: KjxuXiRiKUcAV=
# svcname.namespace
SRC: "pg.default"
# podname.servicename[.namespace]
DST: "production-postgresql-1-0.production-postgresql"
kubectl apply -f migration-cm.yaml
Since we have all the details of source and destination database in the configmap which we created above, we will run the migration job
Create a file:
vim job.yaml
Add the below contents:
apiVersion: batch/v1
kind: Job
metadata:
name: populate-db
spec:
backoffLimit: 0
template:
spec:
restartPolicy: Never
containers:
- name: pg
image: crunchydata/crunchy-postgres:centos8-13.5-4.7.4
command: ["/bin/sh"]
args:
- -c
- "pg_dump -h $SRC -U $USER --format c $DB | pg_restore --verbose -h $DST -U $USER --format c --dbname $DB"
envFrom:
- configMapRef:
name: postgres-migration-cm
kubectl apply -f job.yaml
Once we run the above command, a job will be created and a container will start spinning, We will see the output as shown below:
kubectl get pods | grep populate
NAME READY STATUS RESTARTS AGE
populate-db-txgkt 0/1 Completed 0 2s
we will check the logs of this container. If there is some issue with the connectivity or any errors, you will be able to see them in the logs.
kubectl logs populate-db-txgkt
pg_restore: connecting to database for restore
pg_restore: creating TABLE "public.employee"
pg_restore: processing data for table "public.employee"
pg_restore: creating CONSTRAINT "public.employee employee_pkey"
pg_restore: creating FK CONSTRAINT "public.employee employee_manager_id_fkey"
Here we can see in the logs that database connectivity and restore is successful.
To verify if the same table and data has been created in the Kubegres pod, we will exec into the pod and verify the data.
kubectl exec -it production-postgresql-1-0 bash
root@production-postgresql-1-0:/# su postgres
postgres@production-postgresql-1-0:/$ cd
postgres@production-postgresql-1-0:~$ psql -d production
Password for user postgres:
psql (14.1 (Debian 14.1-1.pgdg110+1))
Type "help" for help.
production=# \dt
List of relations
Schema | Name | Type | Owner
--------+----------+-------+----------
public | employee | table | postgres
(1 row)
production=# SELECT * FROM employee;
employee_id | first_name | last_name | manager_id
-------------+------------+-----------+------------
1 | Sandeep | Jain |
2 | Abhishek | Kelenia | 1
3 | Harsh | Aggarwal | 1
4 | Raju | Kumar | 2
5 | Nikhil | Aggarwal | 2
6 | Anshul | Aggarwal | 2
7 | Virat | Kohli | 3
8 | Rohit | Sharma | 3
(8 rows)
We will execute the above commands, step by step.
First we will exec into the pod by running:
kubectl exec -it production-postgresql-1-0 bash
Then we will switch to user postgres and type psql -d production to connect to the production database. Once connected after providing the right credentials, we will be running the same query which we ran in the sample database section, to verify the consistency of data.
SELECT * FROM employee;
This query lists the same data which we have inserted in Postgres VM.
So the data has been transferred successfully. Now based on your setup, you can switch the application connectivity from Postgres VM to Kubegres setup which will basically require a merge request with few changes, once the MR is merged the application will start connecting to the kubegres setup, without any downtime and without any data loss.
Conclusion
These were the few steps to transfer the data of a database from a Virtual Machine to Kubernetes setup. Using these steps, we could migrate a production database in VM to kubernetes setup with zero downtime and data loss.
This blog post can be helpful even if you use any other database instead of PostgreSQL and are looking for a migration solution from a VM to Kubernetes setup.
To understand more about Kubegres, Please refer to their official doc and Github page.
That’s all for this post. If you are working on a similar scenario and need some assistance, feel free to reach out to me via LinkedIn, you can also read other blog posts that I’ve written on InfraCloud. I’m always excited to hear thoughts and feedback from my readers!
Top comments (0)