Background
As computer applications continue to deepen, the reliance of enterprises on computer systems is also increasing. In some critical applications, it is essential that the backend database provides timely and reliable information and services, as this can become a key factor affecting the company's business.
Computer hardware and operating systems are inevitably prone to failures, which can cause significant losses to enterprises or even lead to a complete shutdown of services. For critical applications where any service interruption could result in severe financial and reputational losses, high availability (HA) of the system is crucial. Therefore, appropriate measures must be taken to ensure uninterrupted services from computer information systems.
The availability of an information system is typically affected in two scenarios: one is an unexpected failure caused by OS crashes, hardware failures, erroneous operations, and management issues; the other is a normal shutdown required for system maintenance and upgrades, such as installing new hardware or software. A high-reliability solution must provide uninterrupted system services for both scenarios.
Topology
In a GBase 8a high availability system based on Rose HA, both servers (hosts) are directly connected to a disk array (shared storage) system. The operating system, GBase 8a database system, and Rose HA are installed on the local disks of both hosts, while the database business data is stored on the disk array. The two hosts are connected via a private heartbeat network. Once the configured system hosts start working, Rose HA begins monitoring the system. Through the heartbeat information transmitted by the private network, the Rose HA software on each host can monitor the GBase 8a status on the other host. If a working host fails, the heartbeat information changes, which Rose HA can capture via the private network. Upon capturing such changes, Rose HA controls the system to switch hosts, meaning the backup host starts and runs the same GBase 8a database service as the working host, taking over its GBase 8a database service and triggering an alarm to notify administrators for repairs. After the repair, it can switch back automatically or manually according to Rose HA's settings, or not switch at all, in which case the repaired host serves as the backup, and the dual-host GBase 8a database system continues to operate.
The key to Rose HA-based GBase 8a high availability fault tolerance is that the host is transparent to the client. When a system error causes a switch, the host switch appears unchanged to the client, and all host-based applications continue to run normally. Rose HA uses virtual IP address mapping technology to achieve this functionality. Clients communicate with the working host via a virtual address. Whether the system switches or not, the virtual address always points to the working host. During network service, Rose HA provides a logical virtual address that any client can use to request services. In normal operation, the primary server provides the virtual address and network services. If the primary server fails, Rose HA transfers the virtual address to the other server's network card, continuing to provide network services. After switching, the system appears to have no failure from the client's perspective, and network services remain available. Besides IP addresses, HA can also provide virtual machine aliases for client access. For database services, when the primary server fails, the other server automatically takes over, starts the database and applications, ensuring that users can continue to operate the database normally.
System Requirements
- Two servers with no need for identical configurations
- Consistent OS versions: Linux 5.X
- Dual-host channel disk array system
- Network card for public network
- Network card or RS-232 serial cable for private network
Features
- When a GBase 8a server crashes, its IP address, server name, and running jobs automatically transfer to another GBase 8a server. Client software doesn't need reconfiguration; reconnecting to the original IP address and server name continues operations.
- Reliable error detection and fault recovery mechanisms reduce system downtime, prevent errors, and provide fault warnings.
- Automatic or manual recovery after fault resolution.
- Installation without modifying the OS kernel or requiring special hardware.
- Information exchange between the two servers via RS232 or TCP/IP.
Configuration and Management Steps
Install Rose HA
Start Rose HA Control Center
Create a Cluster
Create Application Resources
Manage Resource Groups
Switch Resource Groups
Virtual IP Management: After deployment, you can connect not only to the actual server IP but also to the virtual IP. When using the virtual IP for application connections, the virtual IP automatically redirects the connection request to the actual IP, ensuring high availability and automatic IP handling.
High Availability Testing
Shutdown Test
For this test, svr661
is the working server. After inserting data into the working server, restart the server to simulate an unexpected restart or power failure.
[linna@svr661 ~]$ ps -ef | grep gbased
linna 14556 1 0 10:13 ? 00:00:00 /home/linna/GBase/server/bin/gbased --log-queries-not-using-indexes --pid-file=/home/linna/GBase/log/gbase8a/gbased.pid
linna 14918 1841 0 10:14 pts/2 00:00:00 grep gbased
[linna@svr661 ~]$ gbase -uroot -plinna
Welcome to the GBase monitor. Commands end with ; or \g.
Your gbase connection id is 3
Server version: 8.3.1.7
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
gbase> use cmcc;
Query OK, 0 rows affected (0.00 sec)
gbase> show tables;
+----------------+
| Tables_in_cmcc |
+----------------+
| t_user |
+----------------+
1 row in set (0.00 sec)
gbase> select * from t_user;
+----------+------------+-------------+
| f_userid | f_username | f_phone |
+----------+------------+-------------+
| 1 | Rose | 13821600123 |
| 2 | Jack | 13821600001 |
| 3 | Mary | 15920256789 |
+----------+------------+-------------+
3 rows in set (0.03 sec)
gbase> insert into t_user values(4,'Kate','18828088888');
Query OK, 1 row affected (0.13 sec)
gbase> quit
Bye
[linna@svr661 ~]$ exit
logout
[root@svr661 bin]# reboot
Broadcast message from root (pts/2) (Tue May 8 10:15:37 2012):
The system is going down for reboot NOW!
[root@svr661 bin]#
On the other server, query the data:
[linna@svr662 ~]$ ps -ef | grep gbased
linna 12360 1 0 10:16 ? 00:00:00 /home/linna/GBase/server/bin/gbased --log-queries-not-using-indexes --pid-file=/home/linna/GBase/log/gbase8a/gbased.pid
linna 12625 12517 0 10:16 pts/1 00:00:00 grep gbased
[linna@svr662 ~]$ gbase -uroot -plinna
Welcome to the GBase monitor. Commands end with ; or \g.
Your gbase connection id is 4
Server version: 8.3.1.7
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
gbase> use cmcc;
Query OK, 0 rows affected (0.00 sec)
gbase> select * from t_user;
+----------+------------+-------------+
| f_userid | f_username | f_phone |
+----------+------------+-------------+
| 1 | Rose | 13821600123 |
| 2 | Jack | 13821600001 |
| 3 | Mary | 15920256789 |
| 4 | Kate | 18828088888 |
+----------+------------+-------------+
4 rows in set (0.05 sec)
gbase>
Testing shows that when one of the two servers unexpectedly shuts down, the database service continues to operate normally. Modifications made to the database on the unexpectedly shut-down server are not lost due to the unexpected shutdown.
Process Kill Test
When the database process on the working server is unexpectedly killed, it will try to restart a specified number of times. If the restart is successful, the working server does not switch. If the database service cannot restart successfully, it automatically switches to the standby server, ensuring uninterrupted database service.
[linna@svr661 ~]$ gbase -uroot -plinna
Welcome to the GBase monitor. Commands end with ; or \g.
Your gbase connection id is 3
Server version: 8.3.1.7
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
gbase> use cmcc;
Query OK, 0 rows affected (0.00 sec)
gbase> select * from t_user;
+----------+------------+-------------+
| f_userid | f_username | f_phone |
+----------+------------+-------------+
| 1 | Rose | 13821600123 |
| 2 | Jack | 13821600001 |
| 3 | Mary | 15920256789 |
| 4 | Kate | 18828088888 |
+----------+------------+-------------+
4 rows in set (0.09 sec)
gbase> update t_user set f_phone='18920054321' where f_userid=1;
Query OK, 1 row affected (0.11 sec)
Rows matched: 1 Changed: 1 Warnings: 0
gbase> quit
Bye
[linna@svr661 ~]$ ps -ef | grep gbased
linna 3561 1 0 10:18 ? 00:00:00 /home/linna/GBase/server/bin/gbased --log-queries-not-using-indexes --pid-file=/home/linna/GBase/log/gbase8a/gbased.pid
linna 17219 16708 0 10:58 pts/1 00:00:00 grep gbased
[linna@svr661 ~]$ kill -9 3561
[linna@svr661 ~]$ ps -ef | grep gbased
linna 17413 1 0 10:59 ? 00:00:00 /home/linna/GBase/server/bin/gbased --log-queries-not-using-indexes --pid-file=/home/linna/GBase/log/gbase8a/gbased.pid
linna 17604 16708 0 10:59 pts/1 00:00:00 grep gbased
[linna@svr661 ~]$ gbase -uroot -plinna
Welcome to the GBase monitor. Commands end with ; or \g.
Your gbase connection id is 3
Server version: 8.3.1.7
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
gbase> use cmcc;
Query OK, 0 rows affected (0.00 sec)
gbase> select * from t_user;
+----------+------------+-------------+
| f_userid | f_username | f_phone |
+----------+------------+-------------+
| 1 | Rose | 18920054321 |
| 2 | Jack | 13821600001 |
| 3 | Mary | 15920256789 |
| 4 | Kate | 18828088888 |
+----------+------------+-------------+
4 rows in set (0.03 sec)
gbase>
Top comments (0)