GBASE Database

Posted on Nov 5, 2024

GBASE数据库 | GBase 8c Typical Problem Resolution Examples

#database

In the practical usage scenarios of the GBase Database (GBase 数据库), issues may arise due to network failures or improper operations. This article primarily analyzes the issue where the gs_ctl tool process for rebuilding a standby instance is interrupted, delves into its root cause, and provides solutions.

1. Solution for Incomplete Key File Recovery When `gs_ctl` Process for Rebuilding Standby Instance Is Interrupted

1.1 Problem Symptoms

During the process of rebuilding a standby instance, the operation is interrupted, and subsequent attempts to rebuild the instance fail, resulting in the following error messages:

CRC checksum does not match value stored in file, maybe the cipher file is corrupt
non obs cipher file or random parameter file is invalid.
read cipher file or random parameter file failed.
2020-06-18 20:58:12.080 5eeb64e3.1 [unknown] 140697304617088 [unknown] 0 dn_6001_6002 F0000 0 [BACKEND] FATAL:  could not load server certificate file "server.crt": no start line
[2020-06-18 20:58:12.086][24066][dn_6001_6002][gs_ctl]:  waitpid 24446 failed, exitstatus is 256, ret is 2

1.2 Cause Analysis

The certificate files are incomplete when the rebuild process is interrupted, leading to failures in subsequent attempts to rebuild the instance.

1.3 Steps for Resolution

1) Check the size of the certificate files in the data directory:

Run the command to list the files:

ll

Check the sizes of the key files:

   -rw------- 1 omm omm       0 Jun 18 20:58 server.crt
   -rw------- 1 omm omm       0 Jun 18 20:58 server.key
   -rw------- 1 omm omm       0 Jun 18 20:58 server.key.cipher
   -rw------- 1 omm omm       0 Jun 18 20:58 server.key.rand

2) If the certificate files are 0 bytes in size, delete them:

   rm -rf server.crt server.key server.key.cipher server.key.rand

3) Rebuild the standby instance:

   gs_ctl build -D data_dir

Note: If the standby database is already stopped, you will need to regenerate the certificate files or copy them from the $GAUSSHOME/share directory into the data directory. Afterward, start the standby instance and rebuild the instance.

2. Long Delay When Querying Cluster Status Using `gs_om -t status --all`

2.1 Problem Symptoms

After running the gs_om -t status --all command, there is no response for an extended period.

2.2 Cause Analysis

This could be due to the database master process hanging. The query command invokes gsql or gs_ctl to check the database status, but when the process hangs, it does not provide a response and will only exit after a timeout.

2.3 Steps for Resolution

1) Check if gsql can access the database. If the following message appears, it indicates that the gaussdb process has hung, causing the database to be in an abnormal state:

   gsql -d postgres -p 29776
   gsql: wait (null):29776 timeout expired, errno: Success

2) Check the postgresql-*.log files for error messages and resolve the issues accordingly:

   cd $GAUSSLOG/pg_log/dn_6001; grep "ERROR\|FATAL" postgresql-*.log

3) If the database has hung, and gs_om commands are unresponsive, find the process PID on each node and kill it:

   ps -ef | grep $GAUSSHOME/bin/gaussdb | grep -v grep
   kill -9 $pid

4) After killing the processes on all nodes, execute the startup command on one of the nodes. In a test environment, you can directly restart the database. For production environments, please contact technical support engineers.

   gs_om -t start

3. `gs_sshexkey` Reports an Error with the Same User but Different Passwords

3.1 Problem Symptoms

In an openEuler environment, gs_sshexkey supports trust between the same user with different passwords. However, even when entering the correct password, authentication failure occurs.

3.2 Cause Analysis

Upon inspecting the system log (/var/log/secure), you may find logs such as:

**pam_faillock(sshd:auth): Consecutive login failures for user**

This indicates that the current user's password attempt exceeded the maximum number of attempts and the user has been temporarily locked.

3.3 Steps for Resolution

Modify the relevant configuration files (system-auth, password-auth, password-auth-crond) in the /etc/pam.d directory, and increase the deny=3 value appropriately. After all the trust relationships are established, revert the changes.

These are examples of solutions for three common issues. If you have any other questions about GBase Database (GBase数据库), feel free to leave a comment and we can discuss further!

DEV Community

GBASE数据库 | GBase 8c Typical Problem Resolution Examples

1. Solution for Incomplete Key File Recovery When `gs_ctl` Process for Rebuilding Standby Instance Is Interrupted

1.1 Problem Symptoms

1.2 Cause Analysis

1.3 Steps for Resolution

2. Long Delay When Querying Cluster Status Using `gs_om -t status --all`

2.1 Problem Symptoms

2.2 Cause Analysis

2.3 Steps for Resolution

3. `gs_sshexkey` Reports an Error with the Same User but Different Passwords

3.1 Problem Symptoms

3.2 Cause Analysis

3.3 Steps for Resolution

Top comments (0)

Read next

Multi-Region Distributed SQL Transaction Latency

Exploring new AWS Aurora DSQL. What is it ? Why it is important ? How to quickstart ?

Database schema design of Splitwise application

Day 1: Getting Started with SQL - Basics | Beginners' Guide : Mastering

1. Solution for Incomplete Key File Recovery When gs_ctl Process for Rebuilding Standby Instance Is Interrupted

1.1 Problem Symptoms

1.2 Cause Analysis

1.3 Steps for Resolution

2. Long Delay When Querying Cluster Status Using gs_om -t status --all

2.1 Problem Symptoms

2.2 Cause Analysis

2.3 Steps for Resolution

3. gs_sshexkey Reports an Error with the Same User but Different Passwords

3.1 Problem Symptoms

3.2 Cause Analysis

3.3 Steps for Resolution

Read next

Multi-Region Distributed SQL Transaction Latency

Exploring new AWS Aurora DSQL. What is it ? Why it is important ? How to quickstart ?

Database schema design of Splitwise application

Day 1: Getting Started with SQL - Basics | Beginners' Guide : Mastering

1. Solution for Incomplete Key File Recovery When `gs_ctl` Process for Rebuilding Standby Instance Is Interrupted

2. Long Delay When Querying Cluster Status Using `gs_om -t status --all`

3. `gs_sshexkey` Reports an Error with the Same User but Different Passwords