DEV Community

Cong Li
Cong Li

Posted on

GBASE数据库 | GBase 8c Typical Problem Resolution Examples

In the practical usage scenarios of the GBase Database (GBase 数据库), issues may arise due to network failures or improper operations. This article primarily analyzes the issue where the gs_ctl tool process for rebuilding a standby instance is interrupted, delves into its root cause, and provides solutions.

1. Solution for Incomplete Key File Recovery When gs_ctl Process for Rebuilding Standby Instance Is Interrupted

1.1 Problem Symptoms

During the process of rebuilding a standby instance, the operation is interrupted, and subsequent attempts to rebuild the instance fail, resulting in the following error messages:

CRC checksum does not match value stored in file, maybe the cipher file is corrupt
non obs cipher file or random parameter file is invalid.
read cipher file or random parameter file failed.
2020-06-18 20:58:12.080 5eeb64e3.1 [unknown] 140697304617088 [unknown] 0 dn_6001_6002 F0000 0 [BACKEND] FATAL:  could not load server certificate file "server.crt": no start line
[2020-06-18 20:58:12.086][24066][dn_6001_6002][gs_ctl]:  waitpid 24446 failed, exitstatus is 256, ret is 2
Enter fullscreen mode Exit fullscreen mode

1.2 Cause Analysis

The certificate files are incomplete when the rebuild process is interrupted, leading to failures in subsequent attempts to rebuild the instance.

1.3 Steps for Resolution

1) Check the size of the certificate files in the data directory:

Run the command to list the files:

   ll
Enter fullscreen mode Exit fullscreen mode

Check the sizes of the key files:

   -rw------- 1 omm omm       0 Jun 18 20:58 server.crt
   -rw------- 1 omm omm       0 Jun 18 20:58 server.key
   -rw------- 1 omm omm       0 Jun 18 20:58 server.key.cipher
   -rw------- 1 omm omm       0 Jun 18 20:58 server.key.rand
Enter fullscreen mode Exit fullscreen mode

2) If the certificate files are 0 bytes in size, delete them:

   rm -rf server.crt server.key server.key.cipher server.key.rand
Enter fullscreen mode Exit fullscreen mode

3) Rebuild the standby instance:

   gs_ctl build -D data_dir
Enter fullscreen mode Exit fullscreen mode

Note: If the standby database is already stopped, you will need to regenerate the certificate files or copy them from the $GAUSSHOME/share directory into the data directory. Afterward, start the standby instance and rebuild the instance.

2. Long Delay When Querying Cluster Status Using gs_om -t status --all

2.1 Problem Symptoms

After running the gs_om -t status --all command, there is no response for an extended period.

2.2 Cause Analysis

This could be due to the database master process hanging. The query command invokes gsql or gs_ctl to check the database status, but when the process hangs, it does not provide a response and will only exit after a timeout.

2.3 Steps for Resolution

1) Check if gsql can access the database. If the following message appears, it indicates that the gaussdb process has hung, causing the database to be in an abnormal state:

   gsql -d postgres -p 29776
   gsql: wait (null):29776 timeout expired, errno: Success
Enter fullscreen mode Exit fullscreen mode

2) Check the postgresql-*.log files for error messages and resolve the issues accordingly:

   cd $GAUSSLOG/pg_log/dn_6001; grep "ERROR\|FATAL" postgresql-*.log
Enter fullscreen mode Exit fullscreen mode

3) If the database has hung, and gs_om commands are unresponsive, find the process PID on each node and kill it:

   ps -ef | grep $GAUSSHOME/bin/gaussdb | grep -v grep
   kill -9 $pid
Enter fullscreen mode Exit fullscreen mode

4) After killing the processes on all nodes, execute the startup command on one of the nodes. In a test environment, you can directly restart the database. For production environments, please contact technical support engineers.

   gs_om -t start
Enter fullscreen mode Exit fullscreen mode

3. gs_sshexkey Reports an Error with the Same User but Different Passwords

3.1 Problem Symptoms

In an openEuler environment, gs_sshexkey supports trust between the same user with different passwords. However, even when entering the correct password, authentication failure occurs.

3.2 Cause Analysis

Upon inspecting the system log (/var/log/secure), you may find logs such as:

**pam_faillock(sshd:auth): Consecutive login failures for user**
Enter fullscreen mode Exit fullscreen mode

This indicates that the current user's password attempt exceeded the maximum number of attempts and the user has been temporarily locked.

3.3 Steps for Resolution

Modify the relevant configuration files (system-auth, password-auth, password-auth-crond) in the /etc/pam.d directory, and increase the deny=3 value appropriately. After all the trust relationships are established, revert the changes.


These are examples of solutions for three common issues. If you have any other questions about GBase Database (GBase数据库), feel free to leave a comment and we can discuss further!

Top comments (0)