In the practical usage scenarios of the GBase Database (GBase 数据库), issues may arise due to network failures or improper operations. This article primarily analyzes the issue where the
gs_ctl
tool process for rebuilding a standby instance is interrupted, delves into its root cause, and provides solutions.
1. Solution for Incomplete Key File Recovery When gs_ctl
Process for Rebuilding Standby Instance Is Interrupted
1.1 Problem Symptoms
During the process of rebuilding a standby instance, the operation is interrupted, and subsequent attempts to rebuild the instance fail, resulting in the following error messages:
CRC checksum does not match value stored in file, maybe the cipher file is corrupt
non obs cipher file or random parameter file is invalid.
read cipher file or random parameter file failed.
2020-06-18 20:58:12.080 5eeb64e3.1 [unknown] 140697304617088 [unknown] 0 dn_6001_6002 F0000 0 [BACKEND] FATAL: could not load server certificate file "server.crt": no start line
[2020-06-18 20:58:12.086][24066][dn_6001_6002][gs_ctl]: waitpid 24446 failed, exitstatus is 256, ret is 2
1.2 Cause Analysis
The certificate files are incomplete when the rebuild process is interrupted, leading to failures in subsequent attempts to rebuild the instance.
1.3 Steps for Resolution
1) Check the size of the certificate files in the data directory:
Run the command to list the files:
ll
Check the sizes of the key files:
-rw------- 1 omm omm 0 Jun 18 20:58 server.crt
-rw------- 1 omm omm 0 Jun 18 20:58 server.key
-rw------- 1 omm omm 0 Jun 18 20:58 server.key.cipher
-rw------- 1 omm omm 0 Jun 18 20:58 server.key.rand
2) If the certificate files are 0 bytes in size, delete them:
rm -rf server.crt server.key server.key.cipher server.key.rand
3) Rebuild the standby instance:
gs_ctl build -D data_dir
Note: If the standby database is already stopped, you will need to regenerate the certificate files or copy them from the $GAUSSHOME/share
directory into the data directory. Afterward, start the standby instance and rebuild the instance.
2. Long Delay When Querying Cluster Status Using gs_om -t status --all
2.1 Problem Symptoms
After running the gs_om -t status --all
command, there is no response for an extended period.
2.2 Cause Analysis
This could be due to the database master process hanging. The query command invokes gsql
or gs_ctl
to check the database status, but when the process hangs, it does not provide a response and will only exit after a timeout.
2.3 Steps for Resolution
1) Check if gsql
can access the database. If the following message appears, it indicates that the gaussdb
process has hung, causing the database to be in an abnormal state:
gsql -d postgres -p 29776
gsql: wait (null):29776 timeout expired, errno: Success
2) Check the postgresql-*.log
files for error messages and resolve the issues accordingly:
cd $GAUSSLOG/pg_log/dn_6001; grep "ERROR\|FATAL" postgresql-*.log
3) If the database has hung, and gs_om
commands are unresponsive, find the process PID on each node and kill it:
ps -ef | grep $GAUSSHOME/bin/gaussdb | grep -v grep
kill -9 $pid
4) After killing the processes on all nodes, execute the startup command on one of the nodes. In a test environment, you can directly restart the database. For production environments, please contact technical support engineers.
gs_om -t start
3. gs_sshexkey
Reports an Error with the Same User but Different Passwords
3.1 Problem Symptoms
In an openEuler environment, gs_sshexkey
supports trust between the same user with different passwords. However, even when entering the correct password, authentication failure occurs.
3.2 Cause Analysis
Upon inspecting the system log (/var/log/secure
), you may find logs such as:
**pam_faillock(sshd:auth): Consecutive login failures for user**
This indicates that the current user's password attempt exceeded the maximum number of attempts and the user has been temporarily locked.
3.3 Steps for Resolution
Modify the relevant configuration files (system-auth
, password-auth
, password-auth-crond
) in the /etc/pam.d
directory, and increase the deny=3
value appropriately. After all the trust relationships are established, revert the changes.
These are examples of solutions for three common issues. If you have any other questions about GBase Database (GBase数据库), feel free to leave a comment and we can discuss further!
Top comments (0)