DEV Community

Cong Li
Cong Li

Posted on

Best Practices for GBase 8a MPP Cluster Multi-Instance Management

1. Multi-Instance Management

1.1 Directory Structure for Multi-Instance

Comparing directory structures between multi-instance and single-instance setups:

Image description

Image description

Environment Variables for Multi-Instance Servers

Comparison of environment variables (/home/gbase/.gbase_profile):

Multi-Instance Composite Node:

  • 192.168.146.20 & 192.168.146.40:
export GBASE_INSTANCES_BASE=/opt
export GBASE_INSTANCES=/opt/192.168.146.40/gbase_profile
export GBASE_HOME=/opt/192.168.146.40/gnode/server
PATH=$GBASE_HOME/bin:$PATH
export GBASE_INSTANCES=/opt/192.168.146.20/gbase_profile:$GBASE_INSTANCES
export GBASE_HOME=/opt/192.168.146.20/gnode/server
PATH=$GBASE_HOME/bin:$PATH

if [ -f /opt/192.168.146.20/gbase_profile ]; then 
  . /opt/192.168.146.20/gbase_profile 
fi

if [ -f /opt/192.168.146.20/gcware_profile ]; then 
  . /opt/192.168.146.20/gcware_profile 
fi
Enter fullscreen mode Exit fullscreen mode

Non-Multi-Instance Composite Node:

  • 192.168.146.22:
export GBASE_INSTANCES_BASE=/opt
export GBASE_INSTANCES=/opt/192.168.146.22/gbase_profile
export GBASE_HOME=/opt/192.168.146.22/gnode/server
PATH=$GBASE_HOME/bin:$PATH

if [ -f /opt/192.168.146.22/gbase_profile ]; then 
  . /opt/192.168.146.22/gbase_profile 
fi

if [ -f /opt/192.168.146.22/gcware_profile ]; then 
  . /opt/192.168.146.22/gcware_profile 
fi
Enter fullscreen mode Exit fullscreen mode

Comparison of Environment Variable Files between Composite Node and Pure GNode Node:

  • 192.168.146.20 Composite Node:
/opt/192.168.146.20/gbase_profile
Enter fullscreen mode Exit fullscreen mode
  • 192.168.146.40 Pure GNode Node:
/opt/192.168.146.40/gbase_profile

export GBASE_BASE=/opt/192.168.146.20/gnode
export GBASE_HOME=/opt/192.168.146.20/gnode/server
export GBASE_SID=gbase
export GCLUSTER_USER=gbase
export TCMALLOC_AGGRESSIVE_DECOMMIT=1
ulimit -v unlimited
unset TERMINFO
export TERMINFO_DIRS=/opt/192.168.146.20/gcluster/server/share/terminfo:/opt/192.168.146.20/gnode/server/share/terminfo
export GCLUSTER_PREFIX=/opt/192.168.146.20
export GCWARE_BASE=/opt/192.168.146.20/gcware
export PYTHONPATH=$PYTHONPATH:$GCWARE_BASE/python
export SSH_GBASE_PASSWD=6762617365
export GCLUSTER_HOME=/opt/192.168.146.20/gcluster/server
export GCLUSTER_BASE=/opt/192.168.146.20/gcluster
export GCLUSTER_SID=gcluster
export GBASE_HOME=/opt/192.168.146.40/gnode/server
export GBASE_SID=gbase
export SSH_GBASE_PASSWD=6762617365
export GCLUSTER_HOME=/opt/192.168.146.40/gcluster/server
export GCLUSTER_BASE=/opt/192.168.146.40/gcluster
export GCLUSTER_SID=gcluster
Enter fullscreen mode Exit fullscreen mode

1.2 Service Management for Multi-Instance

Service Overview:

  • Coordinator Services: gclusterd, gcrecover
  • GCWare Services: gcware, gcware_monit, gcware_mmonit
  • GNode Services: gbased, gc_sync_server
  • Monitoring Services:
    • Coordinator: gcluster/server/bin, gcmonit, gcmmonit
    • GNode: gnode/server/bin, gcmonit, gcmmonit

Unified Start/Stop Commands for All Instances on a Multi-Instance Server:

gcluster_services all start
gcluster_services all stop
gcluster_services all restart
Enter fullscreen mode Exit fullscreen mode

GCWare services are not controlled by the unified start/stop command. The services controlled by the unified command are:

gclusterd, gcrecover, gbased, gc_sync_server, gcmonit, gcmmonit
Enter fullscreen mode Exit fullscreen mode

Separate Start/Stop Commands for GCWare Services:

gcware_services all start
gcware_services all stop
gcware_services all restart
Enter fullscreen mode Exit fullscreen mode

Services controlled by GCWare start/stop commands: gcware, gcware_monit, gcware_mmonit

Example:

  • Multi-Instance Server (2 GNodes): 192.168.146.20 & 192.168.146.40, with 192.168.146.20 being the composite node.

Image description
Unified Start/Stop Commands for a Specific Type of Service:

gcluster_services gcluster start/stop
gcluster_services gcware start/stop
gcluster_services gcrecover start/stop
gcluster_services syncserver start/stop
gcluster_services gbase start/stop
gcmonit.sh start/stop/status
Enter fullscreen mode Exit fullscreen mode

Example:

[gbase@rhel73-1 ~]$ gcluster_services gbase stop
Stopping gbase: [OK]
Stopping gbase: [OK]
[gbase@rhel73-1 ~]$ gcmonit.sh stop
Stopping GCMonit success!
Enter fullscreen mode Exit fullscreen mode

Start/Stop Commands for a Single Instance in a Multi-Instance Server:

Only GNode supports multiple instances, while GCWare and Coordinator can have only one instance per physical machine. Start/stop commands for GNode and SyncServer services:

gcluster_services syncserver_ip start/stop
gcluster_services gbase_ip start/stop
Enter fullscreen mode Exit fullscreen mode

Example:

gcluster_services gbase_192.168.146.40 stop
gcluster_services syncserver_192.168.146.40 stop
Enter fullscreen mode Exit fullscreen mode

Connecting to a GNode Instance with gncli in a Multi-Instance Server:

gncli –h ip
Enter fullscreen mode Exit fullscreen mode

Example:

[gbase@rhel73-1 sys_tablespace]$ pwd
/opt/192.168.146.40/gnode/userdata/gbase/testdb/sys_tablespace
[gbase@rhel73-1 sys_tablespace]$ ls
t_n1  t_n5
[gbase@rhel73-1 sys_tablespace]$ gncli -h 192.168.146.40
GBase client 9.5.3.17.123187. Copyright (c) 2004-2020, GBase.  All Rights Reserved.
gbase> use testdb;
Query OK, 0 rows affected (Elapsed: 00:00:00.04)
gbase> show tables;
+------------------+
| Tables_in_testdb |
+------------------+
| t_n1             |
| t_n5             |
+------------------+
2 rows in set (Elapsed: 00:00:00.06)
Enter fullscreen mode Exit fullscreen mode

2. Features Affected by Multi-Instance Deployment

Compared to versions V8.6 and V9.5.2, version V9.5.3 supports multi-instance deployment, affecting the following features:

  • License acquisition and installation changes
  • Node directory structure changes
  • Cluster service management changes
  • Cluster upgrade and rollback changes
  • Cluster node replacement changes
  • Cluster backup and recovery changes
  • Compatibility issues between multi-instance and non-multi-instance versions—local load changes

2.1 Upgrade and Rollback

Single-instance to Multi-instance Upgrade

Steps: Upgrade → Expansion → Multi-instance NUMA Binding [Optional]

Step 1: Upgrade
The upgrade syntax remains unchanged. The original cluster to be upgraded must be version 952 or above.

./gcinstall.py --silent=demo.options -U
Enter fullscreen mode Exit fullscreen mode

For cluster upgrades, gcwareHost must be specified and must be the old version gcware node. For cluster upgrades, gcwareHostNodeID cannot be specified if using IPv4; if not using IPv4, gcwareHostNodeID must be specified and its value must be the node ID of the existing gcware.

Note:
To find the gcwareHostNodeID for non-IPv4, check the gcware.conf file in the gcware/config directory. The nodeid under Totem is gcwarehostnodeid, and the nodeid under gcware is coordinatehostnodeid.

Step 2: Expansion
Execute with the database administrator user (e.g., gbase).

Install cluster software on the new multi-instance node:

./gcinstall.py --silent=demo.options --license_file=gbase.lic
Enter fullscreen mode Exit fullscreen mode

Modify demo.options: add dataHost, fill in the existing existCoordinateHost, existDataHost, and existGcwareHost, and comment out #coordinateHost, #coordinateHostNodeID, #gcwareHost, and #gcwareHostNodeID.

Create distribution:

gcadmin distribution gcChangeInfo.xml p 1 d 1
Enter fullscreen mode Exit fullscreen mode

Generate a new hashmap:

gbase> initnodedatamap;
Enter fullscreen mode Exit fullscreen mode

Data redistribution:
Adjust the rebalance parameters as needed based on the data volume to be redistributed:

gbase> set global gcluster_rebalancing_concurrent_count=0;
gbase> rebalance instance;
gbase> set global gcluster_rebalancing_concurrent_count=3;
gbase> select * from gclusterdb.rebalancing_status;
Enter fullscreen mode Exit fullscreen mode

Delete the old hashmap:

gbase> refreshnodedatamap drop 1;
Enter fullscreen mode Exit fullscreen mode

Delete the old distribution:

gcadmin rmdistribution 1;
Enter fullscreen mode Exit fullscreen mode

Step 3: Multi-instance NUMA Binding [Optional]
You can choose to bind the NUMA node based on actual needs. Refer to the NUMA binding section in the previous chapters. Modify the gnode/server/bin/gcluster_services file under the multi-instance and restart the cluster service.

Note:
Possible issues during the expansion step of a single-instance upgrade to a multi-instance version:

./gcinstall.py --silent=demo.options --license_file=20210323.lic
Enter fullscreen mode Exit fullscreen mode

After execution, the multi-instance data node may not have a license file. Re-execute ./License import, but if it still doesn't have the file, manually copy it and then start the gbase service.

Multi-instance to Multi-instance Upgrade

The upgrade syntax remains unchanged.

Automatic Rollback
Errors occurring during single-instance to multi-instance or multi-instance to multi-instance upgrades will automatically rollback without manual intervention.

Manual Rollback
The cluster must be available before rollback. Rollback syntax is as follows:
Switch to the gcinstall directory:

python Restore.py --silent=demo.options –backupFile= --backupGcwareFile=
Enter fullscreen mode Exit fullscreen mode

Note:
If rolling back from multi-instance to single-instance, ensure no expansion or contraction operations were performed on the cluster after the upgrade. The upgrade program backs up important files of the original cluster before upgrading, such as system tables and environment variables; user data is not backed up. If the original cluster was multi-instance, the backup file will have a multi-instance identifier. The rollback program will automatically determine whether to rollback to single-instance or multi-instance by reading the backup file.

2.2 Node Replacement

There are three types of node replacement:

Coordinator Node Replacement
On a normally functioning management node, execute the following command:

replace.py --host=IP --type=coor --dbaUser=dba_user --dbaUserPwd=*** --generalDBUser=db_user --generalDBPwd='***' –overwrite --sync_coordi_metadata_timeout=minutes --retry_times=* --license_file=***
Enter fullscreen mode Exit fullscreen mode

Gnode Node Replacement
On a normally functioning management node, execute the following command:

replace.py --host=IP --type=data --dbaUser=dba_user --dbaUserPwd=*** --vcname=vc_name --generalDBUser=db_user --generalDBPwd='***' –overwrite --sync_coordi_metadata_timeout=minutes --retry_times=* --license_file=***
Enter fullscreen mode Exit fullscreen mode

Gcware Node Replacement
On a normally functioning gcware node, execute the following command:

cd $GCWARE_BASE/gcware_server/
./gcserver.py --host=IP --dbaUser=gbase --dbaPwd=gbase --overwrite
Enter fullscreen mode Exit fullscreen mode

Coordinator and gnode node replacements are not affected by multi-instance; operations remain unchanged (including composite nodes of coordinator and gnode).

Gcware node replacement syntax after unbinding gcware is as follows:

Image description

Multi-instance gcware node replacement removes the --prefix parameter. gcserver.py reads environment variables to obtain the installation path of gcware. gcserver.py reinstalls the replaced gcware node, copies gcware data, and completes the gcware node replacement.

If gcware, gcluster, and gnode are deployed on the same server, it is recommended to replace the gcware node first, then the gcluster node, and finally the gnode node.

2.3 Backup and Recovery

Single-machine multi-instance backup and recovery rcman adds parameter -s to specify the backup instance's IP. If this parameter is not specified, the backup fails.

The current version supports backup and recovery for non-gbase database users. gcrcman.py and rcman add parameter -U to specify the database username for backup and recovery. This user needs to have access to all cluster tables. If the specified user lacks sufficient permissions, the program will error out and exit. If not specified, the default is the gbase user.

The current version supports backup and recovery of independently deployed gcware versions.

2.4 Impact and Solutions for Local Loading in Multi-instance

Multi-instance deployment impacts local file loading functionality, leading to compatibility issues compared to non-multi-instance versions. The local loading path syntax format in multi-instance uses the following format:

file://host + abs_path
Enter fullscreen mode Exit fullscreen mode

Note: If the cluster is deployed as multi-instance, any original local loading path using the syntax file://+abs_path needs to be rewritten to use the specified host syntax format.

Top comments (0)