DEV Community: Michael

Pure Shell Script to Expand IPv6 Addresses

Michael — Tue, 07 Jul 2026 13:33:45 +0000

When deploying a gbase database cluster, you often need to handle IPv6 addresses in configuration files. Many tools require the full 8‑segment, 4‑digit hexadecimal format, while real‑world addresses often come in compressed notation (e.g., 2001:db8::1). This article presents a pure Bash script that expands any compressed IPv6 address to the complete, canonical form, with built‑in validation and prefix handling.

Features

Expands any compressed IPv6 address (including :: zero‑compression) into 8 segments of 4 lowercase hex digits
Handles addresses with prefixes (e.g., 2001:db8::1/64), with an option to preserve or strip the prefix
Built‑in validation: illegal characters, :: occurrence count, segment length, and hex validity
Pure Shell implementation; relies only on standard tools (bc, printf)

The Script

#!/bin/bash
set -euo pipefail

SCRIPT_NAME=$(basename "$0")

show_help() {
    cat << EOF
Usage: $SCRIPT_NAME [options] <IPv6 address>

Expands a compressed IPv6 address to the full 8‑segment, 4‑digit format.
Supports address validation and prefix handling.

Options:
    -h, --help      Show this help message and exit
    -p, --prefix    Preserve the prefix (e.g., 2001:db8::1/64); by default only the address part is expanded

Examples:
    $SCRIPT_NAME 2001:db8::1
    $SCRIPT_NAME -p ::1/128
    $SCRIPT_NAME fe80::1234:5678:9abc:def0
EOF
}

validate_ipv6_chars() {
    local addr="$1"
    if [[ ! "$addr" =~ ^[0-9a-fA-F:/]+$ ]]; then
        echo "Error: IPv6 address contains illegal characters (only 0-9/a-f/A-F/:/ allowed)" >&2
        exit 1
    fi
}

expand_ipv6_core() {
    local pure_addr="$1"
    local expanded_segments=()

    local colon_count=$(grep -o "::" <<< "$pure_addr" | wc -l)
    if [[ $colon_count -gt 1 ]]; then
        echo "Error: Invalid IPv6 address (:: can appear only once)" >&2
        exit 1
    fi

    if [[ "$pure_addr" == "::" ]]; then
        echo "0000:0000:0000:0000:0000:0000:0000:0000"
        return
    fi

    local temp_addr="${pure_addr//::/:__ZERO__:}"
    local segments=()
    IFS=: read -ra segments <<< "$temp_addr"

    local filtered_segments=()
    for seg in "${segments[@]}"; do
        if [[ -n "$seg" ]]; then
            filtered_segments+=("$seg")
        fi
    done
    segments=("${filtered_segments[@]}")

    local zero_pos=-1
    local seg_count=${#segments[@]}
    for i in "${!segments[@]}"; do
        if [[ "${segments[$i]}" == "__ZERO__" ]]; then
            zero_pos=$i
            break
        fi
    done

    local new_segments=()
    if [[ $zero_pos -ne -1 ]]; then
        local fill_zeros=$((8 - (seg_count - 1)))
        if [[ $fill_zeros -lt 0 ]]; then
            echo "Error: IPv6 address has too many segments (more than 8)" >&2
            exit 1
        fi
        for ((i=0; i<zero_pos; i++)); do
            new_segments+=("${segments[$i]}")
        done
        for ((i=0; i<fill_zeros; i++)); do
            new_segments+=("")
        done
        for ((i=zero_pos+1; i<seg_count; i++)); do
            new_segments+=("${segments[$i]}")
        done
    else
        new_segments=("${segments[@]}")
        if [[ ${#new_segments[@]} -ne 8 ]]; then
            echo "Error: IPv6 address segment count wrong (must be exactly 8 when no ::)" >&2
            exit 1
        fi
    fi

    for seg in "${new_segments[@]}"; do
        if [[ -z "$seg" || "$seg" == "0" ]]; then
            expanded_segments+=("0000")
            continue
        fi

        if [[ ${#seg} -gt 4 ]]; then
            echo "Error: IPv6 segment '$seg' too long (max 4 digits)" >&2
            exit 1
        fi

        if ! [[ "$seg" =~ ^[0-9a-fA-F]{1,4}$ ]]; then
            echo "Error: IPv6 segment '$seg' contains illegal characters" >&2
            exit 1
        fi

        local seg_upper=$(echo "$seg" | tr 'a-f' 'A-F')
        local seg_10=$(echo "ibase=16; $seg_upper" | bc)
        local seg_4digit=$(printf "%04x" "$seg_10" | tr 'A-F' 'a-f')
        expanded_segments+=("$seg_4digit")
    done

    local expanded_addr=$(IFS=:; echo "${expanded_segments[*]}")
    echo "$expanded_addr"
}

expand_ipv6() {
    local addr="$1"
    local keep_prefix="$2"
    local pure_addr
    local prefix=""

    if [[ "$addr" =~ / ]]; then
        pure_addr="${addr%/*}"
        prefix="${addr#*/}"
        if ! [[ "$prefix" =~ ^[0-9]+$ ]] || [[ "$prefix" -lt 0 ]] || [[ "$prefix" -gt 128 ]]; then
            echo "Error: Invalid IPv6 prefix '$prefix' (must be 0‑128)" >&2
            exit 1
        fi
    else
        pure_addr="$addr"
    fi

    validate_ipv6_chars "$pure_addr"

    local expanded_addr=$(expand_ipv6_core "$pure_addr")

    if [[ "$keep_prefix" -eq 1 && -n "$prefix" ]]; then
        echo "${expanded_addr}/${prefix}"
    else
        echo "$expanded_addr"
    fi
}

KEEP_PREFIX=0
IPV6_ADDR=""

while [[ $# -gt 0 ]]; do
    case "$1" in
        -h|--help)
            show_help
            exit 0
            ;;
        -p|--prefix)
            KEEP_PREFIX=1
            shift
            ;;
        *)
            if [[ -z "$IPV6_ADDR" ]]; then
                IPV6_ADDR="$1"
                shift
            else
                echo "Error: Extra argument '$1'" >&2
                show_help >&2
                exit 1
            fi
            ;;
    esac
done

if [[ -z "$IPV6_ADDR" ]]; then
    echo "Error: IPv6 address is required" >&2
    show_help >&2
    exit 1
fi

expand_ipv6 "$IPV6_ADDR" "$KEEP_PREFIX"

Running the Script

$ ./ipv6_expand.sh -p 2001:1::5:0:0:8/64
2001:0001:0000:0000:0005:0000:0000:0008/64
$ ./ipv6_expand.sh 2001:1::5:0:0:8
2001:0001:0000:0000:0005:0000:0000:0008

When configuring inter‑node communication in a gbase database cluster that involves IPv6, this script helps you normalize addresses — ensuring they are consistently written in configuration files and simplifying address comparisons during automated deployments.

Who Decides the SSH Cipher? Why the Client's Preference Matters

Michael — Mon, 06 Jul 2026 12:43:24 +0000

A common misconception is that the SSH server unilaterally chooses the encryption algorithm for a connection. In reality, the negotiation follows a strict rule: the server picks the first algorithm from the client's list that it also supports. This means the client's preference order directly determines the final cipher — and putting the best algorithm first can significantly improve performance in your gbase database environment.

What the Standards Say

The OpenSSH documentation states:

The first algorithm in the list (that the client offers to the server) that matches an offer from the server, is what will be selected.

RFC 4253, the SSH Transport Layer Protocol, is even more explicit:

The first algorithm on the client's name-list that satisfies the requirements and is also supported by the server MUST be chosen.

How It Plays Out

Imagine a client sends this cipher list (in order of preference): A, B, C, D, E
The server supports these ciphers: B, C, D, E, A

Both sides support A and B. According to the rule, the first match in the client's list is A — so A becomes the session cipher. The server's own preferred order (B first) is irrelevant; the decision is client‑driven.

Practical Impact for GBase 8a

In a GBase 8a cluster, SSH underpins remote management and SFTP data loading. If you want to prioritise a modern, fast cipher like chacha20-poly1305@openssh.com or aes128-ctr over slower legacy ones, adjust the Ciphers directive in your client's ~/.ssh/config or /etc/ssh/ssh_config and place the preferred algorithm first. This simple ordering can measurably improve throughput for large data transfers.

Host *
    Ciphers chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr

By controlling the cipher preference on the client side, you keep your gbase database operations both secure and fast — no server‑side changes required.

Quick Diagnostic Commands for GBase 8a Cluster Glitches

Michael — Sun, 05 Jul 2026 15:43:00 +0000

When a gbase database cluster experiences sporadic slowdowns, the cause is often isolated to a single unhealthy node. These commands form a rapid triage flow — from cluster‑wide status down to a specific node's operating system resources.

1. Cluster Health Check

Run gcadmin at the OS prompt. Look for any node in CLOSE or OFFLINE state. If everything is OPEN and the cluster is ACTIVE, the problem lies deeper.

2. Find the Longest‑Running SQL on Coordinators

Identify queries that have been executing far longer than normal. This is the first suspect list.

SELECT COORDINATOR_NAME, ID, user, host, command, start_time, time, state,
       substring(info,0,100) info
FROM information_schema.COORDINATORS_TASK_INFORMATION
WHERE command='query' AND time >=0
ORDER BY time DESC LIMIT 10;

3. Pinpoint the Straggling Data Node

Cross‑reference with the data‑node task view. If a query shows 3,600 seconds at the coordinator level but 2,900 seconds on a single data node, that node is your bottleneck.

SELECT NODE_NAME, ID, user, host, command, start_time, time, state,
       substring(info,0,100) info
FROM information_schema.GNODES_TASK_INFORMATION
WHERE command='query' AND info is not null
  AND info not like '%information_schema.processlist%'
ORDER BY time DESC LIMIT 10;

The node_name value (e.g., node3) can be matched to the Nodename field in gcadmin showcluster to obtain the actual IP address.

4. Log into the Suspicious Node and Inspect

4.1 Operating System Errors

dmesg -T | grep -i error

Look for hardware faults, filesystem issues, or OOM events.

4.2 Disk I/O Saturation

iostat -xdc 1

If %util is pegged at 100% or the await column exceeds 200, the node is almost certainly I/O‑bound.

4.3 Memory and SWAP Pressure

top

High SWAP usage combined with low free memory will cripple query performance instantly.

5. Check Data Loading Throughput

If import speed is the concern, query the real‑time load status view.

SELECT tb_name, IP, state, ELAPSED_TIME, avg_speed, progress,
       total_size, loaded_size
FROM information_schema.load_status
ORDER BY avg_speed;

As a rule of thumb, SFTP load speeds should stay above 8 MB/s, while FTP typically reaches 40–100 MB/s. Values far below these thresholds suggest network issues, misconfigured load parameters, or disk bottlenecks on the target node.

Keep these commands in your toolkit and you'll turn vague "the cluster feels slow" reports into precise, actionable diagnostics for your gbase database.

Demystifying SSH with -vvv: How GBase 8a's Underlying Connection Negotiates Encryption

Michael — Sun, 05 Jul 2026 14:39:00 +0000

SSH is the backbone of GBase 8a cluster management, powering everything from remote commands to SFTP data loading. When connection behavior seems sluggish or you need to optimize transport performance, understanding the protocol's inner workings becomes invaluable. This guide walks through the complete SSH connection lifecycle using the verbose ssh -vvv output, revealing how the server and client agree on encryption—and why that matters for your gbase database environment.

Breaking Down the Connection

1. The Handshake: TCP Connection

The client resolves the target IP, loads its configuration, and completes a standard TCP three-way handshake on port 22.

debug1: Connecting to 10.0.2.201 [10.0.2.201] port 22.
debug1: Connection established.

2. Identity File Scan

SSH tries public‑key authentication first, scanning common private‑key paths. When none are found, it silently falls back to password authentication later.

3. Protocol Version Agreement

Both sides exchange their software version strings to ensure they speak the same SSH-2.0 protocol. No mismatches found here — good to go.

debug1: Local version string SSH-2.0-OpenSSH_7.4
debug1: Remote protocol version 2.0, remote software version OpenSSH_7.4

4. Cipher Suite Negotiation (The Critical Part)

This is where performance and security intersect. The client sends its list of supported ciphers in order of preference; the server replies with its own list; then the server makes the final call.

Client's proposal:

debug2: ciphers ctos: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,...

Server's capabilities:

debug2: ciphers ctos: chacha20-poly1305@openssh.com,aes128-ctr,...,blowfish-cbc,3des-cbc

The server's final selection:

debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none

Because the client prioritized chacha20-poly1305@openssh.com and the server supports it, this modern, high‑performance authenticated cipher is chosen. The built‑in Poly1305 MAC means no separate message authentication algorithm is needed—hence <implicit>.

5. Host Authentication

The client verifies the server's ECDSA host key against its known_hosts file. If the fingerprint matches, the connection is trusted; otherwise, SSH warns of a potential man‑in‑the‑middle attack.

6. Session Key Generation

Using the negotiated key‑exchange algorithm, both sides independently compute a symmetric session key. From this point forward, all traffic is encrypted with the agreed chacha20-poly1305 cipher.

7. User Authentication

The client now tries to log in, cycling through available methods:

GSSAPI (Kerberos) – failed, no credentials
Public key – failed, no private key files found
Password – prompts the user for the password

This explains why your server may still ask for a password even when you haven't explicitly configured one—it's the fallback after preferred methods fail.

What This Means for Your GBase 8a Cluster

When transferring large datasets via SFTP or running heavy remote commands, the negotiated cipher directly impacts throughput. The -vvv output gives you an instant view of which algorithm is in use. If you see an outdated cipher like aes128-cbc or 3des-cbc, it may be time to adjust the Ciphers directive in sshd_config to prefer modern, faster algorithms like chacha20-poly1305 or aes128-ctr.

A smooth, high‑performance SSH layer is the first step toward reliable operations in any gbase database environment. Next time a connection feels slow, don't guess—run ssh -vvv and read the negotiation line.

Database Industry Trends and Technical Observations

Michael — Sun, 05 Jul 2026 14:37:06 +0000

Database Industry Trends and Technical Observations

Overview

In a recent GBASE Tech Cloud Talk, Jiang Chunyu, an industry expert, provided an in-depth analysis of the database industry landscape and emerging technical trends. The talk highlighted the rapid evolution of database technologies and the shifting market dynamics globally and in China.

Key Industry Trends

Global Market Growth

The database market continues to expand, driven by digital transformation across industries. Cloud databases are becoming the dominant deployment model, with cloud-native architectures gaining traction.

China's Database Ecosystem

China's database industry is experiencing significant growth, fueled by government policies and increasing demand for domestic solutions. The market is witnessing a surge in new database products and startups, focusing on both relational and NoSQL databases.

Technical Trends

Multi-Model Convergence

Modern databases are increasingly supporting multiple data models (relational, document, graph, etc.) within a single system. This reduces complexity for developers and improves operational efficiency.

Cloud-Native Architecture

Cloud-native databases leverage containerization, microservices, and serverless computing to provide elasticity, scalability, and cost efficiency. They are well-suited for dynamic workloads.

AI Integration

Artificial intelligence is being integrated into database management systems for autonomous tuning, query optimization, and anomaly detection. This trend is expected to accelerate.

Conclusion

The database industry is at a pivotal moment, with rapid innovation in both technology and business models. Staying updated with these trends is crucial for developers and enterprises alike.

GBase 8s Kernel Evolution: Expressing User Understanding Through Product

Michael — Sun, 05 Jul 2026 14:29:06 +0000

GBase 8s Kernel Evolution: Expressing User Understanding Through Product

GBase 8s, a database product, is undergoing a kernel-level evolution that reflects a profound understanding of its users. This article explores how the product's core advancements are driven by user needs.

The Core Philosophy

The evolution of GBase 8s is not just about technical upgrades; it's about aligning the product with real-world user scenarios. By focusing on kernel-level changes, the product aims to deliver better performance, stability, and usability.

Key Improvements

Performance Optimization: Kernel enhancements ensure faster query processing and data management.
Stability Enhancements: Core system improvements reduce downtime and increase reliability.
User-Centric Design: Every change is rooted in understanding user pain points and requirements.

Conclusion

GBase 8s's kernel evolution is a testament to how a product can evolve by truly listening to its users. This approach sets a benchmark for database technology development.

GBase 8a: A Decade of Iteration, Empowering Digital Business

Michael — Sun, 05 Jul 2026 14:13:24 +0000

GBase 8a: A Decade of Iteration, Empowering Digital Business

In the rapidly evolving landscape of data analytics, GBase 8a by Nandu General stands out as a homegrown MPP (Massively Parallel Processing) analytical database that has undergone over ten years of continuous technical iteration. From its early standalone version to the current cloud-native data warehouse and lakehouse architecture, GBase 8a has consistently adapted to meet the growing demands of modern enterprises.

The Evolution Path

GBase 8a's journey began as a single-node analytical database, designed to handle structured data with high performance. Over time, it evolved into a distributed MPP system, enabling parallel processing across multiple nodes for faster query execution. The next leap was the integration with cloud environments, resulting in a cloud data warehouse that offers elasticity and scalability. Most recently, GBase 8a adopted a lakehouse architecture, bridging the gap between data lakes and warehouses by supporting both structured and unstructured data, while maintaining ACID transactions and real-time analytics.

Introducing DataAgent: The Intelligent Agent

The latest version of GBase 8a introduces DataAgent, an intelligent agent capability that automates data management tasks. DataAgent can monitor system health, optimize query performance, and even predict resource needs based on historical patterns. This feature reduces the operational burden on DBAs and enhances overall system efficiency.

Real-World Deployments

GBase 8a has been deployed at scale in critical sectors such as finance, telecommunications, and government. For instance, in financial services, it supports real-time risk analysis and fraud detection on petabytes of transaction data. In telecom, it powers customer analytics and network optimization. Government agencies use it for smart city initiatives and public data services.

Core Value in Digital Transformation

As organizations accelerate their digital transformation, GBase 8a provides a robust foundation for data-driven decision-making. Its ability to handle PB-level data with real-time analytics, combined with cost-effective scalability, makes it a key enabler for businesses seeking to leverage data as a strategic asset.

In summary, GBase 8a's continuous innovation—from standalone to cloud, lakehouse, and AI-powered agents—demonstrates its commitment to meeting the evolving needs of digital business. For developers and data engineers, it offers a reliable, high-performance platform for building next-generation analytics applications.

How GBase 8a Handles Masked Columns in WHERE, GROUP BY, and Projections

Michael — Sun, 05 Jul 2026 13:55:00 +0000

GBase 8a supports data masking to protect sensitive information. But when a masked column appears in a WHERE clause, a GROUP BY, an ORDER BY, or when its data is copied to another column — does the database operate on the original values or the masked values? This article clarifies the behaviour through a set of direct tests.

Test Setup

Create a table with a default masking policy on an integer column and insert sample data:

CREATE TABLE "testmask" (
  "id" int(11) DEFAULT NULL MASKED WITH(FUNCTION='DEFAULT()')
);

INSERT INTO testmask VALUES (1), (2), (3), (4), (99);

Key Tests and Findings

1. Projection Returns Masked Values

A user without unmask privileges sees all id values replaced by the default mask 0. This is the basic masking behaviour.

SELECT * FROM testmask;
-- All five rows show id = 0

2. WHERE Filter Uses Original Values

A query with WHERE id = 1 returns exactly one row. Although the projected id still displays the masked value 0, the filter condition operates on the original data. The user can infer that a row with id = 1 exists, but the actual value is never revealed.

SELECT * FROM testmask WHERE id = 1;
-- One row returned; projected id is 0, but the predicate matched the original value 1

3. GROUP BY Uses Original Values

When grouping on a masked column, the aggregation is performed on the original data. The modulo operation id % 3 correctly reflects the distribution of the original values 1, 2, 3, 4, and 99.

SELECT id % 3, COUNT(*) FROM testmask GROUP BY id % 3;
-- Grouping is based on the original id values

4. ORDER BY Uses Original Values

Sorting on a masked column also uses the original data. Observing the rowid alongside the result confirms that rows are ordered by the original id descending, not by the uniform masked value 0.

SELECT rowid, t.* FROM testmask t ORDER BY id DESC;
-- Ordering is based on the original id values

5. Data Migration Writes Masked Values

When a masked column's data is moved — either via UPDATE to another column, or via INSERT SELECT into a new table — the physical data written is the masked value. The target column or table does not inherit the masking policy; it simply stores the already‑masked 0.

-- UPDATE migration
ALTER TABLE testmask ADD COLUMN id2 int;
UPDATE testmask SET id2 = id;
SELECT * FROM testmask; -- id2 is all 0, and id2 column has no mask

-- INSERT SELECT migration
CREATE TABLE testmask2 AS SELECT * FROM testmask;
SELECT * FROM testmask2; -- both id and id2 are all 0, no mask defined

Summary

The behaviour of data masking in a gbase database follows a clear principle:

Operation	Uses Masked Data	Uses Original Data
Projection (returned to client)	✅	❌
Data migration (UPDATE / INSERT SELECT)	✅ (writes masked value)	❌
WHERE filtering	❌	✅
GROUP BY aggregation	❌	✅
ORDER BY sorting	❌	✅

Masking is applied only when the column is projected — either returned to the application or copied to another storage location. When the column participates in filtering, grouping, or sorting, the engine operates on the original, unmasked values. This behaviour aligns with Oracle Data Redaction policies and is essential to understand when designing secure, yet performant, queries in GBASE's GBase 8a.

GBase 8a Performance Anomaly Case Study: How a Single Parameter Change Sparked a Chain Reaction

Michael — Sun, 05 Jul 2026 13:25:04 +0000

One seemingly innocent parameter adjustment — increasing group_concat_max_len to accommodate a business requirement — caused a cascade of performance degradation across a GBase 8a production cluster. A simple TOP‑N query that normally completed in seconds suddenly ran for over three hours, and multiple other queries on the same node slowed to a crawl. This article reconstructs the full investigation, from identifying the bottlenecked node to uncovering the hidden chain that turned a 200,000‑row sort into a 10 TB disk write storm.

1. Symptom: One Node’s I/O Pegged at 100%

Monitoring showed several queries exceeding 10,000 seconds of execution time. Cross‑referencing the coordinator‑level task view and the data‑node task view revealed that all slow queries were pinned to node3.

-- Coordinators
SELECT COORDINATOR_NAME, ID, user, host, command, start_time, time, state,
       substring(info,0,100) info
FROM information_schema.COORDINATORS_TASK_INFORMATION
WHERE command='query' AND time >=0
ORDER BY time DESC LIMIT 10;

-- Data nodes – all problematic queries on node3
SELECT NODE_NAME, ID, user, host, command, start_time, time, state,
       substring(info,0,100) info
FROM information_schema.GNODES_TASK_INFORMATION
WHERE command='query' AND info is not null
  AND info not like '%information_schema.processlist%'
ORDER BY time DESC LIMIT 10;

On node3, the iostat output showed disk utilisation at a flat 100%, with write rates hitting 900 MB/s. OS monitoring logs confirmed the spike started exactly when the slow queries began. Digging into the database temporary directory, we found thousands of files with a total size exceeding 10 TB.

2. Pinpointing the Culprit Query and Intermediate Results

The slowest query followed a pattern of three subqueries LEFT JOIN-ed together, with an outer ORDER BY … LIMIT 1000.

select xxxx
from (...) a
left join (...) b on ...
left join (...) c on ...
order by xxx
limit 1000

To rule out a Cartesian product, we materialised each subquery into a temporary table:

Subquery a: grouped 2 billion rows → 200,000 rows
Subquery b: distinct on a dimension → 2,000 rows
Subquery c: two‑table LEFT JOIN + group by → about 20,000 rows

Simplified SELECT COUNT(*) tests confirmed the joins produced exactly 200,000 rows, each completing in under 10 seconds. A sorted output of 200,000 rows with a LIMIT should never require terabytes of temp space — so something else was at play.

3. Root Cause: A Parameter Setting Inflated Column Width, Then Disk Usage

3.1 Find the “Heavy” Column

When we replaced the COUNT(*) with the original projection columns one by one, one column — originating from subquery c — caused the query to stall immediately. Inspecting the structure of the temporary table for subquery c revealed its data type: LONGTEXT.

3.2 Why LONGTEXT?

The original expression for that column was group_concat(xxx). The cluster‑level parameter group_concat_max_len had been changed from the default 32 KB to 1 MB to satisfy another business module.

show variables like '%group_concat_max_len%';
-- returned 1048576 (1 MB)

When GBase 8a creates an intermediate table (e.g., CREATE TABLE tmp AS SELECT …), it must determine the column width before executing the query. Because the parameter was set to 1 MB — far exceeding VARCHAR’s maximum 32 KB — the optimiser conservatively typed the intermediate column as LONGTEXT.

3.3 The Disk Sort Disaster

In version 8.6.2, a sort operation materialises all projection columns. For a LONGTEXT column, the engine pre‑allocates memory based on the maximum possible length of 64 MB per row. With 200,000 rows, that equates to 200,000 × 64 MB ≈ 12.2 TB. Memory cannot hold that, so the data is spilled to disk, producing the observed 10 TB+ of temporary sort files on node3, sustaining >900 MB/s writes for hours.

3.4 Why Only Node3?

The main table was randomly distributed. The query’s GROUP BY columns were clttime (low cardinality) and cell_id (high cardinality). During hash redistribution, the first column was chosen as the distribution key, concentrating all intermediate data on a single node. Placing cell_id first or enabling multi‑column hash redistribution would avoid such skew.

4. Solutions

Option 1: Rewrite the SQL (Immediate Fix)

Wrap the group_concat call with a substr to cap the expected output width. The optimiser will then type the intermediate column as VARCHAR, eliminating the pre‑allocation problem entirely.

substr(group_concat(xxx), 0, 1000)

The customer applied this change; the same query finished within 30 seconds.

Option 2: Use a Hint to Override the Parameter Per‑Query

GBase 8a supports hints that temporarily set session‑level parameters for a single statement.

select /*+group_concat_max_len(3000)*/ ...

Option 3: Upgrade to Version 9.5

Version 9.5 improves the materialisation strategy — memory is no longer allocated based on the maximum theoretical column size, but adaptively based on actual data, preventing this entire class of problems.

5. Key Takeaways

A global parameter adjustment can trigger a hidden cascade: “parameter → column‑width estimation → materialisation pre‑allocation → massive disk spill → node‑wide I/O starvation.” When a single parameter cannot satisfy all workloads, use statement‑level hints to give critical queries their own safe configuration, rather than applying a global value that may silently cripple other operations. In a gbase database, understanding how the optimiser interprets parameters is just as important as tuning the parameters themselves.

High-Performance SFTP Loading in GBase 8a: New Parameter Configuration

Michael — Tue, 30 Jun 2026 15:57:23 +0000

In environments with strict security requirements, SFTP is the most common protocol for loading data into a gbase database cluster. The latest versions of GBase 8a introduce configurable parameters that can significantly boost SFTP loading throughput. This guide explains how to assess your current load speed and apply the new parameters for optimization.

When to Consider Tuning SFTP Load Performance

If your current loading speed already meets business requirements, keep the default settings. When speed is lower than expected, first check the per‑node load rates via the information_schema.load_status table. Compare the aggregate cluster load speed against the raw SFTP transfer speed (e.g., a simple sftp get test on a single node). If the cluster’s total load speed is well below 30% of the raw SFTP speed, the parameters described below may help.

Key Parameters

`gbase_use_ssh_sftp`

Default: 0 (legacy SFTP loading method)
1: Enable the new high‑performance loading mechanism
Scope: Session‑level

SET gbase_use_ssh_sftp = 1;

`gbase_use_ssh_sftp_algorithm`

Default: Empty (auto‑negotiation of encryption algorithm)
Usage: Specify the encryption algorithm(s) for SFTP transfer, separated by commas
Scope: Session‑level

SET gbase_use_ssh_sftp_algorithm = 'aes128-ctr';

Important: The chosen algorithm must be supported by the SSH daemon on the data source side, otherwise the connection will fail. Select a faster algorithm from the list of those supported by the server’s sshd.

Best Practices

Validate the performance gain in a test environment or via session‑level settings first.
Once the improvement is confirmed, write the parameters into the configuration file for global effect.
The new method will increase CPU, network, and disk consumption; ensure sufficient resources.
An incorrectly configured encryption algorithm will cause SFTP negotiation to fail immediately — choose carefully.

By applying these settings in your gbase database environment, you can overcome SFTP loading bottlenecks while maintaining secure data transfer. Always test first and monitor resource usage after rolling out to production.

Scaling the GBase 8a gcware Management Cluster: Adding and Removing Nodes

Michael — Tue, 30 Jun 2026 15:54:29 +0000

The gcware component in GBase 8a is responsible for cluster state management and consistency arbitration. It's typically deployed on an odd number of nodes (3 or 5). While the number of management nodes rarely changes during a cluster's lifetime, there are scenarios — migrating from a trial to a full production environment, significant data growth, online hardware replacement, or cluster downsizing — that require expanding or shrinking the gcware cluster. This article demonstrates the commands and procedures for scaling gcware in your gbase database environment.

Choosing the Number of Management Nodes

Management nodes do not store user data; they handle coordination. Always deploy an odd number (3 or 5) to maintain quorum. For small clusters (fewer than 20 nodes with 1+1 replica), 3 gcware nodes suffice. For larger clusters (over 50 nodes or 3‑replica setups), 5 nodes are recommended.

When to Scale gcware

Moving from a pilot (1–2 nodes) to a full production deployment.
Significant business growth requiring more management capacity.
Replacing aging hardware: add new gcware nodes, let data naturally redistribute, then decommission the old ones.
Reducing cluster footprint when business shrinks.

Version Restrictions

Not all GBase 8a versions support gcware scaling. Check with GBASE support whether your version allows it, or request a version that does. Manual workarounds are not recommended for production.

Expanding gcware (Single Node → Three Nodes)

Initial state: a single gcware node at 10.0.2.151. We'll add 10.0.2.152 and 10.0.2.153.

Steps:

Stop all management, coordinator, and data node services.
Navigate to the gcware/gcware_server directory under the installation prefix and run:

./gcserver.py -e \
    --prefix=/opt/gbase \
    --host=10.0.2.152,10.0.2.153 \
    --dbaPwd=111111

After confirmation, the script automatically expands the cluster and restarts the services.

Result: gcadmin now shows three gcware nodes, all with status OPEN.

Shrinking gcware (Three Nodes → Two Nodes)

Important: The shrink command must not be executed on the node being removed. Run it from a remaining node, using the -u flag.

python gcserver.py -u \
    --prefix=/opt/gbase \
    --host=10.0.2.151 \
    --dbaPwd=111111

After confirmation, the script removes the specified node.

Result: gcadmin shows the gcware cluster reduced to two nodes, still functional.

Summary

Scaling the gcware management cluster is rare but can be a necessity. GBase 8a provides scripted, automated operations in select versions, eliminating the need for error‑prone manual steps. If your environment requires this capability, verify version support and plan the change window accordingly, keeping your gbase database cluster stable throughout the process.

GBase 8a Cluster Installation in Practice: From Environment Setup to Health Checks

Michael — Tue, 23 Jun 2026 15:44:00 +0000

The success of a GBase 8a cluster installation often hinges not on the install commands themselves, but on the pre‑installation environment preparation and post‑installation validation. This guide focuses on critical prerequisites — networking, SSH, system limits, and firewall settings — and walks through a verified workflow for verifying cluster state and distribution configuration.

1. Node Planning and Component Roles

A GBase 8a cluster consists of three component types: gcware (management nodes, 3 or 5 recommended), gcluster (coordinators), and gnode (data nodes). Plan the roles of each node before starting. Here is a sample 3‑node layout:

198.51.100.21  management + coordinator + data node
198.51.100.22  management + coordinator + data node
198.51.100.23  management + coordinator + data node

2. System Prerequisites to Address Before Installation

Nodes must use static IPs, have full network connectivity between them, and have hostname resolution properly configured.

Network and SSH Checks

# Node connectivity
ping 198.51.100.22
ping 198.51.100.23

# SSH connectivity
ssh root@198.51.100.22
ssh root@198.51.100.23

Firewall and SELinux Checks

systemctl status firewalld
sestatus

3. Handling Non‑Default SSH Ports

If SSH is not running on port 22, specify the custom port either through user‑level SSH configuration or the installation options file.

Option A: User‑level SSH config

cat ~/.ssh/config
Host 198.51.100.22 198.51.100.23
    Port 22022

Option B: Install options file

# install.options
sshPort = 22022

4. Adjust ulimit and systemd Limits Early

Insufficient file handles or process limits will cause instability under concurrency and batch workloads. Address this across systemd, profile, and limits simultaneously.

# /etc/systemd/system.conf
DefaultLimitNOFILE=655350
DefaultLimitNPROC=655350
systemctl daemon-reexec
systemctl restart sshd

# Also update /etc/profile and /etc/security/limits.conf with appropriate nofile settings

5. Key Installation Steps

Create the operating system user and directories

useradd gbaseadm
passwd gbaseadm
mkdir -p /data/gbase8a
chown -R gbaseadm:gbaseadm /data/gbase8a
chown gbaseadm:gbaseadm /tmp

Extract the package and run the environment setup script

cd /data
tar xfj GBase8a_MPP_Cluster-NoLicense-FREE-9.5.3-demo-redhat7-x86_64.tar.bz2
python SetSysEnv.py --dbaUser=gbaseadm --installPrefix=/data/gbase8a --cgroup

Write the installation configuration file install.options, specifying the install directory, coordinator hosts, data hosts, management hosts, user credentials, and SSH port.
Run the silent installation

./gcinstall.py --silent=install.options

6. Validate Cluster Health Immediately After Installation

A completed install script does not guarantee a healthy cluster. Always run gcadmin to verify that the cluster state is ACTIVE and that all gcware, coordinator, and data node roles show OPEN.

7. Configure and Verify Distribution Settings

Prepare a distribution XML file, apply it with gcadmin distribution, and inspect the result with gcadmin showdistribution node. This step directly determines how data is placed across nodes and how the load is balanced.

8. Parameter Tuning Recommendations

Avoid changing many parameters at once. Prioritise based on symptom category: for connection and timeout issues, check max_connections and connect_timeout first; for concurrency and thread pool pressure, look at gbase_parallel_degree; for loading bottlenecks, examine gcluster_loader_max_data_processors.

A smooth GBase 8a installation relies on getting the basics right before running any installer. When the network, SSH, system limits, and cluster health checks are all solid, the rest of the gbase database operations become far more predictable.

DEV Community: Michael

Pure Shell Script to Expand IPv6 Addresses

Features

The Script

Running the Script

Who Decides the SSH Cipher? Why the Client's Preference Matters

What the Standards Say

How It Plays Out

Practical Impact for GBase 8a

Quick Diagnostic Commands for GBase 8a Cluster Glitches

1. Cluster Health Check

2. Find the Longest‑Running SQL on Coordinators

3. Pinpoint the Straggling Data Node

4. Log into the Suspicious Node and Inspect

4.1 Operating System Errors

4.2 Disk I/O Saturation

4.3 Memory and SWAP Pressure

5. Check Data Loading Throughput

Demystifying SSH with -vvv: How GBase 8a's Underlying Connection Negotiates Encryption

Breaking Down the Connection

1. The Handshake: TCP Connection

2. Identity File Scan

3. Protocol Version Agreement

4. Cipher Suite Negotiation (The Critical Part)

5. Host Authentication

6. Session Key Generation

7. User Authentication

What This Means for Your GBase 8a Cluster

Database Industry Trends and Technical Observations

Database Industry Trends and Technical Observations

Overview

Key Industry Trends

Global Market Growth

China's Database Ecosystem

Technical Trends

Multi-Model Convergence

Cloud-Native Architecture

AI Integration

Conclusion

GBase 8s Kernel Evolution: Expressing User Understanding Through Product

GBase 8s Kernel Evolution: Expressing User Understanding Through Product

The Core Philosophy

Key Improvements

Conclusion

GBase 8a: A Decade of Iteration, Empowering Digital Business

GBase 8a: A Decade of Iteration, Empowering Digital Business

The Evolution Path

Introducing DataAgent: The Intelligent Agent

Real-World Deployments

Core Value in Digital Transformation

How GBase 8a Handles Masked Columns in WHERE, GROUP BY, and Projections

Test Setup

Key Tests and Findings

1. Projection Returns Masked Values

2. WHERE Filter Uses Original Values

3. GROUP BY Uses Original Values

4. ORDER BY Uses Original Values

5. Data Migration Writes Masked Values

Summary

GBase 8a Performance Anomaly Case Study: How a Single Parameter Change Sparked a Chain Reaction

1. Symptom: One Node’s I/O Pegged at 100%

2. Pinpointing the Culprit Query and Intermediate Results

3. Root Cause: A Parameter Setting Inflated Column Width, Then Disk Usage

3.1 Find the “Heavy” Column

3.2 Why LONGTEXT?

3.3 The Disk Sort Disaster

3.4 Why Only Node3?

4. Solutions

Option 1: Rewrite the SQL (Immediate Fix)

Option 2: Use a Hint to Override the Parameter Per‑Query

Option 3: Upgrade to Version 9.5

5. Key Takeaways

High-Performance SFTP Loading in GBase 8a: New Parameter Configuration

When to Consider Tuning SFTP Load Performance

Key Parameters

gbase_use_ssh_sftp

gbase_use_ssh_sftp_algorithm

Best Practices

Scaling the GBase 8a gcware Management Cluster: Adding and Removing Nodes

Choosing the Number of Management Nodes

Option 1: Rewrite the SQL (Immediate Fix)

Option 2: Use a Hint to Override the Parameter Per‑Query

Option 3: Upgrade to Version 9.5

`gbase_use_ssh_sftp`

`gbase_use_ssh_sftp_algorithm`