Michael

Posted on May 2 • Originally published at gbase.cn

Inside GBase 8a Data Consistency: ddlevent, dmlevent, dmlstorageevent

#gbase #database #数据库

In a GBase 8a distributed cluster, data consistency between primary and standby nodes is critical for service stability. Three event types — ddlevent, dmlevent, and dmlstorageevent — act as the core messengers for synchronization and recovery. Together with the gcrecover process, they form a layered assurance system. This article analyzes their roles, triggers, and collaborative logic.

1. Core Positioning: Essential Differences

All three events are triggered by the GBase 8a kernel to record differences between primary and standby nodes, but they focus on entirely different inconsistency scenarios and processing priorities:

Event Type	Core Purpose	Inconsistency Focus	Data Carrier
ddlevent	Metadata sync & repair	Table structure, indexes, partition differences	DDL operation logs
dmlevent	Business data sync & repair	Row data differences from INSERT/UPDATE/DELETE	DML operation records
dmlstorageevent	Storage-level emergency repair	Storage anomalies dmlevent cannot fix	Data/metadata file repair instructions

2. Trigger Mechanism: Layered and Progressive

Event triggering follows a progressive principle: handle regular data differences first, then extreme storage failures.

dmlevent: DML-Induced Data Differences

The most frequently triggered event, addressing business data inconsistency:

Log sync delays or network jitter causing standby data lag;
Standby node coming back online after temporary disconnection;
Row-level data hash mismatches during primary-standby validation.

After triggering, it records the full DML operation context, which gcrecover replays on the standby.

ddlevent: DDL-Induced Metadata Differences

Handles structural differences from DDL. Its priority is higher than dmlevent — metadata inconsistency prevents DML execution:

Standby metadata not synced after ALTER TABLE or CREATE INDEX;
Corrupted standby metadata files;
Metadata differences after cluster expansion.

dmlstorageevent: The Last Line of Defense

Automatically triggers when dmlevent fails, addressing storage-layer faults:

Standby table physical files corrupted or deleted;
Standby metadata files unreadable;
Data block checksum failures during dmlevent replay.

Once triggered, the system skips regular DML replay and performs storage-level repair, such as full sync of table files from the primary.

3. Collaborative Logic: How gcrecover Schedules Events

gcrecover is the executor. Its scheduling logic has three steps:

Collection and Sorting: Events are sorted by ddlevent first > timestamp ascending;
Differentiated Repair:
- ddlevent: Replay DDL on standby, or pull complete metadata files from primary;
- dmlevent: Locate target rows by primary key and execute corresponding DML;
- dmlstorageevent: Validate storage status; if tables are missing, perform a full sync; if files are corrupted, perform file-level repair;
Result Feedback: Failed events enter a retry queue. After 3 failed retries, an alert is triggered.

4. Operational Practice: Monitoring and Diagnosis

1. Check the Event Queue

SELECT event_type, event_status, create_time, target_table 
FROM gbase.gcluster_event_queue 
WHERE event_status = 'PENDING';

2. Identify Failure Causes

SELECT error_msg, event_content 
FROM gbase.gcluster_event_queue 
WHERE event_status = 'FAILED' AND event_type = 'dmlstorageevent';

Common failures: full standby disk, network interruption, insufficient table permissions.

3. Manually Trigger Event Repair

CALL gbase.gcrecover_event_process('ddlevent');

5. Key Takeaways

Layered assurance: ddlevent protects structure, dmlevent protects data, dmlstorageevent protects storage;
Clear prioritization: Metadata first, regular before extreme;
Ops focus: Prioritize monitoring FAILED dmlstorageevents, which often point to hardware faults or severe metadata issues.

Understanding how these three event types work is fundamental to quickly diagnosing primary-standby inconsistencies and keeping a GBase 8a gbase database running reliably.

DEV Community