Lock conflicts and deadlocks are frequent culprits behind business disruptions in GBase 8c distributed clusters. Unlike single-node databases, GBase 8c introduces both local locks and cross-node global locks, making diagnosis more challenging. This guide provides a field-tested workflow covering lock mechanisms, common causes, diagnosis steps, and practical solutions.
1. Core Lock Mechanisms in GBase 8c
1.1 Lock Scope
- Local locks: Confined to a single CN or DN node (e.g., concurrent transactions on one DN).
- Global locks: Span multiple nodes, typical in distributed transactions and DDL operations.
1.2 Common Lock Modes and Conflicts
| Mode | Purpose | Conflict Scenarios |
|---|---|---|
| ShareLock (S) | Shared read lock | Conflicts with X locks |
| ExclusiveLock (X) | Exclusive write lock | Conflicts with all other lock modes |
| AccessExclusiveLock (AEL) | DDL operations (e.g., TRUNCATE) | Blocks all reads and writes |
| Intent Locks (IS/IX) | Table-level intent, coordinates with row-level locks | Ensures table/row lock compatibility |
1.3 Deadlock Types
- Local deadlock: Occurs within a single node, automatically detected and resolved by GBase 8c.
- Global deadlock: Cross-node circular wait; may require manual intervention if auto-resolution fails.
2. Five Common Causes of Lock Issues
- Poor transaction design: Long-running transactions, inconsistent multi-table operation order.
- Ineffective SQL: Missing WHERE clause causing full-table lock, index failure expanding lock scope, DDL during peak hours.
-
Stuck two-phase transactions: Distributed transactions stuck in
preparedstate holding locks. -
Improper cluster parameters: e.g.,
lock_wait_timeouttoo short/long, insufficient thread pool. - Node communication anomalies: Network delays or partitions triggering global deadlocks.
3. Diagnostic Workflow (4 Steps)
3.1 Step 1: Locate Blocked Sessions
-- Find all blocked lock requests
SELECT
pg_locks.pid,
pg_class.relname AS table_name,
pg_locks.mode,
pg_locks.granted,
pg_stat_activity.query,
pg_stat_activity.state
FROM pg_locks
LEFT JOIN pg_class ON pg_locks.relation = pg_class.oid
LEFT JOIN pg_stat_activity ON pg_locks.pid = pg_stat_activity.pid
WHERE pg_locks.granted = false;
Also check for stranded two-phase transactions:
SELECT gxid, status, start_time
FROM pg_distributed_transactions
WHERE status = 'prepared';
3.2 Step 2: Analyze the Lock Wait Chain
WITH lock_chain AS (
SELECT
pg_locks.pid AS wait_pid,
pg_locks.relation,
pg_locks.mode AS wait_mode,
pg_stat_activity.query AS wait_query,
(SELECT pid FROM pg_locks l2 WHERE l2.relation = pg_locks.relation AND l2.granted = true AND l2.mode = 'ExclusiveLock') AS hold_pid
FROM pg_locks
LEFT JOIN pg_stat_activity ON pg_locks.pid = pg_stat_activity.pid
WHERE pg_locks.granted = false
)
SELECT
wait_pid AS blocked_pid,
pg_class.relname AS table_name,
wait_mode,
wait_query,
hold_pid AS holding_pid,
(SELECT query FROM pg_stat_activity WHERE pid = hold_pid) AS holding_query
FROM lock_chain
LEFT JOIN pg_class ON lock_chain.relation = pg_class.oid;
3.3 Step 3: Distinguish Local vs. Global Deadlock
-
Local: Check node logs
grep -i "deadlock" /opt/gbase/data/log/postgresql-*.log. -
Global: Check all CN/DN logs, and query
pg_distributed_transactionsforpreparedtransactions.
3.4 Step 4: Verify Root Cause
- Use
EXPLAIN ANALYZEto check index usage. - Identify transactions running longer than 5 minutes.
- Test node connectivity with
pingandtelnet.
4. Solutions
4.1 Optimise Transaction Design
- Break long transactions; avoid external API calls inside transactions.
- Enforce consistent object order across all transactions (e.g., always update
userbeforeorder).
4.2 Refine SQL and Indexes
-- Add index to prevent full-table scan
CREATE INDEX idx_user_phone ON user(phone);
-- Schedule DDL off-peak, check lock status beforehand
4.3 Clean Up Stuck Prepared Transactions
SELECT pg_terminate_backend(gxid)
FROM pg_distributed_transactions
WHERE status = 'prepared' AND start_time < now() - INTERVAL '1 hour';
4.4 Tune Cluster Parameters
ALTER SYSTEM SET lock_wait_timeout = '10s';
ALTER SYSTEM SET thread_pool_attr = '512, 1, (cpubind:0-3)';
ALTER SYSTEM SET thread_pool_stream_attr = '512, 0.2, 1, (cpubind:0-3)';
-- Apply with SELECT pg_reload_conf(); or restart if necessary.
4.5 Ensure Stable Node Communication
Enable heartbeat checks and adjust inter‑node timeout settings.
5. Real‑World Case Study
Symptom: During peak hours, many queries failed with lock wait timeout, and multiple backend threads hung in waiting state.
Diagnosis: Global lock query revealed two UPDATE transactions in a circular wait — PID 1234 held an X lock on user and waited for order, while PID 5678 held an X lock on order and waited for user. Two prepared transactions also existed.
Resolution:
- Manually terminated the stuck prepared transactions.
- Standardised transaction order across all applications (user → order).
- Adjusted
lock_wait_timeoutto 10s and optimised thread pool settings. - Repaired network latency and enabled node heartbeat detection.
Result: Lock blocks cleared, API response times returned to normal, and no new deadlocks occurred.
6. Common Pitfalls and Best Practices
- Never blindly kill sessions; first identify the lock holder.
- Regularly clean up long‑running
preparedtransactions. - Global deadlocks often need manual intervention.
- Execute DDL only during maintenance windows.
- Periodically verify index effectiveness and query plans.
- Tune lock and thread pool parameters based on actual cluster load.
Effective lock troubleshooting in a gbase database demands a systematic approach — pinpoint the root cause, apply targeted fixes, and establish continuous monitoring to prevent recurrence. Use this guide to keep your GBASE clusters stable and responsive.
Top comments (0)