DEV Community

Michael
Michael

Posted on • Originally published at gbase.cn

3 General Methods to Optimize Query Performance in GBase 8a

Performance tuning for a GBase 8a MPP cluster — whether for queries, loading, or updates — falls into three categories, ranked by impact:

  1. Optimize business SQL
  2. Tune database parameters
  3. Add more hardware

This article draws on real‑world experience to break down actionable optimization points in each category for the China‑domestically developed GBase 8a from GBASE.

Optimize Business SQL

This category includes the SQL itself, table structures, and data organization.

Table Structure Tuning

Hash‑Distributed Tables

A well‑chosen hash distribution column avoids extra data redistribution across nodes during GROUP or JOIN operations. If the required data is already local, it can be processed in place. When the hash column is wrong, the engine performs a dynamic hash redistribution — creating temporary tables and moving data over the network — incurring additional disk I/O and network overhead.

Recommendations:

  • Pick columns used in GROUP or JOIN
  • Choose columns with many unique values, avoiding highly repetitive ones

If multiple business lines are equally important, consider creating separate tables with different hash columns. Starting from GBase 8a V9, multi‑column hash is supported, which helps mitigate data skew in single‑column distribution.

Replicated Tables

For tables with a small row count (say, under 1,000,000) or those that change very rarely, a replicated table is ideal. Each node keeps a full copy, so JOINs avoid any redistribution. High‑concurrency small tables also benefit from replication when combined with connection load balancing.

Hash Index

On tables with 10,000,000+ rows per data node, if you have precise (equality) queries against low‑cardinality columns, consider a Global Hash Index. Be aware it only works for exact matches, not range scans or fuzzy searches.

Row‑Store Columns

For queries that return a large number of columns (e.g., detail records), the row‑store grouped format can reduce disk I/O and improve performance.

Full‑Text Index

If you have frequent LIKE operations with patterns longer than 3 characters, a full‑text index is worth evaluating. Very short patterns like '%138%' may need real‑world testing to confirm benefits.

SQL Writing Tips

  • Avoid SELECT *; fetch only the columns you actually need.
  • Remove unnecessary ORDER BY in inner queries. On the outermost level, add LIMIT where possible.
  • Prevent Cartesian products by minding JOIN order and predicates.
  • For GROUP or JOIN with multiple columns and no suitable hash key, place the column with the most distinct values first (e.g., GROUP BY phonenumber, gender).

Workload Scheduling

  • Bulk loading: Merge small files into larger ones (ideally 1–10 GB) before loading to reduce coordination overhead.
  • Concurrency control: Manage connection pool sizes, cron schedules, and task priorities. Heavy jobs should run during off‑peak hours.

Tune Database Parameters

Concurrency Control

High concurrency does not guarantee high throughput. When resources are limited, over‑subscription leads to CPU sys spikes or disks sitting at 100% busy with low throughput.

  • Cluster‑level: Use resource management to cap the number of concurrently running SQL statements; excess queries queue.
  • Per‑query parallelism: Under high concurrency, reduce internal parallelism to relieve resource contention. Key parameters include gbase_parallel_degree and gbase_loader_parallel_degree.

Memory Control

Insufficient memory under heavy concurrency causes out‑of‑memory errors or massive swapping that tanks performance. Adjust memory parameters according to your actual hardware, referring to the common memory configuration guide for GBase 8a.

Add More Hardware

Better hardware provides the ultimate headroom. Use these techniques to allocate resources precisely:

  • Tablespaces: Place critical tables on high‑performance storage like SSDs or flash cards.
  • Virtual Clusters: Leverage physical isolation to dedicate hardware to different business priorities and SLAs.
  • Multi‑Instance Deployment: For servers with very high core counts (e.g., ARM platforms) and large memory (>512 GB), deploy multiple instances with NUMA binding to fully utilize CPU and memory resources.

Balancing these three categories — application‑level optimization, database configuration, and hardware scaling — lets you systematically remove bottlenecks and keep your gbase database performing reliably at scale.

Top comments (0)