<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Michael</title>
    <description>The latest articles on DEV Community by Michael (@michaelfv).</description>
    <link>https://dev.to/michaelfv</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3830930%2F34d5c2f8-f162-4df3-865b-34a96a64ac17.png</url>
      <title>DEV Community: Michael</title>
      <link>https://dev.to/michaelfv</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/michaelfv"/>
    <language>en</language>
    <item>
      <title>How GBase 8a Rough Index Works: Block‑Level Pruning for 10x Faster Queries</title>
      <dc:creator>Michael</dc:creator>
      <pubDate>Wed, 20 May 2026 14:53:00 +0000</pubDate>
      <link>https://dev.to/michaelfv/how-gbase-8a-rough-index-works-block-level-pruning-for-10x-faster-queries-2aln</link>
      <guid>https://dev.to/michaelfv/how-gbase-8a-rough-index-works-block-level-pruning-for-10x-faster-queries-2aln</guid>
      <description>&lt;p&gt;The Rough Index is a lightweight, built‑in indexing mechanism of GBase 8a's columnar storage engine. Instead of tracking exact row positions, it records the minimum and maximum value of a column for each data block. When a query runs, the optimizer scans the Rough Index and skips blocks that cannot possibly contain matching rows — drastically cutting disk I/O in a &lt;strong&gt;gbase database&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Rough Index Works
&lt;/h2&gt;

&lt;p&gt;Imagine an &lt;code&gt;amount&lt;/code&gt; column split into 1,000 blocks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Block 1: min 10, max 500&lt;/li&gt;
&lt;li&gt;Block 2: min 501, max 1,200&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a query like &lt;code&gt;WHERE amount &amp;gt; 2000&lt;/code&gt;, the engine skips every block whose max value is ≤ 2000. This &lt;strong&gt;block pruning&lt;/strong&gt; can eliminate over 90% of I/O when data is well‑ordered.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verifying That Rough Index Is Active
&lt;/h2&gt;

&lt;p&gt;Use &lt;code&gt;EXPLAIN&lt;/code&gt; to inspect the plan:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;create_time&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2024-01-01'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you see &lt;code&gt;Rough Index Scan&lt;/code&gt; or the scanned block count is far lower than the total, the index is working. You can also query the system view:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;information_schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GBASE_ROUGH_INDEX&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'orders'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Three Tips to Maximize Rough Index Performance
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Keep data ordered during ingestion&lt;/strong&gt;: Sorting by frequently filtered columns (e.g., time) before loading shrinks the min‑max range inside each block.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manually rebuild after heavy updates&lt;/strong&gt;: Run &lt;code&gt;ALTER TABLE orders REBUILD ROUGH INDEX;&lt;/code&gt; after large‑scale modifications or out‑of‑order inserts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prefer range predicates in queries&lt;/strong&gt;: &lt;code&gt;BETWEEN&lt;/code&gt;, &lt;code&gt;&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;&lt;/code&gt; deliver the best pruning. &lt;code&gt;IN&lt;/code&gt; lists and &lt;code&gt;!=&lt;/code&gt; are less effective.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The Rough Index is a core performance accelerator of GBASE's columnar engine. Building your data model and load routines around it can yield dramatic speed improvements without any additional hardware in your &lt;strong&gt;gbase database&lt;/strong&gt;.&lt;/p&gt;

</description>
      <category>gbase</category>
      <category>database</category>
      <category>数据库</category>
      <category>performance</category>
    </item>
    <item>
      <title>Building a Rock‑Solid HA Strategy for GBase 8a: From Cluster Architecture and Failover to Consistency Repair</title>
      <dc:creator>Michael</dc:creator>
      <pubDate>Wed, 20 May 2026 13:45:00 +0000</pubDate>
      <link>https://dev.to/michaelfv/building-a-rock-solid-ha-strategy-for-gbase-8a-from-cluster-architecture-and-failover-to-56af</link>
      <guid>https://dev.to/michaelfv/building-a-rock-solid-ha-strategy-for-gbase-8a-from-cluster-architecture-and-failover-to-56af</guid>
      <description>&lt;p&gt;What separates a reliable GBase 8a deployment from a fragile one isn't how fast a single query runs — it's whether the system keeps serving after a failure, how quickly it recovers, and whether the data stays consistent afterward. This is where high availability (HA) design earns its keep.&lt;/p&gt;

&lt;p&gt;GBase 8a, as a distributed analytical database, structures its HA into three layers: &lt;strong&gt;cluster‑level&lt;/strong&gt;, &lt;strong&gt;node‑level&lt;/strong&gt;, and &lt;strong&gt;process‑level&lt;/strong&gt;. Cluster‑level HA relies on data sync tools and mirror clusters; node‑level HA revolves around Gcluster, Gnode, and Gcware nodes; process‑level HA depends on real‑time monitoring and auto‑recovery of core services.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The Three HA Layers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Primary Goal&lt;/th&gt;
&lt;th&gt;Core Capability&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cluster‑level&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Survive entire‑cluster failure&lt;/td&gt;
&lt;td&gt;Inter‑cluster sync, mirror clusters&lt;/td&gt;
&lt;td&gt;Remote DR, intra‑city active‑active, read/write split&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Node‑level&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Survive single‑node failure&lt;/td&gt;
&lt;td&gt;Gcluster Failover, Gnode multi‑replica, Gcware Raft&lt;/td&gt;
&lt;td&gt;Server crashes, partial node anomalies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Process‑level&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Survive service‑process crash&lt;/td&gt;
&lt;td&gt;Process monitoring, auto‑recovery&lt;/td&gt;
&lt;td&gt;Transient faults, self‑healing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This layering directly determines your strategy. If your concern is "a machine goes down but business must continue," focus on node‑level HA. If it's "a whole data center fails and we must switch to another," that's cluster‑level HA.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Cluster‑Level HA: Disaster Recovery vs. Active‑Active
&lt;/h2&gt;

&lt;p&gt;GBase 8a offers two paths at the cluster level: &lt;strong&gt;inter‑cluster sync&lt;/strong&gt; and &lt;strong&gt;mirror clusters&lt;/strong&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Sync Mode&lt;/th&gt;
&lt;th&gt;Typical Scenario&lt;/th&gt;
&lt;th&gt;Characteristic&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Inter‑cluster Sync&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Incremental&lt;/td&gt;
&lt;td&gt;Remote DR, T+1 reporting, cascading sync&lt;/td&gt;
&lt;td&gt;Async, DR‑oriented&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Mirror Cluster&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Real‑time&lt;/td&gt;
&lt;td&gt;Intra‑city active‑active, failover, read/write split&lt;/td&gt;
&lt;td&gt;Real‑time, business‑continuity‑oriented&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The data sync tool supports incremental sync between two homogeneous GBase 8a clusters based on data blocks rather than traditional log replay — far more efficient for massive data volumes. Mirror clusters synchronize data in real time; once the primary writes, data flows instantly to the backup cluster, transparent to applications, and supports read/write splitting on top.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to choose&lt;/strong&gt;: If the primary writes, the standby mainly reads, some sync delay is acceptable, and the focus is remote DR, go with inter‑cluster sync. If the standby must be readable almost immediately after writes, you want to offload read traffic, and smooth intra‑city failover is critical, mirror clusters are the better fit.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Node‑Level HA: The Insurance That Fires Most Often
&lt;/h2&gt;

&lt;p&gt;GBase 8a has three node types — &lt;strong&gt;Gcluster&lt;/strong&gt; (scheduling), &lt;strong&gt;Gnode&lt;/strong&gt; (storage &amp;amp; compute), &lt;strong&gt;Gcware&lt;/strong&gt; (management) — and their HA logic differs.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.1 Gcluster: Don't Let the Entry Point Become a Single Point
&lt;/h3&gt;

&lt;p&gt;Gcluster handles access, authentication, SQL parsing, and scheduling. Gcluster nodes are independent and support Failover: when one node fails, others take over its in‑flight tasks. As long as one healthy Gcluster node remains, the cluster stays online. The real risk is not Gcluster itself, but connecting applications that always point to a single address.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.2 Gnode: Replica Count Determines Fault Tolerance
&lt;/h3&gt;

&lt;p&gt;Gnode stores data and runs computations. Its HA relies on multi‑replica mechanisms. With &lt;strong&gt;3 replicas&lt;/strong&gt;, each piece of data has three copies on different Gnode nodes; even if two nodes become unavailable, the remaining replica still provides access.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Replica Count&lt;/th&gt;
&lt;th&gt;Availability&lt;/th&gt;
&lt;th&gt;Risk Profile&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Almost no node‑level fault tolerance&lt;/td&gt;
&lt;td&gt;Node failure = data unavailable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Some redundancy&lt;/td&gt;
&lt;td&gt;Recovery and consistency pressure higher in edge cases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Production‑grade&lt;/td&gt;
&lt;td&gt;Stronger node‑level fault tolerance&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;"Three replicas are safer" doesn't mean "the cluster can lose any two machines casually." Actual availability also depends on replica placement, hot‑spot data, Gcware state, and node topology.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.3 Gcware: The Arbitration and Consistency Core
&lt;/h3&gt;

&lt;p&gt;Gcware manages cluster metadata consistency and data consistency, using the &lt;strong&gt;Raft protocol&lt;/strong&gt;. As long as the surviving Gcware nodes satisfy Raft's minimum quorum, the Gcware cluster continues to function.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Gcware Nodes&lt;/th&gt;
&lt;th&gt;Recommendation&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Not for production&lt;/td&gt;
&lt;td&gt;Obvious single point&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Generally discouraged&lt;/td&gt;
&lt;td&gt;Too little quorum margin&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Commonly recommended&lt;/td&gt;
&lt;td&gt;Good balance of cost and availability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;For higher availability requirements&lt;/td&gt;
&lt;td&gt;More stable, higher cost&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  4. Process‑Level HA: Small Faults That Shouldn't Escalate
&lt;/h2&gt;

&lt;p&gt;Core GBase 8a processes (GNode, GCluster, GCware, etc.) are continuously monitored and can auto‑recover after failure. A practical daily check:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ps &lt;span class="nt"&gt;-ef&lt;/span&gt; | egrep &lt;span class="s1"&gt;'gcware|gcluster|gnode'&lt;/span&gt;
gcadmin
&lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-100&lt;/span&gt; /opt/gbase/gcluster/log/system.log
&lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-100&lt;/span&gt; /opt/gbase/gcware/log/gcware.log
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  5. Primary‑Replica Inconsistency: The Hardest HA Problem
&lt;/h2&gt;

&lt;p&gt;A node crash is usually detected fast, but replica inconsistency can lurk while the cluster still appears operational — producing drifting results and subtle anomalies. Common causes: inconsistent local parameters, power loss or kernel panic, RAID controller or driver anomalies, VM abnormal exit, manual mistakes (e.g., deleting events during a node outage).&lt;/p&gt;

&lt;p&gt;GBase 8a uses direct I/O for writes; it considers a write successful only when the return confirms it. But if the underlying environment fails, a "successful" write may not have reached physical disk. The lesson: don't blame all consistency issues on database logic — hardware, virtualization, and host stability are integral parts of HA.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Key Parameter: gcluster_suffix_consistency_resolve
&lt;/h2&gt;

&lt;p&gt;GBase 8a provides the &lt;code&gt;gcluster_suffix_consistency_resolve&lt;/code&gt; parameter to handle primary‑replica inconsistency:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0 (default)&lt;/td&gt;
&lt;td&gt;Does not attempt automatic resolution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Tries to automatically resolve consistency issues&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This parameter supports both session and global scope. It can automatically detect and repair scenarios like row‑count mismatches, schema differences, and SCN inconsistencies across replicas. Before enabling in production, verify version support, confirm the cluster has at least 3 host nodes, and validate in a test environment.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="k"&gt;GLOBAL&lt;/span&gt; &lt;span class="n"&gt;gcluster_suffix_consistency_resolve&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  7. Parameter Consistency: The Most Overlooked Foundation
&lt;/h2&gt;

&lt;p&gt;Many teams obsess over replica counts, active‑active setups, and failover while neglecting the most basic layer: parameter consistency across nodes. Community documentation explicitly lists "parameter differences" as a common cause of replica inconsistency.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter Category&lt;/th&gt;
&lt;th&gt;Recommendation&lt;/th&gt;
&lt;th&gt;Reason&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Consistency‑related&lt;/td&gt;
&lt;td&gt;Uniform across all nodes&lt;/td&gt;
&lt;td&gt;Prevent replica behavior drift&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Resource limits&lt;/td&gt;
&lt;td&gt;Uniform across all nodes&lt;/td&gt;
&lt;td&gt;Avoid weak‑link nodes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Log levels&lt;/td&gt;
&lt;td&gt;Adjustable, but keep records&lt;/td&gt;
&lt;td&gt;Troubleshooting convenience&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Experimental params&lt;/td&gt;
&lt;td&gt;Test environment first&lt;/td&gt;
&lt;td&gt;Reduce production drift&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A quick baseline check:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;host &lt;span class="k"&gt;in &lt;/span&gt;203.0.113.41 203.0.113.42 203.0.113.43
&lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"===== &lt;/span&gt;&lt;span class="nv"&gt;$host&lt;/span&gt;&lt;span class="s2"&gt; ====="&lt;/span&gt;
  ssh &lt;span class="nv"&gt;$host&lt;/span&gt; &lt;span class="s2"&gt;"grep gcluster_suffix_consistency_resolve /opt/gbase/conf/* 2&amp;gt;/dev/null"&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  8. Read/Write Split Is Also Part of HA
&lt;/h2&gt;

&lt;p&gt;GBase 8a supports multiple read/write split approaches. Their value isn't just performance — they keep the standby side actively serving reads, so the standby isn't idle, and the switchover cost is lower when needed.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Granularity&lt;/th&gt;
&lt;th&gt;Primary Orientation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Replicated Table&lt;/td&gt;
&lt;td&gt;Table‑level&lt;/td&gt;
&lt;td&gt;Write‑once‑read‑many within a node&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mirror Cluster&lt;/td&gt;
&lt;td&gt;Cluster‑level&lt;/td&gt;
&lt;td&gt;Real‑time read/write split&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inter‑cluster Sync&lt;/td&gt;
&lt;td&gt;Cluster‑level&lt;/td&gt;
&lt;td&gt;Scheduled sync, DR‑style read/write split&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  9. Recommended Rollout Sequence
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Solidify node‑level HA first&lt;/strong&gt;: Multi‑entry Gcluster access, no single‑replica core data on Gnode, odd‑numbered Gcware nodes for quorum.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Establish parameter and configuration baselines&lt;/strong&gt;: Track all config changes, periodically compare key parameters across nodes, ensure temporary tweaks can be rolled back.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Then choose the cluster‑level path&lt;/strong&gt;: Remote DR with acceptable sync delay → inter‑cluster sync. Intra‑city active‑active with real‑time read/write split → mirror cluster.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Finally, run failover drills&lt;/strong&gt;: At minimum, cover Gcluster single‑point failure, Gnode replica node anomaly, Gcware node loss, and primary‑replica inconsistency detection &amp;amp; repair.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  10. Daily Inspection Template
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Cluster status&lt;/span&gt;
gcadmin

&lt;span class="c"&gt;# Key processes&lt;/span&gt;
ps &lt;span class="nt"&gt;-ef&lt;/span&gt; | egrep &lt;span class="s1"&gt;'gcware|gcluster|gnode'&lt;/span&gt;

&lt;span class="c"&gt;# Key logs&lt;/span&gt;
&lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-100&lt;/span&gt; /opt/gbase/gcluster/log/system.log
&lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-100&lt;/span&gt; /opt/gbase/gcware/log/gcware.log

&lt;span class="c"&gt;# Parameter consistency across nodes&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;host &lt;span class="k"&gt;in &lt;/span&gt;203.0.113.41 203.0.113.42 203.0.113.43
&lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"===== &lt;/span&gt;&lt;span class="nv"&gt;$host&lt;/span&gt;&lt;span class="s2"&gt; ====="&lt;/span&gt;
  ssh &lt;span class="nv"&gt;$host&lt;/span&gt; &lt;span class="s2"&gt;"grep gcluster_suffix_consistency_resolve /opt/gbase/conf/* 2&amp;gt;/dev/null"&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;GBase 8a HA isn't a single feature — it's a layered system. Looking up: inter‑cluster sync, mirror clusters, read/write split. Looking across: Gcluster, Gnode, Gcware three‑layer node fault tolerance. Looking down: process‑level self‑healing and parameter consistency control. The most stable designs aren't the most complex ones — they're the ones that solidify the node‑level foundation first, then layer on cluster‑level capabilities as the scenario demands.&lt;/p&gt;

&lt;p&gt;A well‑architected &lt;strong&gt;gbase database&lt;/strong&gt; HA strategy keeps your data available and consistent through failures both small and large — and that's what production maturity really looks like.&lt;/p&gt;

</description>
      <category>gbase</category>
      <category>database</category>
      <category>数据库</category>
      <category>consistency</category>
    </item>
    <item>
      <title>GBase 8a Data Sync in Practice: T+1 Replication, Real‑Time Mirroring, and Write‑Once‑Read‑Many</title>
      <dc:creator>Michael</dc:creator>
      <pubDate>Wed, 20 May 2026 12:20:34 +0000</pubDate>
      <link>https://dev.to/michaelfv/gbase-8a-data-sync-in-practice-t1-replication-real-time-mirroring-and-write-once-read-many-oh7</link>
      <guid>https://dev.to/michaelfv/gbase-8a-data-sync-in-practice-t1-replication-real-time-mirroring-and-write-once-read-many-oh7</guid>
      <description>&lt;p&gt;Data synchronization in GBase 8a isn't just "primary‑standby replication." Different business requirements — real‑time, disaster recovery, read/write splitting — lead to completely different technical paths. This article organizes three core approaches — &lt;strong&gt;mirror clusters&lt;/strong&gt;, &lt;strong&gt;inter‑cluster sync&lt;/strong&gt;, and &lt;strong&gt;replicated tables&lt;/strong&gt; — into a practical decision framework for your &lt;strong&gt;gbase database&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Sync Routes at a Glance
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Timeliness&lt;/th&gt;
&lt;th&gt;Granularity&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;th&gt;Characteristics&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Mirror Cluster&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Real‑time&lt;/td&gt;
&lt;td&gt;Table‑level&lt;/td&gt;
&lt;td&gt;Intra‑city active‑active, real‑time read/write split&lt;/td&gt;
&lt;td&gt;Real‑time sync between two clusters, business continuity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Inter‑Cluster Sync&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Scheduled / Incremental&lt;/td&gt;
&lt;td&gt;Table‑level&lt;/td&gt;
&lt;td&gt;Remote DR, T+1 reporting, cascading distribution&lt;/td&gt;
&lt;td&gt;Supports 1‑to‑many and cascading; delay tied to data volume&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Replicated Table&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Near real‑time within cluster&lt;/td&gt;
&lt;td&gt;Table‑level&lt;/td&gt;
&lt;td&gt;Local write‑once‑read‑many, hot table read scaling&lt;/td&gt;
&lt;td&gt;Every data node holds an identical copy&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The key to choosing isn't memorizing names — it's clarifying what problem you're solving. Want the standby cluster to serve queries in near real‑time? Look at mirror clusters first. Need periodic sync to a reporting or DR cluster? Inter‑cluster sync fits better. Just need dimension tables readable across all nodes? Replicated tables are enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mirror Clusters: Real‑Time Active‑Standby and Read/Write Splitting
&lt;/h2&gt;

&lt;p&gt;A mirror cluster aims to get data to the other side as quickly as possible so the standby can serve read traffic continuously — not just during failures. Think of it as &lt;strong&gt;table‑level real‑time mapping&lt;/strong&gt;. It suits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Standby must be queryable shortly after writes on the primary&lt;/li&gt;
&lt;li&gt;Intra‑city active‑active or near‑real‑time read/write split&lt;/li&gt;
&lt;li&gt;Tolerates only very small data lag&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the two clusters aren't in the same data center and the network is mediocre, forcing real‑time sync will cause constant instability. Cross‑region, bandwidth‑limited scenarios are often not the best fit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Inter‑Cluster Sync: Incremental Distribution and Disaster Recovery
&lt;/h2&gt;

&lt;p&gt;Inter‑cluster sync moves changes on a schedule, accepting minute‑level or even hour‑level delays. It excels at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Remote disaster recovery&lt;/li&gt;
&lt;li&gt;T+1 report queries&lt;/li&gt;
&lt;li&gt;One production cluster feeding multiple downstream clusters&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A typical topology: production syncs hourly to a reporting cluster, daily to a DR cluster, and cascades to regional query clusters. It's less demanding on the network than real‑time approaches and more cost‑effective in multi‑downstream, multi‑purpose environments.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Mirror Cluster&lt;/th&gt;
&lt;th&gt;Inter‑Cluster Sync&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Real‑time&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Low–Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross‑region suitability&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;td&gt;Better&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi‑downstream&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;td&gt;Stronger&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Typical use&lt;/td&gt;
&lt;td&gt;Active‑active, read/write split&lt;/td&gt;
&lt;td&gt;DR, reporting, distribution&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Replicated Tables: Write‑Once‑Read‑Many Inside the Cluster
&lt;/h2&gt;

&lt;p&gt;A replicated table keeps an identical copy on every node, spreading read pressure and reducing cross‑node costs during queries. It's ideal for small tables, dimension tables, and lookup tables that are read frequently but updated infrequently.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Replicated Table&lt;/th&gt;
&lt;th&gt;Inter‑Cluster Sync&lt;/th&gt;
&lt;th&gt;Mirror Cluster&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Scope&lt;/td&gt;
&lt;td&gt;Within cluster&lt;/td&gt;
&lt;td&gt;Between clusters&lt;/td&gt;
&lt;td&gt;Between clusters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Timeliness&lt;/td&gt;
&lt;td&gt;Intra‑cluster sync&lt;/td&gt;
&lt;td&gt;Scheduled / Incremental&lt;/td&gt;
&lt;td&gt;Real‑time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Primary value&lt;/td&gt;
&lt;td&gt;Write‑once‑read‑many&lt;/td&gt;
&lt;td&gt;DR / Distribution&lt;/td&gt;
&lt;td&gt;Active‑active / read/write split&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Cross‑cluster sync solves "how data reaches another cluster." Replicated tables solve "how to read more easily within the same cluster." They operate at different levels.&lt;/p&gt;

&lt;h2&gt;
  
  
  Four Decision Dimensions
&lt;/h2&gt;

&lt;p&gt;Before designing a sync strategy, answer these four questions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Leans Mirror Cluster&lt;/th&gt;
&lt;th&gt;Leans Inter‑Cluster Sync&lt;/th&gt;
&lt;th&gt;Leans Replicated Table&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Real‑time requirement&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Low–Medium&lt;/td&gt;
&lt;td&gt;High within cluster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross‑region&lt;/td&gt;
&lt;td&gt;Less stable than scheduled&lt;/td&gt;
&lt;td&gt;Better fit&lt;/td&gt;
&lt;td&gt;Not applicable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Downstream count&lt;/td&gt;
&lt;td&gt;Typically 2 clusters&lt;/td&gt;
&lt;td&gt;1‑to‑many, cascading&lt;/td&gt;
&lt;td&gt;Not applicable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DR orientation&lt;/td&gt;
&lt;td&gt;Possible&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;Not for cross‑cluster DR&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In short: intra‑city near‑real‑time active‑active / read‑write split → mirror cluster; remote DR / T+1 reporting / multi‑downstream → inter‑cluster sync; hot table read optimization within a cluster → replicated tables.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Common Pitfalls
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Over‑idealizing real‑time requirements&lt;/strong&gt; — if clusters span data centers with average networks, real‑time sync will be fragile.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Treating sync as "automatic full‑database replication"&lt;/strong&gt; — GBase 8a sync is mostly table‑level. Permissions, job chains, views, scripts, and application connections need separate planning.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Watching only "sync success" without verifying "sync usage"&lt;/strong&gt; — data arriving is not enough. Check whether queries actually shifted to the target, whether the DR link can really switch over, and whether downstream clusters are truly being used.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Pre‑Launch Checklist
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Strategy level&lt;/strong&gt;: Confirm sync mode (real‑time/scheduled/incremental), granularity (table‑level), direction (one‑way/two‑way), downstream count, and network path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Operational level&lt;/strong&gt;: Continuously monitor sync lag, incremental backlog, downstream query latency, key‑table row‑count validation, and whether any critical tables are missing from the sync scope.&lt;/p&gt;

&lt;p&gt;A simple daily check: compare row counts between source and target for key tables on the same date.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recommended Rollout Sequence
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Clarify the goal first&lt;/strong&gt;: Remote DR → inter‑cluster sync. Intra‑city active‑active → mirror cluster. Report offloading → either works. Hot intra‑cluster tables → replicated tables.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layer the objects&lt;/strong&gt;: Separate core fact tables, report summary tables, and dimension/lookup tables. Decide which must be real‑time, which can be hourly, and which only need intra‑cluster replication.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Confirm windows and link conditions&lt;/strong&gt;: sub‑second requirements? minute‑level tolerable? cross‑region? multiple downstream clusters?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;GBase 8a data sync is a layered, well‑defined capability set. Not every sync needs to be real‑time — the key is reserving real‑time capacity for the places that truly need it in your &lt;strong&gt;gbase database&lt;/strong&gt;.&lt;/p&gt;

</description>
      <category>gbase</category>
      <category>database</category>
      <category>数据库</category>
      <category>replication</category>
    </item>
    <item>
      <title>Debugging Slow SQL in GBase 8a: How Execution Plans, Distribution Keys, and Parallelism Go Off the Rails Together</title>
      <dc:creator>Michael</dc:creator>
      <pubDate>Wed, 20 May 2026 10:35:53 +0000</pubDate>
      <link>https://dev.to/michaelfv/debugging-slow-sql-in-gbase-8a-how-execution-plans-distribution-keys-and-parallelism-go-off-the-2i7</link>
      <guid>https://dev.to/michaelfv/debugging-slow-sql-in-gbase-8a-how-execution-plans-distribution-keys-and-parallelism-go-off-the-2i7</guid>
      <description>&lt;p&gt;A slow query in a &lt;strong&gt;gbase database&lt;/strong&gt; MPP cluster is rarely just a matter of sloppy SQL. More often, it's the combined effect of data distribution, execution plan choices, parallelism settings, and resource contention — all interacting at once. If you apply single‑node OLTP tuning habits here, you'll usually make things worse.&lt;/p&gt;

&lt;p&gt;This article lays out a complete troubleshooting path, from deciding whether a query is &lt;em&gt;truly&lt;/em&gt; slow to fixing distribution‑key mistakes and reining in runaway parallelism.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. First, Distinguish "Naturally Heavy" from "Suddenly Slow"
&lt;/h2&gt;

&lt;p&gt;Analytical workloads run anywhere from seconds to hours, so there's no single threshold that flags a slow query. Instead, compare execution times against historical baselines for the same task.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Naturally heavy SQL&lt;/strong&gt; — always slow but stable. Look at the data model, distribution keys, and whether pre‑aggregation would help.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Suddenly slow SQL&lt;/strong&gt; — the same task recently degraded. Check locks, execution plan changes, data skew, and parallel‑resource contention.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Collect periodic task durations via &lt;code&gt;audit_log&lt;/code&gt; and correlate them with table‑growth trends to build a daily monitoring baseline.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Rule Out Wait States First
&lt;/h2&gt;

&lt;p&gt;Many queries appear slow because they haven't actually started running — they're stuck on a lock or queued behind other work.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="n"&gt;PROCESSLIST&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you see states like &lt;code&gt;checking permission&lt;/code&gt;, suspect lock waits.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcadmin showlock
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two common traps: treating a queued session as a slow execution (rewriting the SQL won't help), and mistaking a single slow node for a cluster‑wide problem. In a 16‑node cluster, if 15 nodes finish in 2 minutes and one node takes 18, the entire query feels slow.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Watch Data Movement in the Execution Plan
&lt;/h2&gt;

&lt;p&gt;Key motion operators in a GBase 8a plan[reference:8]:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operator&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;RESULT&lt;/td&gt;
&lt;td&gt;Send result to client&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GATHER&lt;/td&gt;
&lt;td&gt;Collect data at the coordinator&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;REDIST(...)&lt;/td&gt;
&lt;td&gt;Hash‑redistribute on a column&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BROADCAST&lt;/td&gt;
&lt;td&gt;Replicate / broadcast result&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RAND REDIST&lt;/td&gt;
&lt;td&gt;Random redistribution&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In an MPP context, the most important question is: &lt;strong&gt;how many times does the plan move data around?&lt;/strong&gt; A classic costly scenario: a large fact table is not distributed on its JOIN key, and the dimension table is not replicated — causing multiple &lt;code&gt;REDIST&lt;/code&gt; steps that generate extra disk I/O, network transfer, and write overhead on the receiving side.&lt;/p&gt;

&lt;p&gt;The community documentation highlights three core issues to look for in execution plans: unreasonable dynamic redistribution, extra redistribution caused by mismatched column types, and Cartesian products triggered by poor JOIN ordering[reference:9].&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Wrong Distribution Keys Set the Stage for Slow SQL
&lt;/h2&gt;

&lt;p&gt;Make the large‑table JOIN column the Hash distribution key whenever possible; the GROUP BY column is a second choice. GBase 8a V9 also supports multi‑column Hash to reduce skew[reference:10].&lt;/p&gt;

&lt;p&gt;The guiding principle: &lt;strong&gt;do the most expensive work locally, on the node where the data already sits. Avoid shredding and redistributing massive data sets just to run a JOIN or GROUP.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If a fact table grows by hundreds of millions of rows a day and most core reports slice by customer, distributing on &lt;code&gt;customer_id&lt;/code&gt; is far better than the "neutral" &lt;code&gt;order_id&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;dws_order_day&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;stat_date&lt;/span&gt;   &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;order_id&lt;/span&gt;    &lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;product_id&lt;/span&gt;  &lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;pay_amount&lt;/span&gt;  &lt;span class="nb"&gt;DECIMAL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;city_id&lt;/span&gt;     &lt;span class="nb"&gt;INT&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;DISTRIBUTED&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;HASH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For small, slowly‑changing dimension tables, use replication so that JOINs with large tables don't trigger redistribution[reference:11].&lt;/p&gt;

&lt;p&gt;Three easily overlooked issues: pick a hash column with many distinct values to avoid node skew[reference:12]; ensure JOIN columns have matching types on both sides; don't pin all hope on single‑table indexes — in analytical workloads, how data is distributed and moved usually dominates performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Rewrite SQL to Work &lt;em&gt;With&lt;/em&gt; Columnar Storage and MPP Execution
&lt;/h2&gt;

&lt;p&gt;Avoid &lt;code&gt;SELECT *&lt;/code&gt;. Remove unnecessary inner &lt;code&gt;ORDER BY&lt;/code&gt;. Prevent Cartesian JOINs. For multi‑column GROUP BY / JOIN, place the column with the most distinct values first.&lt;/p&gt;

&lt;p&gt;A common anti‑pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- ❌ SELECT * defeats columnar benefits, inner ORDER BY is pointless, JOIN types mismatch&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;dwd_order_detail&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;stat_date&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="s1"&gt;'2026-03-01'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="s1"&gt;'2026-03-07'&lt;/span&gt;
    &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;create_time&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;dim_customer&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_code&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A safer rewrite:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- ✅ Slim columns first, aggregate locally on the fact table, then JOIN with matching types&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_cnt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_level&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_cnt&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;dwd_order_detail&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;stat_date&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="s1"&gt;'2026-03-01'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="s1"&gt;'2026-03-07'&lt;/span&gt;
    &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;dim_customer&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  6. Use Hash Indexes Judiciously
&lt;/h2&gt;

&lt;p&gt;Hash indexes excel at single‑table equality lookups with small result sets. They don't help with range scans or fuzzy queries[reference:13]. For columns with few distinct values and frequent exact queries, a Global Hash index is a common choice.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_user_id&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;dwd_user_behavior&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;HASH&lt;/span&gt; &lt;span class="k"&gt;GLOBAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But indexes have a maintenance cost: they can slow down data loading and DML. For real‑time loading scenarios, stage data into an index‑free temporary table first, then bulk‑move into the indexed target or build the index in one shot[reference:14]. On large fact tables, always tackle distribution keys and SQL patterns before reaching for Hash indexes.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. In High‑Concurrency Scenarios, &lt;em&gt;Dialing Back&lt;/em&gt; Parallelism Often Stabilizes Performance
&lt;/h2&gt;

&lt;p&gt;Key parameters: &lt;code&gt;gbase_parallel_execution&lt;/code&gt; (parallel enable), &lt;code&gt;gbase_parallel_degree&lt;/code&gt; (max parallelism per SQL), &lt;code&gt;gbase_parallel_max_thread_in_pool&lt;/code&gt; (thread pool size)[reference:15].&lt;/p&gt;

&lt;p&gt;Higher concurrency does not automatically mean higher performance. When CPU, disk, and memory are finite, too much parallelism results in elevated CPU sys time, 100% disk busy with low throughput, and swapping.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Low concurrency, few heavy queries&lt;/strong&gt;: raise &lt;code&gt;gbase_parallel_degree&lt;/code&gt; (e.g., 16).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High peak concurrency, many batch tasks&lt;/strong&gt;: lower single‑SQL parallelism (e.g., 4) and cap the thread pool (e.g., 64) so that more tasks can make progress and the pool isn't drained by a few large queries[reference:16].&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  8. A Complete Troubleshooting Flow
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Confirm it's truly a regression&lt;/strong&gt; — compare historical runtimes and data‑volume trends.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rule out waiting&lt;/strong&gt; — &lt;code&gt;SHOW PROCESSLIST&lt;/code&gt;, &lt;code&gt;gcadmin showlock&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inspect the execution plan&lt;/strong&gt; — watch for REDIST, GATHER, BROADCAST and mismatched JOIN column types.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Revisit table design&lt;/strong&gt; — do the distribution keys align with the core JOIN / GROUP paths? Should small tables be replicated?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tune parameters&lt;/strong&gt; — &lt;code&gt;gbase_parallel_degree&lt;/code&gt;, thread pool, memory relative to concurrency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluate indexes last&lt;/strong&gt; — is this genuinely a single‑table exact‑match problem? Can you afford the maintenance cost?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;At its heart, performance tuning in a &lt;strong&gt;gbase database&lt;/strong&gt; is about minimizing unnecessary data movement. Keep data on the node where the work happens, aggregate early, and don't let parallelism cannibalize itself. Build these habits into your model design and job scheduling, and you'll stop a large share of slow queries before they ever reach production.&lt;/p&gt;

</description>
      <category>gbase</category>
      <category>database</category>
      <category>数据库</category>
      <category>parallelism</category>
    </item>
    <item>
      <title>GBase 8a Partitioning Best Practices: From Design to Operations</title>
      <dc:creator>Michael</dc:creator>
      <pubDate>Tue, 19 May 2026 11:45:00 +0000</pubDate>
      <link>https://dev.to/michaelfv/gbase-8a-partitioning-best-practices-from-design-to-operations-ma6</link>
      <guid>https://dev.to/michaelfv/gbase-8a-partitioning-best-practices-from-design-to-operations-ma6</guid>
      <description>&lt;p&gt;In a &lt;strong&gt;gbase database&lt;/strong&gt; handling tens of billions of rows, smart partitioning is no longer optional — it's mandatory. This guide covers partition type selection, pruning validation, daily maintenance, and common pitfalls to help you build a solid partitioning strategy for your GBASE cluster.&lt;/p&gt;

&lt;h2&gt;
  
  
  Supported Partitioning Types
&lt;/h2&gt;

&lt;p&gt;GBase 8a offers four partitioning methods:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;RANGE&lt;/strong&gt; – split by column value intervals (ideal for time‑series data)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LIST&lt;/strong&gt; – split by discrete enumerated values (e.g., region, status)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HASH&lt;/strong&gt; – distribute evenly via hash modulus&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;KEY&lt;/strong&gt; – similar to HASH, but the database chooses the hash function&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In practice, &lt;strong&gt;RANGE partitioning&lt;/strong&gt; on a date column is the most common choice.&lt;/p&gt;

&lt;h2&gt;
  
  
  RANGE Partitioning
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Monthly Partitioning Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;order_id&lt;/span&gt;     &lt;span class="nb"&gt;BIGINT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;customer_id&lt;/span&gt;  &lt;span class="nb"&gt;BIGINT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;order_date&lt;/span&gt;   &lt;span class="nb"&gt;DATE&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;amount&lt;/span&gt;       &lt;span class="nb"&gt;DECIMAL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;status&lt;/span&gt;       &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;-- partition key must be part of PK&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;RANGE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TO_DAYS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p202401&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TO_DAYS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2024-02-01'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p202402&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TO_DAYS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2024-03-01'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p202403&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TO_DAYS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2024-04-01'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p202404&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TO_DAYS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2024-05-01'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p202405&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TO_DAYS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2024-06-01'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p202406&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TO_DAYS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2024-07-01'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p_future&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="k"&gt;MAXVALUE&lt;/span&gt;  &lt;span class="c1"&gt;-- catch-all&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key rules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The partition key column &lt;strong&gt;must be included in the primary key or unique key&lt;/strong&gt;, otherwise the DDL will fail.&lt;/li&gt;
&lt;li&gt;For daily partitioning, keep the total partition count &lt;strong&gt;under 1024&lt;/strong&gt; to avoid metadata overhead.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  LIST Partitioning Example
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;sales_regional&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;sale_id&lt;/span&gt;   &lt;span class="nb"&gt;BIGINT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;region&lt;/span&gt;    &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sale_date&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;amount&lt;/span&gt;    &lt;span class="nb"&gt;DECIMAL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sale_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;LIST&lt;/span&gt; &lt;span class="n"&gt;COLUMNS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p_north&lt;/span&gt;   &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'北京'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'天津'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'河北'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'山西'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'内蒙古'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p_east&lt;/span&gt;    &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'上海'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'江苏'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'浙江'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'安徽'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'山东'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p_south&lt;/span&gt;   &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'广东'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'广西'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'海南'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'福建'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p_central&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'湖北'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'湖南'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'河南'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'江西'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p_west&lt;/span&gt;    &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'四川'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'重庆'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'云南'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'贵州'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'西藏'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p_other&lt;/span&gt;   &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'其他'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Partition Pruning
&lt;/h2&gt;

&lt;p&gt;Pruning kicks in &lt;strong&gt;only when the &lt;code&gt;WHERE&lt;/code&gt; clause references the partition key&lt;/strong&gt;. Verify it with &lt;code&gt;EXPLAIN PARTITIONS&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="n"&gt;PARTITIONS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;order_date&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="s1"&gt;'2024-03-01'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="s1"&gt;'2024-03-31'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;✅ If only &lt;code&gt;p202403&lt;/code&gt; appears in the &lt;code&gt;partitions&lt;/code&gt; column, pruning works.&lt;/li&gt;
&lt;li&gt;❌ If all partitions are listed, pruning is disabled — check your query.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Common pruning killers&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Applying a function: &lt;code&gt;WHERE MONTH(order_date) = 3&lt;/code&gt; ❌&lt;/li&gt;
&lt;li&gt;Implicit type conversion: &lt;code&gt;WHERE order_date = 20240301&lt;/code&gt; ❌&lt;/li&gt;
&lt;li&gt;Mixing partition-key and non‑key conditions with &lt;code&gt;OR&lt;/code&gt; ❌ — consider &lt;code&gt;UNION ALL&lt;/code&gt; instead.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Day‑to‑Day Partition Maintenance
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Add a New Partition
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
    &lt;span class="n"&gt;REORGANIZE&lt;/span&gt; &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p_future&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p202407&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TO_DAYS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2024-08-01'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
        &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p_future&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="k"&gt;MAXVALUE&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;-- Always update statistics immediately!&lt;/span&gt;
&lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Drop Historical Partitions (orders of magnitude faster than DELETE)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p202101&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p202102&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p202103&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Merge or Split Partitions
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Merge two small partitions&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
    &lt;span class="n"&gt;REORGANIZE&lt;/span&gt; &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p202401&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p202402&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p2024_q1&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TO_DAYS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2024-03-01'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;-- Split a large partition&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
    &lt;span class="n"&gt;REORGANIZE&lt;/span&gt; &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p_future&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p202407&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TO_DAYS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2024-08-01'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
        &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p202408&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TO_DAYS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2024-09-01'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
        &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p_future&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="k"&gt;MAXVALUE&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Common Mistakes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Too many partitions&lt;/strong&gt;: Over &lt;strong&gt;500&lt;/strong&gt; partitions cause metadata bloat and slower plan generation. Aim for &lt;strong&gt;50–200&lt;/strong&gt; partitions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Low‑cardinality partition key&lt;/strong&gt;: Using a column like &lt;code&gt;status&lt;/code&gt; (3 values) causes severe data skew.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No catch‑all partition&lt;/strong&gt;: Without &lt;code&gt;MAXVALUE&lt;/code&gt;, out‑of‑range inserts fail immediately.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skipping &lt;code&gt;ANALYZE TABLE&lt;/code&gt; after partition changes&lt;/strong&gt;: The optimizer sees empty partitions and may choose poor plans.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Automate Future Partition Creation
&lt;/h2&gt;

&lt;p&gt;A simple script added to crontab ensures you never forget to create next month's partition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nv"&gt;NEXT_MONTH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s2"&gt;"+1 month"&lt;/span&gt; +%Y%m&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;NEXT_MONTH_START&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s2"&gt;"+1 month"&lt;/span&gt; +%Y-%m-01&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;NEXT_NEXT_MONTH_START&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s2"&gt;"+2 month"&lt;/span&gt; +%Y-%m-01&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;PARTITION_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"p&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NEXT_MONTH&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

mysql &lt;span class="nt"&gt;-h&lt;/span&gt; 192.168.1.10 &lt;span class="nt"&gt;-P&lt;/span&gt; 5258 &lt;span class="nt"&gt;-u&lt;/span&gt; gbase &lt;span class="nt"&gt;-pyour_password&lt;/span&gt; sales_db &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
ALTER TABLE orders
    REORGANIZE PARTITION p_future INTO (
        PARTITION &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PARTITION_NAME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt; VALUES LESS THAN (TO_DAYS('&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NEXT_NEXT_MONTH_START&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;')),
        PARTITION p_future VALUES LESS THAN MAXVALUE
    );
ANALYZE TABLE orders;
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Crontab: &lt;code&gt;0 1 1 * * /bin/bash /usr/local/scripts/auto_add_partition.sh&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Design Decision Flow
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Estimate data size: below &lt;strong&gt;10 million&lt;/strong&gt; rows → no partition needed; above &lt;strong&gt;10 million&lt;/strong&gt; → consider partitioning.&lt;/li&gt;
&lt;li&gt;Date column? → RANGE. Discrete categories? → LIST. No obvious pattern? → HASH.&lt;/li&gt;
&lt;li&gt;Granularity: daily inserts &amp;gt; &lt;strong&gt;5 million&lt;/strong&gt; → partition by day; &lt;strong&gt;1–5 million&lt;/strong&gt; → by month; &amp;lt; &lt;strong&gt;1 million&lt;/strong&gt; → by quarter or year.&lt;/li&gt;
&lt;li&gt;Ensure the partition key is in the primary key, then validate pruning with &lt;code&gt;EXPLAIN&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A well‑designed partitioning layout is the foundation of query performance in a &lt;strong&gt;gbase database&lt;/strong&gt;. Spend the time to get it right at design time, and you'll save yourself a world of trouble later.&lt;/p&gt;

</description>
      <category>gbase</category>
      <category>database</category>
      <category>数据库</category>
      <category>performance</category>
    </item>
    <item>
      <title>GBase 8a Backup and Restore Guide: Full and Incremental Backups with gbackup</title>
      <dc:creator>Michael</dc:creator>
      <pubDate>Tue, 19 May 2026 10:36:00 +0000</pubDate>
      <link>https://dev.to/michaelfv/gbase-8a-backup-and-restore-guide-full-and-incremental-backups-with-gbackup-37n7</link>
      <guid>https://dev.to/michaelfv/gbase-8a-backup-and-restore-guide-full-and-incremental-backups-with-gbackup-37n7</guid>
      <description>&lt;p&gt;GBase 8a stores massive analytical datasets across many nodes. If hardware fails, someone drops a table, or an upgrade goes wrong, a solid backup strategy is your only lifeline. This guide covers the official &lt;code&gt;gbackup&lt;/code&gt; tool for full and incremental backups, plus &lt;code&gt;mysqldump&lt;/code&gt; for finer-grained tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  gbackup vs. mysqldump
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;gbackup&lt;/th&gt;
&lt;th&gt;mysqldump&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Type&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Physical backup&lt;/td&gt;
&lt;td&gt;Logical backup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fast (copies data files)&lt;/td&gt;
&lt;td&gt;Slow (generates SQL statements)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Restore speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;Slow (re‑executes every statement)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Granularity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Database / cluster&lt;/td&gt;
&lt;td&gt;Database, table&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cross‑platform&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Weaker&lt;/td&gt;
&lt;td&gt;Strong (standard SQL)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best for&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full disaster recovery, large volumes&lt;/td&gt;
&lt;td&gt;Table migration, cross‑platform exports&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For production, &lt;code&gt;gbackup&lt;/code&gt; is the primary tool. &lt;code&gt;mysqldump&lt;/code&gt; fills the gaps when you need per‑table exports.&lt;/p&gt;

&lt;h2&gt;
  
  
  Full Backup with gbackup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Before You Start
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Verify all nodes are healthy&lt;/span&gt;
gcadmin showall

&lt;span class="c"&gt;# Ensure the backup directory has enough space (1.5× data size recommended)&lt;/span&gt;
&lt;span class="nb"&gt;df&lt;/span&gt; &lt;span class="nt"&gt;-h&lt;/span&gt; /backup/gbase
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Run a Full Backup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Basic full backup&lt;/span&gt;
gbackup &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--backup-dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/backup/gbase/full_20240601 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--backup-type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;full &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gbase &lt;span class="nt"&gt;--password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_password &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;coordinator_ip &lt;span class="nt"&gt;--port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5258

&lt;span class="c"&gt;# Compressed full backup (saves 30%–50% storage)&lt;/span&gt;
gbackup &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--backup-dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/backup/gbase/full_20240601 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--backup-type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;full &lt;span class="nt"&gt;--compress&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gbase &lt;span class="nt"&gt;--password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_password &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;coordinator_ip &lt;span class="nt"&gt;--port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5258
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Verify the Backup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-lh&lt;/span&gt; /backup/gbase/full_20240601/
&lt;span class="nb"&gt;cat&lt;/span&gt; /backup/gbase/full_20240601/backup.log | &lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-20&lt;/span&gt;
&lt;span class="c"&gt;# Look for "backup completed successfully"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Incremental Backups with gbackup
&lt;/h2&gt;

&lt;p&gt;Incremental backups capture only changes since the last backup, so they're fast and compact.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gbackup &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--backup-dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/backup/gbase/incr_20240602 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--backup-type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;incremental &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--incremental-basedir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/backup/gbase/full_20240601 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gbase &lt;span class="nt"&gt;--password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_password &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;coordinator_ip &lt;span class="nt"&gt;--port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5258
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Recommended cadence&lt;/strong&gt;: Full on Sunday, incrementals Monday–Saturday, new full next Sunday. At worst, you lose one day of data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Restoring from Backups
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Full Restore
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Stop the database first&lt;/span&gt;
gcadmin stop

grestore &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--backup-dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/backup/gbase/full_20240601 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gbase &lt;span class="nt"&gt;--password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_password &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;coordinator_ip &lt;span class="nt"&gt;--port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5258

gcadmin start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Full + Incremental Restore
&lt;/h3&gt;

&lt;p&gt;Restore the full backup, then apply each incremental in sequence. The last incremental omits &lt;code&gt;--apply-log-only&lt;/code&gt; to finalize the consistent state.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Restore full backup (don't commit yet)&lt;/span&gt;
grestore &lt;span class="nt"&gt;--backup-dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/backup/gbase/full_20240601 &lt;span class="nt"&gt;--apply-log-only&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gbase &lt;span class="nt"&gt;--password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_password

&lt;span class="c"&gt;# Apply first incremental&lt;/span&gt;
grestore &lt;span class="nt"&gt;--backup-dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/backup/gbase/incr_20240602 &lt;span class="nt"&gt;--apply-log-only&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--incremental-dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/backup/gbase/incr_20240602

&lt;span class="c"&gt;# Apply last incremental (no --apply-log-only)&lt;/span&gt;
grestore &lt;span class="nt"&gt;--backup-dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/backup/gbase/full_20240601 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--incremental-dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/backup/gbase/incr_20240603

gcadmin start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  mysqldump for Fine-Grained Tasks
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Export a single database&lt;/span&gt;
mysqldump &lt;span class="nt"&gt;-h&lt;/span&gt; coordinator_ip &lt;span class="nt"&gt;-P&lt;/span&gt; 5258 &lt;span class="nt"&gt;-u&lt;/span&gt; gbase &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--databases&lt;/span&gt; sales_db &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /backup/sales_db_20240601.sql

&lt;span class="c"&gt;# Export a single table&lt;/span&gt;
mysqldump &lt;span class="nt"&gt;-h&lt;/span&gt; coordinator_ip &lt;span class="nt"&gt;-P&lt;/span&gt; 5258 &lt;span class="nt"&gt;-u&lt;/span&gt; gbase &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  sales_db orders &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /backup/orders_20240601.sql

&lt;span class="c"&gt;# Import back&lt;/span&gt;
mysql &lt;span class="nt"&gt;-h&lt;/span&gt; coordinator_ip &lt;span class="nt"&gt;-P&lt;/span&gt; 5258 &lt;span class="nt"&gt;-u&lt;/span&gt; gbase &lt;span class="nt"&gt;-p&lt;/span&gt; sales_db &amp;lt; /backup/orders_20240601.sql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Automating with Cron
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nv"&gt;DATE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%Y%m%d&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;BACKUP_BASE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/backup/gbase
&lt;span class="nv"&gt;COORD_IP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;192.168.1.10

&lt;span class="nv"&gt;DAY_OF_WEEK&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%u&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$DAY_OF_WEEK&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-eq&lt;/span&gt; 7 &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nv"&gt;BACKUP_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$BACKUP_BASE&lt;/span&gt;/full_&lt;span class="nv"&gt;$DATE&lt;/span&gt;
  gbackup &lt;span class="nt"&gt;--backup-dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$BACKUP_DIR&lt;/span&gt; &lt;span class="nt"&gt;--backup-type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;full &lt;span class="nt"&gt;--compress&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gbase &lt;span class="nt"&gt;--password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_password &lt;span class="nt"&gt;--host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$COORD_IP&lt;/span&gt; &lt;span class="nt"&gt;--port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5258
&lt;span class="k"&gt;else
  &lt;/span&gt;&lt;span class="nv"&gt;LAST_FULL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-dt&lt;/span&gt; &lt;span class="nv"&gt;$BACKUP_BASE&lt;/span&gt;/full_&lt;span class="k"&gt;*&lt;/span&gt; | &lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-1&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="nv"&gt;BACKUP_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$BACKUP_BASE&lt;/span&gt;/incr_&lt;span class="nv"&gt;$DATE&lt;/span&gt;
  gbackup &lt;span class="nt"&gt;--backup-dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$BACKUP_DIR&lt;/span&gt; &lt;span class="nt"&gt;--backup-type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;incremental &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--incremental-basedir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$LAST_FULL&lt;/span&gt; &lt;span class="nt"&gt;--compress&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gbase &lt;span class="nt"&gt;--password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_password &lt;span class="nt"&gt;--host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$COORD_IP&lt;/span&gt; &lt;span class="nt"&gt;--port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5258
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c"&gt;# Remove backups older than 30 days&lt;/span&gt;
find &lt;span class="nv"&gt;$BACKUP_BASE&lt;/span&gt; &lt;span class="nt"&gt;-maxdepth&lt;/span&gt; 1 &lt;span class="nt"&gt;-mtime&lt;/span&gt; +30 &lt;span class="nt"&gt;-exec&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; &lt;span class="o"&gt;{}&lt;/span&gt; &lt;span class="se"&gt;\;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add to crontab: &lt;code&gt;0 2 * * * /bin/bash /usr/local/scripts/gbase_backup.sh&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Backup Readiness Checklist
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Does the backup log end with "completed successfully"?&lt;/li&gt;
&lt;li&gt;Is the backup file size similar to the previous run?&lt;/li&gt;
&lt;li&gt;Perform a restore drill at least once a month.&lt;/li&gt;
&lt;li&gt;Are backups stored off‑site?&lt;/li&gt;
&lt;li&gt;Does the retention policy meet compliance requirements?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A backup is only as good as your ability to restore it. Run a full restore drill every quarter — when a crisis hits, you'll be glad you did. Protecting your &lt;strong&gt;gbase database&lt;/strong&gt; with a solid backup routine is one of the most important investments a DBA can make.&lt;/p&gt;

</description>
      <category>gbase</category>
      <category>database</category>
      <category>数据库</category>
      <category>backup</category>
    </item>
    <item>
      <title>When Execution Plans Go Wrong in GBase 8a: A Tuning Guide from Statistics to Hints</title>
      <dc:creator>Michael</dc:creator>
      <pubDate>Tue, 19 May 2026 09:27:00 +0000</pubDate>
      <link>https://dev.to/michaelfv/when-execution-plans-go-wrong-in-gbase-8a-a-tuning-guide-from-statistics-to-hints-44ia</link>
      <guid>https://dev.to/michaelfv/when-execution-plans-go-wrong-in-gbase-8a-a-tuning-guide-from-statistics-to-hints-44ia</guid>
      <description>&lt;p&gt;In day-to-day operations of a &lt;strong&gt;gbase database&lt;/strong&gt;, a query that ran in seconds yesterday might suddenly take minutes today — even with indexes in place. The root cause is often stale statistics that mislead the cost‑based optimizer. This guide covers the essential tools: understanding cardinality, using &lt;code&gt;ANALYZE TABLE&lt;/code&gt;, reading &lt;code&gt;EXPLAIN&lt;/code&gt; output, and applying Hints when necessary.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Metric: Cardinality
&lt;/h2&gt;

&lt;p&gt;The optimizer relies on statistics like row counts and cardinality (the number of distinct values in a column) to estimate how effective an index will be. High cardinality encourages index use; low cardinality may lead to a full table scan. If statistics are out of date, the optimizer makes poor decisions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Check column statistics&lt;/span&gt;
&lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- Alternatively&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;TABLE_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CARDINALITY&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;information_schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;STATISTICS&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;TABLE_SCHEMA&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'your_db'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;TABLE_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'orders'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Refreshing Statistics with ANALYZE TABLE
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;ANALYZE TABLE&lt;/code&gt; updates statistics immediately for new query plans. Existing running queries are not affected.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Single table&lt;/span&gt;
&lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- Multiple tables&lt;/span&gt;
&lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;order_items&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Crucially, after adding a new partition, you must run &lt;code&gt;ANALYZE TABLE&lt;/code&gt;&lt;/strong&gt;. Without it, the optimizer underestimates the partition's size, potentially disabling partition pruning and selecting wrong parallelism.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p202406&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2024-07-01'&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Diagnosing with EXPLAIN
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;12345&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;type&lt;/code&gt; shows &lt;code&gt;ALL&lt;/code&gt;, it's a full table scan. Common causes: missing index, outdated statistics (cardinality is zero or too low), implicit type conversion, or using a function on the indexed column.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution workflow:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Check if the index exists (&lt;code&gt;SHOW INDEX&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;ANALYZE TABLE&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Verify with &lt;code&gt;EXPLAIN&lt;/code&gt; again.&lt;/li&gt;
&lt;li&gt;If still &lt;code&gt;ALL&lt;/code&gt;, inspect the SQL for type mismatches or functions on the index column.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Hint: Emergency Override
&lt;/h2&gt;

&lt;p&gt;When statistics are correct but the optimizer still picks a bad plan, Hints provide a temporary override. Use them sparingly and document why.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;JOIN order&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;STRAIGHT_JOIN&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;large_table&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;small_table&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Index hints&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;USE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;idx_customer_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;12345&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;FORCE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;idx_customer_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;12345&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;IGNORE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;idx_create_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;12345&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Parallelism hint (GBase 8a‑specific)&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="cm"&gt;/*+ PARALLEL(4) */&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;large_table&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Always verify with &lt;code&gt;EXPLAIN&lt;/code&gt; that the hint is being respected.&lt;/p&gt;

&lt;h2&gt;
  
  
  Complete Tuning Flow
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;EXPLAIN&lt;/code&gt; the slow query.&lt;/li&gt;
&lt;li&gt;If &lt;code&gt;type = ALL&lt;/code&gt;: check indexes → if missing, create; if present, run &lt;code&gt;ANALYZE TABLE&lt;/code&gt; → re‑evaluate.&lt;/li&gt;
&lt;li&gt;If JOIN order is wrong: update statistics → if still bad, apply &lt;code&gt;STRAIGHT_JOIN&lt;/code&gt; hint.&lt;/li&gt;
&lt;li&gt;If statistics are fine but plan remains poor: use targeted hints, record the reason, and schedule a follow‑up review.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Operational Best Practices
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Run &lt;code&gt;ANALYZE TABLE&lt;/code&gt; immediately after bulk data loads.&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;ANALYZE TABLE&lt;/code&gt; immediately after adding partitions.&lt;/li&gt;
&lt;li&gt;Schedule a weekly &lt;code&gt;ANALYZE&lt;/code&gt; for core business tables.&lt;/li&gt;
&lt;li&gt;When troubleshooting slow queries, always start with &lt;code&gt;ANALYZE&lt;/code&gt;, then &lt;code&gt;EXPLAIN&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;After version upgrades, perform a full &lt;code&gt;ANALYZE&lt;/code&gt; and re‑validate critical execution plans.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Keeping statistics fresh is the single most reliable way to keep your &lt;strong&gt;gbase database&lt;/strong&gt; performing predictably. Make "ANALYZE after change" a habit, and you will avoid the vast majority of unexpected plan regressions.&lt;/p&gt;

</description>
      <category>gbase</category>
      <category>database</category>
      <category>数据库</category>
      <category>performance</category>
    </item>
    <item>
      <title>When GBase 8a Execution Plans Go Wrong: Mastering Statistics Management</title>
      <dc:creator>Michael</dc:creator>
      <pubDate>Tue, 19 May 2026 08:20:23 +0000</pubDate>
      <link>https://dev.to/michaelfv/when-gbase-8a-execution-plans-go-wrong-mastering-statistics-management-46db</link>
      <guid>https://dev.to/michaelfv/when-gbase-8a-execution-plans-go-wrong-mastering-statistics-management-46db</guid>
      <description>&lt;p&gt;A simple SQL suddenly triggers a full table scan, or a JOIN order looks completely off. Most of the time, the culprit is stale statistics misleading the optimizer. This guide walks through how GBase 8a's statistics work, how to read execution plans, and what to do when the optimizer gets it wrong in a &lt;strong&gt;gbase database&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. What Are Statistics?
&lt;/h2&gt;

&lt;p&gt;The GBase 8a optimizer relies on three layers of statistics to estimate the cost of each execution path:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Table‑level&lt;/strong&gt;: row count, data pages, average row length.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Column‑level&lt;/strong&gt;: min/max values, NULL ratio, cardinality (number of distinct values).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Block‑level&lt;/strong&gt;: rough index bounds (min/max) for each data block.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When these statistics become stale or inaccurate, the optimizer's cost estimates drift — and the execution plan can go from a quick range scan to a painfully slow full table scan.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Inspecting Statistics
&lt;/h2&gt;

&lt;p&gt;Start by checking the current state of statistics for your tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Basic table statistics&lt;/span&gt;
&lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;STATUS&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;analytics_db&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'orders'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Column-level statistics&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;information_schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;COLUMN_STATISTICS&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'orders'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Index statistics (Cardinality is key)&lt;/span&gt;
&lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Last update time&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;table_rows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;avg_row_length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data_length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;update_time&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;information_schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TABLES&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;table_schema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'analytics_db'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'orders'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the &lt;code&gt;Cardinality&lt;/code&gt; value is far from reality, an immediate &lt;code&gt;ANALYZE&lt;/code&gt; is needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Reading EXPLAIN Plans
&lt;/h2&gt;

&lt;p&gt;Before updating statistics, use &lt;code&gt;EXPLAIN&lt;/code&gt; to see what the optimizer is currently doing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;create_time&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2024-01-01'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'East'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="n"&gt;PARTITIONS&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;create_time&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2024-01-01'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="n"&gt;EXTENDED&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;create_time&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2024-01-01'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key fields to watch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;type&lt;/strong&gt;: &lt;code&gt;ALL&lt;/code&gt; signals a full table scan — the most urgent red flag.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;rows&lt;/strong&gt;: If the estimate is far larger than the actual result set, filtering is inefficient.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extra&lt;/strong&gt;: &lt;code&gt;Using filesort&lt;/code&gt; or &lt;code&gt;Using temporary&lt;/code&gt; deserve attention.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;key&lt;/strong&gt;: &lt;code&gt;NULL&lt;/code&gt; means no index is being used.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  4. Manually Updating Statistics
&lt;/h2&gt;

&lt;p&gt;Trigger &lt;code&gt;ANALYZE&lt;/code&gt; manually in these situations: after bulk loads or deletes, when performance suddenly drops, before the first query on a new table, after adding or dropping partitions, or when &lt;code&gt;Cardinality&lt;/code&gt; deviates by over 50% from reality.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Single table&lt;/span&gt;
&lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Multiple tables&lt;/span&gt;
&lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Specific partition (faster than full‑table)&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p2024&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Automate with a cron script that runs during off‑peak hours:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nv"&gt;TABLES&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"orders users products regions"&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;TABLE &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nv"&gt;$TABLES&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
    &lt;/span&gt;gccli &lt;span class="nt"&gt;-u&lt;/span&gt; gbase &lt;span class="nt"&gt;-p&lt;/span&gt;&lt;span class="s1"&gt;'password'&lt;/span&gt; analytics_db &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"ANALYZE TABLE &lt;/span&gt;&lt;span class="nv"&gt;$TABLE&lt;/span&gt;&lt;span class="s2"&gt;;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; /var/log/gbase_analyze.log 2&amp;gt;&amp;amp;1
&lt;span class="k"&gt;done&lt;/span&gt;

&lt;span class="c"&gt;# crontab: 0 3 * * * /opt/scripts/gbase_analyze.sh&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  5. Common Problems and Fixes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  5.1 Partition Pruning Disabled
&lt;/h3&gt;

&lt;p&gt;Applying a function to the partition key prevents the optimizer from inferring partition boundaries.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Bad: YEAR() forces a scan of all partitions&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nb"&gt;YEAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;create_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2024&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Good: direct range condition&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;create_time&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2024-01-01'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;create_time&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="s1"&gt;'2025-01-01'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5.2 Wrong JOIN Order
&lt;/h3&gt;

&lt;p&gt;Run &lt;code&gt;ANALYZE&lt;/code&gt; first, then verify with &lt;code&gt;EXPLAIN&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_time&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2024-01-01'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5.3 Large Table JOIN Causing Full Shuffle
&lt;/h3&gt;

&lt;p&gt;Update statistics first. If the plan still looks bad, use a Hint to broadcast the small table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="cm"&gt;/*+ BROADCAST(u) */&lt;/span&gt;
    &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  6. Using Hints to Override Plans
&lt;/h2&gt;

&lt;p&gt;Hints are a temporary measure, not a permanent fix. Always try statistics and SQL rewriting first.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Force JOIN driving order&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="cm"&gt;/*+ STRAIGHT_JOIN */&lt;/span&gt; 
    &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Broadcast the small table&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="cm"&gt;/*+ BROADCAST(u) */&lt;/span&gt;
    &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Set parallelism&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="cm"&gt;/*+ PARALLEL(4) */&lt;/span&gt;
    &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Combine hints&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="cm"&gt;/*+ BROADCAST(u) PARALLEL(8) */&lt;/span&gt;
    &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  7. Building a Statistics Management Routine
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Always ANALYZE after data imports&lt;/strong&gt; — make it the final step of every load script.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schedule daily ANALYZE for write‑heavy tables&lt;/strong&gt; during off‑peak hours (2–4 AM).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor Cardinality drift&lt;/strong&gt; — alert if &lt;code&gt;table_rows&lt;/code&gt; deviates from &lt;code&gt;COUNT(*)&lt;/code&gt; by more than 30%.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Periodically compare EXPLAIN plans&lt;/strong&gt; for critical queries against a stored baseline.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  8. Important Notes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ANALYZE TABLE&lt;/code&gt; consumes resources; run it on large tables during maintenance windows.&lt;/li&gt;
&lt;li&gt;Statistics have a natural lag; don't rely solely on automatic collection for frequently written tables.&lt;/li&gt;
&lt;li&gt;Hints are a last resort. Overusing them makes SQL harder to maintain and can become a bottleneck as data evolves.&lt;/li&gt;
&lt;li&gt;When execution plans become unstable, &lt;strong&gt;always check statistics first&lt;/strong&gt; — not the hardware or the SQL.&lt;/li&gt;
&lt;li&gt;For partitioned tables, &lt;code&gt;ANALYZE&lt;/code&gt; new partitions immediately; otherwise the optimizer sees them as empty.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Keeping statistics fresh is the single most impactful maintenance task for query performance in a &lt;strong&gt;gbase database&lt;/strong&gt;. A disciplined &lt;code&gt;ANALYZE&lt;/code&gt; routine prevents most optimizer misjudgments before they ever reach production.&lt;/p&gt;

</description>
      <category>gbase</category>
      <category>database</category>
      <category>数据库</category>
      <category>performance</category>
    </item>
    <item>
      <title>How Audit Logs Flow from audit_log to audit_log_express in GBase 8a</title>
      <dc:creator>Michael</dc:creator>
      <pubDate>Mon, 18 May 2026 15:52:00 +0000</pubDate>
      <link>https://dev.to/michaelfv/how-audit-logs-flow-from-auditlog-to-auditlogexpress-in-gbase-8a-171k</link>
      <guid>https://dev.to/michaelfv/how-audit-logs-flow-from-auditlog-to-auditlogexpress-in-gbase-8a-171k</guid>
      <description>&lt;p&gt;GBase 8a's audit architecture relies on two key system tables: &lt;code&gt;audit_log&lt;/code&gt; and &lt;code&gt;audit_log_express&lt;/code&gt;. They form a &lt;strong&gt;"local source → centralized archive"&lt;/strong&gt; relationship. A built‑in scheduled job automatically aggregates audit records from every node into a single cluster‑wide view, enabling both global querying and automatic data aging.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Two Tables at a Glance
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;audit_log&lt;/code&gt; is where raw logs first land, one copy per node. &lt;code&gt;audit_log_express&lt;/code&gt; is the consolidated warehouse for all those records.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;gbase.audit_log&lt;/th&gt;
&lt;th&gt;gclusterdb.audit_log_express&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Local to each node; records only that node's events&lt;/td&gt;
&lt;td&gt;Centralized, randomly distributed across data nodes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Content&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GCluster nodes: full original SQL. GNode nodes: dispatched sub‑task fragments&lt;/td&gt;
&lt;td&gt;Aggregated logs from all nodes, with complete context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Retention&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No automatic cleanup; manual &lt;code&gt;TRUNCATE&lt;/code&gt; required&lt;/td&gt;
&lt;td&gt;Auto‑aged: keeps only the last &lt;strong&gt;31 days&lt;/strong&gt; by default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Primary use&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Node‑level audit queries and HA data source&lt;/td&gt;
&lt;td&gt;Cluster‑wide auditing, long‑term archival, compliance&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  How Data Flows Between Them
&lt;/h2&gt;

&lt;p&gt;The entire flow is driven by a system‑created event called &lt;code&gt;import_audit_log&lt;/code&gt;.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Automatic setup&lt;/strong&gt;: During cluster installation or upgrade, the system creates the &lt;code&gt;audit_log_express&lt;/code&gt; table and the &lt;code&gt;import_audit_log&lt;/code&gt; event.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Daily schedule&lt;/strong&gt;: The event runs once per day, exporting new records from &lt;code&gt;gbase.audit_log&lt;/code&gt; on all nodes and merging them into &lt;code&gt;audit_log_express&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Built‑in aging&lt;/strong&gt;: &lt;code&gt;audit_log_express&lt;/code&gt; has a lifecycle policy that automatically deletes records older than 31 days.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Multi‑VC Configuration
&lt;/h2&gt;

&lt;p&gt;In a multi‑Virtual Cluster environment, you must manually bind the &lt;code&gt;import_audit_log&lt;/code&gt; event to the correct VC. Without this step, the VC's audit logs won't be collected.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Find the VC ID&lt;/span&gt;
&lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="n"&gt;VCS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Update the event's VC (run in the gbase database)&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;gbase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;vc_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'vc00001'&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'import_audit_log'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Operational Best Practices
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Query &lt;code&gt;audit_log_express&lt;/code&gt; for daily audits&lt;/strong&gt; — it's the complete, query‑friendly view without touching production nodes.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Use &lt;code&gt;TRUNCATE SELF audit_log&lt;/code&gt;&lt;/strong&gt; to safely clear local logs.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;In multi‑VC setups, always verify the &lt;code&gt;vc_id&lt;/code&gt;&lt;/strong&gt; of &lt;code&gt;import_audit_log&lt;/code&gt; right after deployment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This design gives your &lt;strong&gt;gbase database&lt;/strong&gt; a lightweight yet powerful audit pipeline: write locally, query globally, and age automatically. It's a solid foundation for compliance and operational visibility in any GBASE environment.&lt;/p&gt;

</description>
      <category>gbase</category>
      <category>database</category>
      <category>数据库</category>
      <category>operations</category>
    </item>
    <item>
      <title>Creating SM4-Encrypted Tables in GBase 8a and Managing Encryption Keys</title>
      <dc:creator>Michael</dc:creator>
      <pubDate>Mon, 18 May 2026 15:12:00 +0000</pubDate>
      <link>https://dev.to/michaelfv/creating-sm4-encrypted-tables-in-gbase-8a-and-managing-encryption-keys-1mdf</link>
      <guid>https://dev.to/michaelfv/creating-sm4-encrypted-tables-in-gbase-8a-and-managing-encryption-keys-1mdf</guid>
      <description>&lt;p&gt;GBase 8a MPP Cluster supports the SM4 national cipher for transparent data encryption at the storage level. To create an SM4-encrypted table, you follow a three-step process: pick the algorithm, create a certificate, and define the table. The encryption key is managed through a centralized certificate and distributed as files across all cluster nodes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Choose the SM4 Algorithm
&lt;/h2&gt;

&lt;p&gt;Set the cluster-wide encryption algorithm to SM4. Once set, it cannot be changed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="k"&gt;GLOBAL&lt;/span&gt; &lt;span class="n"&gt;gbase_encrypt_new_mode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'SM4'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 2: Create an Encryption Certificate
&lt;/h2&gt;

&lt;p&gt;Certificates can be &lt;strong&gt;plain&lt;/strong&gt; or &lt;strong&gt;ciphered&lt;/strong&gt;. Ciphered certificates require a password and offer stronger security.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;ENCRYPTION&lt;/span&gt; &lt;span class="n"&gt;CERTIFICATE&lt;/span&gt; &lt;span class="n"&gt;my_sm4_cert&lt;/span&gt;
    &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CIPHERED&lt;/span&gt;
    &lt;span class="n"&gt;PASSWORD&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'YourStrongPassword123!'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The certificate file is automatically distributed to every node's &lt;code&gt;$GBASE_HOME/config/&lt;/code&gt; directory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Create the Encrypted Table
&lt;/h2&gt;

&lt;p&gt;Use the &lt;code&gt;ENCRYPT&lt;/code&gt; keyword for table-level or column-level encryption.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Table-level: encrypts all columns&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;encrypted_whole_table&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sensitive_data&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;create_time&lt;/span&gt; &lt;span class="nb"&gt;DATETIME&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ENCRYPT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Column-level: encrypts only specific columns&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;customer_info&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;id_card_no&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ENCRYPT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;phone&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ENCRYPT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; Encryption must be defined at table creation time; it cannot be added or removed via &lt;code&gt;ALTER TABLE&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Ciphered Certificates
&lt;/h2&gt;

&lt;p&gt;Before running DML on encrypted columns, you must open the certificate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;OPEN&lt;/span&gt; &lt;span class="n"&gt;ENCRYPTION&lt;/span&gt; &lt;span class="n"&gt;CERTIFICATE&lt;/span&gt; &lt;span class="n"&gt;my_sm4_cert&lt;/span&gt; &lt;span class="n"&gt;PASSWORD&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'YourStrongPassword123!'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Closing the certificate will cause all subsequent DML on encrypted columns to fail.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Management and Storage
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Key generation &amp;amp; storage&lt;/strong&gt;: The key is generated when the certificate is created and stored inside the certificate file. A copy of that file lives on every cluster node.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Certificate types&lt;/strong&gt;: Plain (no password) or Ciphered (password-protected). You can convert between them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Management tasks&lt;/strong&gt;: Backup the certificate file, change the password, query certificate status via &lt;code&gt;information_schema.encryption_certificate_status&lt;/code&gt;, and open/close the certificate as needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security model&lt;/strong&gt;: One cluster‑wide certificate. For ciphered certificates, the password is the user's responsibility — the system does not store or recover it. Protect the certificate file with OS permissions (e.g., &lt;code&gt;chmod 600&lt;/code&gt;) and ensure the password meets the cluster's password policy.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Important Restrictions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Once the encryption algorithm is set, it cannot be changed.&lt;/li&gt;
&lt;li&gt;Encrypted tables do not support DBLink, and encrypted columns cannot be used as distribution keys.&lt;/li&gt;
&lt;li&gt;The performance overhead is minimal — typically less than 5%.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This design gives you strong, transparent encryption for your &lt;strong&gt;gbase database&lt;/strong&gt; while keeping key management clean and compliant with enterprise security requirements. If you're storing sensitive data in GBASE's MPP platform, SM4 encryption is a powerful tool to have in your security toolkit.&lt;/p&gt;

</description>
      <category>gbase</category>
      <category>database</category>
      <category>数据库</category>
      <category>security</category>
    </item>
    <item>
      <title>Dynamic Data Masking in GBase 8a: How It Works and How to Use It</title>
      <dc:creator>Michael</dc:creator>
      <pubDate>Mon, 18 May 2026 14:55:00 +0000</pubDate>
      <link>https://dev.to/michaelfv/dynamic-data-masking-in-gbase-8a-how-it-works-and-how-to-use-it-4mnj</link>
      <guid>https://dev.to/michaelfv/dynamic-data-masking-in-gbase-8a-how-it-works-and-how-to-use-it-4mnj</guid>
      <description>&lt;p&gt;GBase 8a MPP Cluster implements &lt;strong&gt;dynamic data masking&lt;/strong&gt; — the original data never changes on disk. Instead, the database applies masking rules in real time during queries, based on the user's permissions. This post explains the three‑part mechanism: column attributes, built‑in masking functions, and the &lt;code&gt;UNMASK&lt;/code&gt; privilege.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Dynamic Masking Works
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Define Masked Columns
&lt;/h3&gt;

&lt;p&gt;Use the &lt;code&gt;MASKED WITH&lt;/code&gt; clause in DDL to attach a masking function to a column:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;customer&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;MASKED&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'default()'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;phone&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;MASKED&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'partial(&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s1"&gt;***&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s1"&gt;, 3, 4)'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;MASKED&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'keymask(&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s1"&gt;@&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s1"&gt;, &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s1"&gt;****&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s1"&gt;, 0)'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also add masking to an existing column with &lt;code&gt;ALTER TABLE ... MODIFY COLUMN&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Five Built‑in Masking Functions
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;Data Type&lt;/th&gt;
&lt;th&gt;Example (Original → Masked)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;default()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Any&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;'Brad Stevens'&lt;/code&gt; → &lt;code&gt;'XXXX'&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;random(start, end)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Numeric&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;42&lt;/code&gt; → random value in range&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;partial(prefix, padding, suffix)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;String&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;'Hello'&lt;/code&gt; (keeps first and last char, fills rest)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sha()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;String&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;'Hello'&lt;/code&gt; → SHA hash&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;keymask(substr, padding, pos)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;String&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;'gbase@gbase.cn'&lt;/code&gt; → &lt;code&gt;'****@gbase.cn'&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  3. The UNMASK Privilege — Who Sees What
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Without &lt;code&gt;UNMASK&lt;/code&gt;&lt;/strong&gt;: the user sees the masked result.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;With &lt;code&gt;UNMASK&lt;/code&gt;&lt;/strong&gt;: the user sees the original value.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="n"&gt;UNMASK&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;db_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;table_name&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;user_name&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="s1"&gt;'host'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;REVOKE&lt;/span&gt; &lt;span class="n"&gt;UNMASK&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;db_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;table_name&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;user_name&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="s1"&gt;'host'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Dynamic vs. Static Masking
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;GBase 8a Dynamic&lt;/th&gt;
&lt;th&gt;Static Masking&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Original data untouched&lt;/td&gt;
&lt;td&gt;Data permanently replaced&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;When it happens&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Query time&lt;/td&gt;
&lt;td&gt;ETL / offline batch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Flexibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High — different views per user&lt;/td&gt;
&lt;td&gt;Low — same masked view for all&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Primary use&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Production real‑time compliance&lt;/td&gt;
&lt;td&gt;Test / dev data provisioning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Built‑in support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes, via DDL and privileges&lt;/td&gt;
&lt;td&gt;Requires external ETL tools&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Because the underlying columnar storage never changes, dynamic masking in a &lt;strong&gt;gbase database&lt;/strong&gt; keeps your analytical workloads fast while meeting security requirements. It's a native, low‑overhead way to protect sensitive data in GBASE's MPP platform.&lt;/p&gt;

&lt;p&gt;If you're working with a &lt;strong&gt;gbase database&lt;/strong&gt; in production, consider enabling dynamic masking on PII columns — your compliance team will thank you.&lt;/p&gt;

</description>
      <category>gbase</category>
      <category>database</category>
      <category>数据库</category>
      <category>datamasking</category>
    </item>
    <item>
      <title>Monitoring and Safely Killing Sessions in GBase 8a</title>
      <dc:creator>Michael</dc:creator>
      <pubDate>Mon, 18 May 2026 14:27:28 +0000</pubDate>
      <link>https://dev.to/michaelfv/monitoring-and-safely-killing-sessions-in-gbase-8a-3ppe</link>
      <guid>https://dev.to/michaelfv/monitoring-and-safely-killing-sessions-in-gbase-8a-3ppe</guid>
      <description>&lt;p&gt;Managing user sessions is a core DBA skill in any &lt;strong&gt;gbase database&lt;/strong&gt;. When performance dips or resources are tight, knowing how to list connections and terminate problematic ones safely keeps your cluster running smoothly.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Viewing All Connected Sessions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Option A: SHOW PROCESSLIST
&lt;/h3&gt;

&lt;p&gt;A quick overview of threads on the current coordinator.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="n"&gt;PROCESSLIST&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key columns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Id&lt;/strong&gt; – the unique session identifier, used later with &lt;code&gt;KILL&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User&lt;/strong&gt;, &lt;strong&gt;Host&lt;/strong&gt; – who is connected and from where.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time&lt;/strong&gt; – seconds the session or current query has been running.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State&lt;/strong&gt; – e.g., &lt;code&gt;Sleep&lt;/code&gt; (idle), &lt;code&gt;Query&lt;/code&gt; (executing).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Info&lt;/strong&gt; – the running SQL; empty means the session is idle.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Option B: information_schema.processlist (Recommended)
&lt;/h3&gt;

&lt;p&gt;This view supports filtering and sorting, making it the go‑to for analysis.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;host&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;info&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;information_schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;processlist&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s1"&gt;'Sleep'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use it to quickly spot long‑running, resource‑heavy sessions.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Safely Terminating a Problem Session
&lt;/h2&gt;

&lt;p&gt;Before running any kill command, identify the right session and choose the least disruptive method.&lt;/p&gt;

&lt;h3&gt;
  
  
  Termination Options
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Effect&lt;/th&gt;
&lt;th&gt;When to Use&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;KILL CONNECTION &amp;lt;id&amp;gt;&lt;/code&gt; (or &lt;code&gt;KILL &amp;lt;id&amp;gt;&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Disconnects the session, rolls back uncommitted work&lt;/td&gt;
&lt;td&gt;Frozen sessions, unresponsive connections&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;KILL QUERY &amp;lt;id&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Cancels only the current SQL statement, keeps the connection alive&lt;/td&gt;
&lt;td&gt;A runaway query that doesn't need a full disconnect&lt;/td&gt;
&lt;td&gt;Lower&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;PAUSE &amp;lt;id&amp;gt;&lt;/code&gt; / &lt;code&gt;CONTINUE &amp;lt;id&amp;gt;&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Temporarily pauses / resumes execution&lt;/td&gt;
&lt;td&gt;Quick resource relief for diagnostics&lt;/td&gt;
&lt;td&gt;Minimal&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Terminate the entire session&lt;/span&gt;
&lt;span class="n"&gt;KILL&lt;/span&gt; &lt;span class="mi"&gt;5593&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Cancel only the running query&lt;/span&gt;
&lt;span class="n"&gt;KILL&lt;/span&gt; &lt;span class="n"&gt;QUERY&lt;/span&gt; &lt;span class="mi"&gt;5593&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Safe Procedure
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Identify&lt;/strong&gt;: find queries running longer than 5 minutes.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;   &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;info&lt;/span&gt;
   &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;information_schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;processlist&lt;/span&gt;
   &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Query'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;
   &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Evaluate&lt;/strong&gt;: check &lt;code&gt;State&lt;/code&gt; (&lt;code&gt;Sending data&lt;/code&gt;, &lt;code&gt;Sorting result&lt;/code&gt;) and review the SQL in &lt;code&gt;Info&lt;/code&gt; for anomalies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kill&lt;/strong&gt;: prefer &lt;code&gt;KILL QUERY&lt;/code&gt; to preserve the connection; escalate to &lt;code&gt;KILL CONNECTION&lt;/code&gt; only if needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verify&lt;/strong&gt;: run &lt;code&gt;SHOW PROCESSLIST&lt;/code&gt; again to confirm the session is gone or its state has changed.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  3. Important Notes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Privileges&lt;/strong&gt;: &lt;code&gt;PROCESS&lt;/code&gt; is required to view all sessions; &lt;code&gt;SUPER&lt;/code&gt; (or equivalent) is needed for &lt;code&gt;KILL&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi‑VC clusters&lt;/strong&gt;: switch to the target VC first:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;  &lt;span class="n"&gt;USE&lt;/span&gt; &lt;span class="n"&gt;VC&lt;/span&gt; &lt;span class="n"&gt;vc_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="n"&gt;PROCESSLIST&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prevention&lt;/strong&gt;: use resource pools to cap per‑query resources, set &lt;code&gt;task_running_timeout&lt;/code&gt; to auto‑cancel long queries, and leverage GDOM for visual monitoring.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The golden rule for terminating sessions: &lt;strong&gt;diagnose first, then act; use &lt;code&gt;KILL QUERY&lt;/code&gt; whenever possible&lt;/strong&gt;. This keeps your &lt;strong&gt;gbase database&lt;/strong&gt; healthy while minimizing business disruption.&lt;/p&gt;

</description>
      <category>gbase</category>
      <category>database</category>
      <category>数据库</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
