Charles Wu for OceanBase User Group

Posted on Jun 28

How Does an Enterprise-Grade Query Optimizer Keep HTAP Workloads Accurate, Fast, and Stable Over Time?

#database #sql #dataengineering #distributedsystems

Today, many enterprises treat “running real-time analytics directly on transactional data” as a baseline capability: the same dataset must support high-concurrency point lookups and short transactions, while also delivering reports, risk-control rule evaluation, and operational dashboards within minutes or even seconds.

This is not merely a question of whether something is “technically possible.” Enterprises want to minimize the servers, storage, synchronization, and operational overhead that come with building and maintaining a separate lakehouse pipeline just for analytics. At the same time, they want to shorten the path from when transactional data is generated to when analytical results become available, so that reporting, risk control, and business decisions can stay as close to real time as possible.

In many enterprise deployments, transactional and analytical systems are built separately. But as analytics increasingly emphasizes the timeliness of results while overall system cost must be kept in check, this model introduces new problems: data has to be moved and synchronized across multiple systems, pipelines grow longer, consistency governance becomes more complex, and analytical results are more likely to lag behind the true state of the business.

For exactly this reason, users rarely think of the word “optimizer” first — until a familiar yet unsettling pattern starts showing up in production: the same kind of SQL, on the same hardware, sometimes runs fast and sometimes runs so slowly it feels like a completely different system.

When troubleshooting, people tend to first suspect network jitter, disk bottlenecks, or a sudden traffic spike during a particular window. But in HTAP scenarios, a significant share of performance fluctuations can ultimately be traced back to the same root cause: the execution plan no longer fits the current data distribution, statistics, and workload state.

What an enterprise-grade query optimizer must solve is precisely this: continuously generating more reasonable execution plans in such a mixed environment — letting analytical queries complete at an acceptable resource cost, while avoiding situations where a wrong access path, a poor distributed execution strategy, or an unreasonable degree of parallelism drags every other workload on the same cluster into jitter.

Why HTAP Scenarios Are More Prone to Execution Plan Problems

Why SQL in HTAP Scenarios Runs “Sometimes Fast, Sometimes Slow”

Most slow SQL in production is not caused by “having no optimizer,” but rather by “the optimizer choosing the wrong plan under current conditions.”

The most typical cases include:

After data volume grows, the system still uses a join order or access path that was only suitable for a smaller data scale;
After the data distribution becomes skewed, the cost model still estimates as if the distribution were uniform;
After statistics are refreshed, the plan switches to a path that is theoretically better but performs worse in practice.

In HTAP scenarios, these deviations get amplified: response-time fluctuations of short queries turn into long tails, while analytical queries may — due to a poor distributed strategy or degree of parallelism — consume extra shared resources, ultimately stripping “real-time analytics” of its real-time nature.

Therefore, the first step in performance optimization is not “making one particular SQL run a bit faster,” but making plan selection as explainable and reproducible as possible: why columnar storage was chosen this time, why a redistribution was needed this time, and why the degree of parallelism is N rather than M.

Why HTAP Makes “Plan Selection” Harder Than Pure TP or AP

In transaction-oriented (TP) workloads, each query touches little data, has tight response-time requirements, and follows highly repetitive execution patterns; the goal is typically low latency and path stability. In analytics-oriented (AP) workloads, queries more often involve large-range scans, multi-table joins, aggregation, sorting, and window functions, and rely more heavily on columnar storage, distributed execution, and parallelism; the goal is typically throughput and execution efficiency for complex queries.

HTAP stacks both on top of the same engine, the same database objects (indexes, partitions, and so on), and the same set of statistics. This means the optimizer cannot simply answer “is this operator fast or not”; within a unified cost framework, it must simultaneously handle two completely different plan-selection problems:

For short queries: row store or index path;
For large queries: columnar scan or another access path;
Under distributed execution: stay local or perform data redistribution;
How to choose the join algorithm;
Serial execution or parallel execution.

What makes it even trickier is that the runtime environment is always changing: data growth, distribution skew, statistics refreshes, and version upgrades all change the answer to “which plan is cheaper.” Once row-count estimation or cost evaluation goes wrong, every subsequent choice compounds in the wrong direction, eventually manifesting as query latency jitter, wasteful amplification of CPU and I/O, and even mutual interference with other workloads during peak hours.

One boundary needs to be made clear here: the fundamental competition between transactions and analytics over physical resources is governed by system-level capabilities such as resource isolation, quotas, and scheduling. The optimizer’s value is to reduce the extra amplification caused by plan errors — not to replace system-level governance.

How OceanBase Selects, Runs, and Stabilizes Plans Well

Unified CBO: How OceanBase Handles TP and AP Within a Single Framework

OceanBase does not build two separate, unrelated optimizers for TP and AP. On top of the same CBO (Cost-Based Optimizer) framework and the same set of database objects — statistics, indexes, partitions, and so on — OceanBase selects different but cost-comparable execution plans for different SQL shapes.

For TP-leaning queries, the optimizer focuses more on path stability, index and table-lookback costs, and short-query response time. For AP-leaning queries, the optimizer brings columnar scans, pruning effectiveness, distributed data movement, join algorithms, and parallel costs all into consideration.

From an implementation standpoint, the OceanBase query optimizer follows the cost-based System-R approach: it enumerates and evaluates the cost of base-table access paths, join orders, join algorithms, and other operator combinations to generate the final execution plan. As a distributed HTAP engine, OceanBase’s main difference lies here: distributed properties and parallel properties must be considered alongside local operator costs from the plan-generation stage onward. Distributed and parallel properties are not add-on information tacked on at execution time — they are part of the plan’s cost.

While fully inheriting the unified CBO framework, OceanBase has made targeted enhancements for the parts of analytical queries that are more common and more prone to amplifying cost deviations. Viewed against the main line of optimizer work, these capabilities essentially solve three things: first select the plan accurately, then make the plan run well, and finally keep the plan stable.

Selecting the Plan Accurately: Row Estimation, Statistics, and Path Selection

For the optimizer, the prerequisite for “selecting accurately” is to estimate data scale and selectivity as accurately as possible first. Whether it’s the access path, join order, or subsequent query rewriting and parallel decisions, everything is ultimately built on cost evaluation; and whether cost evaluation can be trusted depends first on the quality of row estimation and statistics.

In HTAP scenarios where data changes continuously, relying solely on static statistics is often not enough. To address this, OceanBase combines online statistics collection, dynamic sampling, and storage-layer row estimation into its cost model, so that judgments of “row count and selectivity” stay closer to the real data state.

The core role of dynamic sampling is to improve estimation quality when statistics are insufficient or inaccurate, providing more reliable input to the CBO;
Storage-layer row estimation stays closer to the real data distribution, helping improve the accuracy of access-path selection;
Online statistics collection is used to reduce how far statistics lag behind the real data.

In particular, OceanBase’s row-estimation method — based on logical rows and physical rows — can account for both incremental and baseline data, unlike traditional approaches that rely more heavily on static statistics. This brings the statistical profile closer to a real-time state, and it also handles predicate dependencies more naturally in composite-index scenarios. For example, when an index (a, b) encounters a condition like a = 1 and b = 1, it can obtain an estimate close to the true selectivity without over-relying on extra multi-column histograms or complex compensation logic.

Only once row estimation becomes more trustworthy can the optimizer truly do a good job of “selecting the path accurately.” This matters especially in HTAP scenarios, where the same semantics can often be served either by a row-store index path or by a columnar scan path, and the optimizer must make an explainable comparison within a unified cost framework. For columnar scans, if the benefit of a Skip Index cannot be correctly evaluated, it’s easy to fall into a “looks cheap, actually expensive” path mis-selection; for row-store and index paths, row-estimation errors can likewise inflate an access that should have been local into a far more costly execution method.

The same logic applies to complex SQL. The value of query rewriting is not “the more rules the better,” but in turning SQL into a structure that more easily yields a globally optimal plan — providing a better starting point for subsequent path selection, join ordering, and parallel decisions. In other words, accurate estimation is not the finish line; it is the prerequisite that lets the entire chain of downstream decisions start from the right place.

Running the Plan Well: Distributed Global Cost and Parallel Decisions

In a Shared-Nothing architecture, the plan that is “cheapest on a single node” is often not the one that is “cheapest for the cluster.” Once data is spread across multiple nodes, joins and aggregations may introduce broadcasts, redistributions, and multiple rounds of data movement. These costs are often insignificant in a single-node model, but at the cluster level they can become the dominant expense.

Therefore, the real problem a distributed database optimizer must solve is not “can it generate a distributed plan,” but can it bring distribution properties, data movement, and operator implementations into a global comparison at the plan-generation stage. This determines whether it falls into the trap of “locally optimal, globally inefficient.”

From an optimization-framework perspective, distributed plan optimization is harder than local optimization in three main ways:

The operator implementation space is larger
Take Hash Join as an example: in a distributed environment, different data distributions correspond to different distributed implementation algorithms, giving a far larger choice space than single-node algorithms.
Physical properties are more complex
Beyond sort properties, a distributed plan must also maintain properties such as partition information and data location, which directly determine whether a given operator can adopt a particular distributed execution method.
Parallelism and partitioning further expand the search space
Partition pruning, intra-partition parallelism, inter-partition parallelism, and degree-of-parallelism selection all add to optimization complexity.

To address these issues, one important direction for OceanBase is to bring distributed properties into plan selection as early as possible, rather than simply “generating a locally optimal plan first and then bolting on a distributed execution method.”

Closely related to distributed planning is parallel decision-making. Parallelism can indeed shorten the wall time of complex queries, but a higher degree of parallelism is not always better. Too much parallelism amplifies CPU, memory, and system jitter; too little leaves long queries unable to finish for a long time. So letting the optimizer balance “whether to parallelize and how much” between response time and resource cost is more aligned with engineering reality than relying on fixed hints across large numbers of SQL statements.

This is exactly the point of OceanBase’s optimizer Auto DOP (Automatic Degree of Parallelism) capability: the optimizer automatically decides “whether to parallelize and what degree to use” based on query cost and resource state. It does not simply chase a shorter single-query response time; it brings parallelism itself into the plan cost, aiming to reduce “bad parallelism” within the given resource constraints.

Columnar storage and Skip Index mainly solve “read less, read the right things.”
Distributed planning and parallel decisions mainly solve “how to move less data and how to expand operators.”

Together, the two form a more complete real-time analytical execution-plan capability for HTAP scenarios.

Keeping the Plan Stable: Caching and Evolution Mechanisms for High-Frequency SQL

If row estimation, distribution, and parallelism mainly solve “selecting and running the plan well,” then for an enterprise-grade system there is a third thing to consider: can the plan stay stable over the long term.

For complex analytical queries, the core problems usually remain row estimation, distribution, and parallel selection themselves; plan caching and SPM are not the default main path. But for highly reusable SQL — especially high-concurrency, low-latency business requests — plan stability becomes a very practical concern. Changes in statistics, growth in data scale, and version upgrades all lead the optimizer to generate new plans, and a “theoretically better” new plan may not actually perform better under real workloads.

For scenarios like these, OceanBase has built a fairly complete plan-caching and plan-evolution mechanism.

First is parameterized plan caching.
For high-concurrency workloads, generating and caching a separate plan for every specific parameter is both costly and impractical. The value of parameterized caching is to let a large number of SQL statements that share the same shape but differ in parameters reuse the same plan, keeping the execution overhead low when a cached plan is hit.

But parameterized caching does not mean “one shared plan is always optimal for all parameters.” In real business, the data scale corresponding to different parameters can vary enormously. For example, even for the same task — “compute a merchant’s sales over the past year” — a large merchant may be better served by a main-table scan, while a small merchant is better served by an index. Forcing the same plan to be reused in both cases leads to obvious parameter-sensitivity problems. In other words, plan caching solves “can plans be reused efficiently,” but it does not automatically solve “whether this plan will remain suitable for all parameters and the current data state over the long term.”

At this point, the question evolves from “whether to cache the plan” into “when should the plan change, and can that change happen safely.” Building on this, the plan-evolution mechanism (SPM) further addresses “can a new plan be switched in safely.”
For highly reusable SQL, the core value of SPM is not to boost analytical performance, but to turn plan stability from a matter of human experience into mechanized governance: a new plan is not immediately and fully adopted the moment it is generated; instead, it is validated against real business traffic. Only when validation shows it is genuinely better than the baseline plan will subsequent SQL continue to use the new plan; otherwise, execution falls back to the baseline plan.

So SPM is more accurately positioned as: a means of governing plan stability for highly reusable SQL. It is part of the optimization loop, but it is not the main path for improving complex analytical performance.

From “Selecting the Right Plan” to “Keeping the Plan Under Control”

A query optimizer generally faces two structural constraints. First, statistics, dynamic sampling, and cost parameters describe “the data from some earlier period,” while the query happens “right now.” Second, search and decision time is limited — even within a mature framework like System-R, it is impossible to enumerate all equivalent plans and find the global optimum within an acceptable time.

Given this, “sometimes choosing the wrong plan” is not a weakness unique to any one product, but the normal state of the CBO problem itself. OceanBase weighs row-store versus columnar paths, distributed data, and the degree of parallelism within the same “estimate — compare — select” logic: row-estimation errors, join order, shuffle, and parallel decisions can all drift off course in a chain reaction; HTAP stacking (TP and AP sharing statistics and database objects on the same engine) only makes it easier for “sensitive SQL” to expose plan jitter during peaks or statistics-refresh windows.

The maturity of an enterprise-grade optimizer should not be measured by a single benchmark. What matters more is whether, when a plan no longer matches the real workload, the execution plan can be turned into a process that is observable, intervenable, evolvable, and reversible. Beyond “selecting plans,” OceanBase also shares the responsibility of “governing plans” together with the execution side, database objects (indexes, partitions, and so on), and the operations toolchain: letting plans be continuously corrected as data, versions, and business evolve, rather than treating planning as a one-time result.

The common sources of “choosing the wrong plan” fall into three categories: row estimation deviating from the data state (changes in table size, data skew, or correlated predicates causing inaccurate estimation; the first response should be refreshing statistics and adjusting the collection strategy, not jumping to hints); parameter sensitivity of high-frequency SQL (parameterized caching does not guarantee all parameters are equally well served, requiring coordination with business data traffic and the timing of baseline establishment); and database objects (indexes, partitions, and so on) or version upgrades (SPM compares new and old plans under real traffic, rather than switching the entire cluster instantaneously after a single statistics task).

OceanBase handles execution-plan risk through a combination of runtime reuse, switch-time governance, and manual intervention during troubleshooting.

At runtime, parameterized plan caching is used first to reduce hard-parsing overhead, letting SQL with the same structure but different parameters reuse existing plans as much as possible. If parameter distributions vary widely and a single plan cannot cover all scenarios, further mechanisms such as SPM, adaptive plan matching, or targeted hints are needed to constrain plan behavior.

Among these, SPM’s role is “validate the new plan first, then promote it.” A candidate new plan does not immediately replace the existing one; instead, it enters the baseline-and-evolution process and is validated under real traffic before deciding whether to switch. Through sql_plan_management_mode, the system can also make policy trade-offs between "online evolution" and "baseline plans."

For problems that have already been clearly diagnosed, Outline / Hint are better suited as local correction tools, used to pin down access paths, join orders, or parallel strategies. During the diagnosis phase, tools such as EXPLAIN, SQL Trace, and DBMS_XPLAN can help locate execution-plan problems.

Therefore, the OceanBase optimizer cannot guarantee that every SQL statement uses the optimal plan on every execution. Instead, it places plans within a continuously governable process: at runtime it relies on plan caching to reduce repeated optimization cost; when plans change, it controls switching risk through SPM; and after problems surface, it uses hints, outlines, and diagnostic tools to locate and correct them.

From Capabilities to Scenarios: Where the Optimizer Lands in Typical HTAP / Real-Time Analytics Workloads

The “select accurately, run well, stay stable” described above is not an abstract optimizer slogan; it translates into different benefit priorities across different business workloads. For enterprise systems, the value of the optimizer is often demonstrated not through a single benchmark, but through whether it can reliably support several typical query scenarios.

Complex queries and financial core scenarios: the focus is whether global cost is trustworthy

In scenarios like financial cores and real-time risk control, queries are often not simple point lookups; they need to combine transaction data, account information, rule conditions, market data, or credit-limit information to make high-frequency judgments. Such queries are characterized by many joined tables, strong real-time requirements, and sensitivity to result latency, while also being easily affected by data distribution and cross-node access.

In these scenarios, the most critical capability of the OceanBase optimizer is not making a particular operator locally optimal, but whether it can form a more trustworthy global cost judgment at the plan-generation stage. Whether statistics and row estimation are accurate determines whether the access path and join order are reasonable; whether the distributed plan considers data location and network cost early enough determines whether cross-node latency keeps getting amplified in high-frequency queries. For these mixed TP-and-real-time-risk-control queries, the value of the OceanBase optimizer shows up first in keeping complex joined queries stably within an acceptable latency, rather than letting plan fluctuations become a source of tail latency.

HTAP / real-time analytics scenarios: the focus is whether row/columnar paths, distribution, and parallel decisions can be handled together within a unified framework

In more analytics-leaning workloads such as reporting, operational analytics, and real-time data warehouses, the problems the optimizer faces become even more complex, because the same dataset may serve both high-concurrency short queries and large-range scans, multi-table joins, and aggregation analytics.

The OceanBase optimizer solves these problems within the same CBO decision framework: row/columnar path selection, analytical cost rewriting, distributed plan generation, Auto DOP, and — when necessary — plan-stability governance can all happen in coordination within a unified optimization process.

High-concurrency write scenarios: the optimizer’s focus lies more in stability governance for highly reusable SQL

For more OLTP-leaning high-concurrency write scenarios, the priorities differ from analytical workloads. Here, plan caching, parameterized matching, execution-plan stability, and the commit efficiency of the write path itself usually deserve more attention. Capabilities like Plan Cache are more directly relevant to typical transactional write paths; the performance of complex analytical SQL, on the other hand, often still needs to be considered together with sharding design, index design, and specific query patterns.

This precisely illustrates that the value of the OceanBase optimizer does not lie in “solving every scenario with the same single point,” but in whether it can, for different workloads, combine capabilities like row estimation, path selection, distributed execution, parallelism, and stability governance into plan-selection logic suited to that scenario.

The Key to the OceanBase Optimizer Is a Closed Optimization Loop Within a Unified Framework

Let’s return to the question from the beginning of this article: why does the same kind of SQL run “sometimes fast, sometimes slow” within the same system?
The real answer often lies not in whether some single feature exists, but in whether the system has a mechanism that can continuously generate reasonable plans under continuously changing data and workload conditions.

The OceanBase optimizer, with distributed-native + HTAP-integrated as its through-line, completes a full closed loop within a unified CBO framework:

Through statistics, dynamic sampling, storage-layer row estimation, and other capabilities, it makes cost judgments as trustworthy as possible;
Through row/columnar path selection, complex SQL rewriting, distributed plan generation, and parallel decisions, it selects and runs plans well;
Through parameterized plan caching, adaptive plan matching, and SPM, it brings stability governance for highly reusable SQL into a systematic mechanism.

This is the true value of an enterprise-grade query optimizer in HTAP and real-time analytics scenarios: it does not just make a single SQL run faster, but keeps the system able to maintain performance, resource efficiency, and stability within an explainable, governable, and dependable range — even under complex queries, distributed execution, and continuously changing data conditions.

DEV Community