Charles Wu

Posted on Jun 28

Bridging TP and AP: How OceanBase Materialized Views Power HTAP Workloads

#analytics #ai #database

In real-time analytics, the challenge is no longer just making a query fast enough. Enterprises must satisfy three requirements simultaneously: stable transactional writes, efficient analytical queries, and a data pipeline that is neither too long nor too heavy. The traditional approach — TP systems handle online transactions, then CDC, message queues, ETL, or stream/batch jobs move data into a separate analytics system for dashboards and ad-hoc queries — has worked for years, but its costs are increasingly obvious: long pipelines, operational complexity, and poor real-time consistency.

This is especially true in e-commerce, retail, and ERP scenarios. Common analytical needs — city-level sales, category profit analysis, user profiling, operational reports — typically join orders, products, customers, stores, and campaigns across many tables, layered with aggregation, sorting, and filtering. As data volumes grow, repeatedly running these queries against base tables amplifies compute pressure and forces analytical workloads to compete with online transactions for CPU, I/O, and memory.

Once HTAP became the mainstream architecture direction, the key question shifted from “make a query fast once” to: how do you move high-frequency, complex, reusable analytical computation out of query time — without extending the data pipeline — and provide AP with stable, reusable, maintainable data artifacts.

In this context, materialized views (MV) in OceanBase are not merely query accelerators. They serve as a bridge between TP and AP. They transform logic like multi-table joins, aggregation rollups, and layered processing into physically stored result sets that the database continuously maintains. This creates a shorter, more stable path between “TP continuously producing changes” and “AP steadily consuming results.”

1. The Real HTAP Tension: It’s Not Just Slow Queries

HTAP workloads are characterized by two coexisting classes: short transactions with row-level updates, high concurrency, and low-latency sensitivity (TP); and wide scans, multi-table joins, complex aggregations with higher throughput and stability requirements (AP). Enterprises typically choose one of two paths: offload analytics entirely to external systems, or pull analytical capability closer to the online database. Either way, new architectural and operational burdens emerge.

In practice, this tension manifests in three typical problems.

Cross-system inconsistency: The same business fact flows through CDC, messaging, task platforms, data warehouses, and reporting systems — easily producing multiple “versions” of the data. Dashboards disagree, troubleshooting requires cross-system coordination. This isn’t a SQL problem; it’s an architectural complexity problem.

Timeliness vs. processing cost: Moving from T+1 to hourly or minute-level freshness means more frequent incremental syncs, tighter scheduling, and larger resource budgets. Traditional batch processing simply cannot support today’s real-time operational and decision-making demands.

Analytics stealing online resources: Heavy analytical SQL characteristically causes CPU, I/O, memory, and temporary result bloat from scans, joins, and aggregations. When these computations hit during transaction peaks or run on the primary database, the system doesn’t just get “consistently slow” — it jitters: tail latency on critical transactions spikes, and analytical queries themselves become unstable.

The real HTAP challenge, then, is not tuning one more SQL statement. It is simultaneously achieving three things: stable TP writes, complex AP analytics, and a simple overall pipeline. This requires the database not just to execute queries, but to proactively move high-frequency complex computation into the database engine itself.

2. Why OceanBase MVs Bridge TP and AP

Conceptually, materialized views are straightforward: persist the result of a query definition and have the system maintain it according to some refresh strategy. The critical question is not whether a copy is stored, but whether it can shift “analytical computation” from query time to post-write/background maintenance, making the online path and the analytics path engineeringly separable.

This is especially important in HTAP scenarios. The value of an MV is not just making one query faster — it reorganizes where and when data processing occurs. OceanBase materialized views bridge four layers simultaneously:

TP’s continuous changes and AP’s query analytics. Inserts, updates, and deletes on base tables represent business facts entering the system. AP queries prefer to consume pre-organized wide tables, summary tables, or metric tables. MVs internalize the transformation from “continuous changes” to “stable results” — downstream consumers no longer recompute from detail tables every time.
Detail data and analytical results. Transaction tables are optimized for writes, updates, and point lookups. Analytical queries depend on denormalized topic tables, pre-aggregated metric tables, or filtered result tables. MVs pre-organize results that would otherwise be generated dynamically at query time into forms better suited for AP consumption.
Query-time recomputation and background maintenance. If join denormalization, aggregation rollups, and layered processing execute at every query, costs are high, variance is large, and resource contention with transactional traffic is inevitable. MVs shift these high-frequency, complex, reusable computations to the background — queries become “read results” instead of “compute on the fly.”
External pipelines and in-database processing. When part of the ETL/pre-computation capability consolidates inside the database, system components decrease, pipelines shorten, and data consistency and troubleshooting become more centralized.

The proper positioning of OceanBase MVs is therefore not query accelerator but data processing infrastructure: converting frequent, complex, reusable computations into refreshable, reusable materialized tables.

3. Why OceanBase MVs Can Enter the Critical HTAP Path

In traditional databases, materialized views mostly serve reporting or offline analytics optimization. In HTAP and near-real-time processing scenarios, merely supporting MVs is insufficient. What determines whether MVs can enter the critical business path is whether the underlying architecture can sustain them at scale, reliably, over time.

OceanBase’s advantage starts with its distributed HTAP architecture:

Distributed storage and compute: OceanBase MV container tables use distributed sharded storage; refresh operations support distributed parallel execution, fully leveraging cluster resources. Traditional single-node database MVs are limited by single-node performance.
Elastic scalability: As data volumes grow, OceanBase scales MV storage and compute by adding nodes. Traditional single-node databases are constrained by hardware limits.
High availability: OceanBase uses Paxos-based multi-replica storage. MV container tables inherit high availability — if a node fails, the system quickly switches to another replica, ensuring query continuity. A single-node database MV becomes inaccessible on node failure.

These are the foundation. More importantly, OceanBase goes further to make MVs genuinely suitable for high-frequency complex queries in HTAP scenarios.

4. Why OceanBase MVs Fit HTAP Workloads

If distributed architecture, HA, and elastic scalability answer “why OceanBase is suited to host MVs,” the next question is: how do OceanBase MVs actually solve the computations that are truly expensive, truly frequent, and truly worth shifting out of query time in HTAP scenarios?

Not all analytical computation deserves materialization. The best candidates are high-frequency, expensive, logically stable, reusable computations that can tolerate some refresh latency. OceanBase MVs deliver value in the following areas.

4.1 MLOG-Based Incremental Refresh: Bounding Maintenance Cost to Change Volume

In real-time or near-real-time scenarios, the expensive part is not defining an MV — it’s maintaining one. If every data change triggers a full recomputation from base tables, the MV quickly loses engineering value.

OceanBase uses materialized view logs (MLOG) to track base table changes, enabling refresh to operate on incremental data rather than scanning the full table. This keeps MV maintenance cost proportional to “change volume” rather than “table volume.” This is the foundation for OceanBase MVs operating in continuously changing data scenarios: they don’t recompute everything on each refresh — they process only the delta.

4.2 Multi-Table Join and Aggregation Pre-Computation: Converting Expensive Queries into Results

Many slow queries in HTAP scenarios are not one-off incidents — they are stable, frequent, expensive large queries. The two most typical patterns are multi-table join denormalization and aggregation rollups.

In product analytics, user profiling, or order-topic analysis, queries typically join multiple fact and dimension tables. Rather than letting every query do the join at runtime, materializing the denormalization logic into an MV transforms execution from dynamic joins into stable table reads.

Similarly, metrics like GMV, gross margin, DAU, retention, and conversion funnels are essentially repeated group-by aggregations over the same detail data. Materializing these into summary or metric tables via MVs significantly reduces redundant computation while converging metric definitions.

For a few critical but expensive SQL paths that execute repeatedly, MVs can also serve as targeted pre-computation — shifting query-time uncertainty to the background refresh phase, trading controllable maintenance cost for stable query performance.

4.3 Nested MVs and Cascading Refresh: Layered Processing and Near-Real-Time Data Warehousing

In real-time or near-real-time data warehouse scenarios, data naturally layers: detail → topic → report. OceanBase supports nested MVs with cascading refresh, enabling MVs to power not just single-query optimization but entire layered processing pipelines.

This means MVs can express not just a single result table, but an organized, maintainable data processing workflow. Under continuous writes and updates, downstream analytics don’t necessarily require absolute real-time — they typically need results that are “fresh enough, stable enough, and queryable.” MVs serve as the in-database analytics and result layers.

4.4 Query Rewrite and Result-Layer Optimization

For enterprise systems, ease of adoption matters. OceanBase supports query rewrite: when conditions are met, the optimizer automatically redirects base-table queries to materialized views. Applications don’t need to rewrite SQL to benefit from MVs — they gain performance with minimal code changes.

Beyond pre-computation, how results are efficiently consumed also matters. OceanBase treats MVs as manageable objects that can be further combined with columnar storage, indexes, and partitioning to optimize analytical access paths. In other words, MVs are not just results written to disk — they can be further organized around consumption patterns for optimal access.

5. Typical Scenarios: How MVs Land in HTAP Practice

5.1 E-Commerce Flash Sales: Turning Multi-Table Joins into Maintainable Wide Tables

During promotions, operational analytics typically require integrating product master data, sales attributes, campaign product pools, and store information into a unified wide table for pricing analysis, campaign evaluation, and operational decisions.

Once upstream data lands in OceanBase, if downstream analytics perform ad-hoc multi-table joins every time, two instabilities emerge:

Join cost and query latency fluctuate sharply with peak traffic
Analytical queries and other online workloads amplify each other’s jitter

The optimization goal here is not “make a single join faster” but pre-materialize stable, reusable multi-table relationships into a consumable wide table — so AP queries read a processed table rather than repeatedly resolving the same join logic.

The difficulty isn’t just table size — it’s extremely uneven change patterns:

Product master table: ~400K rows total, expected to grow to ~1.8M; peak incremental batch ~100–150K
Product sales attributes table: ~440K rows; peak incremental batch ~2–3K
Campaign product pool table: daily total only ~1K, but campaign switches can produce ~4–5M concentrated changes in a single batch
Final wide table: ~500K–1M rows

Under this pattern, “join at query time” hits three practical problems:

High join cost with large latency variance
Forced fallback to full recomputation quickly escalates cost
Downstream analytics and processing compete for resources during peaks

This is a textbook HTAP bridge requirement: move the most expensive join computation from the shared query window into a controllable background maintenance window.

OceanBase materializes the four-table join (campaign pool, product master, sales attributes, store) into a join-type materialized view, producing a processed wide table for analytical consumption. Downstream queries hit the MV directly — no repeated multi-table joins.

Maintenance strategy:

Daily operation uses incremental refresh
Based on base table change logs, only affected ranges undergo incremental recomputation
“Query-time join” becomes “post-write incremental maintenance”
Full refresh is reserved as a catch-up or rebuild mechanism, not the default path

Result: Join uncertainty removed from query time

With incremental refresh set to ~5-minute intervals, benefits are clear:

Off-peak refresh typically completes in under 1 minute
Full refresh takes ~20 minutes, used primarily for rebuilds or alignment
Even when the campaign product pool produces massive batch changes, incremental maintenance keeps the wide table near-real-time available

For the analytics side, the key benefit is usually not a single peak metric but the query path collapsing from multi-table join to single wide-table access. Execution cost becomes more controllable and latency stability improves markedly. The “bridge” meaning is clear: join uncertainty is transferred from the query side into a schedulable background maintenance window.

5.2 SaaS ERP / Reporting: Consolidating In-Database Processing with Nested MVs

If the e-commerce scenario’s core tension is join computation cost and variance, SaaS ERP / reporting faces a different challenge: processing pipeline stability, metric definition consistency, and long-term maintainability.

In ERP systems, business details are continuously written while reports and analytical queries run continuously. Compared to single-query speed, these scenarios emphasize two things:

Report metric definitions must be stable and explainable
Processing pipelines must run reliably over time

Traditional ETL flows span multiple systems and stages — long pipelines with scattered processing logic. Once orchestration becomes complex, operational costs rise and metric governance and troubleshooting become significantly harder.

For ERP / reporting systems, the core problem is usually not a slow report SQL but three structural challenges:

How TP real-time ingestion and near-real-time processing coexist in the same cluster
How the two workload types avoid mutual interference
How report queries consume stable processed results instead of recomputing from raw details

What truly needs optimization is not just “query execution” but the entire processing and consumption pipeline.

OceanBase uses MVs to consolidate processing layers previously scattered across external ETL pipelines into the database.

Specifically, different data warehouse layers — detail, topic/processing, and report — can each be defined as materialized views, organized via nested MVs and cascading refresh. Each processing layer is continuously maintained internally; final results land in their respective MVs, and reports consume these processed results directly.

Stable operation depends on two key capabilities:

First, cascading refresh of nested MVs.
Bottom-up refresh ordering ensures detail, topic, and report layers maintain consistent data progression — well suited for reporting scenarios with strict metric consistency requirements.

Second, workload isolation between TP writes and MV processing.
In practice, TP data write leaders and MV container table leaders can be placed on different nodes, separating processing from write workloads. Incremental updates read change data from the TP side via MLOG, then complete processing on the MV side — reducing resource conflicts between real-time writes and background processing.

The resulting benefits go beyond “faster reports.” More importantly, the previously long and scattered ETL pipeline consolidates inside OceanBase:

Report queries read pre-computed MV data instead of recomputing in real time — more stable responses
Nested MVs with cascading refresh make each processing layer easier to trace, reproduce, and govern
Workload isolation lets TP ingestion and MV refresh run stably in parallel

For ERP / reporting systems, MVs don’t replace all ETL. They internalize the most critical, most stable, most worth-materializing processing logic within the database. The previously externally-orchestrated pipeline becomes a shorter, more stable, easier-to-operate in-database path. The net benefit is not just faster query responses but also: higher processing efficiency, more stable reports, and reduced overall architectural complexity and operational cost.

6. Conclusion

OceanBase materialized views sit between TP’s continuous changes and AP’s stable consumption. Their value is not in replacing all queries, but in materializing high-frequency, expensive, stable, reusable analytical computations into database-maintained data objects. Only then can real-time analytics evolve from ad-hoc SQL tuning into a truly operable, governable, and continuously evolving data processing path.

Explore OceanBase: Learn, Connect, Try

Thank you for reading. If this article sparked your interest, here’s how to go deeper:

Learn More: Visit oceanbase.com to explore OceanBase’s architecture, features, and real-world customer success stories.
Contact Us: Have specific questions or want to explore how OceanBase fits your needs? Reach out via Contact Us Page.
Follow Our Journey: Share your thoughts in the comments below. Stay updated with OceanBase news and best practices on LinkedIn and X.

Let’s build better data architectures together.

DEV Community