Charles Wu for seekdb

Posted on May 21

From TP to AP: How OceanBase Materialized Views Support HTAP

#database #distributedsystems #softwareengineering #cloudcomputing

Incremental refresh, join MVs, and nested pipelines for hybrid workloads.

In real-time analytics, the question is no longer “Why is this SQL slow?” It’s “Why is our data pipeline fighting itself?” Teams need three things at once: stable transactional writes, efficient analytical queries, and a data path that stays short and light.

The classic split still works on paper: an OLTP stack handles online transactions; CDC, message queues, ETL, or batch/stream jobs copy data into an analytical system for dashboards and ad hoc queries. That pattern has carried enterprises for years — but the bill is getting steeper: long pipelines, heavy operations, and a persistent gap between freshness and reliability.

Retail, e-commerce, and ERP are the usual stress tests. City-level sales rollups, category margin analysis, customer profiles, and executive dashboards routinely join orders, products, customers, stores, and campaigns — then filter, sort, and aggregate at scale. As volume grows, running those queries directly on base tables (or hammering base tables during peak traffic) keeps compute and resource costs climbing. Worse, analytics starts fighting OLTP for CPU, I/O, and memory.

Once HTAP (hybrid transactional/analytical processing) moved from slide decks to production, the goal shifted. It is not only “make this query fast once.” It is moving frequent, complex, reusable analytical work out of the query window — without stretching the business pipeline — and give the analytical side a stable, reusable, maintainable data shape.

That is where materialized views (MVs) in OceanBase matter. They are not just a cache for slow SELECTs. They are a bridge between TP and AP: joins that widen detail data, rollups, and layered transformations become physically stored result sets maintained inside the database, so “TP keeps changing” and “AP keeps reading” connect through a shorter, steadier path.

1. The HTAP tension: it is not only “queries are slow”

HTAP means two workloads share one platform for the long haul:

TP (transaction processing): short transactions, row-level updates, high concurrency, latency-sensitive paths.
AP (analytical processing): wide scans, multi-table joins, heavy aggregation, higher appetite for throughput and predictable runtime.

Enterprises usually pick one of two directions: push analytics entirely to external systems, or pull analytics closer to the operational database. Either way, architecture and ops load can balloon.

In practice, three pain points show up again and again.

Cross-system inconsistency

The same business facts flow through CDC, messaging, job platforms, warehouses, and BI tools. Soon you have several “versions” of truth. Dashboards disagree; incidents span teams; root cause is rarely “bad SQL” alone — it is pipeline complexity.

Freshness vs. processing cost

Moving from T+1 to hourly or minute-level insight means more frequent incremental sync, tighter scheduling, and higher spend. Pure batch alone struggles to keep up with how operations and product teams want to decide now.

Analytics stealing online capacity

Heavy AP queries mean scans, joins, aggregations, and bloated intermediates. When that work lands on the primary cluster — especially during transactional peaks — you do not get “consistently slow.” You get jitter: OLTP tail latency rises while analytics itself stays unstable.

So the HTAP problem is not tuning one statement. It is balancing stable TP ingest, rich AP analysis, and a simple overall chain. The database must do more than execute queries; it must shift repeatable heavy compute into the platform and maintain it there.

2. Why OceanBase MVs bridge TP and AP

A materialized view is easy to define: persist the result of a query and refresh it on a policy. The hard part is not “we stored a copy.” It is whether analytical work moves from query time to post-write / background maintenance, so online and analytical paths separate in engineering terms.

In HTAP, that repositioning is the point. OceanBase MVs sit across four bridges:

Continuous TP change → stable AP reads
Base tables absorb inserts, updates, and deletes — the live record of the business. AP wants wide subject tables, rollups, and metric tables ready to query. MVs turn “always changing detail” into “maintained results” inside the database, so consumers are not recomputing from scratch every time.
Detail data → analytical shape
Transactional tables are built for write and point lookup. Analytics wants denormalized subjects, pre-aggregated KPIs, and shapes that prune filters efficiently. MVs pre-organize what would otherwise be built dynamically at query time.
Query-time recompute → background maintenance
Join widening, aggregation, and layered transforms are expensive and volatile when run on demand — and they compete with traffic. MVs front-load stable, reusable logic so reads trend toward fetching results, not recomputing them live.
External pipelines → in-database processing
When part of ETL and pre-compute moves into OceanBase, you drop components, shorten the path, and centralize consistency and troubleshooting.

The right label is not “query accelerator.” It is infrastructure for a data processing layer: frequent, complex, reusable logic becomes refreshable, reusable materialized tables.

3. Why OceanBase MVs can sit on the critical path

Classic MVs often serve reports or offline tuning. Near-real-time HTAP needs more than “MV support.” The question is whether the architecture can run MVs at scale, continuously, and reliably.

OceanBase starts from a distributed HTAP foundation:

Distributed storage and compute — MV container tables are sharded; refresh can run in parallel across the cluster, unlike single-node ceilings.
Elastic scale — add nodes as data grows; storage and refresh capacity expand with the cluster.
High availability — Paxos-replicated storage; MV data stays available when a node fails, instead of vanishing with one machine.

That is the floor. On top of it, OceanBase tunes MVs for the expensive, high-frequency queries HTAP actually runs.

4. How OceanBase MVs earn their place in HTAP workloads

Distributed scale answers “can we host MVs?” The next question is “which work should MVs own?”

Not every analytical job belongs on an MV. Good candidates are frequent, costly, stable, reusable, and tolerant of bounded refresh lag. OceanBase MVs deliver value in four areas.

4.1 Incremental refresh via MLOG

In real-time or near-real-time systems, the tax is not creating an MV — it is keeping it current. Full recompute on every change destroys the economics.

OceanBase uses a materialized view log (MLOG) on base tables so refresh can target deltas instead of full scans. Maintenance cost tracks change volume, not table size. That is what makes MVs viable on continuously mutating data: each refresh processes what changed, not everything.

4.2 Join and aggregation pre-compute

Many “slow” HTAP queries are not one-offs. They are repeatable heavy queries: multi-table joins and rollups.

Join widening — product analytics, user profiles, order subjects — often stitches facts and dimensions. Materializing the join turns dynamic association into stable table reads. The win is not only shorter SQL; execution shifts from ad hoc join to scan a maintained wide table.

Metrics — GMV, margin, DAU, retention, funnel steps — are the same detail regrouped again and again. MVs pin one definition to a rollup or metric table, cut duplicate work, and help align metrics across teams.

For a few critical, costly paths that run constantly, MVs are targeted pre-compute: uncertainty moves from the query window to a controlled refresh window.

4.3 Nested MVs and cascaded refresh

Near-real-time warehouses are layered: detail → subject → reporting. OceanBase supports nested MVs and cascaded refresh, so MVs express pipelines, not single tables.

Downstream often does not need perfect instant freshness — but it does need results that are fresh enough, stable enough, and always queryable. MVs act as the in-database analytical and serving layers under that bar.

4.4 Query rewrite and consumption tuning

Adoption matters. OceanBase can rewrite eligible queries from base tables to MVs automatically, so applications are not forced to rewrite every statement by hand.

MVs are managed objects. They can combine with columnar storage, indexes, and partitioning so the read path matches how AP actually accesses data. Materialization is not only “precompute and store”. It is also shaping the object for the next thousand reads.

5. Typical landing patterns

5.1 E-commerce peak season: multi-table join → maintained wide table

During peak retail (e.g. Black Friday), ops needs one wide table blending product master, sales attributes, campaign SKUs, and store data — for pricing, promo review, and intraday decisions.

If every dashboard re-joins at query time, two problems compound:

Join cost and latency swing with traffic
Analytics amplifies jitter on the shared operational cluster

The goal is not “tune one join.” It is freeze stable multi-table relationships into a consumable wide result so AP mostly reads one processed table.

Scale and churn (representative):

“Join at query time” hurts in three practical ways:

High, volatile join cost
Painful fallback to full rebuilds
Peak-hour contention between analytics and online traffic

That is classic bridge work: move the costliest association out of the shared query window into a scheduled maintenance window.

OceanBase materializes a join MV across campaign pool, product master, sales attributes, and merchant store — producing an analytical wide table. Downstream queries the MV instead of repeating the join.

Maintenance model:

Incremental refresh by default
MLOG-driven partial recompute on affected ranges
Shift from “join on read” to “maintain on write / schedule”
Full refresh reserved for catch-up or rebuild, not daily ops

With ~5-minute incremental refresh (illustrative):

Off-peak refresh often under 1 minute
Full refresh ~20 minutes for rebuild or alignment
Even huge campaign-pool spikes can stay near-real-time via incremental paths

For analysts, the structural win is path convergence: multi-table join → single wide-table read. Cost becomes more predictable; latency stabilizes. The bridge is explicit: join uncertainty leaves the query path and lands in an orchestrable background window.

5.2 SaaS ERP and reporting: nested MVs shorten in-database processing

If e-commerce peaks stress join cost and volatility, SaaS ERP / reporting stresses pipeline stability, metric definitions, and long-run maintainability.

ERP keeps ingesting detail while reports keep firing. Success is less about one fast SQL and more about:

Stable, explainable report definitions
Pipelines that survive years of operation

Traditional ETL spans many systems and stages. Orchestration grows; ops cost rises; metric ownership and incident triage get harder.

Structural challenges:

TP ingest and near-real-time transforms on one cluster
Keeping workloads from stepping on each other
Reports reading maintained results, not re-aggregating raw detail every time

OceanBase pulls layers that used to live in external ETL into the database as MVs: detail, subject, and reporting tiers as nested materialized views, linked by cascaded refresh, with reports querying the top layers directly.

Two capabilities matter:

Cascaded refresh across nested MVs
Bottom-up order keeps detail → subject → report aligned — important when metric consistency is audited.

Workload isolation between TP writes and MV maintenance
Place TP and MV partition leaders on different nodes where possible; incremental work reads MLOG changes on the TP side and applies transforms on the MV side — reducing collision between real-time ingest and background refresh.

Outcomes go beyond “faster reports”:

Reports read precomputed MV data — more stable response
Nested MVs make each layer easier to trace, reproduce, and govern
Isolation helps ingest and refresh run more independently

MVs do not replace every ETL job. They absorb the core, stable, worth-persisting slice inside OceanBase — shortening the path that used to depend on external stitching. Gains include better processing efficiency, steadier reporting, and lower architecture and ops complexity.

6. Closing

OceanBase materialized views carry continuous TP change and feed stable AP consumption. They do not replace every query. They materialize analytical work that is frequent, expensive, stable, and reusable — as objects the database maintains over time.

That is how real-time analytics graduates from one-off SQL tuning to a path you can operate, govern, and evolve.

Building HTAP systems? What’s your biggest challenge with TP/AP
workload isolation? Drop a comment below.

👏 Clap if this helped · 🔔 Follow for more database engineering deep dives

References

OceanBase Materialized Views Documentation: https://en.oceanbase.com/docs/common-oceanbase-database-10000000003683480
OceanBase AP Overview: https://en.oceanbase.com/docs/common-oceanbase-database-10000000003678687

DEV Community