Charles Wu

Posted on Jun 28

Why One System Beats Two: From Separate TP and AP Stacks to One Unified Engine

#analytics #architecture #database #dataengineering

How to Consolidate TP and AP Workloads into One System Without Sacrificing Performance

Data freshness is now a competitive edge. Enterprise expectations have shifted from “can we store it” to “can we use it now”: flash-sale dashboards must track conversions in real time, financial risk engines must flag anomalies within milliseconds, and retail chains rely on live inventory to schedule replenishment. Whoever gets to accurate, fresh data first wins.

Yet most enterprises still run transactions and analytics as two separate stacks with two data copies. Data must traverse a long ETL pipeline before reaching the analytical platform. This raises a persistent industry question: can a single system, with a single copy of data, simultaneously support large-scale transactions and real-time analytics?

Five Costs of the TP/AP Split

Before diving into the technical solution, consider the real costs of the TP/AP split:

Data silos that block cross-domain collaboration. Data is scattered across multiple online databases and real-time warehouses. A marketing team combining transaction records with user behavior for a real-time profile may find the data spread across three or four systems, grinding collaboration to a halt.

Stale data that kills critical decisions. When the platform supports only scheduled sync and batch analytics, enterprises miss their best decision windows. Flash-sale systems must track every order and click in real time; financial risk-control systems must detect and intercept anomalous transactions within milliseconds — or the timing advantage is gone.

Storage redundancy that wastes resources. Multiple downstream systems store identical copies of the same dataset. This duplication multiplies storage costs and introduces stability risks as data is repeatedly pulled from production systems.

Pipeline complexity that drives up operational costs. When data flows through Kafka, Flink, Hudi, Hive, Spark, ClickHouse, and more, every layer adds another data copy and another consistency check. The longer the pipeline, the higher the latency and the greater the consistency risk.

High labor costs and weak governance. Data processing, cleansing, assembly, and stream-batch job development consume substantial engineering effort. Fragmented governance forces teams to spend energy on data movement and exception handling rather than on business logic. Redundant storage, duplicated compute, and the senior engineers needed to maintain a complex pipeline form a significant hidden cost.

Why Convergence Is Inevitable

A fragmented stack accumulates serious technical debt: data is shuffled repeatedly between systems, engineering time goes to pipeline maintenance instead of innovation, and every sync step adds latency. Under this model, “real-time analytics” rarely lives up to its name.

The demand for millisecond-level response is forcing architectures to simplify. Replacing a patchwork of components with a single, elastic analytics platform is becoming industry consensus. The goal: keep data on one platform from generation through analysis, eliminate cross-system movement latency, and make genuine real-time decisions possible.

OceanBase’s evolution mirrors this trend. From its 2010 origin as a distributed engine, to v2.0’s natively distributed architecture, to v3.0 — where HTAP became a core objective and OceanBase became the only system to top both TPC-C and TPC-H — to v4.0’s unified single-node/distributed architecture, and finally the in-house columnar engine in v4.3, the convergence throughline has held for over a decade. Every upgrade answers the same question: how can one system and one copy of data serve both transactions and analytics well?

OceanBase’s Answer: One System, One Copy of Data

HTAP (Hybrid Transactional and Analytical Processing) addresses an urgent industry need. Built on natively distributed technology, OceanBase handles both transactional and analytical workloads — OLTP and OLAP — in a single engine, delivering the functionality of two separate stacks at substantially lower cost. In the traditional approach, OLAP must wait for data to be asynchronously shipped to a separate system; with OceanBase’s HTAP engine, mixed workloads complete inside the same cluster, and data is queryable as soon as it is written — no synchronization lag.

AP Grows on a TP Foundation

OceanBase’s premise: genuine HTAP requires high-performance OLTP first, then real-time analytics built on top. Natively distributed technology enables “one system” for both transactions and analytics, and “one copy of data” for heterogeneous workloads — guaranteeing consistency, minimizing redundancy, and lowering total cost. The AP capabilities are heavily optimized for real-time analytics but retain three core traits from the TP foundation:

High availability: Built on the Paxos protocol. When a minority of nodes fails, automatic lossless failover delivers RPO = 0 and RTO < 8 s. Strong-validation mechanisms detect inter-replica inconsistencies, network data corruption, and silent disk errors. Supports deployments from three-region five-IDC to two-region three-IDC.

Strong transactions: Intelligent routing dynamically picks the optimal execution path based on transaction characteristics. Full ACID is enforced for both single-log-stream and distributed transactions. Native support for large transactions and mixed workloads — high-concurrency short transactions alongside long-running bulk operations — covers scenarios such as bulk imports and batch data processing.

High-concurrency writes and queries: LSM-Tree turns random writes into sequential writes, dramatically increasing write throughput. PDML provides parallel large-transaction writes that shorten batch operations. MVCC delivers non-blocking reads and writes under high concurrency. The built-in MPP parallel execution framework scales performance linearly with node count.

By preserving these TP traits, OceanBase ensures that gaining real-time analytics does not mean compromising availability, consistency, or write performance — the foundation that makes “one system, two jobs” viable in production.

The Core Challenge: One Dataset Serving Two Workloads

The central engineering challenge: how can the same dataset satisfy point-lookup and random-read/write demands of transactions while also meeting large-scale scan and high-compression demands of analytics? OceanBase’s answer is a multi-layered design.

A clever split inside LSM-Tree: OceanBase’s LSM-Tree engine divides data into baseline and incremental layers. The baseline can be configured at table creation to use row store, column store, or row-and-column hybrid. In column-store mode, each column is an independent SSTable; together they form a virtual SSTable serving as the columnar baseline — ideal for analytical scans. Incremental data stays row-oriented, so all DML operations and upstream/downstream synchronization work unchanged; columnar tables support every transactional operation that row-store tables do. Row-oriented increments preserve real-time write performance, and periodic or adaptive compactions produce a fresh columnar baseline.

Three storage modes for different workloads: OceanBase’s baseline supports three storage modes — row, column, and row-column hybrid:

Columnar tables — columnar baseline plus row-oriented incremental layer. Each column is an independent SSTable; together they form a virtual SSTable. Incremental data remains row-oriented, and all DML and replication behave identically to row-store tables — no “column store can’t do transactions” limitation.
Row-column hybrid tables — redundant row-and-column baseline plus row-oriented incremental layer. The optimizer automatically picks the best read path per query. Best performance balance for HTAP workloads mixing point queries with wide scans.
Row-store tables — fully row-oriented layout for pure TP, high-concurrency transactional workloads. Fully compatible with existing workloads.

A unified SQL engine: For row-and-column coexistence to pay off, the SQL engine must cooperate. OceanBase’s optimizer ships a dedicated cost model for column store and automatically chooses row-store index reads or columnar scans based on the table’s layout and the query’s characteristics. Point queries go to row store, analytics go to column store — fully transparent. One SQL statement, optimal path.

A unified transaction layer: All incremental data is row-oriented; transaction modifications, logging, and multi-version control share logic entirely with the row store. No “column store can’t do transactions” problem. Data is queryable the instant it is written — TP write efficiency and AP data freshness coexist.

AP and AI retrieval on the same engine and data plane: In the GenAI era, a single query may combine structured filtering, JSON tag matching, full-text search, and vector similarity ranking. These capabilities used to be split among relational databases, search engines, and vector databases — creating “one model, one database” silos. OceanBase takes a different path: AP analytics and multi-model retrieval run within the same engine and data plane. Beyond relational data, OceanBase natively supports JSON (multi-value indexing), Vector (in-house HNSW index), full-text indexes (BM25), Array, Map, and Roaring Bitmap. One engine, one data plane — no second product needed for predicate filtering, multi-model retrieval, transactions, access control, or high availability.

Resource Isolation

For a single system to be truly viable in production, supporting mixed workloads is not enough — it must also ensure transactions and analytics do not interfere with each other. A single complex analytical query dragging down the transactional system is unacceptable in finance, retail, and similar domains. OceanBase provides multi-level resource isolation, coarse to fine:

Option 1: Tenant-level isolation. Each tenant gets dedicated CPU, memory, and IO quotas plus data access control. A single cluster serves multiple business systems without interference, with elastic scaling from 2C8G to 48C160G via in-place resize or node addition.

Option 2: Physical isolation via read-only columnar replicas. Row-store replicas handle read/write TP traffic through one ODP gateway; read-only columnar replicas in a dedicated zone handle AP traffic through a separate ODP gateway. Two gateways, strong physical isolation — and the total replica count can be lower than with row-and-column redundancy.

Option 3: Fine-grained resource-group isolation within a tenant. For workloads that share data, resource groups can be configured at the user, SQL, or background-task level. CPU is isolated via Cgroup, IOPS via IO scheduling, and network bandwidth limits are supported. Weighted allocation lets high-priority tasks claim more resources.

Together, these mechanisms turn “one system running two workloads” from a theoretical possibility into production-grade reality.

In Production: Who’s Already Running It

OceanBase’s real-time analytics, flexible resource management, and high compression and query performance serve finance, telecom, and retail customers in production today. Two examples:

Haidilao Hot Pot: Unified HTAP for Membership and Inventory

Mixed HTAP workloads are the most direct expression of “one system doing two jobs.” Online transactional requests and online analytics — dashboards, BI panels, ad-hoc queries — are served by the same OceanBase cluster. Vectorized execution, row-column hybrid storage, and resource isolation let transactions and analytics coexist efficiently.

Haidilao faced three challenges: database stability during holiday traffic peaks, real-time recommendations for hundreds of millions of members, and wasted capacity that was scaled up but never scaled back. The legacy sharding solution could not support high-concurrency transactions and complex analytics simultaneously.

After migrating to OceanBase, both the membership and inventory systems benefit from unified HTAP. TP and AP tenants coexist in the same cluster, with a mix of isolation strategies — resource group, replica, and tenant — balancing isolation, freshness, and utilization. Paxos-based 24/7 availability, stronger AP analytics, compressed storage, and second-level dynamic scaling together drove a significant reduction in TCO.

A Global Retail Brand: Replacing ADB + ClickHouse + Delta Lake

The brand’s marketing-analytics system originally combined three products: ADB for ETL, ClickHouse for serving queries, and Delta Lake for storage. Multiple products meant extra licensing costs; the ETL pipeline was complex and slow; ClickHouse ran on self-managed single nodes with no HA.

With OceanBase v4.3’s columnar engine and new-generation compute engine, a single cluster now delivers ETL faster than ADB and serving-query performance comparable to ClickHouse — simultaneously. One replica handles ETL bulk writes; another handles concurrent serving reads. The result: a unified solution for both analytical processing and serving queries, dramatically simpler architecture, the same concurrency handled with smaller CPU specs, significantly compressed storage versus ADB, zero ClickHouse maintenance overhead, and substantial cost savings overall.

Summary

Back to the opening question: can a single system, with a single copy of data, support large-scale transactions and real-time analytics simultaneously?

From Haidilao’s membership and inventory systems to a global retail brand consolidating three analytics components into one, more enterprises are answering yes in production. Behind these cases is OceanBase’s convergence at three layers:

The outcome: enterprises no longer need a parallel analytics infrastructure just to deliver “real time.” Time spent shuttling data, headcount maintaining sync pipelines, and dollars paid for redundant storage can all be redirected to work that creates business value.

HTAP mixed workloads are only the starting point of OceanBase’s AP story. The vectorized execution engine, enterprise-grade optimizer, and materialized views continue raising the analytical performance ceiling, while storage-compute separation and high-compression storage redefine price-performance for data analytics.

Subsequent articles in the OceanBase AP Core Strengths series will unpack these capabilities one by one. If replacing a multi-component architecture with a single system sounds compelling, the next installments will show that it not only runs — it runs faster and cheaper.

See It in Your Own Stack

Docs and deployment guides are at https://en.oceanbase.com/ if you want to test HTAP in your environment. For architecture questions, reach out — or compare notes in the comments.

Still running separate TP and AP stacks? What’s the one cost that actually hurts — pipeline ops, stale dashboards, or storage duplication — and what would it take for you to collapse them into one engine?

👏 Clap · 🔔 Follow for more database engineering deep dives

DEV Community