Charles Wu for OceanBase User Group

Posted on Jun 28

Technical Deep Dive: How OceanBase’s Native Column Store Powers HTAP

#analytics #database #datascience

This series covers how OceanBase Analytic Processing (AP) delivers strong transactional guarantees and high-concurrency support for real-time analytics. This article focuses on the native column-store engine’s architecture — tracing the full technical path from LSM-Tree’s baseline-delta separation, through adaptive compaction, columnar encoding, and Skip Index optimizations, to the vectorized execution engine 2.0, cost-model-driven row/column path selection, and system-wide adaptations across DDL, backup/restore, and transaction consistency. Together, these mechanisms show how OceanBase serves both TP and AP workloads within a single architecture.

1. From TP to HTAP: Why a Native Column Store

When building real-time analytics, enterprises face a classic architectural trade-off: deploy a separate OLAP database, or run analytical queries directly on the OLTP system. The first approach introduces data synchronization latency and operational complexity. The second hits a performance wall — row-store engines are not designed for analytical workloads.

Starting with V4.3.0, OceanBase offers a third path: a native column-store engine that delivers high-concurrency transaction processing and complex analytical queries within the same database instance — no additional data sync pipeline required. The key breakthrough is a deep integration of row store and column store within a single codebase and a single OBServer process, built on the LSM-Tree architecture.

Before V4.3.0, OceanBase’s AP capability relied on a lightweight row-store-plus-index approach. It handled simple analytical queries but bottlenecked on typical AP workloads involving multi-table joins, wide-range scans, and complex aggregations. The native column store closes this gap, enabling use cases like real-time reporting, real-time data warehousing, and user profiling.

This article walks through the technical details, starting with the column-store engine’s architecture.

2. Architecture of the Native Column-Store Engine

OceanBase’s native column-store engine is not a bolt-on layer over a row-store architecture. It redesigns data organization from the ground up. The core challenge: support both high-concurrency writes and efficient analytical queries in a unified architecture — balancing storage format, data flow, and query execution. This section explains the baseline-delta separation mechanism built on LSM-Tree.

Traditional column-store engines offer weak transactional support and struggle with complex transaction scenarios. OceanBase’s LSM-Tree architecture handles this by separating baseline and delta data:

Baseline data: Stored in columnar format, optimized for analytical query performance. When delta data accumulates to a threshold, it merges with the baseline to produce a new columnar baseline.
Delta data: Stored in row format for high-concurrency transactional updates. DML operations first write to an in-memory MemTable, then flush to disk as SSTables.
Merge mechanism: A background process periodically merges delta data into the baseline, avoiding performance degradation from heavy random updates.

This architecture delivers row-store and column-store unification in one codebase, one architecture, and one OBServer process — serving both TP and AP query performance.

3. Key Technical Mechanisms

With the baseline-delta separation established at the architecture level, the next challenge is engineering efficiency: how to compact columnar data efficiently, reduce storage overhead, and minimize I/O and compute costs at query time. This section covers four core mechanisms in OceanBase’s column-store engine.

Adaptive Compaction

Column-store compaction is more complex than row-store compaction. Columnar data is organized by column, so merges involve more files and data reorganization, consuming significantly more resources. OceanBase addresses this with an adaptive compaction mechanism.

The system intelligently selects which partitions to compact based on the volume of delta data written and query performance metrics per partition — avoiding resource overload from full-scale compaction scheduling. It borrows parallelization techniques from row-store compaction, splitting columnar merge tasks horizontally into sub-tasks for parallel execution. A more innovative feature is vertical splitting: column-level merge task scheduling that prioritizes hot or critical columns, optimizing resource allocation.

V4.3.0 also introduces tablet-level compaction, supporting partition-level merges triggered from the system tenant. When users observe query performance degradation, they can manually trigger a partition-level merge for quick resolution — providing greater operational flexibility.

Columnar Encoding

OceanBase V4.3.0 introduces a new columnar encoding algorithm, enabled via row_format=compressed. This encoding is deeply optimized for column-store access patterns, with full-stack optimization from low-level encoding to upper-level execution.

The new algorithm leverages CPU SIMD instructions to dramatically improve parallel processing of numerical computations. It applies efficient compression algorithms to numeric columns — delta encoding, run-length encoding, and others — significantly reducing storage footprint. Queries can execute filter operations directly on compressed data, eliminating decompression overhead and further boosting query performance.

Skip Index

Skip Index is one of the column-store engine’s core optimization features. In analytical queries, most time is spent on I/O and full data scans. Skip Index adds pre-aggregated data at the storage layer, intelligently skipping irrelevant data blocks to drastically reduce unnecessary disk access.

At the implementation level, Skip Index computes statistics at the smallest storage unit (micro-block) — including min, max, and null count — then aggregates upward layer by layer: micro-block to macro-block to SSTable, building a multi-level index structure. At query time, the system uses pre-aggregated min/max values to quickly determine whether a data block contains data within the query’s filter range, skipping large volumes of irrelevant blocks.

For DDL, OceanBase provides flexible Skip Index management. Users can create different types of Skip Index on specified columns at table creation, or modify them later via ALTER TABLE. For column-store tables, the system automatically creates MIN_MAX and SUM Skip Indexes on all columns — delivering performance gains with zero additional configuration.

Enhanced Pushdown

OceanBase further enhances query pushdown, moving more computation into the storage layer. All filter conditions can now be pushed down to storage, where they combine with Skip Index pre-aggregation to perform rapid data filtering at the storage level.

Aggregate function pushdown is also strengthened. count, max, min, sum, and avg can execute directly at the storage layer. For aggregation queries without GROUP BY clauses, the final result is computed entirely in the storage layer, eliminating the overhead of pulling data up to the execution layer.

The most innovative enhancement is GROUP BY pushdown. For low-cardinality columns, the system uses dictionary information within micro-blocks to perform localized GROUP BY computation, significantly reducing data transfer volume. This optimization is especially effective for typical analytical scenarios like user profiling and behavioral analysis.

4. System-Wide Adaptation

The column-store engine is not an isolated feature — its value depends on tight integration with every database module. From SQL parsing to execution plan generation, from DDL operations to backup/restore, OceanBase has systematically adapted multiple core modules since V4.3.0 to ensure column-store capabilities integrate seamlessly into existing workflows.

DDL Support and Table Types

OceanBase V4.3.0 provides flexible column-store DDL support. Users can create different table types based on workload requirements:

Column-store table: Creates a pure column-store table where all data is stored in columnar format, suited for analytics-heavy workloads.

create table t1 (c1 int, c2 int) with column group (each column);

Row-column redundant table: Maintains both row-store and column-store copies of the data. Supports both high-concurrency transactions and efficient analytical queries, at the cost of additional storage.

create table t2 (c1 int, c2 int) with column group(all columns, each column);

Column-store index: Creates a column-store index on a row-store table to accelerate specific query patterns.

-- Create a pure column-store index on columns c1, c2 of table t1
create index idx1 on t1(c1, c2) with column group(each column);

-- Create a row-column redundant index on column c1 of table t1
create index idx2 on t1(c1) with column group(all columns, each column);

Column-store tables support the full range of DDL operations: adding columns, dropping columns, modifying column types, and more. Skip Index DDL syntax was further refined in V4.3.5, supporting online DDL for maintaining pre-aggregated data.

Cost-Model Enhancement in the Optimizer

Cost-based row/column path selection is a key optimization in V4.3.0. OceanBase implements a unified optimizer codebase that estimates costs differently for row-store and column-store paths, enabling automatic path selection for user queries.

Cost Estimation

Storage-layer cost evaluation: The optimizer estimates I/O cost, CPU cost, and memory cost for scanning both row-store and column-store data.
Data characteristics: It considers the number of columns accessed, data distribution, and filter selectivity to dynamically select the optimal storage path.
Hybrid path support: For complex queries, the optimizer may use both row-store and column-store paths simultaneously, achieving best performance through intelligent data reorganization.

Path Selection Strategy

When the optimizer identifies an analytical operation (full table scan, multi-column aggregation, complex filtering), it prefers the column-store path. For point lookups and high-concurrency updates (TP operations), it selects the row-store path. This intelligent routing allows OceanBase to efficiently handle both TP and AP workloads within a single database.

Vectorized Engine 2.0

V4.3.0 introduces a new vectorized execution engine based on the Column data format. Compared to the earlier Uniform format, the new engine offers:

Native columnar data support: Optimized for column-store access patterns, eliminating data format conversion overhead.
Full SIMD utilization: More efficient data layout enables better use of modern CPU SIMD instructions for numerical computation.
Memory access optimization: Improved in-memory data arrangement increases cache hit rates and memory access efficiency.

Backup/Restore and Transaction Adaptation

OceanBase has adapted multiple modules around the column-store engine — from optimizer to executor, from DDL to backup/restore and transaction processing:

Backup/restore support: Column-store backup and restore is fully compatible with row-store mechanisms, supporting both full and incremental backups.
Transaction consistency: The column-store engine natively supports distributed strong-consistency transactions via MVCC, guaranteeing consistent data views.
High-concurrency processing: The LSM-Tree-based architecture supports high-concurrency transactional and query operations.
Mixed-workload capability: High-concurrency transaction processing and complex analytical queries coexist, providing a unified data processing platform.

Through these adaptations, OceanBase delivers a new technical option for modern enterprise applications requiring real-time analytics, strong transactional guarantees, and high concurrency — particularly in finance, e-commerce, and IoT where data freshness and consistency requirements are stringent.

5. Core Value

OceanBase’s native column-store engine leverages LSM-Tree architectural innovation to solve the traditional column-store bottleneck around strong transactions and high concurrency — a significant advance in HTAP database technology.

Key technical breakthroughs:

Architectural unification: Seamless row-column fusion under one architecture eliminates the data synchronization complexity and latency inherent in traditional HTAP systems.
Native transactional support: The column-store engine natively supports distributed strong-consistency transactions — rare in the industry.
Concurrency scalability: MVCC combined with LSM-Tree enables large-scale concurrent read/write operations.
Real-time analytics: Achieves second-level data freshness for analytical queries — delta data is immediately available for analysis.

From an industry perspective, the column-store engine redefines the direction of HTAP databases. It demonstrates that a unified architecture outperforms separated architectures in both performance and cost, that strong transactions and real-time analytics are achievable simultaneously, and that high-concurrency OLTP and complex OLAP can coexist.

6. Use Cases

OceanBase’s column-store engine applies broadly across multiple scenarios.

OLAP workloads: In data warehouse applications, the columnar format excels at large-scale data import and transformation, significantly improving ETL throughput. Complex report generation benefits from vectorized execution and pre-aggregation optimizations.

Real-time analytics: For user behavior analysis, business monitoring dashboards, and similar use cases, OceanBase delivers sub-second query latency. Anomaly detection systems perform rapid identification on real-time data for timely alerting.

HTAP mixed workloads: E-commerce platforms process high-concurrency transactions and complex sales analytics on the same platform, eliminating synchronization delays. Financial institutions achieve unified trading and risk control — real-time transactions and risk monitoring on one platform.

IoT and monitoring: High-volume device data collection and analysis, efficient time-series storage and querying, and predictive maintenance based on real-time device data — all demand strong real-time analytics capabilities.

7. Summary and Roadmap

OceanBase’s native column-store engine provides a new technical option for real-time analytics. From architectural unification to native transactions, from storage-layer pushdown to vectorized execution, the column-store engine forms a comprehensive technical system. Looking ahead, OceanBase will continue evolving across three dimensions: functionality, performance, and deployment flexibility.

Richer Functionality

Flexible column groups: Currently supports pure column storage; future releases will enable custom column group partitioning for diverse analytical needs.
Enhanced direct load: Further improvements to incremental direct load capabilities will shorten data preparation time for analytics.

Stronger Performance

Skip Index enhancement: Expand the statistical dimensions supported by Skip Index, covering more query patterns.
Unified storage format: Current storage formats are diverse; future releases will deeply integrate storage formats with the SQL vectorized engine, automatically recognizing different formats during SQL execution to reduce conversion overhead.

Flexible Deployment

Heterogeneous replicas: Support OLAP-specific heterogeneous replica types for specialized deployment requirements.
Storage-compute separation: Future support for storage-compute separation, enabling independent scaling of storage and compute for AP workloads at lower cost.

Continued evolution of OceanBase’s column-store engine will further strengthen its position as an enterprise-grade unified HTAP data platform.

DEV Community