For years, the lakehouse unified storage and analytics.
It did not unify serving.
The architecture typically looked like this:
Lakehouse → analytics & ETL
Operational database → low-latency applications
Reverse ETL → copy curated subsets between them
That split worked when humans drove queries.
AI agents changed the load profile. They issue iterative point lookups, selective filters, repeated joins, and parallel queries inside tight reasoning loops. That workload stresses both scan-optimized engines and traditional OLTP systems in different ways.
Two architectural responses have emerged from Onehouse and Databricks.
Onehouse LakeBase: Database Primitives on Open Tables
LakeBase is positioned as a low-latency serving layer built directly on open lakehouse tables, specifically:
Apache Hudi
Apache Iceberg
Storage remains object-store based. LakeBase introduces:
Record-level and secondary indexing
Index joins that shift cost toward O(K) for selective workloads
Transaction-aware distributed caching tied to table commits
Autoscaled serving engines (Quanton-based execution)
A Postgres-compatible endpoint for standard connectivity
The core bet: instead of maintaining a separate serving tier via reverse ETL, extend the lakehouse itself with database-style mechanics.
Traditional distributed engines (Spark/Trino class) often execute joins with work proportional to O(N + M) because of scan and shuffle patterns. LakeBase’s index joins aim to reduce cost toward the filtered working set.
For narrow, high-selectivity queries, Onehouse reports:
~95% latency reduction on 1TB TPC-DS selective workloads
~6x performance vs Databricks SQL Serverless (tested narrow queries)
5–10x improvement vs AWS Athena in customer trace replays
These are vendor-reported benchmarks and workload-specific, but they illustrate the design intent: make the lakehouse viable for high-concurrency serving without duplicating data.
Databricks Lakebase Postgres: Dedicated OLTP Integrated with the Lakehouse
Databricks takes a different approach.
Lakebase is a fully managed PostgreSQL-compatible OLTP engine integrated into the Databricks platform.
Architecturally:
Transactional workloads run on a dedicated Postgres engine
Strong OLTP semantics and isolation guarantees
Tight integration with Unity Catalog
Federated access between OLTP and lakehouse analytics
Databricks is natively optimized around Delta Lake, with growing Iceberg interoperability.
Lakebase Postgres does not modify the lakehouse storage layer. It complements it.
The philosophy here is specialization:
OLTP engine → optimized for transactional latency
Lakehouse (Delta / Iceberg) → optimized for distributed analytics
Unified control plane → separate execution semantics
Architectural Contrast
Both approaches aim to reduce brittle reverse ETL pipelines.
The difference lies in where database behavior lives:
Onehouse → Extend open lakehouse tables (Hudi/Iceberg) with indexing, caching, and serving semantics.
Databricks → Introduce a dedicated PostgreSQL engine alongside a Delta-native lakehouse.
One converges inward.
The other composes specialized systems under one platform.
Final Take
If your workload is read-heavy, selective, and lake-centric, the indexing-first model is compelling.
If you require mature transactional guarantees and explicit workload isolation, a managed PostgreSQL engine integrated with the lakehouse may be structurally cleaner.
The real shift is not about formats.
It is about whether serving becomes a native property of the lakehouse — or remains a specialized companion to it.
Top comments (0)