By Rick Houlihan & Patrick Meredith
Databricks named the right problem. Their answer is a credible execution of an idea Oracle Multitenant solved a decade earlier — and as it turns out, the gap they think they've found in Oracle was only one PL/SQL package away from closing.
The Pitch That Started This
A colleague forwarded me the Databricks blog post the other day. Opening line:
"In our previous blog, we introduced Lakebase, the third-generation database architecture that fundamentally separates storage and compute."
— Databricks, "How agentic software development will change databases"
So, like what Oracle did 12 years ago.
I'm being a little snide. Bear with me — there's a real article underneath. The blog is a thoughtful read about how AI agents are changing database workloads, and most of the diagnosis is right. Their telemetry is interesting:
- "In Databricks's Lakebase service, AI agents now create roughly 4x more databases than human users."
- "[O]n average, each database project has ~10 branches and some databases with nested branches reaching depths of over 500 iterations…"
- "[F]or about half of these agentic applications, the database compute lifetime is less than 10 seconds."
That last number is real. Agents don't behave like humans. They generate variants by the dozen, run them in parallel, evaluate against an eval set, keep the winner, throw away the losers. Evolutionary development. The economics break down completely on a database that costs $200/month per instance with a five-minute provisioning cycle.
So Databricks is right about the problem. They're right that databases need a branching primitive. They're right that storage and compute need to scale independently. They're right that the always-on cost floor doesn't survive contact with agents.
This article is not about whether they're wrong on the diagnosis.
It's about whether their answer is novel — and what the architecture-correct version looks like. Because Oracle has been shipping the same primitive in the engine since July 2013, and a small Python + PL/SQL wrapper is all that separates it from the developer experience Databricks just announced.
Patrick and I thought it was worth writing this down.
What Lakebase Actually Is
Spoiler: it's Neon.
Databricks announced its agreement to acquire Neon on May 14, 2025. The press release didn't disclose a price (industry reporting put it at roughly $1 billion), but it did volunteer a useful telemetry data point: "over 80 percent of the databases provisioned on Neon were created automatically by AI agents rather than by humans." That number is also the reason this acquisition happened — Neon, founded in 2021 by Postgres committers, had built a serverless Postgres architecture that AI agents could actually afford to use: stateless compute nodes, a Paxos-based safekeeper quorum holding WAL, and a pageserver materializing pages on demand from object storage. Branches were stamped as metadata pointers at a moment in WAL history; copy-on-write at the storage layer made divergence cheap.
That architecture is good engineering. It's also exactly what Databricks now ships as Lakebase. Their own architecture deep-dive opens with:
"In the lakebase architecture, your compute is stateless. It does not rely on a local data directory. Instead, it streams WAL to a Paxos-based quorum of safekeepers."
— Databricks, "How lakebase architecture delivers 5x faster Postgres writes"
The same post describes how, when Postgres compute requests a page from storage, the pageserver "reconstructs it by finding the most recent materialized image of that page and replaying any WAL deltas on top." If you've read Neon's published architecture overview, this is familiar vocabulary — stateless compute → safekeepers → pageserver → object storage — because it is Neon's architecture. Lakebase is Neon with a Databricks brand on top.
To be clear: that's not a problem. Neon is good engineering. Acquiring it and integrating it with the lakehouse is a perfectly defensible product move — buying a four-year-old startup whose technology already solves the agent-economics problem is faster than building one yourself. Nobody should be mad about an acquisition.
The problem is the next thing Databricks did, which was call a four-year-old Postgres-branching architecture "the third-generation database architecture that fundamentally separates storage and compute." That's a marketing claim, not an architectural one, and it has two specific issues. First, "third generation" implies a chronology — first generation was monolithic, second was something, this is the third — and Databricks has never been particularly clear about what the second generation was, which is convenient because any honest answer would include systems older than Lakebase that already do what Lakebase does. Second, the "fundamentally separates storage and compute" phrasing treats compute/storage separation as a 2025 innovation, which is awkward when Snowflake shipped that architecture commercially in 2014 and Oracle shipped a multitenant variant of it in July 2013.
"Third generation" sells better than "we acquired a 2021 startup six months ago, here's what they built." It also doesn't survive a history check.
That's the next section.
The "Third-Generation" Sleight of Hand
Same Databricks blog post — "A New Era of Databases: Lakebase," June 12, 2025 — one "Database Architecture Evolution" section, three generations laid out in sequence.
Generation 1 — the monoliths:
"Examples: MySQL, Postgres, classic Oracle"
"Database systems started as absolute monoliths."
Generation 2 — proprietary loose coupling:
"Examples: Aurora, Oracle Exadata"
"As cloud infrastructure improved, vendors physically separated storage from compute, moving storage into proprietary backend tiers."
Same Oracle. Two generations. One page apart. Pick one.
I'll be charitable and assume the intended argument was "early Oracle was a monolith, modern Oracle isn't." Fine. Then "modern" deserves a timeline.
| Year | System | What was separated |
|---|---|---|
| 2001 | Oracle Real Application Clusters (RAC) | Multiple compute nodes against a single shared SAN/NAS storage substrate (Oracle 9i) |
| 2008 | Oracle Exadata v1 | Database servers vs. intelligent storage cells with predicate offload (Smart Scan), GA September 2008 |
| 2010 | Google Dremel / BigQuery | Disaggregated storage and compute, columnar — VLDB 2010 paper |
| July 1, 2013 | Oracle Database 12c / Multitenant | CREATE PLUGGABLE DATABASE … FROM … SNAPSHOT COPY ships in the engine |
| 2014 | Snowflake (GA) | Three-layer cloud-native: storage / virtual warehouses / cloud services |
| Nov 2014 / Jul 2015 | Amazon Aurora | Compute decoupled from a 6-way replicated storage layer across 3 AZs (preview Nov 2014, GA July 2015) |
| 2021 | Neon (founded) | Postgres-specific WAL-level disaggregation with branching |
| May 14, 2025 | Lakebase = Databricks acquires Neon | Neon's architecture wrapped around open lake storage |
Storage and compute have been separated in production databases for 25 years. Across two paradigms, four vendors, and at minimum seven shipping systems before Lakebase showed up. "Third generation" isn't an architectural claim. It's a marketing label that requires the reader to forget about Oracle RAC, Exadata, Dremel, Multitenant, Snowflake, Aurora, and Neon in roughly that order.
So what's actually new in Lakebase? The same blog is honest about this if you read past the generation label:
"Like Gen 2, it separates compute from storage, but with a critical difference: both the storage infrastructure and the data formats are completely open."
Translation: Gen 2 already separated storage from compute. Their own text concedes the point. The Gen 3 differentiator they're actually claiming is open data formats. We'll dismantle that claim in Section 10 — short version, "open formats" turns out to do less work than the marketing suggests once you ask which formats, governed by whom, queryable how. But file the claim for now.
The other thing the launch blog flags as Gen 3 distinctive is branching:
"Databases can be branched and cloned the way developers branch code."
Branching as a developer-experience primitive is a fair thing to call out — it genuinely changes how AI agents and dev workflows interact with databases, and we conceded that point in Section 1. Branching as a database-engine primitive, though, has shipped in Oracle Multitenant since July 1, 2013, with documented syntax, multiple supported storage substrates, and a hard limit four to eight times higher than Lakebase's. Which is the next section.
"Third-generation database architecture? We're on our fifth." - Patrick Meredith
PDB Snapshot Copy: The Branching Primitive Oracle Has Shipped Since 2013
The syntax is one statement:
CREATE PLUGGABLE DATABASE my_experiment_branch
FROM base_experiment_pdb
SNAPSHOT COPY;
The Oracle 19c SQL Reference describes what happens underneath:
"The
SNAPSHOT COPYclause instructs the database to clone the source PDB using storage snapshots. This reduces the time required to create the clone because the database does not need to make a complete copy of the source data files."— Oracle Database 19c SQL Language Reference: CREATE PLUGGABLE DATABASE
What "storage snapshots" means depends on the substrate. The same reference is explicit: with CLONEDB=FALSE, "the underlying file system for the source PDB's files must support storage snapshots. Such file systems include Oracle Automatic Storage Management Cluster File System (Oracle ACFS) and Direct NFS Client storage." With CLONEDB=TRUE, "the underlying file system for the source PDB's files can be any local file system, network file system (NFS), or clustered file system that has Direct NFS enabled. However, the source PDB must remain in open read-only mode as long as any clones exist."
So:
| Storage substrate | Snapshot mechanism | Notes |
|---|---|---|
| Oracle ACFS | Copy-on-write storage snapshots |
CLONEDB=FALSE path |
| Direct NFS Client (dNFS) | Copy-on-write storage snapshots on snapshot-capable NFS array |
CLONEDB=FALSE path |
| Exadata sparse disk groups | Copy-on-write | Source PDB must be read-only |
Standard FS + CLONEDB=TRUE
|
dNFS sparse files over NFS | Source PDB must remain open read-only while clones exist |
| Exascale (23ai+) | Redirect-on-write | "created quickly, consume little storage space upon initial creation, and can be created in practically unlimited numbers" |
Note the precision on "redirect-on-write" — that's Oracle's official term only for Exascale snapshots in 23ai+. Older substrates use copy-on-write semantics. Per the Exadata Database Service on Exascale Infrastructure documentation: "These PDB snapshots leverage Exascale redirect-on-write technology so that they are created quickly, consume little storage space upon initial creation, and can be created in practically unlimited numbers." The distinction matters if you're going to argue with someone about it.
Sibling features in the Multitenant family:
-
PDB Snapshot Carousel (introduced in 18c, not 19c — common citation error). Per oracle-base.com: "Oracle 18c introduced the concept of a snapshot carousel, which is a series of point-in-time copies, or snapshots, of a PDB." Default 8 snapshots, hard cap at 8 via
MAX_PDB_SNAPSHOTS. Oldest is overwritten when full. Useful for short-horizon point-in-time recovery without the overhead of full backups. - Refreshable Clones. Physically full copies with incremental redo apply. Different beast from snapshot copies (full storage cost, but ongoing sync from source). Convertible one-way to a regular PDB.
-
PDB density. Up to 4098 PDBs per CDB on Enterprise Edition with Multitenant licensing — the
MAX_PDBSreference lists possible values of5,254, or4098by edition (Standard/Express, Standard Edition 2, Enterprise Edition respectively).
Now compare ceilings:
| Platform | Branch limit | Branch depth | Cross-region |
|---|---|---|---|
| AWS Aurora | 15 copy-on-write clones per source; 16th becomes a full copy | No explicit depth ceiling, but each level re-consumes the 15 budget | "You can't create a clone in a different AWS Region from the source Aurora DB cluster" |
| Lakebase (Databricks doc) | 500 per project; only 10 unarchived (active) at once | Hundreds nested (per their telemetry) | Per region |
| Oracle Multitenant | Up to 4098 PDBs per CDB | No documented depth limit | RAC + Data Guard, cross-region via Active Data Guard |
Lakebase's 500-per-project ceiling is generous compared to Aurora's 15. Oracle's 4098 is generous compared to Lakebase's 500 by an order of magnitude. And Lakebase has another hard cap that doesn't appear in the cloning side of the comparison: it allows only 10 unarchived (active) branches at once. Oracle has no equivalent active-cap; you tune branch density via Resource Manager based on your workload, which is the next section.
This primitive shipped on July 1, 2013, in Oracle Database 12c. Twelve years before Lakebase. In the database engine, not in a wrapper. With a single SQL statement, documented in the official SQL Language Reference. There is no Postgres extension here. There is no separate page server, no Paxos quorum, no $1B acquisition. It's just CREATE PLUGGABLE DATABASE … SNAPSHOT COPY, and it has been since the series finale of Breaking Bad.
The Compute Story Most People Get Wrong
A note on this section: the structural argument here came from Patrick during a Slack thread when he challenged me on the scale-to-zero comparison. I had it wrong initially. Here's the correct read, in his voice.
The naive comparison says Lakebase wins on scale-to-zero because branches scale individually to zero compute when idle. Oracle, the story goes, is "always on" — fixed ECPUs allocated to the ADB instance, multiple PDBs sharing the pool, no per-branch zero-cost dormancy.
That comparison gets the shape right and the conclusion wrong.
Yes, in Autonomous Database Serverless, ECPUs are allocated at the instance level, not per PDB. Yes, Snapshot Copy PDB branches inside an ADB share that pool. The naive read says: "uh-oh, no isolation, abandoned branches will eat compute." The correct read is: abandoned branches in a shared pool consume nothing by construction — because they aren't reserving anything.
Walk through the mechanics:
-
Closed PDBs consume zero CPU and zero shadow processes.
ALTER PLUGGABLE DATABASE foo CLOSE IMMEDIATE;and the branch is dormant. The 26c SQL Reference describes the semantic: "the PDB equivalent of the SQL*PlusSHUTDOWNcommand with the immediate mode." Metadata stays in the dictionary; nothing else stays resident. - Idle open PDBs consume near-zero. Just metadata pages.
- Active PDBs draw from the shared pool. That pool auto-scales: per the Oracle docs, "with compute auto scaling enabled the database can use up to three times more CPU and IO resources than specified by the number of ECPUs." You pay for the burst when it happens, not when it doesn't.
-
Resource Manager governs the priority. CPU shares,
MAX_IOPS,MAX_MBPS, sessions, parallel servers, per-PDBSGA_TARGETandPGA_AGGREGATE_LIMIT. You decide which branches get more pool when contended. -
V$PDBSandV$RESOURCE_LIMITexpose per-branch consumption so a supervisor process can watch and auto-suspend.
So what's the real difference? Lakebase per-DB scale-to-zero with cold-start latency on resume. Oracle shared elastic pool with no cold start.
For an agentic workflow, where the supervisor might wake an "abandoned" branch tomorrow to revisit a hypothesis it shelved today, the no-cold-start property matters. The branch has been consuming nothing; the moment it gets a connection, it's responsive within milliseconds because the compute pool is already warm. Lakebase, by design, has to spin compute back up.
Which means the elasticity scoreboard most people read off the spec sheet — "Lakebase: scale-to-zero ✅ / Oracle: shared pool ❌" — is solving the same problem two different ways and pretending one wins. Different shape. Same economics for abandoned experiments. Faster wakeup on Oracle when the agent comes back.
Sharing compute between PDBs isn't a bug. It means abandoned branches aren't wasting compute, period.
Or as I put it in Slack when this came up: "What we want is exactly what we already have. The compute is scaled. Abandoned branches contribute nothing." That's the architecture.
— Patrick
The Hard Limits
Side-by-side, with citations on every claim:
| Capability | Lakebase | Oracle Multitenant + ADB |
|---|---|---|
| Total branches | 500 / project (Databricks doc) | Up to 4098 / CDB (MAX_PDBS) |
| Active branches | 10 (hard cap) | No hard cap; tuned via Resource Manager |
| Branch creation speed | Instant (metadata + COW) | Near-instant on snapshot-capable storage |
| Cold-start on resume | Sub-second to multi-second | None — shared pool |
| ACID | Postgres MVCC | Full ACID, RAC, Active Data Guard |
| Failover behavior | Postgres-standard (kills in-flight) | Transparent Application Continuity — in-flight transaction replay |
| Vector search | Postgres extension | In-engine, optimized by 40-year-old CBO |
| JSON | jsonb (sequential traversal) | OSON binary, hash-indexed O(1) field access |
| Graph | Postgres extension | SQL/PGQ, in-engine |
| Cross-modal queries (vector + JSON + graph + relational) | Limited by extension boundaries | Single transaction, single query plan |
| Open data format | "Postgres page on S3" (Postgres-only readable) | OSON + Iceberg + Parquet + Mongo wire + native SQL |
| Mongo wire compatibility | None | Yes (Oracle MongoDB API) |
Lakebase wins on developer-experience polish today. The branching UX is wired into the product, the CLI is published, the dashboard renders branch trees. Credit where due — that's a real product investment.
Oracle wins on every limit that matters once you stop counting GitHub stars. Density (4098 vs 500). Active concurrency (no cap vs 10). ACID. Failover that doesn't kill your transactions. Vector + JSON + graph + spatial + relational in one query plan optimized by 40 years of CBO development. Mongo wire compatibility, for the developers who already wrote against MongoDB and don't want to rewrite their app to evaluate a database.
The DX gap is real. It's also the easiest gap to close, which is the next section.
The DX Gap, And Why It's Trivial to Close
Patrick said it best in the original Slack thread: "We probably should develop a lightweight external API too. That should be extremely simple — it's all external to the database."
He was right and he's already shipped it.
The DX gap is real. There is no pdb branch my-experiment command in stock Oracle. Lakebase has a polished branching UX with a published CLI, a dashboard, and git-shaped semantics. We're not going to pretend otherwise.
But this is a wrapper-shaped problem, not a kernel-shaped problem. Patrick built the wrapper:
pmeredit/pdb-branch— "a small multi-language library over a shared PL/SQL package for making Oracle PDB snapshot copies feel like cheap database branches for agentic workflow experiments."Python, Node.js, Rust, and Java bindings, plus a Rust-built
pdbCLI. Releases alongside this article.
The architecture is small enough to fit on a napkin:
-
PDB_BRANCHPL/SQL package — installed and upgraded automatically by the language binding at startup. WrapsCREATE PLUGGABLE DATABASE … SNAPSHOT COPYwith idempotent lifecycle DDL. -
Three control tables in
CDB$ROOT:-
PDB_BRANCH_BRANCHES— branch registry (name, parent, state, expiration, score) -
PDB_BRANCH_EVENTS— audit log of branch lifecycle events -
PDB_BRANCH_PROFILES— branch-to-Resource-Manager-profile mapping
-
-
BranchClientwrappers in four languages — Python overpython-oracledb, Node.js overoracledb, Rust over the ODPI-C-basedoraclecrate (with a pure-Rustoracle-rspath for non-SYSDBA work), and Java. One PL/SQL contract, four idiomatic surfaces. -
A
pdbRust CLI —bin/pdbwraps the Rust binding so callers don't need to know Cargo'starget/layout.git branch-shaped commands,.pdbprofileTOML config, and per-flag environment-variable overrides. -
Optional Resource Manager profiles:
PDB_BRANCH_ACTIVE,PDB_BRANCH_IDLE,PDB_BRANCH_BACKGROUND.
Two ways to drive it. The library surface (Python shown; Node/Rust/Java are equivalents):
from pdb_branch import BranchClient
client = BranchClient(connection) # auto-installs/upgrades PL/SQL package
client.create_branch(
"AGENT_RAG_042",
from_pdb="GOLDEN_MASTER",
notes="try smaller chunk size and rerank before answer synthesis",
)
client.record_score("AGENT_RAG_042", 0.91, notes="eval: qa_regression_v3")
client.promote("AGENT_RAG_042", notes="winner for current retrieval policy")
client.cleanup(close_idle_after_minutes=60, drop_expired=True)
Or, at the shell, the same workflow via the pdb CLI:
bin/pdb init --dsn localhost:1521/FREE --user sys --password ... --from FREEPDB1
bin/pdb branch AGENT_RAG_042 --notes "try smaller chunk size and rerank"
bin/pdb score AGENT_RAG_042 0.91 --notes "eval: qa_regression_v3"
bin/pdb promote AGENT_RAG_042
bin/pdb branch -d AGENT_RAG_042
bin/pdb init writes a .pdbprofile so the daily commands stay short. The CLI also accepts environment-variable overrides and flag overrides — flags beat env vars beat .pdbprofile beat local defaults.
That's the entire developer experience. Branch, score, promote, reap. The argument that Oracle "doesn't have git branch for databases" was true a week ago. Today there's a CLI in the repo, an integration test that runs it against an Oracle Free container in CI, and a Rust binary you can drop in your $PATH.
One architectural point worth elevating: the two-connection security model. The agent never gets SYSDBA. There are two distinct connections:
-
Control-plane connection — trusted orchestration code →
CDB$ROOTasSYSDBA→ usesBranchClientto create, open, close, and drop PDB branches. - Workload connection — the agent → branch PDB → normal application user → ordinary SQL against branch-local data.
The agent receives only a DSN to its assigned branch and standard application credentials. It cannot create branches, drop branches, or escape its sandbox. Lakebase has nothing analogous in its branching API today; the agent-vs-supervisor security boundary is enforced at the cloud-IAM layer rather than in the database itself, and that's a category weaker than separation of concerns enforced inside the engine.
Snapshot-copy fallback is engineered, not aspirational. When the library requests SNAPSHOT COPY and the underlying storage rejects it — Oracle Free's container filesystem returns ORA-17525 / ORA-65169, for instance — the library transparently retries as a full clone, records a SNAPSHOT_COPY_FALLBACK row in PDB_BRANCH_EVENTS, and (in the Python binding) emits a SnapshotCopyFallbackWarning. Correctness is preserved on substrates that can't sparse-clone; the events table makes it visible when that happened so capacity planning isn't a guessing game.
Free deployment path:
-
Oracle Database 23ai/26ai Free Docker image —
container-registry.oracle.com/database/free. CDB serviceFREE, default PDBFREEPDB1. Multiple branch PDBs supported. The Free image's container filesystem doesn't support storage snapshots, sosnapshot_copy=Trueis silently treated as a full clone via the fallback path above — which means 10–30 branches realistic on a laptop, not hundreds. $0 cost forever, and the Oracle Free integration tests in the repo run the Python, Node.js, Rust, Java, and CLI surfaces against this image in CI. -
Self-managed CDB on 19c+ with snapshot-capable storage — production target. ACFS, dNFS, Exadata sparse, or Exascale. Branch DDL uses Oracle Managed Files via
CREATE_FILE_DEST, preferringDB_CREATE_FILE_DESTwhen set and otherwise deriving a destination from the parent PDB's datafile directory. -
ADB Serverless / Always Free is explicitly NOT a v1 target. ADB application connections land in an existing PDB, not in
CDB$ROOT, so they cannot run PDB branch DDL. A real architectural constraint of ADB's tenancy model, not apdb-branchlimitation.
The README is honest about v1 boundaries: the idempotent installer doesn't migrate destructive schema changes yet; PL/SQL identifiers are restricted to simple unquoted Oracle names; promotion is metadata-only, with scaling and export workflows left to deployment-specific adapters. That's an honest v1 scope.
The article is the "why." The repo is the "how." They land together, today.
The Agentic Workflow on Oracle
The lifecycle Patrick described in our Slack thread, mapped to the actual pdb-branch API:
Phase 1 — heavy experimentation. The supervisor holds the SYSDBA control-plane connection and spins up branches:
for hypothesis in hypotheses:
branches.create_branch(
f"AGENT_{hypothesis.id}",
from_pdb="GOLDEN_MASTER",
notes=hypothesis.description,
)
branches.set_profile(f"AGENT_{hypothesis.id}", "PDB_BRANCH_ACTIVE")
Each agent receives a DSN to its assigned branch plus an app-user credential. Agents do not see CDB$ROOT. They run their experiments — vector queries, JSON queries, SQL, whatever the eval needs — against ordinary Oracle PDBs. Once the branch PDB is open there is no special "branch query mode": the branch is just an isolated Oracle PDB service.
Phase 2 — evaluate. Supervisor logs scores back to PDB_BRANCH_BRANCHES as agents finish:
branches.record_score("AGENT_RAG_042", 0.91, notes="eval: qa_regression_v3")
The supervisor process can watch V$PDBS (open mode, last open time, total size) and V$RESOURCE_LIMIT (per-PDB CPU and I/O draw) for liveness and resource consumption.
Phase 3 — promote and reap. Winners stay active. Losers get downgraded or closed:
branches.promote("AGENT_RAG_042", notes="winner for current retrieval policy")
branches.cleanup(close_idle_after_minutes=60, drop_expired=True)
cleanup is the auto-suspend / auto-drop primitive. In production you don't run this from the supervisor; you schedule PDB_BRANCH.CLEANUP from DBMS_SCHEDULER so the orchestration code doesn't need to babysit branch lifecycle.
Behind those four method calls, the SQL is exactly what you'd expect:
CREATE PLUGGABLE DATABASE AGENT_RAG_042
FROM GOLDEN_MASTER SNAPSHOT COPY;
ALTER PLUGGABLE DATABASE AGENT_RAG_042 OPEN;
ALTER PLUGGABLE DATABASE AGENT_RAG_042
SET DB_PERFORMANCE_PROFILE='PDB_BRANCH_ACTIVE';
INSERT INTO PDB_BRANCH_BRANCHES (NAME, PARENT, STATE, NOTES, CREATED)
VALUES ('AGENT_RAG_042', 'GOLDEN_MASTER', 'ACTIVE',
'try smaller chunk size...', SYSTIMESTAMP);
INSERT INTO PDB_BRANCH_EVENTS (BRANCH_NAME, EVENT_TYPE, DETAILS, EVENT_TIME)
VALUES ('AGENT_RAG_042', 'CREATED', '{"from":"GOLDEN_MASTER"}', SYSTIMESTAMP);
Five statements, one transaction. The branch is live. An agent connects to AGENT_RAG_042 as app_user and runs its experiment.
This is what Databricks calls evolutionary algorithms in the database. It's the right framing. The substrate has been Oracle for a decade; what was missing was the wrapper that makes it feel like git. Each language binding is roughly one module long, the Rust pdb CLI is one binary, and they all sit on top of one shared PL/SQL package. The whole DX gap was about that much code.
Cost Reality
Both platforms have real costs and real free entry points. Skipping the marketing-deck pricing slide and going straight to what an engineer would actually pay:
| Workload pattern | Lakebase | Oracle ADB Serverless 2 ECPU |
|---|---|---|
| 50 mostly-idle branches, occasional bursts | $80–$150/mo | $190–$290/mo |
| 100+ branches, high density | Hits the 10-active wall | Scales naturally to thousands |
| Sustained 8+ hr/day activity | Capacity-unit cost climbs | Cheaper at sustained load |
| Storage at scale | $0.345 / GB-month | ~$0.024 / GB-month (≈15× cheaper) |
| Free for prototyping | Always Free tier (limited) | Free Docker image: $0 forever |
These are public list prices as of mid-2026, picked from each vendor's published rates. Run the numbers for your workload.
The honest read:
- Lakebase wins on bursty, mostly-idle floors with light data. That's the optimization point of per-DB scale-to-zero, and they do it well.
- Oracle wins on density, sustained activity, and storage at scale. When agents are actually doing work, the shared-pool model delivers more compute per dollar. When experiment data grows, the storage cost differential alone (~15×) can dominate the total.
-
Oracle Free Docker is genuinely free. No cloud signup, no credit card, no quotas. Patrick's
pdb-branchREADME documents this as the recommended local prototyping path.
This is the compute story restated as economics. Per-DB scale-to-zero looks cheap when nothing is running. Shared elastic pool is cheaper when anything is running. Pick the model that matches your workload, not the marketing scoreboard.
What's Actually New About Lakebase
Worth giving Databricks an honest hearing. The "third-generation" framing collapses the moment you check the dates. What about their other claim — that in Lakebase "both the storage infrastructure and the data formats are completely open"?
That one survives partway and dies in the details.
The operational store in Lakebase is Postgres page format on cloud object storage. That's what they mean by "open storage infrastructure." But Postgres' on-disk page layout is a physical storage format, not a portable interchange format. The only thing that can read a Postgres page file is the Postgres engine. Calling that "open" because the Postgres source code is open is a category error. By that logic, MongoDB's BSON is "open" because the spec is published.
The other openness claim — that the same data is queryable as Iceberg by external analytical engines — is true. But the Iceberg view isn't the operational store. It's a separate projection layer (the "Mooncake" bridge — Databricks' OLTP-to-lakehouse export pipeline). Iceberg files are derived from the operational Postgres pages, not the same bytes.
Which means Lakebase's actual architecture is:
- A canonical store in Postgres-only page format. Closed to anything that isn't Postgres.
- A projected shape in Iceberg, exported to make the data analytically accessible.
That's exactly canonical form + projected shape. It's the architecture pattern I've been calling Unified Model Theory for the last two years. Databricks reinvented UMT, called the closed canonical store "open," and called the projection layer "openness."
Oracle's answer to "open data" is the converged engine itself: same canonical store, multiple shapes natively in the engine — SQL, JSON Duality Views, Property Graph, Vector, Spatial, Full-Text Search, Mongo wire protocol, OSON serialization out, Iceberg/Parquet for analytics. No bridge layer required. The cost-based optimizer sees all the modalities in a single query plan.
The architecture-correct way to expose canonical data through multiple shapes is to do it in the engine. That is what Oracle has been shipping for 40 years and what UMT formalizes. Databricks' Lakebase + Mooncake architecture is one valid implementation pattern of the same idea, with two extra hops and a new vocabulary.
What's actually new in Lakebase isn't the architecture. It's the packaging — a polished branching UX wired into a data lake brand and a billion dollars of marketing oxygen. That's a real product investment and a credible push into a market segment Oracle has under-marketed. Credit where due.
It's just not "third-generation database architecture." It's first-generation Postgres branching with a second-generation marketing department.
The Real Take
Three things to land:
1. Agents do need branching. Databricks' diagnosis is correct, and the agentic future they describe is real. Database branching is the missing primitive for evolutionary development. Cost floors do break the economics. Storage and compute do need to scale independently. Credit where due.
2. Lakebase is competent execution of an idea Oracle Multitenant solved in 2013. Neon is good engineering. Lakebase is Neon plus a brand and a UX layer. That's fine — but it isn't "third generation." It's a four-year-old Postgres-branching architecture, recently acquired and rebranded.
3. The architecture-correct version exists today. Full ACID. Up to 4098 branches per CDB. Vector, graph, JSON, spatial, full-text — single engine, single transaction, single query plan optimized by 40 years of cost-based optimizer development. Transparent Application Continuity replays in-flight transactions across failover. The two-connection security model keeps agents out of CDB$ROOT by construction.
The only real gap was developer experience. Patrick's pdb-branch closes it. Today. A Python client, a PL/SQL package, three control tables, and a sane API. Branch, score, promote, reap.
Stop reinventing 2013. Build the wrapper. Ship.
Third-generation database architecture? We're on our fifth.
— Rick & Patrick
Citations
Databricks (primary subject):
- "How agentic software development will change databases" — https://www.databricks.com/blog/how-agentic-software-development-will-change-databases
- "A New Era of Databases: Lakebase" (June 12, 2025) — https://www.databricks.com/blog/what-is-a-lakebase
- "How lakebase architecture delivers 5x faster Postgres writes" — https://www.databricks.com/blog/how-lakebase-architecture-delivers-5x-faster-postgres-writes
- "Database Branching in Postgres: Git-Style Workflows" — https://www.databricks.com/blog/database-branching-postgres-git-style-workflows-databricks-lakebase
- "Databricks Agrees to Acquire Neon" press release (May 14, 2025) — https://www.databricks.com/company/newsroom/press-releases/databricks-agrees-acquire-neon-help-developers-deliver-ai-systems
Oracle Database documentation:
- 12c Multitenant Concepts — https://docs.oracle.com/database/121/CNCPT/cdbovrvw.htm
- 19c CREATE PLUGGABLE DATABASE — https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/CREATE-PLUGGABLE-DATABASE.html
- 19c Cloning a PDB — https://docs.oracle.com/en/database/oracle/oracle-database/19/multi/cloning-a-pdb.html
- 19c Administering a PDB Snapshot Carousel — https://docs.oracle.com/en/database/oracle/oracle-database/19/multi/administering-pdb-snapshots.html
- 19c MAX_PDBS reference — https://docs.oracle.com/en/database/oracle/oracle-database/19/refrn/MAX_PDBS.html
- 21c V$PDBS reference — https://docs.oracle.com/en/database/oracle/oracle-database/21/refrn/V-PDBS.html
- 26c ALTER PLUGGABLE DATABASE — https://docs.oracle.com/en/database/oracle/oracle-database/26/sqlrf/ALTER-PLUGGABLE-DATABASE.html
- Resource Manager for PDBs (19c) — https://docs.oracle.com/en/database/oracle/oracle-database/19/multi/using-oracle-resource-manager-for-pdbs-with-sql-plus.html
- ADB Compute Models (ECPU/OCPU) — https://docs.oracle.com/en/cloud/paas/autonomous-database/serverless/adbsb/autonomous-compute-models.html
- ADB Auto-Scale 3× — https://docs.oracle.com/en-us/iaas/autonomous-database-serverless/doc/autonomous-auto-scale.html
- PDB Snapshots on Exadata Exascale (23ai+) — https://docs.oracle.com/en/learn/exadb-xs-pdb-snapshot/index.html
Historical context:
- Dremel 2020 retrospective (VLDB) — https://www.vldb.org/pvldb/vol13/p3461-melnik.pdf
- Aurora 10-year retrospective — https://aws.amazon.com/blogs/aws/celebrating-10-years-of-amazon-aurora-innovation/
- Aurora cloning hard limits — https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Managing.Clone.html
- Snowflake architecture — https://docs.snowflake.com/en/user-guide/intro-key-concepts
- Oracle 18c PDB Snapshot Carousel introduction — https://oracle-base.com/articles/18c/multitenant-pdb-snapshot-carousel-18c
Neon / Postgres branching:
- Neon architecture overview — https://neon.com/docs/introduction/architecture-overview
- Neon branching docs — https://neon.com/docs/introduction/branching
- TechTarget on Databricks/Neon acquisition — https://www.techtarget.com/searchdatamanagement/news/366623864/Databricks-adds-Postgres-database-with-1B-Neon-acquisition
Companion repository:
- pmeredit/pdb-branch — https://github.com/pmeredit/pdb-branch



Top comments (0)