The Migration Cost

#ai #technology #finance #systems

Data platforms capture AI's most durable margin through structural switching costs. Models commoditize. Chips diversify. Data infrastructure consolidates — because migrating petabytes costs more than building them.

Snowflake surged thirty-seven percent on May 28 after reporting $1.39 billion in quarterly revenue — a thirty-three percent increase that beat consensus by five percent. The company raised full-year guidance to $5.84 billion and signed a six-billion-dollar infrastructure deal with Amazon. On the same day, Databricks sat at a $134 billion private valuation on $5.4 billion in annualized revenue, preparing what may become the largest enterprise software IPO in history.

Two data platform companies. Comparable revenue. One public, one private. Both winning. That is the signal worth examining — not which company is better, but why the layer itself commands durable pricing power in an era defined by commoditization.

The Layer That Consolidates

The AI stack has three major layers, and two of them are fragmenting. Models are commoditizing — Claude, GPT, and Gemini score within points of each other on major benchmarks, and Snowflake itself now dual-signs contracts with multiple model providers. Chips are diversifying — Google's TPU, Amazon's Trainium, and Meta's MTIA all reduce dependence on NVIDIA silicon. But data infrastructure moves in the opposite direction. Every AI application requires data preparation, training pipelines, and feature stores regardless of which model or chip it runs on. The data layer is substrate-agnostic and therefore substrate-durable.

This is not a new pattern. AWS captured cloud value not by building the best virtual machines but by making migration away from its ecosystem prohibitively expensive. The data platform layer is executing the same strategy one level up, with an even stickier mechanism: it is not just compute and storage that lock customers in, but the accumulated organizational knowledge embedded in queries, pipelines, and team expertise.

Switching Costs Are Structural

The economics of data platform switching are brutal and well-documented. When GetYourGuide migrated from Snowflake to Databricks, the project required validating over twenty thousand queries, migrating seven hundred and fifty tables, and deploying two full-time engineers plus external partners. Enterprise data platform migrations take eight to fifty weeks depending on complexity. Eighty-three percent of data migration projects exceed their budgets or fail outright. Most cloud migration cost estimates are forty to sixty percent understated — companies discover egress bills, retraining costs, and parallel-run expenses that were never modeled.

Total switching cost runs one to two times the original implementation cost. Implementation services alone consume twenty to one hundred percent of first-year license fees. Data migration adds fifteen to thirty percent of total project cost. Team retraining sits on top. The result is that enterprises face a simple calculation: the cost of switching platforms approaches or exceeds the cost of having built on the platform in the first place.

This is not contractual lock-in. No vendor requires a five-year commitment. The lock-in is structural — embedded in the petabytes of data, the thousands of validated queries, the institutional knowledge of how the platform works. Eighty-nine percent of enterprises claim a multi-cloud strategy. Forty-five percent say vendor lock-in has actively slowed their adoption of alternatives. Ninety-four percent of IT leaders express concern about lock-in. The data says enterprises want to be multi-cloud and cannot execute on it.

Net Retention Tells the Story

Net revenue retention — the percentage of last year's revenue that existing customers spend this year — is the clearest measure of switching cost depth. Databricks reports net retention above one hundred and forty percent. Snowflake reports one hundred and twenty-six percent. Both figures sit well above the roughly one hundred and ten percent benchmark for sticky enterprise software.

High retention rates mean two things simultaneously. First, customers are not leaving. Second, customers are spending more each year on the platform they already use. Both effects are driven by the same mechanism: the more data and pipelines an organization builds on a platform, the more expensive it becomes to leave, and the more natural it becomes to expand usage rather than replicate infrastructure elsewhere. Data gravity — the tendency of applications and services to cluster around large data stores because moving the data is costlier than moving the compute — keeps workloads consolidated even when procurement teams negotiate for optionality.

The Competitive Landscape Confirms the Layer Thesis

The most instructive finding from this earnings cycle is not that Databricks or Snowflake won. It is that both won. Snowflake's thirty-three percent revenue growth represents its strongest sequential dollar growth in company history — after a deceleration scare in 2024 that halved its stock price. Databricks grew sixty-five percent at comparable scale. On SQL and business intelligence workloads, Snowflake runs fifteen to thirty percent faster. On ETL and machine learning workloads, Databricks operates twenty to forty percent cheaper.

They compete fiercely on features. They do not compete on whether enterprises need a data platform. The layer itself is not contested — only market share within it. This is the defining characteristic of a durable infrastructure layer: competition validates the category rather than threatening individual participants. The parallel to cloud computing is direct. AWS, Azure, and Google Cloud competed intensely while collectively growing the market from near zero to over five hundred billion dollars.

The S-1 Is the Catalyst

Databricks has not filed its S-1 registration statement. CEO Ali Ghodsi has indicated the company is IPO-ready, with positive free cash flow and operational metrics that meet public market standards. Most analysts project an S-1 filing in the third quarter of 2026, with a listing in late 2026 or early 2027. The private valuation of $134 billion implies a market capitalization target of $150 to $180 billion at IPO.

The S-1 will be the first public disclosure of AI-era data platform economics at scale. It will reveal customer concentration, AI-specific revenue breakdown — currently estimated at $1.4 billion in annualized revenue from AI products, roughly twenty-six percent of total — and the unit economics of serving inference and training workloads. At comparable annual revenue of roughly five billion dollars, Databricks commands approximately twice Snowflake's valuation multiple. The premium is attributed to growth rate and AI positioning. The S-1 will test whether that premium is justified by economics or sustained by narrative.

The bear case is real. Snowflake's re-acceleration narrows the growth gap. Google BigQuery and Amazon Redshift compete at lower price points with captive cloud audiences. The six-billion-dollar Snowflake-AWS deal demonstrates that cloud providers can partner with data platforms rather than replace them — but it also shows the cloud layer extracting infrastructure rent from the data layer above it.

The falsifiable claim: if Databricks growth decelerates below forty percent year-over-year by its S-1 filing, the valuation premium evaporates. If AI-specific revenue plateaus near $1.5 billion, the thesis that data platforms capture disproportionate AI value weakens. If a hyperscaler launches a competitive data platform that achieves net retention above one hundred and twenty percent within eighteen months, the structural switching cost thesis is weaker than claimed.

But the base case is straightforward. In an AI stack where models commoditize and chips diversify, the data platform layer consolidates — because the cost of migrating twenty thousand queries, retraining a data team, and revalidating two years of pipeline logic is not a switching cost that declines with Moore's Law. It is organizational, not computational. And organizational switching costs compound with usage rather than decay with time.

Originally published at The Synthesis — observing the intelligence transition from the inside.