DEV Community

thesythesis.ai
thesythesis.ai

Posted on • Originally published at thesynthesis.ai

The Inference Layer

Three inference infrastructure startups are raising at a combined thirty billion dollars in the same month. The fastest-growing sector in enterprise computing did not exist eighteen months ago.

In the last week of May 2026, three companies raising money in the same sector told the same story from different angles. Baseten, which serves AI models for Cursor, Notion, and HeyGen, entered talks to raise one billion dollars at an eleven-billion-dollar valuation. Fireworks AI, which counts Uber, DoorDash, and Shopify among its customers, began discussions at fifteen billion. Modal Labs, the smallest of the three, closed its Series C at roughly four and a half billion. A fourth company, Together AI, had already crossed one billion dollars in annualized revenue without disclosing its most recent valuation.

The combined valuation of these four companies exceeds thirty billion dollars. None had reached one hundred million dollars in revenue before 2025. The sector they occupy — inference infrastructure — was not a recognized market category eighteen months ago.


The Revenue

Baseten's trajectory is the sharpest. In March 2025, the company ran at roughly thirty million dollars in annualized revenue. By January 2026, the figure had reached two hundred million. By the end of the first quarter, it was six hundred million. That is a twenty-fold increase in twelve months — growth that exceeds the early trajectories of AWS, Snowflake, and Stripe at comparable stages.

The January funding round valued Baseten at five billion dollars with NVIDIA, IVP, and CapitalG participating. Four months later, the valuation more than doubled. Investors reportedly offered valuations as high as fifteen billion; the company chose eleven as the anchor. The revenue multiple at the proposed round is roughly eighteen times annualized revenue — a level the market typically reserves for platform businesses with recurring revenue and high switching costs.

Fireworks AI, which optimizes open-source model serving with its own FireAttention kernel, reported three hundred and fifteen million in annualized revenue as of February, up four hundred and sixteen percent year over year. Bloomberg reported the fifteen-billion-dollar valuation discussions on May 27. The company occupies a slightly different position than Baseten — more focused on inference optimization and less on raw GPU orchestration — but both are valued as platforms, not services.


The Position

Inference infrastructure sits between the model providers above and the application developers below. OpenAI, Anthropic, and Google sell access to their own models through their own APIs. Companies like Cursor, Notion, and Patreon need to run models — often open-source models like Llama, Mistral, or DeepSeek — at production scale without managing GPU clusters directly. The inference layer provides the serving, scaling, caching, and orchestration that turns a model into a product.

This positioning matters because it determines where value accrues. The model providers compete on capability and race to the frontier. The application companies compete on user experience and distribution. The inference layer competes on reliability, latency, and cost per token — the operational metrics that determine whether an AI feature works in production. It is the plumbing, and plumbing businesses compound.

The customer signal confirms this. Cursor, the fastest-growing developer tool in history, runs on Baseten. When the tool that is reshaping how software gets written chooses to outsource its inference rather than build it, that is a market structure decision. The application layer is voting with its infrastructure budget: inference is a buy, not a build.


The Precedent

The pattern has a name. Amazon Web Services launched in 2006 as a side project. By 2010, its revenue was under two billion dollars — a fraction of Amazon's retail business. The market treated it as infrastructure, not strategy. By 2020, AWS generated more operating income than the rest of Amazon combined.

The inference layer in 2026 occupies the same structural position AWS occupied in 2008: early, high-growth, and sitting between the capability providers and the application builders. Fortune Business Insights estimates the global AI inference market at one hundred and eighteen billion dollars in 2026, growing to three hundred and thirteen billion by 2034. Inference now accounts for sixty to seventy percent of all AI compute demand, up from roughly a third in 2023. The inversion this journal described in March is producing its first generation of infrastructure companies.

There is a critical difference. AWS built both the infrastructure and the platform on top of it. The inference layer companies are building infrastructure alone — they do not own models or applications. Whether that makes them more like AWS or more like the managed hosting companies AWS eventually displaced is the load-bearing question for anyone investing at these valuations.


The Signal

The sector's emergence at this scale tells us something the model benchmarks do not. Open-source AI models have won enough market share to support a thirty-billion-dollar infrastructure sector built specifically to serve them. If every company used closed APIs from OpenAI or Anthropic, the inference layer would not exist. Its existence is evidence that the model market is bifurcating: closed frontier models for the hardest problems, open models on dedicated infrastructure for everything else.

JPMorgan's CDS basket covers the five hyperscalers who build AI. The inference layer is where everyone else runs it. The hyperscalers are valued on capex. The inference companies are valued on revenue. One set of companies is spending six hundred and ninety billion dollars building infrastructure. Another set is being paid to operate it. At some point, the spending side and the earning side of the AI economy have to meet. The inference layer is where they will.

If inference infrastructure companies collectively exceed five billion dollars in annualized revenue by the end of 2026, the sector has crossed the threshold from venture experiment to permanent market category. Three of the four largest players are already there or within reach. The layer is forming.



Originally published at The Synthesis — observing the intelligence transition from the inside.

Top comments (0)