Steriani Karamanlis

Posted on Jun 2 • Originally published at a7omintelligence.substack.com

Platforms Cross Below Neoclouds in 2026 First as Cached Pricing Diverges 59 Points Between Frontier and Platform Channels YTD

#ai #llm #finops #mcp

Last week we wrote that twenty weeks of clean data had surfaced a two-track market, with text compressing 9.4 percent year to date while reasoning, mid tier, and budget output ran double-digit gains. Week 21 layers a second structural dimension on top of that map. The channel stack reshuffles, and for the first time in 2026 third-party inference platforms now price close to or even below GPU-native neoclouds for the average text model.

Platform discount widened from 48.0 percent in Week 20 to 50.1 percent in Week 21. Neocloud discount tightened from 50.0 percent to 48.6 percent. The two channels have crossed, reversing a discount ranking that held for the entire first quarter of the year. Cloud marketplaces still sit furthest from direct developer pricing, but the relationship between the two specialty channels has changed shape.

Size spread compressed from 7.2x to 6.2x. The premium that larger models commanded over smaller alternatives in Week 20 receded toward the 5.8x reading of Week 19. The structural KPIs continue to do the visible work while the price benchmarks themselves hold steady.

The week itself was quiet, and the channel reshuffle did the talking

Week 21 was again quiet in the matched-set. Most of the fifteen AIPI benchmark indexes posted moves under 40 basis points on every pricing direction. AIPI PLT GLB cached input recorded the largest single move on the week at minus 0.39 percent, reflecting continued aggressive cached repricing across third-party platforms. AIPI CLD GLB output was down 0.14 percent and input down 0.09 percent. AIPI TXT GLB output was down 0.07 percent and input down 0.05 percent. AIPI FTR GLB cached input was down 0.08 percent, and the rest of the frontier flagship benchmark held within two basis points across input and output. The chained matched-model methodology absorbed 5,296 priced SKUs from 51 vendors with the year-to-date picture intact and the weekly variance contained.

The story of Week 21 is not in the indexes. It is in the channel discounts, the cached pricing divergence, and the size spread retracement that all moved together while individual SKU prices stayed close to their Week 20 levels.

Why the channel stack crossed

The platform discount widened because third-party inference platforms continued to capture cheaper open weight models and to expand cached pricing tiers. Across the 696 SKUs in the platform channel, average input pricing now sits at $0.000680 per 1,000 tokens. Direct model developer pricing sits at $0.004702 per 1,000 tokens across 936 SKUs. The numerical advantage was already in platform’s favor on input pricing earlier in the year. What changed in Week 21 is that the structural KPI caught up to the underlying SKU pricing reality and crossed the neocloud line.

The neocloud discount tightened from a different direction. Neocloud pricing held remarkably stable, with AIPI NCL GLB at $0.000301 per 1,000 tokens on input and minimal weekly movement across 122 SKUs. What shifted is the model developer baseline against which the discount is measured. As AIPI DEV GLB drifted modestly to minus 0.7 percent YTD on input, the gap between neocloud and direct developer narrowed. Neocloud did not get more expensive. Direct developer pricing got cheaper while neocloud held.

The cached pricing split

AIPI FTR GLB cached input sits 39.2 percent above January. AIPI PLT GLB cached input sits 19.5 percent below. The 59-point spread between frontier and platform channels is the widest YTD divergence in any cached pricing direction since tracking began in December.

Frontier flagship models entered 2026 with cached input pricing structured as a 70 to 90 percent discount off non-cached input. Through Q1 and into Q2, several vendors restructured cached tiers, in some cases tightening the discount and in others bundling cached input with new prompt processing minimums. The result is that the chained matched-model frontier cached benchmark has climbed in steps over twenty-one weeks, not in a single move.

AIPI PLT GLB cached input moved in the opposite direction. Inference platforms expanded cached access aggressively through Q2, with caching availability across all text models crossing 23.4 percent of the population in Week 21, up from 22.8 percent in Week 19 and 16.9 percent at the start of Q2. The pricing tiers offered through platforms tend to be discounted below direct developer cached pricing as a category, and the cumulative effect is a 19.5 percent YTD decline in the platform cached benchmark.

The 59-point divergence is not a single-week event but rather the cumulative footprint of three months of structural moves that pulled frontier cached pricing up and platform cached pricing down at the same time.

The two tracks hold

The two-track YTD picture from last week holds intact. AIPI TXT GLB sits 9.4 percent below January on input. AIPI MID GLB carries 13.0 percent on input and 14.5 percent on output. AIPI RSN GLB runs 11.4 percent on input and 10.1 percent on output. AIPI BDG GLB output is up 12.4 percent year to date. AIPI FTR GLB stayed close to flat at minus 0.2 percent on input and minus 0.1 percent on output, completing the picture from last week that frontier flagship is the only tier holding its January reference within a basis point band.

Open source advantage widened from 68.8 percent in Week 20 to 69.7 percent in Week 21. Context cost curve ticked up from 3.4x to 3.6x as long context model pricing repositioned at the upper end. Reasoning premium compressed from 1.8x to 1.7x, continuing the gradual tightening from 2.2x in Week 17. Caching availability crossed 23.4 percent of text models.

Coverage and structure

Week 21 coverage settled at 5,296 SKUs across 51 vendors and 3,249 models, modestly below Week 20 as nine vendors completed planned delistings of legacy models flagged through their April change-detection logs. The delistings touched the budget and mid tier segments most heavily, but the matched-model methodology absorbed them without disturbing the chained signal.

The channel population physics clarifies how the crossover took shape. Platform covers 696 SKUs, neocloud covers 122, and direct model developers cover 936. Platform is roughly six times the size of neocloud by SKU count, and that breadth is what enabled platform pricing to find new equilibria more quickly than the smaller neocloud pool. Neocloud, with a thinner SKU base, exhibits less monthly variance and tends to hold pricing for longer windows. The compression in the platform channel had room to run, and it ran into the neocloud line in Week 21.

The cached pricing split traces back to a similar population dynamic. Of the 800 text models tracked in Week 21, 187 offer cached input pricing, or 23.4 percent of the universe. Those 187 SKUs split unevenly across channels, with platforms holding the deepest cached pricing pool relative to their direct developer counterparts.

Forward look

The first Q2 quarterly read becomes possible next week. If the channel crossover holds through Week 22, it becomes the leading framework for that analysis. If it reverts, Q2 stays anchored on the two-track tier narrative the index has carried since April.

DeepSeek V4 propagation through third-party platforms continued through Week 21. If the series lands at lower channel pricing than the V3 models it is replacing, AIPI PLT GLB cached could extend its 19.5 percent YTD decline further and widen the channel cached spread beyond 60 points.

The Grok 4.2 alignment on Azure Foundry that landed in Week 21 brought frontier flagship channel pricing closer to a unified rate across providers. Week 22 will test whether other Azure-hosted frontier model series follow with similar price alignments, particularly in cached input where the channel spread remains wide.

Methodology and Resources

The Inference Price Benchmark publishes every Monday at 9am Eastern Time.

Full index methodology is available at a7om.com/methodology.

Live pricing for AI agents and dev workflows: a7om.com/mcp

Full inference market intelligence for analysts and FinOps: a7om.com/terminal

Structured data feeds for enterprise licensing: a7om.com/feed

Get AIPI Weekly direct to your inbox every Monday: https://a7omintelligence.substack.com/subscribe

DEV Community