Alex Merced

Posted on May 27

AI Weekly: Cheaper Coding Models, Custom Chips, and a Stateless MCP

#ai #coding #mcp #news

The past week pushed three quiet shifts into the open. A coding model matched the frontier at a tenth of the cost. Custom chips started outgrowing Nvidia. And the protocol behind most AI agents got its biggest rewrite yet.

AI Coding Tools: Cursor Ships Composer 2.5 and the Price of Frontier Coding Drops

Cursor released Composer 2.5 on May 18. The headline is not the benchmark score. It is the price next to that score.

Composer 2.5 scores 79.8% on SWE-Bench Multilingual and 63.2% on CursorBench v3.1. Those numbers sit right next to Claude Opus 4.7 and GPT-5.5 on the same tests. The standard tier costs $0.50 per million input tokens and $2.50 per million output tokens. That works out to roughly one-tenth the cost per task of the frontier models it matches.

The model runs on Moonshot AI's open-source Kimi K2.5 checkpoint. Cursor spent about 85% of its compute budget on its own post-training, including reinforcement learning on 25 times more synthetic coding tasks than Composer 2 used. The base model came from a Beijing lab. The behavior that developers actually feel came from Cursor's training pipeline. That split tells you something about where value lives in 2026. The base weights are increasingly a commodity. The post-training is the product.

Composer 2.5 is built for long, tool-heavy sessions. It reads files, runs terminal commands, edits across many files, runs tests, and iterates on its own. Cursor tuned it for sustained work and better instruction following, not just raw puzzle-solving. The jump from Composer 2 was real: SWE-Bench Multilingual went from 73.7% to 79.8%, and Terminal-Bench 2.0 went from 61.7% to 69.3% in two months.

The model is not the whole story for Cursor this month. The editor is turning into something closer to a control panel for a whole team's agents. Cursor 3.3 landed on May 7 with Build in Parallel, a feature that dispatches async subagents across independent steps of a plan at the same time. Cursor 3.5 followed on May 20 with multi-repo automations and shared canvases for team artifact access. The pattern is clear. Cursor wants to be the place where you manage many agents, not just the place where you autocomplete one line.

Google answered fast. On May 19, one day after Composer 2.5 shipped, Google launched Antigravity 2.0 with Gemini 3.5 Flash at I/O 2026. Antigravity 2.0 targets the same agentic IDE seat as Cursor. It pairs multi-agent orchestration with a built-in Chromium browser, dynamic subagents, and scheduled background tasks. Two of the biggest names in the space shipped competing agentic IDEs within 24 hours of each other. That cadence is the real signal.

Here is the part worth sitting with. Composer 2.5 does not beat Opus 4.7 or GPT-5.5 outright. On CursorBench v3.1, a test built around real Cursor workflows, it edges ahead. On Terminal-Bench 2.0 it ties Opus 4.7 at 69.3% but trails GPT-5.5 at 82.7%. Opus keeps an edge on deep architectural reasoning and long single-shot generation. So the frontier still leads on the hardest work. What changed is that "good enough for most tasks" now costs a fraction of what it did, and it runs inside the editor where the work already happens.

For teams, this resets the math. A year ago the question was which single coding tool to standardize on. That question is gone. JetBrains research from April found that 90% of developers used at least one AI tool at work as of January 2026, and 74% used specialized development tools beyond plain chat. GitHub Copilot stayed the most adopted at 29% workplace usage, with Cursor and Claude Code tied at 18%. Most teams now run two or three tools in different roles. A common setup pairs Claude Code in the terminal for agentic work, Cursor or Copilot in the IDE for inline edits, and a chat window for thinking through design.

The terminal-native agents keep gaining ground at the high end. The JetBrains 2026 survey recorded Claude Code jumping from 3% adoption in April 2025 to 18% in January 2026, a sixfold rise in nine months. The starker number is senior developer preference. When JetBrains asked developers with more than ten years of experience which tool they would pick for daily work, 46% chose Claude Code and 9% chose Copilot. The anthropics/claude-code repository now counts more than 126,000 GitHub stars. Codex passed 3 million weekly active users in March, up from 2 million a month earlier. None of these tools is winning outright. Each owns a slice.

The interesting question for teams is which tool acts as the controller and which ones do the subtasks. Most teams now run a terminal-native agent like Claude Code as the controller and hand specific jobs to Codex or Cursor. That arrangement is not stable. It has shifted twice already this year and will likely shift again before December. Picking a permanent stack right now is a bet against your own future workflow.

One more change is reshaping the field, and it is about money, not models. GitHub paused new sign-ups for Copilot Pro and Pro+ in April. Copilot moves to AI Credits-based flex billing on June 1, keeping the $10 and $39 prices but swapping in credit pools. A new Copilot Max tier targets heavy individual users. Windsurf raised Pro from $15 to $20 a month and added a $200 Max plan bundling Devin. Cursor included double usage for the first week after Composer 2.5 to pull developers in for evaluation. The tools are competing on cost structure now, not only capability.

The plain advice for a working developer in 2026 holds up well. A solo developer or hobbyist gets the best entry value from Copilot at $10 a month. A full-time developer tends to pay for itself with Cursor Pro at $20 a month within the first week. A senior developer or technical founder who lives in the terminal gets the most from Claude Code on a higher tier, where the agentic depth justifies the price. Many people use more than one of these at once, and that is the sane default rather than a sign of indecision.

Microsoft Build runs June 2 and 3 in San Francisco, so its announcements land just after this issue. The smaller two-day format and the agenda point at agents as the throughline. The seven session tracks include Agents and Apps, Azure AI Foundry, and a track on working with models. Microsoft framed 2026 as the year agentic tooling moves from announced to production-ready. Expect multi-agent orchestration, new APIs for deploying autonomous agents, and updates to Microsoft's MCP integrations. We will cover the actual announcements next week.

AI Processing: Custom Chips Start Outgrowing Nvidia

A shipment forecast this week marked a turning point in AI hardware. For the first time, custom AI chips are projected to outgrow Nvidia's GPUs.

TrendForce projects 44.6% growth in ASIC shipments for 2026 against 16.1% growth for merchant GPUs. Alchip Chairman Johnny Shen confirmed the shift in comments reported by Digitimes on May 26. ASIC stands for application-specific integrated circuit, a chip designed for one job rather than general flexibility. The growth gap says buyers are moving toward purpose-built silicon faster than they are buying more general-purpose GPUs.

The reason is straightforward. Nvidia's GPUs are general-purpose processors. They are powerful and flexible, built to run nearly any AI workload, but not tuned for any single one. AI inference, the ongoing job of running trained models against live queries, has overtaken training as the dominant compute load. For inference at scale, that flexibility carries a cost you no longer need to pay. A chip built for one model architecture can run it cheaper and cooler than a GPU that can run anything.

This does not mean Nvidia is in trouble. Nvidia still holds roughly 70% to 80% of the AI accelerator market by revenue. Total Nvidia AI accelerator revenue could pass $150 billion in 2026. Losing share in percentage terms is not the same as losing money when the whole market is growing this fast. The real pressure on Nvidia is margin, not market exit. As hyperscalers diversify suppliers, they gain leverage to push pricing on next-generation parts.

The custom-chip wave has clear backers. Broadcom co-designs Google's Tensor Processing Units and chips for Meta and others. It reported $8.4 billion in AI semiconductor revenue in a recent quarter. Alchip is forecasting a return to growth in 2026 as new 3-nanometer accelerator programs hit volume in the second quarter, with about 80% of its revenue landing in the second half of the year. Every major hyperscaler now ships in-house silicon: Google with TPU, AWS with Trainium, Microsoft with Maia, and Meta with its own designs.

Nvidia is not standing still. Its Vera Rubin architecture, the successor to Blackwell, is in full production, with partner products arriving in the second half of 2026. Rubin is built on TSMC's 3-nanometer process with HBM4 memory and 336 billion transistors. Nvidia reports it cuts inference token costs by 10 times and reduces the GPUs needed to train mixture-of-experts models by 4 times compared to Blackwell. AWS, Google Cloud, Microsoft, and Oracle are among the first cloud providers set to deploy Vera Rubin instances. The architecture is tuned for mixture-of-experts models, the same design trend showing up across the field.

AMD is running its own play. The Instinct MI400 launches in the second half of 2026 with 432GB of HBM4 memory and 40 petaflops of FP4 compute. S&P Global projects the MI400 will generate $7.2 billion in its first year, and AMD's data center GPU revenue is forecast to grow 114% year over year to $15 billion. AMD also locked in a multi-generational deal with Meta covering a 6-gigawatt deployment, the first tranche using MI450-based custom GPUs.

What does this mean if you build with AI rather than sell chips? Inference is getting cheaper, and the savings will reach your bills. The same shift that lets Cursor sell near-frontier coding at a tenth of the cost is happening one layer down in silicon. Purpose-built inference chips, mixture-of-experts models that activate only the parameters they need, and architectures tuned for serving instead of training all point the same direction. Running models in production is on a steady path to costing less, which changes what is worth building.

There is a catch that keeps Nvidia ahead even as custom chips grow faster. Its CUDA software ecosystem has more than a decade of tools, libraries, and developer habits built on top of it. Moving a workload off Nvidia means rewriting or re-tuning the code that runs on it, and that switching cost is real. Custom ASICs win where the workload is fixed and the volume is huge enough to justify the engineering, which describes a hyperscaler running one model at massive scale. It does not yet describe most teams, who still benefit from the flexibility of a GPU that runs whatever they throw at it.

The other trend to watch is the move to the edge. NPUs in laptop-class chips from Intel, AMD, and Apple now deliver 40 to 50 TOPS of on-device inference. That is enough to run capable local models without a round trip to the cloud. The hybrid pattern, cloud for the hard reasoning and the device for latency-sensitive work, is becoming the default shape for AI apps rather than a niche choice.

Standards and Protocols: MCP Gets Its Biggest Rewrite, and the NSA Weighs In

Two things happened to the Model Context Protocol this week, and they pull in opposite directions. The protocol got its largest revision since launch, and the NSA published its first formal security guidance for it. Both matter if you build agents.

MCP is the open standard, created by Anthropic in late 2024, that lets AI models connect to external tools, databases, and services through one common interface. Instead of writing custom glue code for every new integration, you connect through MCP once. It has become a core building block of the agentic AI stack.

On May 21, the maintainers locked the release candidate for MCP 2026-07-28. The final spec publishes on July 28. The ten-week gap gives SDK maintainers and client builders time to validate the changes against real workloads. This is the biggest revision since the protocol launched.

The centerpiece is a stateless protocol core. The new version removes the session ID, the initialize handshake, and resumable streams. In plain terms, the protocol now runs on ordinary HTTP infrastructure without holding a connection open or tracking session state on the server. That is a major change for anyone running MCP in production. Stateless services scale across commodity servers far more easily than stateful ones, since any server can handle any request. Load balancing gets simpler, and recovery from a crashed node stops being a problem.

The release brings more than the stateless core. An Extensions framework lets capabilities ship on their own timeline instead of waiting for a full spec release. Two extensions arrive with it. MCP Apps allow server-rendered user interfaces, so a tool can return a real UI rather than plain text. The Tasks extension handles long-running work, the kind of job that takes minutes or hours instead of returning instantly. Authorization now aligns more closely with OAuth and OpenID Connect, which matters for enterprise deployments. A formal deprecation policy means the protocol can change without breaking what teams already built.

The timing of the NSA guidance is no accident. On May 20, the NSA's Artificial Intelligence Security Center published a Cybersecurity Information Sheet titled "Model Context Protocol: Security Design Considerations for AI-Driven Automation." The document runs 17 pages, carries identifier U/OO/6030316-26, and is the most careful security treatment of MCP to date.

The core warning is structural. MCP flips the usual security model. In a normal API, clients query servers for data. With MCP, servers query data and execute actions on behalf of clients. That inversion means the mental model most engineers use to reason about API security points the wrong way. Access control is optional at the protocol level, which is exactly the gap the NSA flags.

The guidance names specific risks. Serialization flaws can let bad input trigger unstable behavior. Trust boundary failures let one component over-reach into another. Unverified task propagation lets tasks pass between MCP servers without checking their origin, scope, or intent, which can leak sensitive context or fire unrelated tools. Session weaknesses can allow message replay or session hijacking. The NSA's central point is that these problems cannot be patched at isolated endpoints. They have to be addressed across the whole MCP environment.

The practical advice is concrete. Validate every tool invocation against defined schemas, expected ranges, and the intended execution context. Log all tool and model invocations with their exact parameters and the identities involved. Use a filtering outgoing proxy or enterprise data-loss prevention for external MCP connections, with resource URLs pinned tightly enough to limit leakage. Prefer a local MCP server instance when processing private data. Align tools and models with data classification zones, so public tools handle public data and sensitive tools stay segregated and explicitly controlled.

Read the two documents together and a picture forms. MCP adoption has outrun its governance. The protocol is now embedded in production workflows across finance, legal, and software, which means the NSA is describing live exposure in regulated industries, not a hypothetical. MCP stacks built in 2024 and early 2025 likely lack the authentication and privilege isolation now considered baseline. The 2026-07-28 spec hardens authorization and brings the stateless core, and the NSA guidance gives teams a checklist while they wait for it. If you ship anything touching MCP, both belong on your reading list this week.

If you run agents in production, this week gives you a short to-do list. Audit your MCP servers for the access controls the NSA flags, since the protocol will not enforce them for you. Log every tool call with its parameters and the identity behind it, because you cannot investigate what you did not record. Plan for the stateless migration ahead of the July 28 final spec, especially if your current setup leans on session IDs or resumable streams. And test at least one of the cheaper coding models against your real workload before your next billing cycle, since the cost gap is now large enough to matter at team scale.

The thread tying all three categories together is maturation. Coding models are competing on cost because capability has spread. Chips are specializing because workloads have settled into clear shapes. And the protocol layer is being rewritten for scale and locked down for security because it is running real production systems now. The experimental phase is closing. The infrastructure phase is here.

Resources to Go Further

The AI landscape changes fast. Here are tools and resources to help you keep pace.

Try Dremio Free: Experience agentic analytics and an Apache Iceberg-powered lakehouse. Start your free trial

Learn Agentic AI with Data: Dremio's agentic analytics features let your AI agents query and act on live data. Explore Dremio Agentic AI

Join the Community: Connect with data engineers and AI practitioners building on open standards. Join the Dremio Developer Community

Book: The 2026 Guide to AI-Assisted Development: Covers prompt engineering, agent workflows, MCP, evaluation, security, and career paths. Get it on Amazon

Book: Using AI Agents for Data Engineering and Data Analysis: A practical guide to Claude Code, Google Antigravity, OpenAI Codex, and more. Get it on Amazon

Top comments (1)

Harjot Singh • May 31

Those three threads in your roundup actually rhyme: cheaper coding models, custom inference chips, and stateless MCP are all pointing at the same future - inference is commoditizing, so the durable value moves off "which model" and onto the orchestration around it. When the model layer gets cheap and swappable, the moat is the harness: routing, context handling, and protocol plumbing like MCP that lets you mix-and-match without lock-in.

That's the thesis I'm building on with Moonshift - a multi-agent pipeline that ships a prompt to a real SaaS on your own GitHub + Vercel, model-agnostic by design so it rides the cheaper-models curve instead of being hostage to one vendor; routing the deterministic 80% to cheap models lands a full build ~$3 flat. First run's free, no card. The stateless-MCP item is the one I find most interesting long-term - do you read it as MCP maturing toward serious multi-agent infra, or still early/experimental? That protocol layer feels underrated.