Alex Merced

Posted on Jun 4

AI Weekly: New PC Chips, Credit Pricing, Stateless MCP

#agents #ai #mcp #news

Week of May 28 to June 4, 2026

This week the AI stack moved on three fronts at once. Coding tools reset their pricing and shipped new models, NVIDIA pushed its silicon into the Windows PC, and the Model Context Protocol started its run toward a stateless core. Here is what changed and why it matters for the people who build with these tools every day.

The common thread is maturity. None of this week's news was a flashy demo. It was the work of making agents cheap to run, fast to serve, and safe to scale. Pricing models, silicon, and protocols are the plumbing of the agent era, and the plumbing got most of the attention this week. That is a sign the technology is moving from the lab into the budget line. The teams that win the next year will not be the ones with the flashiest model. They will be the ones who set sane cost caps, pick durable standards, and keep a human in the loop while everyone else chases the leaderboard.

AI Coding Tools: A Pricing and Model Reset

The coding tool market spent the week sorting out two questions at the same time. Who has the best model, and who has the pricing developers will accept. Both answers shifted.

Anthropic released Claude Opus 4.8 on May 28, 2026, and paired it with a feature called Dynamic Workflows for Claude Code, according to a June 3 roundup of coding agents. Opus 4.8 is the newest model in the Claude line. Dynamic Workflows lets Claude Code plan and run multi-step tasks with less hand-holding from the developer. The terminal agent reads the repo, drafts a plan, and works through it step by step. That design suits long tasks like a refactor across many files or a migration that touches a dozen modules.

The model release matters because of where Claude Code now sits in developer preference. A JetBrains survey of engineers with more than ten years of experience found that 46% picked Claude Code as their daily tool, against 9% for GitHub Copilot, as reported in a June 2 market analysis. That same analysis traced Cursor's revenue from 100 million dollars in annual recurring revenue in January 2025 to 1 billion by mid-2025. The tools that started agentic, rather than bolting an agent onto an editor, are growing fastest.

The pricing story was louder. GitHub Copilot moved to usage-based billing on June 1, 2026, a change tracked across several 2026 buyer guides. The old flat plans gave way to AI Credits. Copilot Pro at 10 dollars a month now buys a 1,500-credit monthly allowance instead of unlimited agent use, and heavy agent sessions burn through that pool fast. GitHub also added a 100 dollar Max plan for the developers who run agents all day. The shift drew a public backlash from users who liked the old flat rate.

The backlash points at a real problem with agentic coding. Cost is hard to predict. A single long agent run can spend a large share of a monthly budget in one sitting. A Q1 2026 survey cited in one buyer guide found that 42% of developers ranked cost volatility as their top pain point, ahead of model reliability. Teams that adopt agents without a usage cap can get a surprise bill. The fix is simple to state and easy to skip. Set a team-level cap before you turn agents loose.

Cursor answered the model question with its own engine. The company shipped Composer 2.5 in May 2026, an in-house long-horizon model that the lushbinary roundup says matches Opus 4.7 and GPT-5.5 on benchmarks at 0.50 dollars per million input tokens and 2.50 dollars per million output tokens. Cursor also added Build in Parallel, which runs several agent tasks at once, and a built-in pull request review step. Owning the model gives Cursor control over both price and latency, and it lets the company tune the model for the editing tasks its users run most.

The rest of the field kept moving too. Windsurf rebranded as Devin Desktop, folding the editor into Cognition's agent brand. Google shipped Antigravity 2.0 with Gemini 3.5 Flash as a fast default model. Kiro switched to a credit model, with Kiro Pro at 20 dollars a month for 1,000 credits, which lines up with Cursor Pro on price. The pattern across all of them is the same. Flat unlimited pricing is fading, and credits or usage tiers are taking its place.

Security tooling caught up to this shift. Salt Security launched Salt Code on June 2, 2026, per the company's announcement. Salt Code enforces security policy inside the coding assistant itself. It works across Claude, Cursor, GitHub Copilot, Windsurf, Codex, and Gemini CLI, and it aims to make those tools generate policy-compliant code by default, from the first prompt to production. The pitch lands because of a number that keeps showing up in 2026 surveys. Roughly 48% of AI-generated code carries a security flaw, and about 75% of senior developers still review every snippet before merging. Agents shift where engineers spend time. They do not remove the need to spend it.

There is a practical lesson in this week's coding news. Pick the tool that fits the work, not the leaderboard. A terminal agent like Claude Code suits a senior engineer running a big refactor. An inline tool like Copilot suits a junior developer who wants visual diffs and quick completions. Match the format to the task, set a cost cap, and keep a human in the review loop. Those three habits hold up no matter which logo wins the quarter.

The conference calendar reinforced the direction. Anthropic ran its Code with Claude event in San Francisco on May 6, 2026, with stops in London on May 19 and Tokyo on June 10, as noted in the June 2 market analysis. The company also pushed Cowork, described as Claude Code for general computing. Cowork extends the agent model past code into spreadsheets, file work, and report drafting for people who do not write software. The business is shifting from a license for an assistant to the sale of work that gets done.

The Competitive Picture

The model race and the pricing race are reshaping who leads. Cursor's revenue tells the clearest story. The company reached 100 million dollars in annual recurring revenue in January 2025, 500 million by mid-year, and 1 billion by the second half of 2025, with revenue doubling about every two months through early 2026. That growth rate is rare even for fast business software, and it came from a tool that started agentic rather than adding an agent later.

Claude Code grew on a different axis. The JetBrains survey that put Claude Code at 46% among senior engineers, against 9% for Copilot, reflects a network effect more than a marketing push. Anthropic's enterprise guide for scaling Claude Code became one of the most-read documents in the field, and the skills ecosystem around the tool gave teams a shared library of workflows. Each new skill makes the tool more useful, which draws more users, which produces more skills.

Microsoft has not stood still. Through 2025 it shipped Copilot's agent mode, added Bring Your Own Key to plug in third-party models, and opened VS Code Insiders to Anthropic's protocols. The work reads as extension rather than redesign. Copilot stays the safe enterprise choice, with deep VS Code integration and reliable suggestions, but it rarely surprises a senior engineer with an idea they had not considered. That reliability suits large teams that value predictability over reach.

The June 1 billing change put that position under more pressure. Moving longtime Pro users to credits, then adding a 100 dollar Max plan, asked the most loyal users to pay more or ration their agent runs. Some accepted it. Others looked at Cursor and Claude Code, where the agent model came first. Microsoft Build 2026 in San Francisco arrives within weeks and is expected to reset the board again, so this snapshot will age fast.

The lesson for buyers is not to chase the leader. The leaderboard turns over every quarter. The durable move is to build on the standards all these tools share, MCP and A2A, so a switch between assistants costs a configuration change rather than a rewrite. Tools come and go. The integration layer is what you keep.

How the Coding Tools Now Differ

The category split into four shapes in 2026, and the shape matters more than the brand. The first shape is the IDE plugin. GitHub Copilot lives inside VS Code and JetBrains as an extension, offers inline completions and an agent mode, and fits teams that want AI without changing their editor. The second shape is the VS Code fork. Cursor replaces the editor and makes AI a first-class part of the layout, and its Composer feature proposes edits across many files in one pass. The third shape is the terminal agent. Claude Code runs in the shell and treats the codebase as a conversation, not a stream of completions. The fourth shape is the bring-your-own-key open tool, like Cline and Cody, which lets a team plug in its own model and keys.

These shapes handle different work. Inline completion saves keystrokes on small, local edits. A multi-file composer handles a feature that touches several files at once. A terminal agent handles a long task with many steps, like a framework upgrade or a test-suite rewrite. A team that standardizes on one shape for every task pays for it in mismatched fit.

The credit math is the part teams keep getting wrong. Copilot Pro at 10 dollars now gives 1,500 credits a month. Inline completions stay light on credits. Agent runs do not. A single long agent session that reads a large repo, drafts a plan, and edits a dozen files can spend hundreds of credits in one sitting. Run three of those in a day, and the monthly pool runs dry by the second week. The 100 dollar Max plan exists for exactly this reason, and so does the new pressure to cap usage per developer and per team.

Cursor's move to its own Composer 2.5 model is a hedge against that volatility. When a tool rents its model from a frontier lab, its margin and its latency follow that lab's pricing. When it owns the model, it sets both. Composer 2.5 runs at 0.50 dollars per million input tokens and 2.50 dollars per million output tokens, which undercuts frontier pricing for the editing tasks Cursor users run most. Build in Parallel then spends that cheaper inference on several agent tasks at once, which turns a price advantage into a speed advantage.

The review burden is the quiet cost in all of this. One 2026 measurement found senior engineers now spend about 11.4 hours a week reviewing code against 9.8 hours writing it. AI shifted the bottleneck from writing to reading. That is why the security layer grew this week. Salt Code sits in the assistant and blocks policy violations before they reach a pull request, which moves some of the review load earlier, where it is cheaper to fix.

Cowork points at where this goes next. Anthropic built it as Claude Code for general computing, aimed at people who do not write software, and it runs agent work across spreadsheets, files, and reports. The same agent pattern that refactors code can reconcile a budget or draft a status report. For data teams, that matters because the analyst down the hall now has an agent that can touch the same files and systems the engineers use. Governance has to cover both.

AI Processing: NVIDIA Pushes Into the PC

The biggest hardware story of the week did not come from a data center. It came from a laptop chip. NVIDIA CEO Jensen Huang presented the RTX Spark Superchip at Computex 2026 in Taipei on June 1, as covered by CNBC. The chip is a system-on-chip for Windows PCs, and Huang said NVIDIA and Microsoft plan to reinvent the PC.

The market read the move as a threat. Shares of AMD, Intel, and Qualcomm fell after the announcement. NVIDIA has owned the data center for years. Now it is going after the PC, and Wall Street took notice. The RTX Spark targets the edge, where phones and laptops run AI models on the device instead of calling the cloud for every request.

The chip also marks a bet on Arm. CPUs have run on the x86 instruction set that Intel pioneered in the 1970s and AMD extended later. Arm's lower-power design went mainstream when Apple put it in the first iPhone in 2007, and Amazon brought it to the data center with Graviton in 2018. NVIDIA building an Arm-based SoC for Windows machines extends that arc into the consumer PC. Low power draw, not raw clock speed, is the selling point for on-device AI.

On-device inference solves real problems. It cuts latency because the model runs next to the user. It improves privacy because data does not leave the device. It lowers cost because each query does not hit a cloud GPU. For developers, an Arm laptop with a strong NPU changes what a local model can do. A coding agent that runs partly on the device responds faster and keeps source code off third-party servers.

The PC push sits inside a larger contest over inference. AI moved past the training phase, where NVIDIA has dominated, into deployment, where the field is wider. AMD submitted its MI355X system to the MLPerf Inference v6.0 benchmark on April 1, 2026, showing the reach of its ROCm software stack, per AMD's ROCm blog. Google's TPUs, AWS Inferentia, and Intel Gaudi all compete for inference workloads. Each wins on a different axis, whether that is latency, throughput, memory, or cost.

NVIDIA's roadmap stretches past the PC. At its GTC conference earlier in 2026, the company set plans to ship its next-generation Vera Rubin system in the second half of the year, with a successor named Feynman after the physicist. NVIDIA also bought assets from inference startup Groq in a 17 billion dollar deal and signaled an inference-focused chip strategy. The RTX Spark is the consumer face of a plan that spans the whole stack, from the phone to the rack.

For teams choosing inference hardware, the guidance has not changed. Define what speed means for your workload first. Time-to-first-token matters for chat. Tokens per second matters for batch serving. A chip that wins on throughput can still feel slow if its tail latency spikes under load. Benchmark two candidates with your real prompt lengths and output sizes before you commit. Hardware comparisons that ignore prompt length and tail latency mislead more than they help.

The edge trend has a second-order effect worth watching. As more inference moves to the device, the cloud bill shifts from per-query GPU time toward model distribution and sync. Smaller models that fit on a laptop NPU get more attention. Distillation, quantization, and mixture-of-experts routing all help a model run on less silicon. The RTX Spark gives those techniques a large new install base to target.

Picking Inference Hardware in 2026

The RTX Spark sharpened a choice that data and ML teams face all year. Which inference hardware fits the workload. The field now has five families in regular production use. NVIDIA GPUs like the H200 and the Blackwell line lead on raw performance and on software maturity through CUDA. AMD's Instinct line, including the MI300X and the newer MI355X, wins on memory capacity, which cuts the need to shard a large model across many chips. Google Cloud TPUs and the Trillium accelerators fit teams already inside Google Cloud. AWS Inferentia fits teams on AWS. Intel Gaudi 3 and the Xeon 6 line make a case for CPU-plus-accelerator inference at the edge.

Speed has more than one meaning, and that trips up buyers. Time-to-first-token measures how fast a chat reply starts. Tokens per second measures how much a system serves at scale. Tail latency measures the worst-case response under load. A chip that posts a great tokens-per-second number can still feel slow in a product if its first token lags or its tail latency spikes when traffic climbs. The MLPerf Inference suite tracks these across vendors, and the v6.0 round on April 1, 2026, drew submissions from NVIDIA, AMD, Google, and several startups.

The memory angle decides many real cases. A model that fits in one chip's memory runs faster than the same model split across four, because sharding adds communication overhead. AMD's large-memory parts win here. A 70-billion-parameter model that needs sharding on a smaller GPU can run on a single large-memory accelerator, which removes the cross-chip traffic. For teams that serve big models, memory capacity often beats peak throughput on the spec sheet.

The RTX Spark changes the math at the small end. On-device inference moves the cheapest, most private workloads off the cloud entirely. A laptop with a strong NPU runs a small model for code completion, document search, or local chat without a network round trip. That pushes interest toward models built to run small. Quantization shrinks a model's weights to fewer bits. Distillation trains a small model to mimic a large one. Mixture-of-experts routing turns on only part of a model per query. All three let a capable model fit on consumer silicon, and the RTX Spark gives those techniques a large install base.

The market reaction shows the stakes. NVIDIA moving into the PC sent AMD, Intel, and Qualcomm shares down on June 1, because each of those companies counts on the PC chip market that NVIDIA just entered. The Arm angle adds to the pressure. NVIDIA, Apple, Amazon, and Qualcomm all build on Arm's power-saving design, while Intel and AMD built their businesses on x86. A strong Arm PC chip from NVIDIA pulls more of the market toward Arm, which is a slow but real shift in who sets the platform.

The buying advice stays steady through all of it. Estimate the memory your model needs first. Benchmark two candidate platforms with your real prompt and output lengths, not a vendor's demo numbers. Watch tail latency under the load you expect, not the average. Then pick the platform you can scale and staff, because a chip you cannot get capacity for is not fast at any spec.

AI Standards and Protocols: MCP Goes Stateless

The Model Context Protocol reached its biggest turning point since launch. The MCP maintainers published the 2026-07-28 specification release candidate on May 21, 2026, and the ten-week validation window for that candidate ran through this week, per the Model Context Protocol blog. The final specification ships on July 28, 2026. This is the largest revision of the protocol since it appeared in late 2024.

The headline change is that MCP is now stateless at the protocol layer. Six Specification Enhancement Proposals work together to get there. The practical effect is large. A remote MCP server that used to need sticky sessions, a shared session store, and deep packet inspection at the gateway can now run behind a plain round-robin load balancer. Clients route traffic on an Mcp-Method header and cache the list of tools for as long as the server allows. That makes MCP servers far easier to scale on ordinary HTTP infrastructure.

The release candidate carries more than the stateless core. It adds an Extensions framework, a Tasks extension for long-running work, and MCP Apps for server-rendered user interfaces. It hardens authorization to line up with OAuth and OpenID Connect deployments. It also adds a formal deprecation policy so the protocol can change without breaking the integrations teams already shipped. The ten-week window gives SDK maintainers and client builders time to test the changes against real systems before the spec locks.

A short bit of history explains why stateless matters. The June 2025 MCP update classified MCP servers as OAuth Resource Servers and required clients to implement RFC 8707 Resource Indicators to block token misuse. Those security gains came with a cost. Stateful sessions tied each client to a specific server instance, which made horizontal scaling hard. The new spec standardizes session creation, resumption, and migration, so a server restart or a scale-out event stays invisible to connected clients. That is the missing piece for running MCP at enterprise scale.

MCP's reach is wide enough to make this revision a big deal. As of March 2026, MCP passed 97 million monthly SDK downloads and 81,000 GitHub stars, and every major AI vendor supports it, including Anthropic, OpenAI, Google, Microsoft, and AWS. Official SDKs exist for TypeScript, Python, C#, Java, and Swift. The community has published hundreds of public MCP servers. People call it the USB-C of AI for a reason. It gives any model one way to talk to any tool.

Governance moved in step with the spec. Anthropic donated MCP to the Agentic AI Foundation in December 2025, a directed fund inside the Linux Foundation co-founded by Anthropic, Block, and OpenAI. The maintainer team also grew. Clare Liguori joined the core maintainer group, and Den Delimarsky joined as a lead maintainer. Vendor-neutral governance gives enterprise buyers the confidence to build on a standard that no single company controls.

MCP does not stand alone. The Agent2Agent protocol covers the other half of the agent stack. A2A marked its one-year milestone with more than 150 supporting organizations and deep integration across Google, Microsoft, and AWS platforms, per the Linux Foundation. The two standards split the work cleanly. MCP defines how an agent connects to tools and data. A2A defines how agents talk to each other across vendor and org boundaries. Both live under the Linux Foundation, which keeps them complementary rather than rival.

For builders, the stateless shift changes deployment math today. A team that wanted to run an MCP server for its data platform used to plan around session affinity. Now it plans around plain HTTP scaling. That lowers the operational cost of exposing a tool to agents. Expect more SaaS platforms and data systems to ship MCP servers once the final spec lands, because the cost of running one just dropped.

What Stateless MCP Changes at the Wire

The stateless core sounds abstract until you run a server. Before, an MCP client opened a session and stayed bound to one server instance for the life of that session. The server held state in memory, so a load balancer had to send every request from that client back to the same box. That meant sticky sessions, a shared session store for failover, and a gateway smart enough to inspect traffic. All of that adds cost and risk.

The new spec removes the binding. A client routes requests on an Mcp-Method header, so a plain round-robin load balancer can send any request to any server instance. The server tells the client how long it can cache the list of tools through a time-to-live value, so the client does not re-fetch the tool list on every call. A server restart no longer drops the client, because there is no session to lose. This is the change that lets a company run an MCP server the same way it runs any other stateless web service.

The Extensions framework is the second big piece. It lets the protocol grow without bloating the core. The Tasks extension handles long-running work, so an agent starts a job, walks away, and checks back, instead of holding a connection open for minutes. MCP Apps lets a server render a user interface, so a tool shows a form or a chart inside the client rather than returning raw text. These extensions ship as optional add-ons, which keeps the base protocol small while giving advanced servers room to do more.

The deprecation policy is the piece enterprises asked for. A formal policy means the protocol retires an old behavior on a known schedule, with warning, instead of breaking integrations without notice. That predictability is what lets a large company commit to building on MCP. It turns the protocol from a fast-moving open source project into something a platform team plans around for years.

The split between MCP and A2A is worth keeping straight. MCP connects one agent to tools and data. A2A connects agents to each other across vendors and orgs. A concrete workflow shows the split. A planning agent uses A2A to hand a subtask to a specialist agent at another company. That specialist agent uses MCP to query a database and run a tool. The first protocol carries the handoff. The second carries the data access. Both live under the Linux Foundation, which keeps them aligned rather than competing.

For data platforms, the timing is good. A team that ships an MCP server for its catalog or query engine after July 28 gets the stateless core from day one. That server scales on ordinary infrastructure, vends governed access, and shows up to every MCP-aware agent without custom glue. The cost of making a data system agent-ready dropped this week, and the standard to target is now clear.

Agentic AI Moves From Demo to Default

A theme ran under all three sections this week. Agents are no longer a demo. They are the default unit of work. The pricing changes, the on-device chip, and the stateless protocol all serve the same goal, which is running more agent work, more reliably, at lower cost.

The numbers back the shift. The AI coding tools market sits near 12.8 billion dollars in 2026, up from 5.1 billion in 2024, and about 90% of professional developers use a coding tool daily, per a 2026 market review. The 2025 Stack Overflow Developer Survey found that 84% of developers use or plan to use AI tools, and 51% of professionals use them daily. Adoption is not the question anymore. Control is.

Open source maintainers feel the change in a new way. A DataFusion blog post on May 28, 2026, by Tim Saucer made the point that a growing share of a library's users are not typing code at all, per the DataFusion blog. They ask an agent to write it. The agent leans on whatever style it picked up during training, which rarely matches what a project wants. That pushes maintainers to publish machine-readable guidance, like skills files and idiomatic examples, so agents generate code the project will accept. The audience for documentation now includes the model, not just the human.

Security stays the gap between demo and default. The 48% flaw rate in AI-generated code is the reason tools like Salt Code exist. The right pattern pairs an agent with a policy gate and a human reviewer. The agent drafts fast. The gate blocks the obvious mistakes. The human signs off on the merge. Teams that skip the middle two steps trade speed today for incidents later.

Enterprise buyers should treat 2026 as the year to set standards, not just pick tools. Decide which protocols you will support, which is now mostly MCP and A2A. Decide how you will cap agent cost. Decide who reviews agent output before it ships. Those three decisions matter more than the brand of any single assistant, because the assistants will keep changing every quarter while the standards settle.

Documentation Now Has Two Audiences

The open source world is adjusting to a reader it did not design for. When a developer asks an agent to use a library, the agent writes the code, not the human. The agent draws on patterns it learned during training, and those patterns lag behind a project's current style. A library that shipped a cleaner API last quarter still gets old-style code from agents that learned the old way.

Maintainers are responding with machine-readable guidance. Skills files, idiomatic examples, and clear API docs now serve the model as much as the person. A project that publishes a good skill teaches every agent that reads it how to write code the project will accept. That lowers the review burden on maintainers and cuts the number of pull requests that miss the house style.

This changes how teams should write internal docs too. An engineering org that wants its agents to follow internal conventions has to write those conventions down in a form an agent reads. A style guide that lives in someone's head does not reach the agent. A style guide checked into the repo, with examples, does. The cost of undocumented conventions just went up, because now both the new hire and the agent pay it.

What This Means for Data Teams

Data teams sit at the center of the agent shift, because agents are only as good as the data they can reach. An agent that writes SQL is useful. An agent that queries live, governed data and acts on the result is far more useful. That is the work that turns an agent from a chat box into a system of record.

Three of this week's threads land directly on data work. MCP going stateless lowers the cost of exposing a data platform to agents through a standard interface. The on-device chip trend means analysts will run small models next to their notebooks for fast, private exploration. The pricing reset means data teams need a cost model for agent queries the same way they have one for warehouse compute.

The hard part is governance. An agent that can read and act on data needs the same controls a human user gets. That means catalog-level permissions, audit logs, and a clear record of which agent ran which query. The lakehouse pattern fits this well, because an open catalog can vend credentials and enforce access in one place. As agents become the default query writer, the catalog becomes the control point for the whole system.

A concrete example shows the pattern. Suppose an analyst asks an agent to find which regions missed their sales target last quarter and draft a summary. The agent connects to the data platform through an MCP server. The catalog checks the agent's permissions and vends a short-lived credential scoped to the tables the analyst can see. The agent runs the query, reads the result, and writes the summary. Every step lands in an audit log, tagged with the agent's identity and the analyst who triggered it. If the analyst lacks access to a table, the agent gets the same denial a person gets.

That flow turns an open question into a governed action. The catalog is the control point, because it sits between the agent and the data and enforces access in one place. An open catalog that vends credentials and logs every request gives the security team a single seam to watch. As agents become the default query writer, that seam carries more traffic, which makes the catalog one of the busiest and most important parts of the stack.

The takeaway for data engineers is direct. Treat agents as a new class of user. Give them a standard way in through MCP. Give them governed access through a catalog. Give them a cost budget and an audit trail. Do that, and the agent shift turns into a productivity gain instead of a governance headache.

A Playbook for Adopting Agents This Quarter

The week's news points at a short list of decisions every team should make now. None of them depend on which assistant wins. They hold up across tools, because they govern how you run agents, not which agent you run.

Start with cost. Pick a per-developer credit cap and a per-team cap before you roll out an agent. Usage-based pricing means one long session can spend a big share of a budget in an afternoon. A cap turns a surprise bill into a known ceiling. Review the caps monthly, because the work shifts and the right number shifts with it.

Set a protocol standard. For tool and data access, that means MCP. For agent-to-agent work, that means A2A. Both sit under the Linux Foundation, both have wide vendor support, and both will keep their shape as the spec settles. Building on these two now avoids the custom glue that becomes a maintenance burden later.

Define the review path. Decide who signs off on agent output before it ships, and where the policy gate sits. The 48% flaw rate in AI-generated code is the reason this step is not optional. A policy gate inside the assistant catches the obvious problems early. A human reviewer catches the rest. Both stay in the loop until the data says otherwise.

Treat agents as a class of user. An agent that reads and writes data needs the same controls a person gets. That means scoped permissions, an audit log, and a record of which agent ran which action. Give agents a standard way in, give them governed access, and give them a budget. Skip any of the three, and the productivity gain turns into a cleanup project.

Plan for the edge. The RTX Spark and the broader on-device trend mean some inference moves to the laptop this year. Decide which workloads belong on the device, where latency and privacy matter most, and which belong in the cloud, where scale and large models matter most. A small local model for code completion and a large cloud model for hard reasoning is a sensible default split.

Write your conventions down. Agents follow the guidance you publish, not the rules in your head. A style guide and example set checked into the repo reaches both your new hires and your agents. This is the cheapest item on the list and the one teams skip most.

The Week in Brief

Claude Opus 4.8 shipped on May 28 with Dynamic Workflows for Claude Code. GitHub Copilot moved to usage-based AI Credits on June 1 and added a 100 dollar Max plan, which drew a backlash over cost predictability. Cursor shipped its in-house Composer 2.5 model with parallel builds and pull request review. Windsurf became Devin Desktop, Google shipped Antigravity 2.0 with Gemini 3.5 Flash, and Kiro switched to credits. Salt Security launched Salt Code on June 2 to enforce policy inside coding assistants.

NVIDIA presented the RTX Spark Superchip at Computex on June 1, an Arm-based system-on-chip for Windows PCs, and AMD, Intel, and Qualcomm shares fell on the news. The Model Context Protocol published its 2026-07-28 release candidate, moving the protocol to a stateless core with Tasks, MCP Apps, and an Extensions framework, with the final spec set for July 28. The A2A protocol passed 150 supporting organizations at its one-year mark.

The through-line is control. The models keep getting better and cheaper to run. The open question for every team is how to govern the agents those models power, how to cap their cost, and how to keep a human in the loop. Those are the decisions that separate teams that ship from teams that clean up.

Resources to Go Further

The AI landscape changes fast. Here are tools and resources to help you keep pace.

Try Dremio Free. Experience agentic analytics and an Apache Iceberg-powered lakehouse. Start your free trial

Learn Agentic AI with Data. Dremio's agentic analytics features let your AI agents query and act on live data. Explore Dremio Agentic AI

Join the Community. Connect with data engineers and AI practitioners building on open standards. Join the Dremio Developer Community

Book: The 2026 Guide to AI-Assisted Development. Covers prompt engineering, agent workflows, MCP, evaluation, security, and career paths. Get it on Amazon

Book: Using AI Agents for Data Engineering and Data Analysis. A practical guide to Claude Code, Google Antigravity, OpenAI Codex, and more. Get it on Amazon

DEV Community