Alex Merced

Posted on May 14

AI Weekly: Voice Models, Custom Silicon, MCP Goes Enterprise (May 7–13, 2026)

#ai #llm #mcp #news

This week, OpenAI shipped three voice models in the API and a security variant of GPT-5.5. Anthropic doubled Claude Code rate limits using SpaceX Colossus compute. Thinking Machines released its first model, a 276B-parameter system built for 200-millisecond real-time interaction. Google launched Googlebooks and Gemini Intelligence at its Android Show. Anthropic released Claude for Legal with 20+ MCP connectors. Cursor landed in Microsoft Teams. The connective tissue across all of these stories: AI coding tools, processing hardware, and standards work are all maturing in parallel, and each maturation is reshaping the others.

AI Coding Tools: Cursor in Teams, Copilot CLI Iterates Fast

Cursor expanded its surface area this week. The Cursor team announced on May 11 that Cursor is now available in Microsoft Teams. Users can mention @cursor in any Teams channel to delegate tasks to an agent or pull information from Cursor directly into Teams. The integration matters because it moves Cursor out of the editor and into the place where engineering teams actually coordinate work. Async code review, ticket triage, and PR follow-ups now happen in the same surface as the conversation about them.

GitHub Copilot CLI shipped five releases between May 5 and May 11, 2026. Version 1.0.41 (May 5) reduced startup time by rendering the UI immediately while authentication ran in the background. Version 1.0.42 (May 6) improved MCP server error messages. Version 1.0.43 (May 6) added server-side model routing for Auto mode. Version 1.0.44 (May 8) fixed path completion flickering and enabled mid-input slash commands. Version 1.0.45 (May 11) added an /autopilot slash command to toggle between interactive and autopilot modes. Five releases in seven days reflects the cadence the Copilot CLI team has settled into since the tool went GA earlier this year.

Anthropic doubled Claude Code rate limits overnight on May 8, 2026. The increase came from a SpaceX partnership that added 300 megawatts of new compute, the equivalent of more than 220,000 Nvidia GPUs, in under a month. The Colossus One facility was originally built for xAI's Grok training workloads. Anthropic now uses it for burst compute on developer-facing products. Claude Code users had reported hitting output limits during peak hours, and the doubled limits resolve that pressure point without raising prices.

The competition between Claude Code, Cursor, and GitHub Copilot continues to harden. Claude Code runs on Opus 4.7 with SWE-bench Verified at 87.6%, SWE-bench Pro at 64.3%, and CursorBench at 70%, according to Anthropic's April 16 release. Cursor pushed Composer 2 and parallel agents in April. GitHub paused new sign-ups for Copilot Pro and Pro+ ahead of the June 1 transition to usage-based billing. Each tool now has a distinct positioning. Claude Code is the surface-agnostic agent for senior developers. Cursor is the daily-driver IDE. Copilot is the GitHub-integrated extension for organizations with existing GitHub investment.

Cursor 3.3 shipped in May 2026 with /multitask for spawning parallel subagents instead of running them in sequence. Vulnerability Scanner runs scheduled scans for known CVEs and outdated dependencies. Context usage breakdown shows engineers exactly what their agent is consuming. MCP connection stability got patched in the same release. Cursor 3, which shipped in April 2026, already changed the architecture by putting all local and cloud agents in a single sidebar. Agents kicked off from mobile, Slack, GitHub, and Linear all appear in one workspace view.

Cursor also launched its Security Review beta in May for Teams and Enterprise plans. Two always-on agents anchor the offering. Security Reviewer checks every PR for vulnerabilities, auth regressions, privacy risks, agent tool auto-approvals, and prompt injection attacks. It leaves inline comments at the exact diff location with severity and remediation guidance. Vulnerability Scanner runs scheduled scans for known vulnerabilities, outdated dependencies, and configuration issues with optional Slack notifications. Cursor introduced canvases in the Agents Window during the same window. Canvases let agents build interactive visual interfaces for PR reviews, eval analysis, and data dashboards rather than walls of text. They use React-based components and live alongside the terminal, browser, and source control as durable artifacts.

Cursor's Bugbot is moving from $40 per seat per month to usage-based billing on June 8, 2026. Teams will bill from on-demand spend. Individuals will bill from included usage. The average Bugbot run costs $1.00 to $1.50 depending on PR size and complexity. The pricing change reflects the wider industry shift toward per-action billing for AI coding tools, which is the same direction GitHub Copilot is heading with its June 1 transition.

GitHub's billing change is the bigger story for many teams. The Opus premium-request multiplier on Copilot jumped from 15x to 27x in the same transition window. Teams running Opus-heavy workloads on Copilot pay considerably more than Claude Max plan subscribers using the same model through Anthropic's direct channel. The pricing gap is the reason a growing number of engineering teams are evaluating whether to use Claude Code as a primary surface and keep Copilot for GitHub-native workflows.

Claude Code itself stopped being a CLI in 2026. It runs in the shell, as VS Code and JetBrains extensions, as a GitHub Action that opens PRs, and inside claude.ai on web and mobile. Subagents, skills, hooks, and plan mode turn it into a per-repo configuration rather than a per-session tool. The agent runs anywhere the engineer works. That surface-agnostic posture is part of why the SpaceX compute deal mattered. Capacity constraints on Claude Code show up in five surfaces at once, and the rate limit increase helped all five.

A security note worth flagging. Cursor patched a vulnerability in version 2.5 that let a malicious Git repository trigger arbitrary code execution through the agent. The patch is in place. No public reports of in-the-wild abuse have surfaced. Teams running older Cursor versions need to update. The broader question the bug surfaced is how engineering teams handle repo trust when AI agents have shell access. The combination of agent capability and unverified inputs is a category of risk that did not exist a year ago, and the security tooling industry is starting to respond.

AI Processing: Tesla AI5 Tape-Out, Google TPU Customer Roster Expands

Tesla taped out its AI5 chip on April 15, 2026, with details continuing to land this week. The chip is dual-sourced from TSMC Arizona and Samsung Texas. According to Tesla's stated specs, AI5 delivers roughly 8x the compute and 5x the bandwidth of the current AI4 hardware. A single AI5 chip approximates an NVIDIA H100 for Tesla's specific inference workloads. A dual AI5 setup approximates an NVIDIA Blackwell at a fraction of the cost and power. Tesla claims AI5 uses roughly one-third the power of Blackwell and runs at under 10% of the cost.

Tesla's strategic positioning is the more significant part of the announcement. Musk confirmed that AI4 is sufficient for Full Self-Driving safety levels, so Tesla owners do not need to be retrofitted. AI5 is built for Optimus humanoid robots and supercomputer clusters. Configurations of 5 to 12 AI5 chips per board will form the backbone of Tesla's training infrastructure for FSD v15 and future Optimus models. Engineering samples are expected in late 2026, with volume production targeted for 2027. AI6 is already in development, with tape-out targeted for December 2026.

Google's TPU customer roster reshaped the AI chip market the week before. At Google Cloud Next 2026 on April 22, Google unveiled the TPU 8t and TPU 8i, claiming 2.8x better price-performance than the prior Ironwood generation. Anthropic expanded to multiple gigawatts of next-generation TPU capacity for Claude training and serving. Meta signed a multibillion-dollar multiyear deal in February 2026. OpenAI now takes TPU capacity, which is the most significant signal because OpenAI trains on Microsoft-procured Nvidia clusters. A confirmed OpenAI booking on Google silicon is the first visible crack in the assumption that Nvidia GPUs are the only serious substrate for frontier AI.

The TPU 8i has 384MB of SRAM, triple the amount in the prior Ironwood generation. It pairs with Google's custom Arm-based Axiom CPU. The 8i is built for inference and AI agents. The 8t targets training workloads, with what Google calls a development-cycle compression "from months to weeks." Citadel Securities built quantitative research software on TPUs. All 17 U.S. Energy Department national laboratories use AI co-scientist software built on the chips. Broadcom co-designed TPU 8t under the codename "Sunfish." MediaTek handles TPU 8i under the codename "Zebrafish." The dual-vendor co-design pattern matters because it gives Google supply chain options. It also gives both Broadcom and MediaTek concrete reference designs they can extend to other hyperscaler customers.

Anthropic also announced a SpaceX compute deal on May 8 alongside the Claude Code rate limit increase. The deal covers 220,000+ GPUs at the Colossus One data center. The deal does not reduce Anthropic's TPU commitment, since both arrangements run in parallel. Compute supply is the bottleneck for every frontier lab in 2026, and Anthropic, OpenAI, and Meta are all running multi-vendor strategies across Nvidia, Google TPUs, AMD Instinct, and now SpaceX-operated capacity.

Anthropic's revenue context puts the SpaceX deal in perspective. Dario Amodei confirmed in early May that Anthropic grew 80x in Q1 2026, blowing past the internal plan that called for 10x growth. The annualized revenue run rate crossed $30 billion, up from $9 billion at the end of 2025. The number of customers spending $1 million per year doubled from 500 to over 1,000 in two months. That kind of revenue growth would push any AI lab into emergency compute procurement mode. The SpaceX deal, the multi-gigawatt Amazon and Google partnerships, the $30 billion Azure capacity arrangement through Microsoft and Nvidia, and the $50 billion Fluidstack US AI infrastructure investment are the practical response.

The orbital compute angle in the SpaceX announcement deserves a footnote. Anthropic and SpaceX expressed shared interest in developing multi-gigawatt orbital data center capacity over the coming years. The engineering timeline for that is measured in years, not quarters. The signal is what matters. Frontier AI labs and the compute providers that serve them are now planning for a future where terrestrial power, land, and cooling cannot keep pace with model training demands. Whether orbital capacity becomes real or stays speculative, the fact that two serious companies put it in a press release tells you where the industry thinks the compute ceiling is heading.

The environmental angle around Colossus 1 also bears mention. The Memphis facility runs on gas turbines that were initially installed without Clean Air Act permits or pollution control devices. Memphis residents have raised concerns about air quality and documented increases in hospital admissions tied to poor air quality near the site. Protests against the data center's environmental footprint have continued through 2026. Anthropic committed to cover any consumer electricity price increases caused by its US data centers and is considering local investment in communities that host its facilities. Whether those commitments are sufficient remediation for the Memphis-specific concerns is an open question that the broader AI infrastructure conversation will continue to surface.

The Anthropic compute strategy is now multi-vendor, multi-region, and multi-substrate. Claude runs on AWS Trainium, Google TPUs, and Nvidia GPUs. The recent collaboration with Amazon includes inference capacity in Asia and Europe to satisfy data residency requirements for regulated industries. Location decisions focus on democratic countries with stable legal frameworks and secure supply chains. The single-vendor compute strategy that dominated 2023 and 2024 is gone. Portfolio diversification across the entire AI infrastructure stack has replaced it.

The wider trend is clear. Custom silicon is no longer a hyperscaler-only story. Tesla has joined the small group of companies that design AI chips from the ground up and manufacture at scale. Apple, Google, Amazon, and Meta are all building their own. Anthropic has multi-gigawatt commitments on both Nvidia and TPU substrates. The "single-vendor AI substrate" narrative that justified Nvidia's valuation premium has its first real counter-example.

AI Standards & Protocols: Anthropic Claude for Legal Ships 20+ MCP Connectors

Anthropic released Claude for Legal on May 12, 2026. The launch includes 12 practice-area plugins and more than 20 MCP connectors. The 12 plugins cover Commercial Legal, Corporate Legal (including M&A diligence), Employment Legal, Privacy Legal, Product Legal, Regulatory Legal, AI Governance Legal, IP Legal, and Litigation Legal. Each plugin starts with a setup interview that learns the team's playbooks, escalation chains, risk calibration, and house style.

The MCP connector list reads like the operational stack of a modern law firm. Anthropic connected Claude to DocuSign, Box, Thomson Reuters (CoCounsel Legal), Harvey, Relativity, Everlaw, and Microsoft 365. The Thomson Reuters integration is bidirectional. CoCounsel Legal is rebuilt on Anthropic's technology, and Claude can now call CoCounsel as a tool. The foundation model is both the underlying layer and a caller of the application built on top of it. Anthropic also confirmed that legal became the number one power-user job function in Claude Cowork, with over 3x the usage of any other function.

The launch is a stress test for MCP as a production protocol. MCP shipped in November 2024 as Anthropic's open standard for connecting AI models to tools and data. By February 2026, MCP crossed 97 million monthly SDK downloads across Python and TypeScript. The protocol is now under the Linux Foundation's Agentic AI Foundation (AAIF), with Anthropic, OpenAI, and Block as co-founders. Microsoft embedded MCP into Windows 11 and Copilot. Google DeepMind confirmed support in Gemini. AWS, Cloudflare, and Bloomberg all sit on the AAIF.

The MCP 2026 roadmap published in March 2026 has four priority areas. First, transport evolution to make Streamable HTTP work statelessly at scale. Second, agent communication primitives, closing lifecycle gaps in the Tasks primitive. Third, governance maturation with a formal contributor ladder. Fourth, enterprise readiness with audit trails, SSO-integrated auth, and gateway patterns. The Tasks primitive shipped as experimental and now needs production iteration. Retry semantics on transient failures and expiry policies for completed tasks are the two concrete gaps to close.

Google's Agent-to-Agent (A2A) protocol sits alongside MCP, not in competition. MCP connects agents to tools and data. A2A connects agents to other agents. Microsoft, AWS, and Google all support both. The two-layer stack uses MCP for tool access and A2A for agent coordination. It has become the architectural default for enterprise multi-agent deployments. Industry data shows enterprise MCP adoption crossed 78% in production AI teams, and the public registry surpassed 9,400 servers. Enterprise MCP gateways from Kong, Docker, and others now centralize authentication, audit trails, and tool-level access control.

The Anthropic Claude for Legal launch is significant precisely because it deploys MCP at production scale for a regulated industry. Legal aid organizations and public defenders also get access through Claude for Nonprofits at discounted pricing. The launch rattled legal tech stocks. RELX, Thomson Reuters, and Wolters Kluwer shares fell on the February plugin announcement, and the May 12 release is considerably larger in scope.

Anthropic followed Claude for Legal with Claude for Small Business on May 13, 2026. The bundle covers connectors for QuickBooks, PayPal, HubSpot, Canva, DocuSign, Google Workspace, and Microsoft 365. It includes 15 ready-to-run skills covering finance, operations, sales, marketing, HR, and customer service. Workflows handle payroll planning, month-end close, business performance monitoring, campaign management, invoice chasing, and cash-flow forecasting. The package runs on top of Claude Cowork, the multi-step automation platform Anthropic launched in January 2026, and ships as a toggle install in the desktop app.

The Small Business release is the fifth market-specific Claude product Anthropic has shipped since the start of 2025. The other four target life science researchers, schools, attorneys, and financial professionals. Each bundle uses the same MCP-based connector architecture and the same Claude Cowork automation layer. The pattern is clear. Anthropic uses MCP and Claude Cowork as a platform substrate, with vertical packaging on top. The vertical packaging is what closes deals. The substrate is what makes the verticals shippable in quick succession.

Claude for Small Business comes with a 10-city tour starting May 14 in Chicago. Tulsa, Dallas, Hamilton Township, Baton Rouge, Birmingham, Salt Lake City, Baltimore, San Jose, and Indianapolis follow. Each stop is a free half-day training workshop for 100 local small business leaders. Attendees get a one-month Claude Max subscription. Anthropic also partnered with Workday and the Local Initiatives Support Corporation on a Solopreneurship Accelerator Program funding 15 aspiring entrepreneurs in 2026 with seed capital, Claude credits, and an AI-first curriculum. A separate partnership with three Community Development Financial Institutions targets small business access to capital.

The competitive pressure on traditional SaaS vendors keeps building. Salesforce, ServiceNow, Intuit, DocuSign, and Box have all seen their stocks decline year to date and over the last 12 months as Anthropic's offerings expand into territory those companies have historically owned. Dario Amodei warned at the Briefing: Financial Services event in early May that some SaaS vendors will go bankrupt if they cannot keep pace with the AI shift. That framing is harder to dismiss when Anthropic's own annualized revenue grew from $9 billion to $30 billion in roughly a year. The market is repricing on the assumption that AI-native delivery will collapse a meaningful share of seat-based SaaS revenue.

MCP makes that competitive shift feasible at the protocol level. The same connector that lets Claude pull data from QuickBooks for a small business owner lets Claude pull data from Thomson Reuters for a corporate legal team. Anthropic ships one protocol stack and one automation layer, then packages it for a dozen verticals. The traditional SaaS business cannot easily respond because the value proposition for those tools was the workflow integration with the user. When a model can build the integration through MCP, the workflow lock-in weakens fast.

Also Worth Noting

Thinking Machines released TML-Interaction-Small on May 11, 2026, the first model from Mira Murati's lab. The 276 billion-parameter mixture-of-experts system uses 12 billion active parameters. The architecture processes audio, video, and text in 200-millisecond micro-turns rather than waiting for users to finish speaking. The model achieves 0.40-second turn-taking latency, roughly the speed of natural human conversation. Soumith Chintala, PyTorch co-creator, became CTO after co-founders Barret Zoph and Luke Metz left for OpenAI in January. The research preview is available to a limited group of researchers, with broader access planned for later in 2026. The model's interaction-first design reflects a hypothesis the team has been making publicly for months. Real-time multimodal interaction is a different category of capability than the request-response pattern that dominates most current AI products.

Google streamed its Android Show on May 12, 2026, one week before Google I/O 2026. Two announcements anchored the event. Googlebooks are premium Gemini-first laptops from Acer, Asus, Dell, HP, and Lenovo, shipping this fall. Every Googlebook features a signature "Glowbar" light bar on the keyboard. Gemini Intelligence is the new agentic AI layer running underneath Android. It takes data from one app and completes multistep tasks across other apps without the user switching between them. Gemini Intelligence rolls out to the latest Samsung Galaxy and Google Pixel phones starting this summer, then to Wear OS, Android Auto, Android XR, and Googlebooks. The Glowbar is a physical signal that an AI agent is acting on the user's behalf. That kind of hardware-level affordance for AI activity is the same direction Apple is exploring with iOS 26 visual indicators for active Siri agents.

OpenAI shipped three voice models on May 8. GPT-Realtime-2 brings GPT-5-class reasoning to real-time voice. GPT-Realtime-Whisper handles transcription workloads. GPT-Realtime-Translate handles speech-to-speech translation. The same day, OpenAI released GPT-5.5-Cyber to vetted security teams in limited preview. The release is a direct response to Anthropic's Claude Mythos Preview, which has been used under Project Glasswing to identify zero-day vulnerabilities. ElevenLabs reported crossing $500 million in annual recurring revenue after a Series D round on the same day, with cuts to voice and agentic API pricing. The voice infrastructure category is one of the most competitive surfaces in AI right now. OpenAI, ElevenLabs, Cartesia, Deepgram, and Thinking Machines are all pushing on the same set of latency and quality benchmarks from different starting points.

White House National Economic Council Director Kevin Hassett confirmed on May 7 that the White House is drafting an executive order requiring AI models to be vetted before public release. The Commerce Department has already expanded its voluntary pre-release testing program to include Google, Microsoft, xAI, OpenAI, and Anthropic. Pennsylvania filed suit against Character.AI on May 8 over AI personas misrepresenting themselves as qualified medical professionals. Connecticut's comprehensive AI bill and Iowa's chatbot safety law both advanced in the same week. The state-level regulatory activity is moving faster than the federal pace, and the patchwork compliance question is becoming a real operational concern for AI companies serving US users.

Unsloth joined the PyTorch Ecosystem on May 11, bringing its open-source AI training and inference acceleration tools into the official ecosystem. The move expands PyTorch's official tooling footprint and gives Unsloth's acceleration layer more institutional weight. The Fivetran 2026 Agentic AI Readiness Index, released May 8, found that only 15% of organizations have a data foundation capable of safely running AI agents at production scale. Nearly 60% have already invested millions in the technology. The gap between agent ambition and data foundation readiness is the practical reality every enterprise AI program is now confronting. The lakehouse stack work happening in parallel inside the Apache Iceberg, Polaris, Arrow, and Parquet communities is what closes that gap, and the timing of the Fivetran report against the AI infrastructure investment surge is not a coincidence.

What to Watch Next Week

Google I/O 2026 runs the week of May 19 and is the biggest single event on the calendar. Gemini 3 production rollout, Android XR partner announcements, and additional TPU 8 customer disclosures are all on the likely agenda. Anthropic's 10-city Small Business Tour kicks off May 14 in Chicago, and the early-week press coverage will set expectations for SMB AI adoption. The state-level AI regulatory activity in Pennsylvania, Connecticut, and Iowa should continue moving, and the federal executive order drafting work bears close watching. On the coding tools side, GitHub Copilot's June 1 usage-based billing transition is the next major pricing event, and Cursor's continued shipping cadence on the 3.x release line should bring more agent surface changes. The MCP and A2A protocol communities are running ongoing working group sessions on the Tasks primitive iteration and the Streamable HTTP stateless transport, with concrete proposals expected through the spring.

Resources to Go Further

The AI landscape changes fast. Here are tools and resources to help you keep pace.

Try Dremio Free. Experience agentic analytics and an Apache Iceberg-powered lakehouse. Start your free trial

Learn Agentic AI with Data. Dremio's agentic analytics features let your AI agents query and act on live data. Explore Dremio Agentic AI

Join the Community. Connect with data engineers and AI practitioners building on open standards. Join the Dremio Developer Community

Book: The 2026 Guide to AI-Assisted Development. Covers prompt engineering, agent workflows, MCP, evaluation, security, and career paths. Get it on Amazon

Book: Using AI Agents for Data Engineering and Data Analysis. A practical guide to Claude Code, Google Antigravity, OpenAI Codex, and more. Get it on Amazon