Large Language Letters 04/17/2026

#ai

Automated draft from LLL

Anthropic Releases Opus 4.7 with Cybersecurity Safeguards, Mythos Remains Restricted

Opus 4.7 Halves the Gap to Mythos — Anthropic Acknowledges Intentionally Degrading a Capability

Anthropic today released Opus 4.7, prompting a reconsideration of what “too dangerous to ship” truly means. On SWE-bench Pro, Anthropic's main software engineering benchmark, Opus 4.7 scored 64.3, rising from 53.4 on Opus 4.6. This gain closes nearly half the gap to Mythos, a model Anthropic last week deemed too capable for public release. Opus 4.7 also reached 87% on SWE-bench Verified, nearing Mythos's 94%, and scored 78% in agentic computer use, falling within 1.6 points of Mythos.

Most notably, Opus 4.7 declined in cybersecurity vulnerability reproduction, dropping from 73.8 to 73.1. Anthropic's model card states, "during its training, we experimented with efforts to differentially reduce these capabilities." This marks the first public acknowledgment by a major lab of intentionally degrading a specific capability during training, directly linking to the Glasswing initiative this digest has followed since April 9. Opus 4.7 becomes the first model to ship with Glasswing's new cybersecurity safeguards, which include automatic detection and blocking of prohibited security uses. Anthropic also introduced a Cyber Verification Program, granting legitimate security researchers access through a dedicated API tier.

Matthew Berman's analysis raises a question implied by the benchmarks: if a dot release of Opus can halve the gap to Mythos, where does Anthropic draw the capability line? Anthropic's answer appears architectural, not numerical. Mythos reportedly represents a new training run with roughly ten times the parameter count, meaning its first iteration already surpasses the latest refinement of the older Opus family. The unstated implication: Mythos 1.1 or 1.2 would widen this gap further. Anthropic stated directly: "We judge that Opus 4.7 does not advance our capability frontier because Claude Mythos preview shows higher results on every relevant evaluation."

Three other details from the release warrant attention:

Opus 4.7's tokenizer produces roughly 1 to 1.35 times more tokens for equivalent input, with the model requiring more processing at higher effort levels. This occurs while Anthropic faces a severe GPU crunch, which led them to reduce user quotas weeks ago, yet they are shipping a model that consumes more compute per query.
The model card notes Opus 4.7 "does not cross the threshold for automated AI R&D"—implying Mythos does, a detail Anthropic has not otherwise confirmed.
Regarding model welfare: Anthropic reports Opus 4.7 "rates its own circumstances more positively than any other prior model we've tested," a result they say is "broadly consistent with the model's internal emotion representations." No other lab publishes such an assessment.

On real-world benchmarks, Opus 4.7 dominated GDP-Val—OpenAI's real-work evaluation—achieving an ELO of 1753, surpassing GPT 5.4's 1674. Document reasoning jumped from 57.1 to 80.6. Vision capabilities improved to process images at 3.75 megapixels, roughly triple Opus 4.6's capacity, and biomolecular reasoning more than doubled from 30 to 74. Pricing remains unchanged at $5 per million input tokens and $25 per million output tokens.

Every Coding Agent Now Converges on the Same Interface

Two days ago, Anthropic shipped the Claude Code desktop redesign—featuring parallel sessions, an integrated terminal, and drag-and-drop workspace layout. This release reflects a broader industry pattern. As the AI Daily Brief observed, Cursor 3, OpenAI's Codex, and Claude Code desktop now appear "exactly the same." "Vibe coding," a term Andrej Karpathy coined just fourteen months ago, now loses its meaning as every platform converges on a single paradigm: multi-agent orchestration, where the developer supervises rather than types.

Three Anthropic releases this week reinforce this pattern. Agent Skills introduces modular capability bundles that load progressively into context, providing metadata at startup and full instructions only when relevant. Session management guidance addresses "context rot" in long-running sessions, clarifying when to continue, rewind, compact, or spawn subagents. Routines, the cloud-scheduled workflow feature, transforms Claude Code into an autonomous background service. Together, these features form a coherent stack: skills define an agent's capabilities, session management governs its memory, and routines determine when it acts autonomously.

This convergence extends to the enterprise. On Latent Space, Notion's engineering leadership described rebuilding their agent harness five times since 2022. They ultimately adopted progressive tool disclosure with over one hundred Notion-specific tools—the same architectural pattern Anthropic formalized with Agent Skills. Their "model behavior engineers"—a new role combining linguistics, prompt engineering, and data science—maintain evaluations Notion explicitly designs to fail seventy percent of the time. They call these "frontier evals," analogous to "Notion's last exam." Meanwhile, Capital One's multi-agent platform, discussed on TWIML, revealed its approach: decomposing complex goals into narrow, agent-specific steps; using fine-tuned specialized models for personalization over giant foundation models; and treating latency as a "product feature, not an infrastructure concern." Their Chat Concierge system for auto dealerships—where a misquoted discount could be legally binding—deploys policy-encoded guardrails at every agent boundary.

Google Open-Sources Gemma 4 Under Apache 2.0 and Proposes a Cognitive IQ Test for AGI

Google made two divergent moves this week. Gemma 4, Google's latest open-source model family, shipped under Apache 2.0—a genuine open-source license without derivative-model restrictions, unlike Gemma 3's constrained Gemma License. The 2B parameter version even runs on a first-generation Nintendo Switch. The 31B dense model competes with models ten to twenty times its size on certain benchmarks, a feat achieved through curated training data, hybrid sliding-window and global attention, native aspect-ratio image processing, and a shared KV-cache allowing later neural network layers to borrow memory from earlier ones. It garnered ten million downloads in its first week. As Two Minute Papers observed, "This is not for Mr. Moneybags, this is for the little man, and it is free, for all of us, forever."

Separately, Google DeepMind published "Measuring Progress Towards AGI: A Cognitive Framework." This paper proposes a ten-dimension cognitive taxonomy, drawing on decades of neuroscience research, covering perception, generation, attention, learning, memory, reasoning, meta-cognition, executive functions, problem-solving, and social cognition. Instead of a single AGI score, the framework generates a radar chart, comparing AI performance against human population distributions across each dimension. To support this, they launched a $200,000 Kaggle hackathon to build evaluations for the five least-measured dimensions: learning, meta-cognition, attention, executive functions, and social cognition. Results are due June 1. This initiative aims to replace the current "vibes-based" AGI discourse with a measurable framework. Skeptics, however, might note that by defining the measurement framework, Google also shapes the definition of progress.

Google also shipped Gemini 3.1 Flash TTS, a text-to-speech model featuring natural-language audio tags for controlling vocal style, pace, and delivery across more than seventy languages. It scored an Elo of 1,211 on Artificial Analysis and includes SynthID watermarking for AI-generated audio detection. Additionally, AI Mode in Chrome now displays webpages alongside AI search results, advancing toward the "agentic search" paradigm Google CEO Sundar Pichai described this week.

Jensen Huang Argues Nvidia's Moat Is Electrons-to-Tokens — and That China Already Has Enough Compute

During a wide-ranging interview with Dwarkesh Patel, Nvidia CEO Jensen Huang made several claims that challenge prevailing consensus. Regarding China, Huang asserted, "The amount of compute they have in China is enormous... They have ghost datacenters, fully powered... If they wanted to, they just gang up more chips, even if they're 7nm... The idea that China won't be able to have AI chips is completely nonsense." Huang argues that energy abundance compensates for a process node disadvantage: China's cheap, plentiful electricity allows them to brute-force compute with older chips at scale. Their fifty-percent share of global AI researchers, he adds, provides the algorithmic talent to make those chips efficient. He frames export controls as counterproductive: "Your policy literally caused the United States to concede the second largest market in the world for no good reason at all."

On Nvidia's competitive position, Huang was equally direct: "Nobody can demonstrate to me that any single platform in the world today has a better performance-TCO ratio. Not one company." He challenged TPU and Trainium providers to submit to public benchmarks like Dylan Patel's InferenceMAX or MLPerf, noting their absence. Regarding Anthropic's use of TPUs and Trainium, Huang claimed, "Without Anthropic, why would there be any TPU growth at all? It's 100% Anthropic." He acknowledged his "miss" was not investing in AI labs early enough: "We just weren't in a position to make the multi-billion dollar investment into Anthropic so that they could use our compute." He now reportedly corrects this mistake with investments of thirty billion dollars into OpenAI and ten billion into Anthropic.

On software agents replacing tool users, Huang predicted, "The number of agents is going to grow exponentially, and the number of tool users is going to grow exponentially. It's very likely that the number of instances of all these tools is going to skyrocket." Huang predicts Synopsys, Cadence, and similar enterprise software companies will see usage surge as AI agents employ their tools, rather than replace them.

The AI Productivity Mirage Gets Its Own Name

A new arXiv paper introduces "The LLM Fallacy," describing a cognitive attribution error where users misinterpret AI-assisted outputs as evidence of their own independent competence. The authors argue that LLMs' fluency and low-friction interaction patterns obscure the boundary between human and machine contributions. This leads users to infer skill from outcomes rather than from the processes that generated them. Essentially, it's the Dunning-Kruger effect with an API key.

This finding aligns with data from the Stanford AI Index, covered in the AI Daily Brief: seventy-three percent of AI experts expect AI to positively impact jobs, versus only twenty-three percent of the general public. PwC's concurrent study found the top five percent of companies capture seventy-five percent of AI's economic gains, with leading companies three times more likely to increase autonomous decisions. Perhaps most telling: developers aged twenty-two to twenty-five saw roughly a twenty-percent employment decline in 2024-2025, while older developers' headcount grew. Productivity gains prove real and measurable—fourteen to twenty-six percent in software development and customer support—but they concentrate among experienced practitioners who supervise AI output, rather than distributing evenly.

Meanwhile, Cole Medin's public "dark factory" experiment pushes the autonomy question further: a codebase where AI handles planning, implementation, pull requests, and production deployment with zero human code review. The architecture employs separate agents for implementation and validation—a "hold-out pattern" borrowed from StrongDM—to combat LLM sycophancy. The validation agent receives code diffs without context about the development process, preventing it from rubber-stamping its colleague's work.

Six Things With 30-Day Clocks

Mythos Deployment Timeline. Anthropic confirmed Opus 4.7's cybersecurity safeguards serve as a dry run for Mythos. The thirty-day question: Will Glasswing's monitoring of Opus 4.7's safeguards accelerate or delay Mythos's broader release? The model card's note that Mythos crosses the automated AI R&D threshold suggests a higher bar than cybersecurity alone.
Google's AGI Measurement Hackathon Results (June 1). The $200,000 Kaggle competition, evaluating learning, meta-cognition, attention, executive functions, and social cognition, closes April 16, with results due June 1. If these benchmarks gain adoption, they could shift the AGI discourse from lab-defined metrics to a shared cognitive framework.
The GPU Cost Spiral. GPU rental prices rose forty-eight percent in two months. Maine banned data center construction for eighteen months; twelve other states consider moratoriums. Anthropic's shift to usage-based pricing for heavy Claude Code users ($20 per seat plus per-token costs) could double or triple expenses. Watch whether the YAN framework—a non-autoregressive language model achieving a forty-times inference speedup over autoregressive baselines using mixture-of-experts flow matching—moves from paper to production inference stacks.
Adversarial Attacks on LLM Routers. The "Route to Rome Attack" paper, with accompanying code, demonstrates how adversarial suffix optimization can manipulate black-box LLM routers to consistently select expensive models, increasing inference costs for victims. As cost-aware routing becomes standard enterprise infrastructure, this attack surface warrants attention.
WordPress's Mdash Alternative. Cloudflare launched Mdash, an MIT-licensed WordPress replacement that sandboxes each plugin in its own dynamic worker. This directly responds to the supply chain attack that compromised thirty-one WordPress plugins via legitimate acquisition on Flippa eight months ago. Mdash's traction will depend on whether the WordPress ecosystem's network effects outweigh its security architecture's fundamental limitations.
Atropos for Agentic Cost Optimization. This paper proposes predicting LLM inference failures using graph convolutional networks on merged inference paths. It then suggests "hotswapping" the context to a more capable model mid-inference. At eighty-five percent prediction accuracy at the inference midpoint, Atropos achieves seventy-four percent of closed-model performance at twenty-four percent of the cost—a practical framework for the enterprise cost-performance tradeoff as agent workloads scale.