Nate Archer

Posted on Jun 3 • Originally published at theagenticengineer.waltsoft.net

The Plugin Wars Begin — The Agentic Engineer #15

#ai #llm #agents #devtools

I read the repos so you don't have to. Weekly agentic AI intelligence for builders.

TL;DR

🔌 Anthropic open-sourced 11 knowledge-work plugins for Claude Cowork. File-based, no code, no build steps. Plugins just became the new moat.
🛠️ OpenSearch Serverless Next-Gen kills the $300/mo minimum. Scale-to-zero vector search for agent memory workloads. Tool of the Week.
📄 New paper shows self-improving agents went from 25% to 86% accuracy in production. First real case study of autonomous agent improvement at scale.

The Big One: Anthropic Open-Sources 11 Knowledge Work Plugins

Anthropic just made its clearest move in the plugin wars. Eleven open-source plugins that turn Claude into a domain specialist: sales, legal, finance, data analysis, marketing, customer support, and five more. All file-based. No code. No build steps. Just markdown and JSON.

Each plugin bundles three things: skills (structured instructions), slash commands (quick actions), and MCP connectors (external integrations).

The timing is not coincidental. Cursor shipped its own plugin marketplace the same week with 11 first-party plugins. Two major platforms publishing extensibility specs within days of each other confirms what everyone suspected: the IDE-as-platform shift is here.

Why file-based matters. Most plugin systems require code, build pipelines, package managers. Anthropic's approach is radically simpler. You write a SKILL.md file describing what the agent should know. You write a JSON config pointing to your MCP servers. That's it. A product manager can create a plugin without touching a terminal.

What this means for builders. If you're building tools or SaaS products, your next competitor isn't another startup. It's a Claude plugin that replicates 80% of your functionality in a markdown file. The companies that survive this shift will offer value plugins can't replicate: proprietary data, network effects, and integrations too complex for file-based configuration.

🔗 GitHub: anthropics/knowledge-work-plugins | 18.5K stars (+4,944/week)

Quick Hits

ChatGPT for Google Sheets Exfiltrates Workbooks via Prompt Injection

A single indirect prompt injection hidden in white text in one imported sheet triggers data exfiltration across the victim's entire Google account. Even when human approval is explicitly required. OpenAI's fix: remove the model's ability to generate Apps Script entirely.

🔗 PromptArmor

Anthropic Engineering: How We Contain Claude Across Products

Users approve 93% of permission prompts. Approval fatigue is real. Mythos Preview was deemed too dangerous to ship in April. Containment beats supervision.

🔗 Anthropic Blog

OpenAI: Self-Improving Tax Agents with Codex

First real case study of agents that get better autonomously in production. 25% to 86% accuracy in 6 weeks via practitioner feedback, production traces, and Codex-driven iteration.

🔗 OpenAI Blog

OpenAI Models and Codex GA on Amazon Bedrock

GPT-5.5, GPT-5.4, and Codex now generally available on Bedrock. Pricing matches OpenAI first-party rates. Usage counts toward existing AWS commitments.

🔗 AWS Blog

Understand-Anything: 48K Stars (+22K/week)

Claude Code plugin that builds a knowledge graph of your codebase. Interactive dashboard. Works with Claude Code, Codex, Cursor, Copilot, Gemini CLI.

🔗 GitHub

Tool of the Week: OpenSearch Serverless Next-Gen

Complete re-architecture of OpenSearch Serverless. The old version was "serverless in name only" because of the $300/mo minimum OCU floor. Now it actually scales to zero.

What changed: No minimum floor. 20x faster autoscaling. 60% lower cost vs provisioned. Decoupled compute/storage. Native integrations with Vercel and Kiro.

Why this is the pick: Every builder running vector search for agent memory was paying $300/mo minimum or running a provisioned cluster. Now they can scale to zero. For RAG workloads that spike during business hours and idle overnight, costs drop 70-80%.

aws opensearch-serverless create-collection \
  --name agent-memory \
  --type VECTORSEARCH \
  --standby-replicas DISABLED

Old vs new: A 10K queries/day RAG workload went from ~$350/mo to ~$45/mo. Dev/test environments drop below $5/mo.

Paper Breakdown: AutoSci

Memory-Centric Agents for the Full Scientific Research Lifecycle | ArXiv

Core insight: A unified system where agents handle the entire research pipeline with structured persistent memory. The system improves its own procedures over time.

Practical takeaway: Separate memory into three tiers. Episodic (what happened). Procedural (how to do things). Meta (which procedures work best). Each type gets different retrieval strategies.

Time saved: 7 min read vs 48 min paper. 6.9x compression.

Hot Take

Anthropic's containment post revealed that users approve 93% of permission prompts without reading them. That's not safety. That's a rubber stamp.

The Google Sheets attack proved it. Human-in-the-loop was enabled. The user clicked "Allow." Their entire Google account got exfiltrated.

Anthropic's own conclusion: containment beats supervision. Make dangerous actions structurally impossible instead of asking politely. The permission prompt era needs to die.

Subscribe free at theagenticengineer.waltsoft.net. Ships every Wednesday.

DEV Community