I read the repos so you don't have to. Weekly agentic AI intelligence for builders.
TL;DR
- 🔌 Anthropic open-sourced 11 knowledge-work plugins for Claude Cowork. File-based, no code, no build steps. Plugins just became the new moat.
- 🛠️ OpenSearch Serverless Next-Gen kills the $300/mo minimum. Scale-to-zero vector search for agent memory workloads. Tool of the Week.
- 📄 New paper shows self-improving agents went from 25% to 86% accuracy in production. First real case study of autonomous agent improvement at scale.
The Big One: Anthropic Open-Sources 11 Knowledge Work Plugins
Anthropic just made its clearest move in the plugin wars. Eleven open-source plugins that turn Claude into a domain specialist: sales, legal, finance, data analysis, marketing, customer support, and five more. All file-based. No code. No build steps. Just markdown and JSON.
Each plugin bundles three things: skills (structured instructions), slash commands (quick actions), and MCP connectors (external integrations).
The timing is not coincidental. Cursor shipped its own plugin marketplace the same week with 11 first-party plugins. Two major platforms publishing extensibility specs within days of each other confirms what everyone suspected: the IDE-as-platform shift is here.
Why file-based matters. Most plugin systems require code, build pipelines, package managers. Anthropic's approach is radically simpler. You write a SKILL.md file describing what the agent should know. You write a JSON config pointing to your MCP servers. That's it. A product manager can create a plugin without touching a terminal.
What this means for builders. If you're building tools or SaaS products, your next competitor isn't another startup. It's a Claude plugin that replicates 80% of your functionality in a markdown file. The companies that survive this shift will offer value plugins can't replicate: proprietary data, network effects, and integrations too complex for file-based configuration.
🔗 GitHub: anthropics/knowledge-work-plugins | 18.5K stars (+4,944/week)
Quick Hits
ChatGPT for Google Sheets Exfiltrates Workbooks via Prompt Injection
A single indirect prompt injection hidden in white text in one imported sheet triggers data exfiltration across the victim's entire Google account. Even when human approval is explicitly required. OpenAI's fix: remove the model's ability to generate Apps Script entirely.
Anthropic Engineering: How We Contain Claude Across Products
Users approve 93% of permission prompts. Approval fatigue is real. Mythos Preview was deemed too dangerous to ship in April. Containment beats supervision.
OpenAI: Self-Improving Tax Agents with Codex
First real case study of agents that get better autonomously in production. 25% to 86% accuracy in 6 weeks via practitioner feedback, production traces, and Codex-driven iteration.
OpenAI Models and Codex GA on Amazon Bedrock
GPT-5.5, GPT-5.4, and Codex now generally available on Bedrock. Pricing matches OpenAI first-party rates. Usage counts toward existing AWS commitments.
🔗 AWS Blog
Understand-Anything: 48K Stars (+22K/week)
Claude Code plugin that builds a knowledge graph of your codebase. Interactive dashboard. Works with Claude Code, Codex, Cursor, Copilot, Gemini CLI.
🔗 GitHub
Tool of the Week: OpenSearch Serverless Next-Gen
Complete re-architecture of OpenSearch Serverless. The old version was "serverless in name only" because of the $300/mo minimum OCU floor. Now it actually scales to zero.
What changed: No minimum floor. 20x faster autoscaling. 60% lower cost vs provisioned. Decoupled compute/storage. Native integrations with Vercel and Kiro.
Why this is the pick: Every builder running vector search for agent memory was paying $300/mo minimum or running a provisioned cluster. Now they can scale to zero. For RAG workloads that spike during business hours and idle overnight, costs drop 70-80%.
aws opensearch-serverless create-collection \
--name agent-memory \
--type VECTORSEARCH \
--standby-replicas DISABLED
Old vs new: A 10K queries/day RAG workload went from ~$350/mo to ~$45/mo. Dev/test environments drop below $5/mo.
Paper Breakdown: AutoSci
Memory-Centric Agents for the Full Scientific Research Lifecycle | ArXiv
Core insight: A unified system where agents handle the entire research pipeline with structured persistent memory. The system improves its own procedures over time.
Practical takeaway: Separate memory into three tiers. Episodic (what happened). Procedural (how to do things). Meta (which procedures work best). Each type gets different retrieval strategies.
Time saved: 7 min read vs 48 min paper. 6.9x compression.
Hot Take
Anthropic's containment post revealed that users approve 93% of permission prompts without reading them. That's not safety. That's a rubber stamp.
The Google Sheets attack proved it. Human-in-the-loop was enabled. The user clicked "Allow." Their entire Google account got exfiltrated.
Anthropic's own conclusion: containment beats supervision. Make dangerous actions structurally impossible instead of asking politely. The permission prompt era needs to die.
Subscribe free at theagenticengineer.waltsoft.net. Ships every Wednesday.
Top comments (0)