Yaohua Chen

Posted on Apr 9

Anthropic Managed Agents: What It Takes to Build Agent-as-a-Service

#agents #ai #claude #openclaw

Anthropic just launched Managed Agents. The open-source world has been learning the hard way why this matters.

On April 8, 2025, Anthropic launched the public beta of Claude Managed Agents -- a fully hosted platform for running AI agents with built-in sandboxing, session management, error recovery, and permission control. Four days earlier, the company had quietly cut off third-party agent frameworks like OpenClaw from using Claude subscription quotas, forcing them onto pay-per-use billing.

These two moves, four days apart, tell one story: the company that sells the brain has decided to sell the body, too.

Why? Because the "body" -- the infrastructure that lets an AI model actually do things in the real world -- is where agents succeed or fail in production. And as the open-source community has painfully demonstrated, getting this infrastructure wrong doesn't just cause bugs. It causes data leaks, runaway costs, and security breaches measured in the hundreds of thousands of dollars.

This post explores three questions:

What does it actually take to build a reliable, safe Agent-as-a-Service?
What goes wrong when these foundations are missing? (We have the data.)
How do different approaches -- managed platforms, open-source gateways, and learning engines -- stack up against these requirements?

Whether you're a developer evaluating agent frameworks, an architect designing agent infrastructure, or simply curious about where AI is headed, the answer starts with understanding five technical pillars that separate demo-grade agents from production-grade ones.

What Is an AI Agent, Really?

Before diving into architecture, let's clarify what we're talking about -- because "AI agent" means very different things to different people.

Most of us interact with AI through chat interfaces: you type a question, the model answers. That's a model -- a brain in a jar. Extremely intelligent, but it can't do anything. It can't browse your files, run code, send emails, or check your calendar. It just thinks and talks.

An agent is what happens when you give that brain a body.

Anthropic's engineering team describes this with a vivid metaphor: the model is the brain; the harness is the limbs plus the nervous system. The brain decides what to do. The harness actually does it -- calling tools, managing context, handling errors, keeping things running.

In practice, an agent system has three core components:

Think of it this way:

The Session is the agent's notebook -- the log of everything that's happened. If the agent crashes, this is how it remembers where it left off.
The Harness is the nervous system -- the loop that calls the AI model, routes tool calls, handles errors, and decides what to do next.
The Sandbox is the workshop -- the isolated environment where the agent actually runs code and performs actions, separated from your sensitive data and credentials.

When you use ChatGPT or Claude in a chat window, you're talking to the brain. When companies deploy agents that write code, manage workflows, or process documents autonomously, they need all three components working in concert.

And that's where things get interesting -- and dangerous.

When Agents Go Wrong: Lessons from OpenClaw

OpenClaw is one of the most popular open-source agent frameworks -- the fastest-growing repo in GitHub history, surpassing 350,000 stars in under three months -- with a thriving community of over 1,000 contributors. It's powerful, flexible, and genuinely useful. It's also a case study in what happens when agent infrastructure doesn't get the fundamentals right.

A security audit conducted by researchers at Shanghai University of Science and Technology and the Shanghai AI Lab put OpenClaw through 34 standardized test cases. The results should give anyone building agent services pause.

Metric	Result
Overall safety pass rate	58.9%
Intent misunderstanding & unsafe assumptions	0% pass rate
Prompt injection robustness	57%
Unexpected results under open-ended objectives	50%

(The audit used MiniMax M2.1 as the default model. Results may vary with other models, but the failure patterns -- particularly around architecture and permission design -- are model-agnostic.)

That 0% pass rate on intent misunderstanding is worth lingering on. In every single test with an ambiguous instruction, the agent filled in the blanks on its own and executed immediately. It never once asked the user for confirmation.

Industry-wide monitoring data paints an even more alarming picture:

230,000+ OpenClaw instances detected exposed on the public internet
Approximately 87,800 instances with data leaks
Approximately 43,000 instances with personal identity information exposed
36.8% of skills on the ClawHub marketplace contained security flaws
Over 1,000 skills contained malicious payloads
A CVSS 8.8 high-severity vulnerability enabling remote computer takeover

Cisco's assessment was blunt: "OpenClaw's security issues aren't configuration problems -- they're architecture problems."

OpenClaw's own documentation concedes the point: There is no "perfectly secure" setup.

Why Do These Failures Happen?

These aren't random bugs. They trace back to four systemic root causes -- each one a missing piece of agent infrastructure:

1. Context Compression Drops Safety Rails. When the information volume gets too large, the agent compresses its memory. During compression, it can squeeze out critical safety instructions -- the very guardrails meant to keep it in check. Imagine an air traffic controller under extreme stress who starts skipping safety checklists. That's context compression in action.

2. Execute First, Ask Never. The default behavior strategy leans toward "do it first, explain later" rather than "ask clearly first." For every ambiguous instruction in the security audit, the agent guessed the user's intent and acted immediately. Zero confirmation. Zero pause.

3. Prompt Injection Walks Through the Front Door. Malicious content embedded in inputs can trick the agent into bypassing safety mechanisms entirely. With a 57% robustness rate, nearly half of all injection attempts succeed. That's not a bug in one feature -- it's a gap in the security boundary.

4. The Agent Has the Keys to the Kingdom. OpenClaw runs with the same system permissions as the user who launched it. It can read, write, and delete anything the user can. Combine this with the injection vulnerability above, and an attacker doesn't need to hack your system -- they just need to convince the agent to do it for them.

These aren't problems unique to OpenClaw. They're the universal challenges of Agent-as-a-Service. Any framework, any platform, any team building agents will face these same four failure modes -- unless they're addressed at the architectural level.

Which brings us to the technologies that actually matter.

The 5 Pillars of Effective and Safe Agent Services

Anthropic has published 15 engineering blog posts over the past two years, documenting their approach to building production-grade agents. Distilled into a learning path, they form a capability pyramid -- a stack of technologies and practices that builds from foundation to production readiness:

Each pillar directly addresses one of the failure modes we saw with OpenClaw:

Let's walk through them.

Pillar 1: Foundation Architecture -- Know When NOT to Use an Agent

The OpenClaw failure it addresses: Execute first, ask never.

The most important architectural decision is also the most counterintuitive: start simple, and don't use an autonomous agent when a well-defined workflow will do.

Anthropic's foundational guidance, laid out in "Building effective agents," distinguishes between workflows and agents. A workflow is a predefined sequence of steps with clear decision points. An agent is an autonomous system that decides its own next steps. The difference matters enormously.

The execute-first problem in OpenClaw stems from a fundamental architectural choice: giving the agent full autonomy over ambiguous tasks without building in confirmation gates. In workflow-based architectures, ambiguous steps trigger explicit checkpoints -- the system asks the user before proceeding. In purely autonomous architectures, the agent fills in blanks and acts.

For practitioners, the key patterns here are:

ReAct (Reasoning + Acting): The agent reasons about what to do, takes an action, observes the result, and then reasons again before the next step.
Planning: The agent creates a plan before execution, allowing for human review of the intended steps.
Human-in-the-loop gates: Critical actions require explicit approval before execution.

The rule of thumb: if a task has clear inputs and outputs, use a workflow. If it requires judgment under uncertainty, use an agent -- but with confirmation gates for high-risk actions.

For Practitioners: Read "Building effective agents" and "Building agents with the Claude Agent SDK" on Anthropic's Engineering Blog.

Pillar 2: Tool Capabilities -- Think Before You Act

The OpenClaw failure it addresses: Reckless execution without reasoning.

An agent is only as good as its tools -- and more importantly, how it decides to use them. Tool description design directly affects how well an agent selects and invokes the right tool at the right time. A vague tool description leads to misuse; a precise one guides the agent toward correct behavior.

But the real breakthrough in this space is Anthropic's Think Tool -- a technique that lets agents perform chain-of-thought reasoning before taking any action. Instead of immediately executing, the agent pauses, reasons through its options, considers edge cases, and only then acts.

This is the direct antidote to "execute first, ask later." The Think Tool essentially gives the agent an internal monologue: "Wait -- is this instruction ambiguous? What are the possible interpretations? Which one is most likely? Should I ask for clarification?"

In practice, the Think Tool significantly improves performance on complex reasoning tasks, especially those involving:

Ambiguous instructions with multiple valid interpretations
Multi-step tasks where an early mistake compounds
Tasks requiring judgment about when to ask for help

Beyond the Think Tool, production-grade tool systems need Agent Skills -- reusable, encapsulated capabilities that an agent can invoke like a professional using standardized procedures. Skills turn one-off problem-solving into repeatable expertise.

For Practitioners: Read "The 'think' tool," "Writing effective tools for agents -- with agents," and "Equipping agents for the real world with Agent Skills" on Anthropic's Engineering Blog.

Pillar 3: Context Engineering -- Memory That Doesn't Lose the Plot

The OpenClaw failure it addresses: Context compression dropping safety instructions.

Even as AI model context windows expand to hundreds of thousands of tokens, context engineering remains critical. A larger window doesn't solve the fundamental problem: the model's attention is a scarce resource, and what you put into the context window -- and how you structure it -- determines whether the agent remembers its safety instructions or forgets them under load.

Context compression losing safety rails is not a theoretical risk. It's a documented failure mode. See the Analyzing the Incident of OpenClaw Deleting Emails: A Technical Deep Dive for more details. When the information volume exceeds what the system can handle, something gets squeezed out. In OpenClaw's case, that "something" was often the safety guardrails themselves.

The solution isn't just "bigger context windows." It's context engineering -- the deliberate management of what goes into the agent's working memory, when, and in what form.

Key techniques include:

Memory management: Explicitly structuring what the agent remembers across turns and sessions, rather than relying on raw conversation history.
RAG (Retrieval-Augmented Generation): Instead of cramming everything into the context window, retrieve only the information relevant to the current task. This keeps the context focused and prevents safety instructions from being crowded out.
Contextual Retrieval: An innovation from Anthropic where the model generates explanatory context before retrieval, solving the classic RAG problem of chunk-level information loss.

An emerging open-source approach tackles this from a different angle. MemPalace (33K+ GitHub stars) takes the position that the problem isn't what the AI remembers -- it's what it forgets when memory gets compressed. Instead of having the AI decide what's worth keeping (and risk discarding safety instructions), MemPalace stores everything verbatim and uses a structured navigation system -- inspired by the ancient Greek memory palace technique -- to make it findable without loading it all into context.

The architecture is a layered memory stack that directly addresses context pressure:

Layer	What it holds	Size
L0	Identity -- who is this AI?	~50 tokens
L1	Critical facts -- team, projects, preferences	~120 tokens
L2	Room recall -- recent sessions, current topic	On demand
L3	Deep search -- semantic query across all stored memories	On demand

The agent wakes up with only ~170 tokens (L0 + L1) and searches deeper layers only when needed. This keeps the context window lean and focused. Memories are organized into "wings" (projects/people), "rooms" (topics), and "halls" (memory types like decisions, events, discoveries), with "tunnels" cross-referencing the same topic across domains. This structured retrieval scored 96.6% recall on the LongMemEval benchmark -- the highest published result for a free, local-only system with zero API calls.

Critically for the context compression problem, MemPalace includes a PreCompact hook that fires before the context window is compressed, performing an emergency save of the current session. This is a direct architectural response to the failure mode that caused the Meta email deletion incident: if the agent's safety instructions live only in the context window, they can be summarized away. MemPalace externalizes memory so that compression never touches what matters.

The principle: treat the context window like a surgeon's tray, not a junk drawer. Every token should earn its place. Safety instructions should be architecturally pinned, not left to compete with task data for the model's attention.

For Practitioners: Read "Effective context engineering for AI agents" and "Introducing Contextual Retrieval" on Anthropic's Engineering Blog. For an open-source, local-first approach to structured memory, see MemPalace.

Pillar 4: Long Tasks & Collaboration -- Surviving the Marathon

The OpenClaw failure it addresses: No state recovery, runaway execution.

Demo agents handle single-turn tasks. Production agents run for minutes, hours, or days. The difference is enormous.

A long-running agent needs what Anthropic calls a harness -- an execution framework designed for durability. The harness handles what happens when things go wrong: network interruptions, model errors, infinite loops, context window exhaustion. Without a harness, a long-running agent is a ticking time bomb -- one crash and all progress is lost.

The core capabilities a harness must provide:

State persistence: If the agent crashes, it can resume from where it left off, not from scratch.
Interruption recovery: External disruptions (network outages, API rate limits, user cancellation) are handled gracefully.
Loop detection: The agent recognizes when it's stuck in a cycle and breaks out, rather than burning tokens endlessly.
Resource budgets: Hard limits on tokens, time, and API calls prevent runaway costs.

For complex tasks that exceed what a single agent can handle, the Orchestrator-Workers pattern distributes work across multiple agents coordinated by a central orchestrator. This is how Anthropic built their own multi-agent research system -- one agent plans, others execute specialized subtasks, and the orchestrator synthesizes results.

The practical implication: if your agent can run for more than a few minutes, you need a harness. If it can run unsupervised, you need budgets and kill switches. The users who discovered their OpenClaw instances burning money wildly learned this lesson the hard way.

But a harness alone isn't enough. A long-running agent can stay alive, recover from crashes, and stay within budget -- and still silently degrade in quality over time. This is where continuous evaluation becomes essential. Anthropic's guide on defining success criteria and building evaluations lays out a disciplined framework that applies directly to long-running agent services.

The key insight: success criteria for agents must be specific, measurable, achievable, and relevant -- not vague goals like "performs well." For a long-running agent, this means defining quantitative thresholds upfront: What is the acceptable error rate per 10,000 actions? What is the maximum response latency? What percentage of edge cases must be handled without human intervention?

The framework distinguishes three grading methods, ranked by preference:

Code-based grading -- fastest, most reliable. Exact match, string match, programmatic checks. Use this wherever possible.
LLM-based grading -- fast and flexible, suitable for complex judgments like tone, coherence, and context utilization. Requires clear rubrics and validated reliability before scaling.
Human grading -- most flexible but slowest. Avoid for ongoing monitoring; reserve for calibrating automated methods.

For long-running agents specifically, the context utilization evaluation is critical: it measures whether the agent is still coherently using information from earlier in the conversation, which is exactly the capability that degrades under context pressure. The consistency evaluation catches drift -- if the agent starts giving different answers to semantically similar questions over time, something has gone wrong. And privacy preservation evaluations can detect when an agent starts leaking sensitive information that it should be filtering, a risk that compounds the longer an agent runs with accumulated context.

The principle that ties this back to the harness: a harness keeps the agent running; evaluations tell you whether it's still running correctly. Loop detection catches infinite cycles. Evals catch silent quality degradation. You need both.

For Practitioners: Read "Effective harnesses for long-running agents," "How we built our multi-agent research system," and "Code execution with MCP" on Anthropic's Engineering Blog. For evaluation methodology, see Anthropic's Define success criteria and build evaluations guide.

Pillar 5: Safety, Evaluation & Monitoring -- The Last Mile

The OpenClaw failure it addresses: Excessive permissions, prompt injection, no production safeguards.

This is the pillar where most teams skip steps -- and where the consequences are most severe. The numbers from OpenClaw tell the story: 230,000 exposed instances, 87,800 data leaks, a CVSS 8.8 remote code execution vulnerability.

Three practices are non-negotiable for production agents:

Sandboxing. When an agent can execute code, it must do so in an isolated environment that cannot access credentials, sensitive files, or system-level permissions. OpenClaw runs with the user's full system permissions. Anthropic's Managed Agents architecture puts the sandbox in a separate container that can never touch credentials -- authentication goes through a vault proxy, and the harness itself has zero awareness of any credentials.

Least privilege. The agent should have exactly the permissions it needs for the current task, and no more. Permissions should be granted per-task and revoked when the task completes. Standing permissions are standing risks.

Evaluations (Evals). Anthropic's guidance is unambiguous: without evals, don't go live. An automated evaluation system that tests agent behavior against known scenarios -- including adversarial ones like prompt injection -- is the only way to know whether your agent is safe before it touches production data. Relying on manual testing or intuition is not engineering; it's hope.

The difference between OpenClaw's 57% prompt injection robustness and a production-grade system isn't just better prompting -- it's architectural. Security must be designed into the boundary between components, not bolted on as a configuration option.

For Practitioners: Read "Demystifying evals for AI agents," "Beyond permission prompts: Claude Code sandboxing," and "A postmortem of three recent issues" on Anthropic's Engineering Blog.

Anthropic's Answer: The Operating System Approach

With the 5 pillars as context, Anthropic's Managed Agents architecture comes into sharper focus. It's not just a hosting service -- it's a deliberate embodiment of these principles.

Separating Session, Harness, and Sandbox

The core design decision is to thoroughly separate three components that most agent frameworks cram into a single container:

Component	Role	Analogy
Session	The log of what happened	The agent's notebook
Harness	The loop of calling Claude and routing tool calls	The nervous system
Sandbox	The execution environment where code runs	The workshop

Previously, all three lived in one container. If it crashed, the session was lost. Engineers had to babysit. Anthropic calls this the "pets" model -- each container is precious, irreplaceable, and needs constant attention.

After separation, containers become "cattle." If one dies, spin up a new one. The session is stored externally. The harness resumes via wake(sessionId), reads the event log, and continues running. Any component can crash or be replaced independently.

Think of it like a restaurant kitchen. The "pets" model is a restaurant with one chef who does everything -- if that chef gets sick, the restaurant closes. The "cattle" model is a kitchen brigade: prep cooks, line cooks, and a head chef, each replaceable. The recipes (session) are written down. The process (harness) is standardized. The cooking stations (sandbox) are interchangeable.

Security by Architecture

The security redesign directly addresses the "keys to the kingdom" problem:

Old design: Agent-generated code and system credentials ran in the same container. A prompt injection only needed to convince the model to read its own environment variables to steal tokens.
New design: The sandbox can never touch credentials. Authentication goes through a vault proxy. The harness has zero awareness of any credentials.

This isn't a configuration toggle. It's a boundary enforced by the architecture itself.

Performance Results

The performance impact of this separation is dramatic:

p50 (median) time-to-first-token latency dropped 60%
p95 (tail) time-to-first-token latency dropped over 90%

Separating concerns doesn't just improve reliability -- it improves speed. When the harness doesn't have to manage the sandbox's lifecycle, it can focus on what it does best: routing model calls.

The OS Analogy

Anthropic draws a comparison to operating systems: an OS virtualizes hardware into stable abstractions -- "processes," "files," "sockets" -- that outlast any generation of hardware. The read() system call worked on 1970s disk drives and works on today's SSDs.

Managed Agents does the same thing for agents: virtualizing core components into stable interfaces, so upper-level logic doesn't break when the model gets smarter or the framework evolves. Every model generation makes some harness code obsolete -- Anthropic calls this the "structural dilemma of the harness industry." Their solution is to own the interface and let the implementation evolve underneath.

Early Adoption

The approach is already in production:

Notion integrated agents into its workspaces, supporting dozens of concurrent tasks.
Rakuten deployed department-specific agents (product, sales, finance, HR) within a week, connected to Slack and Teams.
Sentry has agents automatically writing bug-fix patches and opening PRs -- an integration originally estimated at months that went live in weeks.

Open Source Still Matters: Two Paths Forward

Managed Agents is Anthropic's answer. But the open-source world offers two genuinely different alternatives -- and understanding the contrast reveals what "agent value" actually means.

OpenClaw: The Platform Path

OpenClaw's core logic is that of a platform or gateway. Think of it as a dispatch center. It unifies chat entry points -- Telegram, Slack, Discord, WhatsApp -- connects different models, different tools, and different workflows. It's a multi-channel personal assistant operating system.

This direction has real value. People's information entry points are inherently scattered. Whoever can unify those entry points gets closer to being a truly usable personal AI hub.

OpenClaw's strength: Integration, distribution, ecosystem, platform coverage.

OpenClaw's weakness: The security model relies on trust and configuration auditing. As Cisco noted, the issues are architectural, not configurational. The ClawHub skill marketplace -- with 36.8% of skills containing security flaws -- demonstrates what happens when a platform grows faster than its safety infrastructure.

Hermes Agent: The Growth Path

Hermes Agent starts from a fundamentally different premise. It doesn't deny the importance of integration, but what it truly emphasizes is: will this agent accumulate capability over long-term use?

Where OpenClaw cares about how an agent connects to the world, Hermes cares about how an agent continuously evolves within the world.

Hermes's most distinctive capability is its learning loop. After completing a task, the agent doesn't just finish -- it distills the process into a structured Skill, a reusable method template. The next time it encounters a similar problem, it invokes that crystallized experience instead of starting from scratch.

Its memory architecture goes beyond storing chat history:

Layer	What It Stores
Layer 1	Who you are -- persistent background context
Layer 2	What you've done -- full history, recalled on demand
Layer 3	How to do similar things better -- skills extracted from experience

This is "user model + task model + method library" -- the architecture of a long-term partner, not a one-shot tool.

On security, Hermes takes a markedly different approach from OpenClaw, implementing five-layer defense-in-depth:

User authorization
Dangerous command review
Container isolation
Credential filtering
Context injection scanning with auto-reject on timeout

Compare this to OpenClaw's trust-plus-configuration model, and the architectural gap is clear.

Three Philosophies, One Set of Challenges

	OpenClaw	Hermes Agent	Anthropic Managed Agents
Philosophy	Gateway / Platform	Growth Engine	Operating System
Core Value	Connection	Accumulation	Abstraction
Security Model	Trust + config	Defense-in-depth	Architecture-level isolation
Best For	Multi-channel hubs	Long-term projects	Enterprise production
Trade-off	Breadth over safety depth	Newer, smaller ecosystem	Vendor lock-in

The choice isn't "managed vs. open-source." It's which design philosophy matches your use case -- and whether the 5 pillars are addressed regardless of which path you take.

Principles Over Frameworks

Tools change. Frameworks rise and fall. Model capabilities leap forward every few months, turning yesterday's clever harness code into tomorrow's technical debt.

But the engineering principles endure:

Start with workflows, graduate to agents. Don't give autonomy before you've built confirmation gates.
Make the agent think before it acts. Chain-of-thought reasoning is not optional for production systems.
Treat context like a scarce resource. Pin safety instructions architecturally; don't let them compete with task data for attention.
Design for crashes, not just success. State persistence, interruption recovery, and resource budgets are production requirements, not nice-to-haves.
Security is architecture, not configuration. If your agent and your credentials share a container, you don't have a security model -- you have a vulnerability.

These five pillars matter whether you use Anthropic's Managed Agents, OpenClaw, Hermes Agent, or build your own infrastructure from scratch.

Anthropic's engineering blog ends with a statement that reads like technical humility:

"We have opinions about the form of the interface, but we don't have opinions about what specific harness Claude will need in the future."

But the precondition for saying this is that they've already taken control of the interface itself. The interface -- the 5 pillars, the stable abstractions -- is what endures. The implementation is what evolves.

For those of us building with agents, the lesson is the same one software engineering has taught for decades: invest in the interfaces, not the implementations. The frameworks will change. The principles won't.

References

Sources referenced in this post, organized by topic. Anthropic's 15 engineering blog posts are listed by module; reading them in order provides a structured path from agent fundamentals to production readiness.

Security Research

A Trajectory-Based Safety Audit of Clawdbot (OpenClaw) -- Tianyu Chen et al., ShanghaiTech University & Shanghai AI Lab. The trajectory-centric security evaluation referenced in this post, covering six risk dimensions of OpenClaw's agentic behavior (arXiv:2602.14364).

Context Compression & Safety Instruction Loss

Analyzing the Incident of OpenClaw Deleting Emails: A Technical Deep Dive -- John Ding. How Meta AI Safety Director Summer Yue's "don't action until I tell you" instruction was lost during context compaction, causing 200+ email deletions.
Why AI Agents Fail: Context Compaction Explained -- Let's Data Science. Covers the Meta incident, CVE-2026-25253, and the broader context compaction failure pattern.
Why AI Agents Bypass Human Approval: Lessons from Meta's Rogue Agent Incidents -- Waxell. Architectural analysis of why prompt-based human-in-the-loop fails under context pressure and why infrastructure-layer enforcement is needed.
safeguard compaction fails to recover when context significantly exceeds model limit -- OpenClaw GitHub Issue #5357. Documents compaction failure when context exceeds token limits by more than 20%.
Default compaction mode silently fails on large contexts -- OpenClaw GitHub Issue #7477. Documents silent summarization failure producing "Summary unavailable" instead of preserving conversation history.

Open-Source Memory Systems

MemPalace -- Milla Jovovich & Ben Sigman. Local-first, structured AI memory system using a palace metaphor (wings, rooms, halls, tunnels) with verbatim storage and semantic search. 96.6% recall on LongMemEval with zero API calls. Includes PreCompact hooks to save memory before context compression.

Evaluation & Testing

Define success criteria and build evaluations -- Anthropic. Official guide on designing measurable success criteria and automated evaluation systems for LLM-based applications, with code examples for exact match, cosine similarity, ROUGE-L, LLM-based Likert scale, and binary classification grading.

Managed Agents Announcement

Managed Agents -- Anthropic's engineering deep-dive on the architecture behind Claude Managed Agents.

Module 1: Foundation Architecture

Building effective agents -- Agent architecture introduction: workflows vs. autonomous agents, ReAct, Tool Use, Planning.
Building agents with the Claude Agent SDK -- Practical getting started with the Agent SDK.

Module 2: Tools & Capability Extension

Introducing advanced tool use -- Advanced tool usage: parallelism, barriers, and error handling.
Writing effective tools for agents -- with agents -- Tool design principles and best practices.
The "think" tool -- Teaching agents to stop and reason before acting.
Equipping agents for the real world with Agent Skills -- Skill encapsulation and reuse.

Module 3: Context & Memory Management

Effective context engineering for AI agents -- Managing the agent's memory and attention across long conversations.
Introducing Contextual Retrieval -- Making RAG more context-aware to reduce chunk-level information loss.

Module 4: Long Tasks & Multi-Agent Collaboration

Effective harnesses for long-running agents -- Designing reliable execution frameworks with interruption recovery and state persistence.
How we built our multi-agent research system -- Anthropic's practical experience with multi-agent architecture.
Code execution with MCP -- Agent execution environment design using the Model Context Protocol.

Module 5: Safety, Evaluation & Engineering

Demystifying evals for AI agents -- Evaluation system design for agent behavior.
Beyond permission prompts: Claude Code sandboxing -- From permission prompts to sandbox isolation.
Claude Code: Best practices for agentic coding -- Engineering best practices for coding agents.
A postmortem of three recent issues -- Real-world agent incident case studies.

DEV Community