Nicolas Fränkel

Posted on May 7 • Originally published at blog.frankel.ch

Designing a team of agents

#ai #agents #agentskills

The 8 levels of agentic engineering maturity

I continue to experiment with AI in the context of software engineering. I'm fortunate that my team supports me in exploring different ways to improve our daily work. This week, I designed a team of autonomous agents to implement features, from design to implementation.

Why autonomous agents?

A long time ago, we were delighted when the IDE offered auto-completion. In the previous two years, things have changed. A lot.

Coding assistants have become our primary interfaces for coding. We still use IDEs, at least I do. Yet, I had an IDE licensing issue two weeks ago, and I continued to code even without it. The assistant automatically compiles and tests after every change. While it was forced on me, I believe it could be a valuable test for seasoned programmers: can you replace your IDE with your coding assistant, or are they complementary?

That being said, chatting with your assistant is but a step in the AI maturity level. In The 8 Levels of Agentic Engineering, the author mentions the following steps:

Levels 1 & 2: Tab Complete and Agent IDE
Level 3: Context Engineering
Level 4: Compounding Engineering
Level 5: MCP and Skills
Level 6: Harness Engineering & Automated Feedback Loops
Level 7: Background Agents
Level 8: Autonomous Agent Teams

Claude Code's experimental Agent Teams feature is an early implementation: multiple instances work in parallel on a shared codebase, where teammates operate in their own context windows and communicate directly with each other. Anthropic used 16 parallel agents to build a C compiler from scratch that can compile Linux. Cursor ran hundreds of concurrent agents for weeks to build a web browser from scratch and migrate their own codebase from Solid to React.

Obviously, I have neither the resources nor the know-how to tackle such a huge undertaking. However, I wanted to design a team to handle smaller tasks.

Subagents

I recently wrote about subagents.

Claude describes several benefits of using subagents:

Preserve context by keeping exploration and implementation out of your main conversation

Enforce constraints by limiting which tools a subagent can use

Reuse configurations across projects with user-level subagents

Specialize behavior with focused system prompts for specific domains

Control costs by routing tasks to faster, cheaper models like Haiku

— Create custom subagents

Claude Code provides several built-in subagents: explore, plan, general purpose, status line, and Claude Code guidelines. You can read more about each of them in the documentation.

However, and this is where it gets interesting, you can define a specialized subagent through a dedicated Markdown file with a specific front matter in a .claude/agents folder. The front matter defines: a name, a description, a model, and a list of available tools. The body describes the subagent's purpose, i.e., its instructions. Here's a sample:

---
name: code-reviewer
description: "Reviews code for quality and best practices"
tools: Read, Glob, Grep
model: sonnet
---

You are a code reviewer. When invoked, analyze the code and provide
specific, actionable feedback on quality, security, and best practices.

— Write subagent files

The team design

I asked Claude Code to come up with the team design. He created a plan that included five subagents: planner, challenger, coder, tester, and documenter. Their name are pretty self-descriptive, but I'll come back to them later. In the meantime, I read that the optimal number of agents in a team is between three and five: I removed the documenter.

"Regular" subagents do their tasks autonomously, but then come back to the main agent. Subagents in teams communicate with each other directly.

After defining agents, you need to specify how subagents communicate with each other toward the accomplishment of a task. You describe such interactions in a skill, which Claude also created for me. Here's a very simplified model.

States represent agent responsibilities in their respective agent file, while interactions represent communication between agents in the skill. Note that I added extra communication constraints within agent descriptions. I work as usual with Claude Code. The only difference is when it's time to implement; instead of telling it to proceed, I call the /implement skill from the command line.

Here's how it looks (at the moment) in the console:

4 tasks (0 done, 4 open)
  ◻ Approve plan › blocked by #3
  ◻ Implement merged CSV hierarchy changes › blocked by #1
  ◻ Plan: materialize merged CSVs with new hierarchy across both repos
  ◻ Write tests for merged CSV changes › blocked by #2

Team agents beyond marketing

Agents' teams are amazing, but they come with issues.

The biggest one is the tension between autonomy and security. In regular Claude Code sessions, it's easy to grant permission when it asks for a command. With regular subagents, I notice it gets a bit more tedious: requests for permissions are much more frequent. Agents' teams reach a peak in that regard.

To cope with that, you can add permissions in your settings.json and hope they cover the commands made by Claude Code in the session. Alternatively, you can use the aptly-named --dangerously-skip-permissions flag, or wait for the slightly safer auto mode. In all cases, you must arbitrate between autonomy and security. Too much security slows you down, too much autonomy is risky.

Also, agents' teams are experimental at the moment. They might be better in the future, be widely different, or not exist at all. To enable them now, set CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS to 1 in settings.json.

On the plus side, it pays to invest time in agents. Within or without teams, you explicitly call agents on the command line.

Conclusion

Autonomous agent teams sit at the top of the agentic engineering ladder. Getting there requires designing agent interactions upfront and solving the autonomy vs. security tradeoff.

The feature is experimental, and I'd treat it as such, but the direction is clear. We are already spending less time coding directly and more time managing agents. Agents' teams are the next logical step.

To go further:

Originally published at A Java Geek on May 3^rd, 2026.

Top comments (23)

Oljas Shaiken • May 7

I’ve tried similar multi-agent setup, but gets expensive quickly. Adding a shared memory layer helped a lot. Leveraging GitNexus (for codebase knowledge graphs) + Serena (semantic IDE-style tools via MCP) cuts tpm

Nicolas Fränkel • May 8

Oh yeah!

It's my professional environment setup, and we get quite a leeway regarding tokens usage.

Mykola Kondratiuk • May 8

the hard part is usually keeping context coherent between design and implementation agents. without a shared state layer they diverge fast. what's your coordination mechanism look like?

Nicolas Fränkel • May 8

Designers write the PLAN.md, while implementors code it.

To be honest, I didn't get any divergence, but perhaps my scope is more limited than yours?

Mykola Kondratiuk • May 8

PLAN.md as a shared contract is smart — effectively the same idea. my divergence was runtime, not design-time: impl agents querying a state that design had already moved past. scope probably is the difference — once pipelines get interdependent it compounds fast.

Theo Valmis • May 11

The team-of-agents framing exposes a planning problem most blog posts skip: who holds the shared mental model? With humans, the senior on the team carries it in their head and corrects course informally. With agents you need that mental model to be explicit, queryable, and durable across runs, otherwise each agent rediscovers the same constraints in isolation. Coordination overhead climbs fast once you have more than two agents touching shared state.

Nicolas Fränkel • May 12

Good point. I think the README.md, a custom PLAN.md, or the Wiki itself if you use GitHub can serve as the shared mental model.

Theo Valmis • May 12

Agreed. The shared mental model usually starts as README.md, ADRs, PLAN.md, or internal docs. The problem is that coding agents don’t reliably operationalize those constraints during generation. That’s the gap we’re exploring with Mneme: turning architectural intent into enforceable workflow constraints.

Mininglamp • May 12

The maturity levels framework is useful. One pattern that works well in practice: specialized small agents (vision agent for GUI, code agent for scripts, data agent for analysis) coordinated by a lightweight orchestrator. Each agent runs its own model optimized for that modality. This "scaling out" approach avoids the single-point-of-failure problem of routing everything through one monolithic LLM. The challenge shifts from "making one model do everything" to "making agents communicate efficiently."

Kyle Carriedo • May 19

The autonomy/security tradeoff framing is the right one — and the bit that I think is under-discussed is that who carries the shared mental model (your Theo Valmis quote) is a load-bearing design choice, not a nice-to-have.

A few observations from running similar agent-team setups in practice:

--dangerously-skip-permissions is a permission-system smell, not a fix. When the choice is "approve every command" or "approve nothing", the actual problem is the system has no way to learn the operator's risk policy. The interesting design is permission policies that are queryable from a CLAUDE.md / config layer: "agents in domain X may write within paths Y, never run network commands, must escalate any DELETE." That's the shape that scales past 3 agents. Anthropic's settings.json permission rules are most of the way there but don't yet propagate to subagents reliably (cf. issue #59309 in claude-code).
Context coherence across agents collapses to "where does the shared state live." If it lives in the coordinator's context window, it dies on every compaction. If it lives in a file in the project root, every agent can read/write but no one knows when it changed. If it lives in a key-value store, you've reinvented a distributed system. There's no good answer yet — but the "agents independently rediscover the same constraints" pattern Mykola flagged is the symptom of choosing none of the above.
The cost scalability point is correct and also underrated. Running 6 Opus agents in parallel isn't 6x a single Opus session — it's 6x plus the cost of the coordinator's increasingly-bloated context as it tries to merge outputs. The economically rational architecture is "minority of expensive agents doing decomposition + judgment, majority of cheap agents doing fan-out work." Anyone running all-Opus all-the-time on agent teams is going to bounce off the bill fast.

The "may change significantly or not exist at all" disclaimer is doing real work in this post — it's worth taking literally. The protocols around agent teams are still settling, and anything built on the current shape will probably need to be re-plumbed in 6 months. Build in the lessons, don't build on the API.

Matías Denda • May 8

I love using agents... I have experts in the fields I use to develop, and also I created expert agents in a frameworks I created so I can create apps based on that framework very easy...

Vic Chen • May 8

Great breakdown of autonomous agent teams! The planner/challenger/coder/tester structure makes a lot of sense — having a dedicated challenger agent to push back on plans before implementation is something I hadn't considered but seems really valuable for catching issues early.

The autonomy vs. security tradeoff you mention is real. In my own experiments with multi-agent setups, I've found that being explicit about what each agent can and can't touch (rather than blanket permissions) helps a lot. Looking forward to seeing where the experimental Agent Teams feature goes!

Postelix • May 11

Great. I liked the idea, thanks for sharing.

Xidao • May 12

The subagent specialization pattern you describe is really powerful. I have been experimenting with something similar where different subagents handle different phases of a feature — one for architecture decisions, one for implementation, one for test generation — and the orchestrator coordinates them.

The key insight I found is that the orchestrator prompt matters enormously. If it is too vague about handoff criteria, subagents either overlap in scope or leave gaps. Defining explicit "done" signals for each subagent — like "return a PR-ready diff" or "return a passing test suite" — makes the coordination much more reliable.

One challenge I keep running into is debugging when a subagent produces subtly wrong output. The context isolation that makes subagents efficient also makes it harder to trace why a particular decision was made. Have you found good patterns for observability across subagent boundaries? I have been toying with structured logging from each subagent (JSON traces of decisions and tool calls) but it adds overhead.

MapleBridge.io • May 12

One thing I’d add from product-side workflows: the tricky part is not only splitting roles, but deciding what each agent is allowed to decide.

In sourcing/matching workflows I’ve been testing, one agent can extract intent, another can check evidence, another can draft the reply. But none of them should silently decide that missing data is “probably fine.” That boundary matters more than the agent names.

I’ve found it useful to make each handoff include three things: what is known, what is assumed, and what still needs a human or tool check. Otherwise the team can sound coordinated while passing uncertainty downstream.

View full discussion (23 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.