4 agents > 1: Why Argus uses a PM/SE/AP team instead of a single agent
Every AI coding agent I've tried has the same problem: it trusts itself too much.
It writes code, runs it, sees it compiles, and says "done!" — even when the code has edge cases, missing error handling, or doesn't actually solve the problem the user described. There's no second opinion, no review process, no quality gate.
Argus takes a different approach. Instead of one agent doing everything, it uses four specialized agents that work together like a small engineering team.
The four roles
┌─────────────────────────────────────────────────────┐
│ Argus Core │
│ │
│ Shared Memory ← full context visibility │
│ │
│ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │ PM │ │ SE │ │ AP │ │ C │ │
│ │ plan │ │ code │ │review│ │watch │ │
│ └──────┘ └──────┘ └──────┘ └──────┘ │
│ │
│ Executor (tools) MessageBus (events) │
└─────────────────────────────────────────────────────┘
PM (Project Manager) — the planner
The PM receives your request and does three things:
- Decides the task weight — is this a quick task (featherweight) or complex work (lightweight+)?
- For featherweight tasks: executes directly using tools (write_file, exec, etc.)
- For complex tasks: breaks down requirements, assigns to SE, reviews SE's output
The PM doesn't write code itself for complex tasks — it plans and reviews. This separation of concerns is deliberate: the PM evaluates SE's work with fresh eyes.
SE (Software Engineer) — the builder
SE takes the PM's task description and executes it. It writes files, runs commands, installs dependencies, and verifies its own work with exec commands.
SE is the workhorse, but it's not trusted alone. Every task SE completes goes through review.
AP (Approver) — the quality gate
AP is the most important role in the system. It's an independent reviewer that:
- Reads the code SE wrote
- Runs compile/test itself (doesn't trust SE's verification)
- Approves or rejects based on actual results
AP has veto power. If AP rejects, the task goes back to SE — PM can't override it. This three-layer check (SE self-test → PM review → AP approval) catches bugs that single-agent systems miss.
C (Monitor) — the watchdog
C doesn't use an LLM. It's a background process that:
- Detects when PM or SE hangs (no response for 30+ seconds)
- Auto-recovers failed tasks
- Detects file changes and suggests commits
- Provides a safety net so the system doesn't get stuck
How it works in practice
User: "Create a Go REST API with /health endpoint"
↓
PM: Analyzes requirements
└─ Breaks into tasks: 1) main.go scaffold, 2) health endpoint, 3) run & verify
└─ @SE task 1: create main.go
↓
SE: [writes main.go] → [runs go build] → passes
└─ @PM task 1 complete
↓
PM: [reads main.go with read_file] → [runs go build independently] → passes
└─ @SE task 2: add /health endpoint
↓
SE: [edits main.go] → [runs go run, tests /health with curl] → passes
└─ @PM task 2 complete
↓
PM: [reviews changes] → passes
└─ @AP task verified, please approve
↓
AP: [reads final main.go] → [runs go build] → [runs go vet] → [tests /health]
└─ ✅ Approved
The process is more steps than a single agent, but each step adds quality. In practice, for simple tasks PM skips the SE/AP chain entirely and executes directly (featherweight mode). For complex tasks, the full pipeline ensures nothing ships without review.
The shared memory problem
The hardest part of this architecture was keeping all roles on the same page. Early versions gave each role its own memory — PM stored its plan, SE stored its code, AP stored its review — and synced them via messages. This caused constant context loss.
The current V2 architecture uses shared memory: all four roles read from and write to the same context window. PM can see exactly what SE wrote. AP can see PM's review notes. No message passing, no sync issues.
V1 (old): V2 (current):
┌──────┐ ┌──────┐ ┌──────────────────┐
│ PM │ │ SE │ │ Shared Memory │
│ mem │ │ mem │ │ ┌── PM notes │
└──┬───┘ └──┬───┘ │ ├── SE code │
└──msg───→┘ │ └── AP verdict │
(sync issues) └──────────────────┘
↓ all read/write
┌──────┐ ┌──────┐ ┌──────┐
│ PM │ │ SE │ │ AP │
└──────┘ └──────┘ └──────┘
(no sync needed)
This was an 80% code reduction from V1 and eliminated an entire class of bugs where one role didn't know what another had done.
When the multi-agent overhead is worth it
The extra review steps add latency. For "fix this typo" or "run this command", the single-agent path would be faster. Argus uses a featherweight classification system to detect these cases and bypass the review chain.
But for complex work — multi-file changes, new features, bug fixes that need verification — the review process catches real issues. In our testing, AP catches about 15% of SE tasks that PM had already approved. Those are bugs that would have shipped in a single-agent system.
What this means for users
- If you're tired of AI agents that say "done" when they're not, Argus's review pipeline gives you confidence
- If you want to see how an AI arrives at its code, the role separation makes each step visible
- If you're building multi-agent systems yourself, the shared-memory pattern is worth studying
The tradeoff is speed for quality. Simple tasks are fast (featherweight mode skips overhead). Complex tasks take longer but produce more reliable results.
Argus is open source (MIT) at github.com/ArgusTek/Argus. Download the desktop app from releases or build from source.
Questions or feedback? GitHub Issues: https://github.com/ArgusTek/Argus/issues
Discussions: https://github.com/ArgusTek/Argus/discussions
Top comments (0)