Multi-Agent Architecture: Specialist Routing in an Autonomous Task System
When you're building an autonomous agent system that handles hundreds of tasks daily, routing every request through a single powerful model is both expensive and suboptimal. A database schema question needs different context than a React component bug. This article walks through a specialist routing architecture we deployed in production, covering task classification, agent configuration, shared memory, and the hard lessons learned along the way.
Why Specialist Routing Matters
The naive approach to multi-agent systems is to throw your most capable model at every problem. It works, but it's wasteful in two directions simultaneously: you're paying frontier model prices for tasks that don't need frontier model reasoning, and you're using a generalist prompt when a specialist prompt would produce cleaner output.
Specialist routing solves both problems. A database agent with a system prompt full of SQL patterns, schema conventions, and query optimization heuristics will outperform a general agent on database tasks — not because the underlying model is different, but because the context is tighter and more relevant. Meanwhile, simpler classification and fallback tasks can run on lighter models like Haiku at a fraction of the cost.
In our system, we saw a 40% cost reduction after implementing routing, with measurably better task completion quality on database and UI categories specifically.
Task Classification
Everything starts with accurate classification. We use a keyword-matching approach that's deliberately simple — a fast, cheap first pass that doesn't require an LLM call to route the work.
type TaskCategory = 'db' | 'ui' | 'infra' | 'analysis' | 'other';
interface ClassificationResult {
category: TaskCategory;
confidence: number;
matchedKeywords: string[];
}
const CATEGORY_KEYWORDS: Record<TaskCategory, string[]> = {
db: ['sql', 'query', 'database', 'schema', 'migration', 'index',
'postgres', 'mysql', 'transaction', 'join', 'table', 'orm'],
ui: ['react', 'component', 'css', 'layout', 'render', 'hook',
'props', 'state', 'dom', 'styling', 'animation', 'tailwind'],
infra: ['docker', 'kubernetes', 'deploy', 'ci/cd', 'pipeline',
'nginx', 'ssl', 'scaling', 'load balancer', 'terraform'],
analysis: ['analyze', 'report', 'metrics', 'performance', 'benchmark',
'compare', 'evaluate', 'statistics', 'trend', 'insight'],
other: [],
};
function classifyTask(taskDescription: string): ClassificationResult {
const normalized = taskDescription.toLowerCase();
const scores: Record<string, number> = {};
const allMatches: Record<string, string[]> = {};
for (const [category, keywords] of Object.entries(CATEGORY_KEYWORDS)) {
if (category === 'other') continue;
const matched = keywords.filter(kw => normalized.includes(kw));
scores[category] = matched.length;
allMatches[category] = matched;
}
const topCategory = Object.entries(scores)
.sort(([, a], [, b]) => b - a)[0];
if (topCategory[1] === 0) {
return { category: 'other', confidence: 1.0, matchedKeywords: [] };
}
const totalMatches = Object.values(scores).reduce((a, b) => a + b, 0);
const confidence = topCategory[1] / totalMatches;
return {
category: topCategory[0] as TaskCategory,
confidence,
matchedKeywords: allMatches[topCategory[0]],
};
}
The confidence score matters. When it's below 0.6 — meaning multiple categories have significant keyword overlap — we escalate to a higher-capability fallback rather than trusting the classification.
Agent Pool Configuration
Each specialist agent is defined with its model, system prompt, and routing criteria. The system prompts are where the real specialist behavior lives.
interface AgentConfig {
id: string;
model: string;
systemPrompt: string;
maxTokens: number;
temperature: number;
categories: TaskCategory[];
}
const AGENT_POOL: AgentConfig[] = [
{
id: 'db-specialist',
model: 'claude-sonnet-4-5',
systemPrompt: `You are a database specialist. Always structure SQL queries
with explicit column names. Prefer CTEs over nested subqueries.
Flag any query missing an index on filtered columns. Return migration
scripts in up/down pairs. When schema is ambiguous, ask before assuming.`,
maxTokens: 4096,
temperature: 0.1,
categories: ['db'],
},
{
id: 'ui-specialist',
model: 'claude-sonnet-4-5',
systemPrompt: `You are a React/TypeScript UI specialist. Default to
functional components with hooks. Use Tailwind for styling unless
project config indicates otherwise. Always handle loading and error
states. Prefer composition over prop drilling beyond two levels.`,
maxTokens: 4096,
temperature: 0.2,
categories: ['ui'],
},
{
id: 'ruflo-high',
model: 'claude-haiku-4-5',
systemPrompt: `You are a general-purpose technical analyst. Break complex
problems into structured steps. Cite your reasoning explicitly.
When comparing options, use a consistent evaluation framework.`,
maxTokens: 2048,
temperature: 0.3,
categories: ['analysis'],
},
{
id: 'ruflo-medium',
model: 'claude-haiku-4-5',
systemPrompt: `You are a general-purpose assistant for development tasks.
Be concise and direct. If a task seems misclassified, say so and
explain what specialist might handle it better.`,
maxTokens: 1024,
temperature: 0.4,
categories: ['other'],
},
];
function selectAgent(classification: ClassificationResult): AgentConfig {
if (classification.confidence < 0.6) {
return AGENT_POOL.find(a => a.id === 'ruflo-high')!;
}
return AGENT_POOL.find(
a => a.categories.includes(classification.category)
) ?? AGENT_POOL.find(a => a.id === 'ruflo-medium')!;
}
Notice the temperature gradient — specialists run cooler because their tasks reward precision, while general agents run warmer to handle the wider variance in what lands in the other bucket.
Shared Memory via global-lessons.json
Every agent writes back to a shared lessons file after task completion. This creates a lightweight institutional memory across the pool.
interface Lesson {
agentId: string;
category: TaskCategory;
taskPattern: string;
lesson: string;
timestamp: string;
successRate: number;
}
async function loadLessons(category: TaskCategory): Promise<Lesson[]> {
const all: Lesson[] = JSON.parse(
await fs.readFile('global-lessons.json', 'utf-8')
);
return all
.filter(l => l.category === category || l.agentId === 'ruflo-high')
.sort((a, b) => b.successRate - a.successRate)
.slice(0, 5); // Top 5 most relevant lessons
}
async function recordLesson(lesson: Omit<Lesson, 'timestamp'>): Promise<void> {
const existing: Lesson[] = JSON.parse(
await fs.readFile('global-lessons.json', 'utf-8')
);
existing.push({ ...lesson, timestamp: new Date().toISOString() });
await fs.writeFile('global-lessons.json', JSON.stringify(existing, null, 2));
}
Relevant lessons get injected into the system prompt at runtime. The db-
💌 Like this? Get the full system
I build + ship autonomous AI agents in public. Occasional updates, no spam.
Or grab the full open-source dashboard: Autonomous AI Task Dashboard — Next.js + Supabase + Claude starter kit, $39.
Top comments (0)