Agents assemble. One agent is a hire. Many agents are a workforce.

#agents #llm #ai #architecture

The shift from monolithic prompts to coordinated agents is the most consequential architectural change in applied AI since RAG. This issue maps the canonical patterns, picks one production-ready use case, and builds it end-to-end in Semantic Kernel.

Single-agent systems hit a wall the moment a task needs separation of concerns. You feel it in the prompt, the giant system message that tries to be analyst, writer, fact-checker, and policy enforcer in one breath. It works until it doesn't, and the failure modes are exactly what you'd expect when one mind plays five roles: tone drift, forgotten constraints, hallucinated tool calls, and reasoning that quietly skips steps.

Multi-agent systems decompose the problem. Each agent has a narrow remit, a smaller toolset, a tighter system prompt, and clearer evaluation criteria. The orchestration layer becomes the actual product surface and that's where the patterns matter.

Multi-Agent System Use case: Autonomous Incident Response

When PagerDuty fires at 3 a.m., the on-call doesn't need a chatbot, they need a team. A triage agent that reads the alert, a diagnostic agent that pulls logs and metrics, a knowledge agent that searches runbooks and past incidents, a remediation agent that proposes (and after approval, executes) fixes, and a communications agent that drafts the status-page update. This is the canonical multi-agent shape: short-lived, high-stakes, tool-heavy, with a human-in-the-loop gate before anything writes to production.

Behind Semantic Kernel, LangGraph, AutoGen, and CrewAI: The Same 6 Patterns

Most production agentic systems are built as a composition of these six patterns. Memorize the shapes. Every framework like Semantic Kernel, AutoGen, LangGraph, or CrewAI may call them something different, but the core topology is the same.

1. Sequential / Pipeline: Each agent's output is the next agent's input. Used when stages are strictly ordered and outputs accumulate. Incident analogy: Triage → Diagnose → Remediate → Communicate.

2. Concurrent / Parallel: A dispatcher fans the same task to N specialist agents in parallel; an aggregator merges results. Incident analogy: simultaneously query logs, metrics, traces, and runbooks then merge findings.

3. Group Chat / Debate: All agents see one shared message thread; a selector picks who speaks next. Useful when answers benefit from disagreement and critique. Incident analogy: root-cause debate between Diagnostic and Knowledge agents, refereed by a Lead.

4. Handoff / Routing: An agent decides based on the conversation to transfer control (and context) to another, more specialized agent. Incident analogy: Triage hands off to a Database specialist when the alert is a slow-query alert; Likewise to a Network specialist or app specialist.

5. Magnetic / Orchestrator-Worker: A central orchestrator maintains a task ledger, assigns the next step to the best worker, and re-plans on failure. The most flexible and the most expensive. Incident analogy: the on-call's "brain" coordinating specialists across an evolving incident timeline.

6. Hierarchical / Team-of-teams: A tree of agents. Each manager owns a sub-team and exposes only its summarized result upward. Scales to large problems by hiding complexity. Incident analogy: SRE Manager → (DB Lead → DB workers) + (App Lead → App workers).

The orchestration layer is the product. Models are commodities; coordination is the moat.

Reference architecture

SEMANTIC KERNEL · C#

Here's how the patterns compose for the incident-response use case. Triage runs first (sequential), then a concurrent fan-out across investigators, followed by a group-chat root-cause debate. The orchestrator chooses a remediation specialist via handoff, gates on a human approval, and finally hands off to a communications agent.

Building it in Semantic Kernel

.NET 8 · SK 1.x

The full repo is in github. Below: the three pieces that matter most.

a. Defining a specialist agent
Each agent is a ChatCompletionAgent with its own kernel, tool plugins, and instructions.

Agents/TriageAgent.cs C#

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Agents;

public static class TriageAgentFactory
{
    public static ChatCompletionAgent Create(Kernel kernel) => new()
    {
        Name = "TriageAgent",
        Instructions = """
            You are the on-call triage agent. Given an alert payload, output JSON:
              { severity: 'P1'|'P2'|'P3', system: string, suspected_cause: string }
            Do not speculate beyond the evidence. Do not call remediation tools.
            """,
        Kernel = kernel.Clone(),
        Arguments = new(new PromptExecutionSettings
        {
            FunctionChoiceBehavior = FunctionChoiceBehavior.Auto()
        })
    };
}

b. Concurrent fan-out across investigators
SK ships ConcurrentOrchestration for exactly this. Three investigators run in parallel, each with their own tool plugin (Loki, Prometheus, vector store).

Orchestration/InvestigationFanOut.cs C#

using Microsoft.SemanticKernel.Agents.Orchestration.Concurrent;
using Microsoft.SemanticKernel.Agents.Runtime.InProcess;

var orchestration = new ConcurrentOrchestration(
    LogAgentFactory.Create(kernel),
    MetricsAgentFactory.Create(kernel),
    KbSearchAgentFactory.Create(kernel));

await using var runtime = new InProcessRuntime();
await runtime.StartAsync();

var result = await orchestration.InvokeAsync(triageResult.AsJson(), runtime);
var findings = await result.GetValueAsync(TimeSpan.FromSeconds(90));

// findings is string[] — one per agent. Hand to the debate stage next.
await runtime.RunUntilIdleAsync();

c. Group-chat root-cause debate, with a human gate
Two specialists debate the root cause; a "Lead" agent terminates the chat once a confident conclusion is reached. The remediation step then asks a real human for approval before any tool with side effects fires.

Orchestration/RootCauseDebate.cs C#

using Microsoft.SemanticKernel.Agents.Orchestration.GroupChat;

var debate = new GroupChatOrchestration(
    new RoundRobinGroupChatManager { MaximumInvocationCount = 6 },
    DiagnosticAgentFactory.Create(kernel),
    KnowledgeAgentFactory.Create(kernel),
    LeadAgentFactory.Create(kernel))   // terminates when confident
{
    ResponseCallback = msg => { Console.WriteLine($"[{msg.AuthorName}] {msg.Content}"); return ValueTask.CompletedTask; }
};

var hypothesis = await (await debate
    .InvokeAsync($"Findings:\n{string.Join(\"\\n---\\n\", findings)}", runtime))
    .GetValueAsync(TimeSpan.FromSeconds(120));

// Human-in-the-loop gate — block until SRE approves remediation plan.
if (!await ApprovalGate.RequestAsync(hypothesis))
    throw new OperationCanceledException("SRE rejected remediation plan.");

**Production checklist
**Before you put any of this on-call rotation:

Token budget per stage. Each agent gets a hard token cap. Long debates drain wallets.
Tool allow-lists. The triage agent should have read-only access. Only the remediation agent gets write tools, and only after approval.
Tracing. Every agent invocation, every tool call, every handoff instrumented (OpenTelemetry → Honeycomb / Datadog).
Eval harness. Replay the last 90 days of incidents nightly and grade the agents' output against the human resolution.
Kill switch. A feature flag that puts the system back into "advise-only" mode in one click.
Cost guardrails. Per-incident max spend; per-day max spend; alarm at 80%.

Final Thoughts

We need to stop viewing agentic workflows as a creative exercise and start viewing them as an engineering one. The winners in this space treat agents as decoupled microservices:

Independently Deployable: Update one component without risking the entire swarm.
**Individually Testable: **Identify exactly where logic fails before it hits production.
** Ruthlessly Observable:** Trace every decision and handoff in real-time.

Operational discipline is the entire game. Everything else is just noise.

Satish Gopinathan is an AI Strategist, Enterprise Architect, and the voice behind The Pragmatic Architect. Read more at eagleeyethinker.com or Subscribe on LinkedIn.

agenticai aiops incidentmanagement autonomousagents multiagentsystems enterpriseai itoperations aiarchitecture generativeai aiengineering semantickernel dotnet csharp azureai azureopenai llmops agentops observability intelligentautomation sre cloudoperations platformengineering enterprisearchitecture workflowautomation devops digitaltransformation aiforitops incidentresponse futureofoperations