Eelco Los

Posted on May 29 • Edited on Jun 14

How I productionized my multi-agent AI support copilot in Teams and Azure

#ai #agents #teams #azure

TL;DR

Built a .NET A2A demo to validate the triage pattern before deploying the Python system. If the shape only works in one stack, it is not a portable pattern.
Teams is the ingest channel for this deployment, not a hard requirement. The bot posts to a channel-agnostic /ingress endpoint; any other ingest can do the same.
Teams timeout budgets forced a full async reply architecture.
Adaptive card size limits forced progressive disclosure: compact badge up front, everything else behind toggles.
RSC permissions only activate on manifest install, not Entra consent alone. Getting that order wrong costs you a 403 and a bug you cannot reproduce.
Containerization was table stakes. The real work was auth, telemetry, storage, and making every platform permission explicit.

Part 2 covered the runtime failures and the hardening work that followed. This post is about the next step: productionization.

Once the system was capable of producing credible triage results repeatedly, the question changed again. It was no longer "can this architecture work?" It was "can this behave like a deployed product?"

That question turned out to be broader than "put it in Docker":

channels had timeout budgets
Teams cards had presentation limits
attachments arrived in platform-specific shapes
storage and audit needed durable homes
deployment needed images, identities, secrets, probes, and update flow
admin approval and manifest install were part of the runtime story, not just setup trivia

In other words: the hard part was not just getting the system to reason. It was getting the system to operate in the real environment it was supposed to serve. For this deployment, that environment is Microsoft Teams as the ingest channel and Azure as the runtime.

Validating the shape before leaning on it

Before going further into the productionization lessons, I want to point to something concrete.

When agents run inside an LLM session, it is hard to tell whether a failure is a routing problem or a model problem. Before deploying, I wanted proof that the triage pattern held up in a more traditional distributed system: one where the "agents" are plain HTTP services, not LLM sub-sessions, and the failures are just HTTP failures.

The main system uses Claude Code as the multi-agent runtime, so the orchestration is intertwined with model behavior. A standalone implementation in a different stack could isolate the protocol from the reasoning and give a cleaner signal: does the triage shape work, or does it only work because the LLM is papering over gaps?

That's why I built a2a-docker-demo: a standalone FastEndpoints/.NET implementation of the same triage pattern, using the A2A protocol spec. The main system is Python, so using a different stack for the demo was deliberate. If the shape only works in one language or one framework, it's not really a portable pattern.

The demo runs a full triage workflow across five services: Classifier, Assessor, Router, Handler, and an API backend that orchestrates them. Each service does one narrow job and knows nothing about the others. The API backend sequences them. In plain terms: a request comes in, gets classified, gets assessed for priority, gets routed to a queue, and gets handled. The A2A protocol is what connects them: each specialist advertises what it can do via a machine-readable card, and callers use that card to invoke it over a standard JSON-RPC endpoint. Authentication uses short-lived tokens tied to the calling agent's identity, not shared secrets, so every service-to-service call is independently verifiable. Grafana and Tempo make the call graph visible.

What it confirmed:

The linear triage shape works: each specialist receives a request, does its narrow job, and returns a structured result. No specialist needs to know about the others.
Identity at the boundary is not optional. Mixing user tokens and agent tokens causes 401s in ways that are easy to miss if you're not thinking about it.
Discovery is its own concern. The demo includes a discovery service, but the active triage flow ended up not depending on dynamic discovery. The API backend knows its specialists and fetches their agent cards directly. That turned out to be the right tradeoff for a known-topology system.

What the LLM layer adds is the reasoning that rule-based routing can't replicate. The demo classifies, routes, and handles, but everything it knows is hardcoded. The main system's value is that the evidence agents read real telemetry, CRM data, provisioning logs, and identity policy, and the synthesis layer reasons across them. That's not something you can validate in a protocol demo. But the protocol demo proved the container shape, the auth boundaries, and the A2A communication pattern before I had to debug all three at once inside a production deployment.

Why Teams, and why it matters that it's just an ingest

New support tickets arrive in Teams via an automated workflow from the ticketing system. Support agents were already working in that channel. Rather than build a new interface and ask people to change their flow, we injected the triage bot into the channel where the work was already happening. The bot intercepts the incoming workflow messages and runs triage in parallel, without requiring any change to the existing process.

The architecture reflects that the Teams ingest is not load-bearing. A thin bot adapter receives the message and forwards it to a channel-agnostic /ingress endpoint on the orchestration container. Any other ingest (a Zendesk webhook, an email parser, a direct API call) can POST to that same endpoint without touching the orchestration layer. The bot is pluggable. The core system does not care where the message came from.

What we did not fully anticipate was how much the properties of that specific ingest channel would shape the surrounding architecture. The next three lessons are all consequences of that choice.

Lesson 8: async boundaries became part of the product

The original mental model was synchronous: a Teams message comes in, the bot forwards it, the management agent does its work, and the reply comes back in the thread.

That flow is elegant and wrong.

A real triage takes minutes. Teams does not care that your orchestration is elegant. The Bot Framework wants a quick response. The channel wants acknowledgment fast. If you wait for full triage before responding, you have already lost. The request times out before the result arrives.

That forced an architectural change:

Teams message
  -> immediate acknowledgment ("🔍 Triage in progress...")
  -> POST /ingress  →  HTTP 202 Accepted
  -> background triage task
  -> POST callback_url  (from orchestrator back to bot)
  -> adapter.continue_conversation()
  -> threaded Teams reply with result

The repo reflects that shift:

app.py accepts ingress and returns 202 Accepted, then fires asyncio.create_task(run_triage_background(...))
triage runs in the background against the LLM provider
the result is POSTed to the bot's /api/proactive endpoint, secured by a shared key in Key Vault
the Teams bot posts back into the original thread using the stored ConversationReference

This is one of those moments where "plumbing" turns into product behavior. The async boundary is not an implementation detail. It determines whether the user experiences the system as responsive or broken.

It also pushed the architecture toward a clearer separation of concerns:

ingress: receive, acknowledge, store the conversation reference
orchestration: run triage in the background
presentation: post back via proactive callback

Once a channel has timeout budgets, async is not an optimization. It is table stakes. A different ingest channel would impose different constraints, but the same shape holds regardless: acknowledge fast, process in the background, deliver the result asynchronously.

Lesson 9: presentation constraints changed the architecture too

Another production surprise: the output surface is part of the system design.

In local markdown reports, it is easy to think "just show all the evidence." In Teams Adaptive Cards, that becomes nonsense very quickly.

The card payload has practical size limits. Evidence can be huge. Customer drafts can be long. Raw APM output can explode in size. A triage system that preserves everything internally still has to decide what a human should see at a glance.

That's why the card formatter ended up with explicit progressive disclosure:

collapsible sections
capped evidence claims
capped claim lengths
inline draft only when short enough
details hidden by default

But even that wasn't enough once real users saw it. The initial cards were simply too large: multiple agents each rendering up to 20 claims at 300 characters apiece added up to cards that were difficult to scan. The fix was a compact outer container (a title, a two-field confidence badge, and a single "Show analysis" toggle) with everything else collapsed until the support agent asks for it.

That was not just UI cleanup. It was a change in how the system expressed itself. The best pattern I found:

Keep the top-level card short and scannable. One glance should answer "is this worth expanding?"
Preserve deeper detail behind toggles
Store the raw material in blob storage, not in the card

That matches the broader architecture: compact context for humans, raw evidence for audit, and only the right subset flowing into the model.

Lesson 10: your ingest channel owns its own permission surface

Attachments and platform permissions turned into their own evidence-delivery problem, and a reminder that the ingest layer is not passive.

The plans already recognized that screenshots, logs, exports, and shared-file URLs were part of the incident surface. But the runtime taught a more specific lesson: the platform decides how those inputs arrive, and that shape is often inconvenient.

Teams inline images are one good example. They are not always delivered as clean attachments in the way you might expect. Pasted or inline images can show up only as hosted-content URLs embedded in the HTML body. If your system only looks at the attachment list, you miss them. That is why the Teams bot needed logic to inspect HTML body content, extract hosted-content image URLs, use the bot token to fetch Teams-hosted files, and normalize those results into the incident attachment pipeline.

The same pattern repeated for file access. The bot could parse file URLs from the message body, but actually fetching team-shared files required Files.Read.Group, a separate RSC permission declared in the manifest. Another capability, another permission, another manifest version, another install cycle with the Teams admin.

But the deeper platform lesson came from a different direction: permissions that look granted are not always active.

Getting the bot working in a real Teams tenant required two things in combination: Entra admin consent and installing the Teams app manifest. That sounds obvious, but the order of operations matters. Entra consent registers the app and grants the declared permissions in principle. RSC permissions, like ChannelMessage.Read.Group (which the bot needs to read thread history), are only activated when the manifest is installed. Entra approval alone does not trigger them. The manifest install is what causes the Teams platform to enforce and expose the RSC surface.

We discovered this the hard way. The bot could post replies. Entra showed the app as approved. But every attempt to fetch a thread's root message via the Graph API returned 403. The permission looked granted. The permission was not active.

Once RSC was properly activated, a second issue surfaced: the bot had been falling back to stale context when it couldn't read the thread root. A user mentioning the bot in a reply to a workflow card would get triage results for the wrong incident, because the bot resolved incident identity from prior stored state instead of the thread's origin message. The 403 had masked the bug. The code had been correct for some time, but it could only prove it once the permission was actually live.

That is exactly the kind of thing you only learn by running the system for real. The code can be correct. The bot can be deployed. The Azure side can be healthy. And the feature can still fail because the host platform has not activated the permission surface your runtime depends on.

So yes: the Teams IT-admin path is part of the architecture. Entra approval, app manifest, RSC permission activation, installation flow, and actual tenant behavior are operational dependencies, not external trivia.

Lesson 11: containerization is only one slice of productionization

By this phase, the repo had clearly crossed the line from "research notes plus local prompts" into "things that get packaged and deployed."

There are now two main runtime packages:

the orchestrator/ingress app
the Teams bot

They have Dockerfiles. They are built and pushed to ACR. They are rolled out to Azure Container Apps. They expose /health. They carry shared secrets and callback URLs. They log. They emit telemetry. They get updated as separate containers.

That is containerization.

But the bigger lesson is that containerization was only the beginning. Productionization also meant:

provider abstraction: the runtime could not stay hardwired to one LLM access path, so the plans moved toward a BYOK provider layer (azure, openai, anthropic, github)
identity model: local CLI auth had to give way to managed identity and workload identity planning
deployment ergonomics: separate images, separate updates, named container updates, cross-subscription ACR login
telemetry: OpenTelemetry spans and structured logs had to exist so failures were diagnosable
secrets and hooks: webhook secrets and proactive callback tokens had to be validated explicitly
storage: incident state and raw evidence needed durable homes

Even the Docker image contents taught something. Installing Azure CLI and GitHub CLI inside the root image made the local and hosted behavior more consistent, but it also made the image heavier and startup slower. That is not a reason not to do it. It is a reminder that packaging decisions become runtime tradeoffs.

So when I say "productionization," I do include containerization. I just do not mean only containerization.

The Azure object model had to become explicit too

Another thing I underestimated was how much clarity you get from naming the Azure objects explicitly.

When a system is still mostly a notebook idea or a local Docker flow, it is easy to say "we'll deploy this to Azure" and leave the rest fuzzy. But productionization forces a sharper question: what are the actual Azure resource types this system needs, and what job does each one do?

Here is what is currently deployed:

Azure object	Why it exists
Resource Group	The boundary that holds the deployed environment together
Container Registry	Stores the bot image and the orchestrator/ingress image
Container Apps Environment	Shared runtime boundary for the deployed containers
Container App: orchestrator/ingress	Receives requests, runs triage, handles callbacks and orchestration
Container App: Teams bot	Handles Teams messages, threaded replies, proactive updates, and attachment fetch
User-assigned Managed Identity	Gives the runtime a stable Azure identity without interactive login
Key Vault	Holds secrets that still need to exist: bot credentials, shared callback keys
Storage Account + Blob containers	Incident state, attachments, and raw evidence
Application Insights + Log Analytics	Telemetry, traces, and operational debugging
Azure Bot / Entra app registration	The identity and control-plane pieces that let the Teams bot exist in the tenant
RBAC role assignments	Read/write boundaries: which identity can access telemetry, storage, and queues

Naming the objects helped in two ways. First, it made the platform shape legible: instead of "an AI agent deployed to Azure," you get a concrete service map. Second, it surfaced hidden dependencies early. If you want auditability, you need durable storage beyond the model context. If you want a Teams bot to work in a real tenant, app registration and manifest install are not optional side notes.

What the plans and sessions converged on

Looking back across the design notes, the implementation plans, and the build sessions, the lessons converge pretty cleanly.

The architecture itself was mostly right:

orchestrator-worker
blackboard-style shared incident context
no agent-to-agent communication
deterministic policy gate
human approval before external action

What needed the most work was everything around that architecture: auth, storage, async handoff, presentation, deployment, and the platform admin paths that only matter once you leave the notebook.

That is why the project stopped feeling like prompt engineering and started feeling like systems engineering.

The multi-agent part did not go away. It just became one layer inside a broader operational stack.

What productionization actually meant here

If I had to compress the whole phase into one sentence, it would be this:

Productionization was not "put the prototype in Docker." It was making every important boundary explicit.

Local auth versus hosted identity. Compact model context versus full raw evidence. Synchronous intake versus async completion. Code correctness versus platform permission activation. Each of those had to become an explicit seam in the system, not an implicit assumption.

That is the part I underestimated most.

The architecture survived. The work was in making it durable, inspectable, resumable, deployable, and permissioned enough to live outside the notebook.

And that, more than any single prompt or agent definition, is what made it start to feel real.

Part 4 will cover what actually running this in production taught us: the quiet failures, the behaviors we didn't predict, and the feedback loops that changed the second iteration.

Top comments (4)

Harjot Singh • May 31

Grounding answers in our actual docs so it stops inventing policy is the line that matters most for a support copilot specifically, because in support a confident hallucination isn't a bad demo, it's a liability: the bot tells a customer something false about your refund or security policy, they act on it, and you're on the hook for what it said. So the trust mechanism has to be grounding-or-escalate: answer only from retrieved docs, and when the docs don't cover it, say I don't know and hand to a human rather than smoothing over the gap with a plausible invention. Abstaining is a feature here, not a failure. The other three you listed are the real productionization tax that prototypes never show: routing between agents (picking the right specialist without a brittle if-else), conversation state across Teams threads (the same user, the same issue, fragmented across messages and channels, is genuinely hard state to keep coherent), and cost control (route the easy FAQ-shaped questions to a cheap model, reserve the strong one for the genuinely ambiguous tickets). The throughline: a support agent's job is to be right or to defer, never to be confidently wrong, and everything else is the plumbing that makes that reliable at scale. Ground it, and let it escalate instead of invent. That answer-from-the-docs-or-abstain instinct is core to how I think about Moonshift. When retrieval comes up empty, does the copilot hard-escalate to a human, or attempt a best-effort answer with a caveat?

Eelco Los • Jun 1

Great point, and I agree with your core principle: for customer-facing support, it should be ground-or-escalate, never confidently invent.

In my current setup, though, this first productionized version is inward-facing (internal team copilot), not outward-facing customer support. So the immediate objective is reliable internal information retrieval and handoff support, not autonomous customer action.

Also aligned on your grounding note: in this flow, evidence-backed responses are foundational, and at this stage the agent is intentionally read-only. Starting read-only feels like the safest path in early productionization before introducing any action-taking capabilities.

Gilder Miller • May 29

Good catch! Thanks.😺

Eelco Los • Jun 1

thank you 🙏