DEV Community: Eelco Los

How I hardened my multi-agent AI support copilot

Eelco Los — Fri, 24 Apr 2026 14:56:25 +0000

The first post in this series was about the design. This one is about what happened when the first real tickets hit the wiring, and about the hardening work that followed once those runs exposed the weak spots.

The good news is that the architecture mostly held up. The orchestrator-worker model was still the right shape. Parallel evidence gathering still made sense. Shared incident context still made sense. Human approval still made sense.

What broke was the boundary layer: what actually executes a skill, how incomplete configuration should fail, what a spawned sub-agent can really invoke, which tools are scoped to the parent session instead of the child, how incident state should persist, and how much of the system could be tested like normal software.

The examples here are specific to Claude Code, but the class of problem is broader than one tool. In multi-agent systems, runtime semantics matter more than diagrams.

Lesson 1: Skills are documents. Agents are executors.

The first live test was against a real Zendesk ticket from a customer tenant: "The request is blocked" during provisioning admin consent. The Management Agent ran. Sub-agents were dispatched. Evidence came back.

But something was wrong: no az commands had actually run. No APIs were called. The "evidence" was the agents describing what the skills would do if they were run: accurate, thoughtful, and completely useless.

The root cause took a minute to understand. I had treated the skill files as if they were runnable workers. They weren't.

SKILL.md files in Claude Code are execution instructions for an agent that already has the right tools. The frontmatter declares allowed-tools: Bash(az *), but that only scopes permissions when the skill is directly invoked as a user command. In this repo, I call the task-spawned executors evidence agents. The .claude/agents/*.md files define which one runs, what tools it gets, and that it should execute the skill's commands rather than just read the skill file. When the Management Agent does TaskCreate -> "run the SCIM skill", it spawns one of those evidence agents. That evidence agent needs its own .md definition that:

Declares it has Bash(az *) in its tools: frontmatter
Has a system prompt that says execute the skill's commands, not just read the skill file

We had written the instructions, but not the workers. The .claude/agents/ directory was mostly empty. There was nothing for TaskCreate to dispatch to by name.

The fix was straightforward once the problem was clear: create .claude/agents/apm.md, crm.md, scim.md, b2c.md, alm.md. Each with:

---
name: APM Evidence Agent
description: >
  Evidence-gathering sub-agent for application telemetry. Reads the APM skill
  instructions, executes the query steps, and returns structured claim[] entries.
  Dispatched by the Management Agent via TaskCreate; not user-invocable directly.
model: claude-sonnet-4.5
tools: Bash(az *)
maxTurns: 20
---

You are the APM Evidence Agent. Given an IncidentContext with resolved_identity,
execute the APM skill bootstrap and query steps. Write structured claim[] entries
to IncidentContext.evidence[]. Return a compact summary to the Management Agent.

The takeaway: SKILL.md is documentation. An agent definition is what actually runs. Don't confuse the map for the territory.

Lesson 2: Fail fast on configuration. Silent dry-run is an anti-pattern.

The second live test exposed a different problem. All four evidence agents returned:

{
  "agent": "apm",
  "claim_type": "gap",
  "confidence": 0.00,
  "dry_run": true,
  "claim": "Required config fields empty: key_resources[].name, resource_group, subscription_id"
}

Same from CRM, provisioning, and identity config. The Management Agent continued, the Synthesis Agent synthesised from nothing. We got a hypothesis based purely on domain pattern-matching, with zero real telemetry. And we'd burned a full round of context tokens to get there.

The worst part was not that evidence was missing. It was that the output still looked plausible. A clean failure would have been better than a confident-looking answer built on gaps.

The design intent was different. The --dry-run flag existed for demos and training: an opt-in mode. But in practice, any time config was incomplete, the system silently fell back to dry-run instead of reporting what was missing.

The fix: a validate-expertise skill that runs at session start, checks all required fields across the five domain YAML files, and reports what's missing:

Domain   | Field                     | Status  | Auto-discovery
---------|---------------------------|---------|------------------------------------------------
apm      | key_resources[0].name     | ❌ EMPTY | az monitor app-insights component list -o table
crm      | base_url.prod             | ❌ EMPTY | (manual; check deployment docs)
scim     | scim_enterprise_app_sp_id | ❌ EMPTY | az rest ... /servicePrincipals?$filter=...
identity | identity_tenant_id        | ❌ EMPTY | az ad tenant list
alm      | key_repos[0].owner        | ❌ EMPTY | gh repo list --json owner,name

If any required field is empty → print the fill table, log to audit, stop. No evidence agents dispatch.

We also annotated the YAML files directly:

# apm.yaml
key_resources:
  - name: ""              # REQUIRED: az monitor app-insights component list -o table
    resource_group: ""    # REQUIRED: same command
    subscription_id: ""   # REQUIRED: az account show --query id -o tsv

The meta-lesson: dry-run mode should be opt-in and explicit, not the default fallback when setup is incomplete. If the system silently degrades, you don't know it's wrong until you compare the output to reality.

Lesson 3: Slash commands don't exist in sub-agents

This one was embarrassing.

The Management Agent's Step 0 contained:

/validate-expertise

The Onboarding flow contained:

/validate-expertise --dev

Both of these were silently not working. The agent treated them as skill tool calls, got "skill not found" errors, and then manually read the YAML files anyway, which produced a plausible output that masked the actual failure.

Skills in Claude Code are not registered slash commands. They are markdown files. The correct invocation is:

Read .claude/skills/validate-expertise/SKILL.md and execute the validation steps.

Every place in agent instructions that used /skill-name syntax needed updating to explicit file-read instructions. The prime skill had the right pattern all along: "Load current context: execute the prime skill by reading .claude/skills/prime/SKILL.md". We just hadn't applied it consistently.

The fix (pr-13) also introduced an Onboarding Agent, a dedicated agent that runs az/gh auto-discovery commands for each domain, presents the discovered values interactively, and writes confirmed non-secret values into the YAML files. After it runs, it re-validates. The support engineer goes from "11 empty required fields" to "ready to triage" in one guided session.

Lesson 4: Constrain the interface, not just the credentials

This one came in two layers. The CRM skill was the first fully working skill. It uses an MCP server (CRM-* tools) that wraps an internal customer-data API. The API key behind it had write access across the CRM surface, so I intentionally exposed only the read endpoints through the MCP layer. That kept the agent's direct permissions narrow.

The deeper boundary showed up when we wired that same capability into a TaskCreate -> "CRM Evidence Agent" dispatch. The sub-agent started and then hit AADSTS65001:

AADSTS65001: The user or administrator has not consented to use the application
with ID '{ResourceId}'...

The MCP tools (CRM-get_company, CRM-lookup_user_by_email, etc.) are session-scoped to the Management Agent's Claude Code session. When a sub-agent is spawned via task, it starts with a clean tool set; MCP tools don't carry over. The lesson was not just "use read-only endpoints" but also "don't assume the child session has the parent's tools." The sub-agent fell back to az account get-access-token --resource {ResourceId}, which fails because that service principal hasn't been granted consent for interactive CLI flows.

The workaround (and current design): CRM runs as a direct call in the Management Agent session, not as a TaskCreate dispatch. The Management Agent calls CRM-lookup_user_by_email directly before dispatching other evidence agents. Identity resolution happens first, synchronously, in the parent session.

Worth separating two things here that are easy to conflate. The direct-call pattern is a workaround for the MCP tool-scope limitation. But it may also be the right shape regardless. Resolving identity synchronously in the parent session, before dispatching parallel workers, is a reasonable sequencing decision on its own terms: it keeps identity resolution out of the sub-agents, surfaces auth failures early, and means workers can assume a resolved identity rather than having to negotiate it. When MCP tool forwarding lands, the question isn't automatically "switch back to the dispatch model." It's "is synchronous identity resolution still the better shape?" The workaround and the correct architecture might happen to be the same thing.

The decisions.md entry captures the distinction:

Current state: MCP tool forwarding in task: Not supported.
When to re-evaluate: New Claude Code version / MCP spec update.
When available: CRM can be dispatched as TaskCreate → "CRM Evidence Agent" with CRM-* tools. Remove direct-call pattern from management-agent.md.

Document the current workaround, document the future path, document the trigger for re-evaluation. That's the pattern for any runtime constraint you're working around rather than solving.

Lesson 5: local-dev auth and production auth are different systems

One of the first hardening lessons was that I had been mentally treating authentication as one thing. It wasn't.

In the research phase, the working assumption was simple: if az login works in the parent session, the sub-agents should be able to use the same credentials. Real runs made that assumption look shaky. We saw cases where the APM evidence path returned a credential gap even though the parent session was already authenticated.

That does not automatically mean the agent logic was wrong. It means the runtime boundary was more important than the prompt. A child session is not the same thing as the parent. A local CLI flow is not the same thing as a hosted workload identity. "Authenticated on my machine" is not a production auth strategy.

For local development, we found one concrete pattern that was reliable: user-level environment variables were visible to new PowerShell sessions, while process-level $env: values were not.

# Set once for local dev:
[Environment]::SetEnvironmentVariable('AZURE_DEVOPS_EXT_PAT', $yourToken, 'User')

# Read inside a spawned session:
[Environment]::GetEnvironmentVariable('AZURE_DEVOPS_EXT_PAT', 'User')

That solved a local dev problem. It did not solve the production problem.

The broader lesson was that auth should be treated as part of the architecture, not as bootstrap glue. If the system depends on interactive CLI auth in production, it is not productionized yet.

The production direction is a workload identity model: managed identity on the host, workload identity federation for cross-tenant scenarios, service principals with explicit RBAC rather than user delegation. How that maps to this system's deployment shape is Part 3 territory. The design implication was already clear in Part 2: auth is an architectural decision, not something you patch in at the prompt layer.

Lesson 6: IncidentContext needed to be durable and layered

The IncidentContext pattern held up extremely well. It was still the right abstraction: every evidence agent writes into shared incident state, the synthesis layer reads from it, and agents do not talk directly to one another.

But the storage story behind that abstraction had to evolve.

At first, IncidentContext was mostly conceptual. It lived in prompts, in in-memory state, and in repeated task payloads. That was enough to prove the architecture, but it broke down as soon as the system had to resume work, survive interruptions, or preserve a real audit trail.

The first hardening step was to make it a real file. That solved two immediate problems:

Token overhead: we stopped re-injecting the full incident state into every sub-agent prompt
Resume: the incident could survive beyond one chat turn or one session

But even at that stage, the more general lesson was clear: not all state belongs in the model context, and not all state should stay implicit.

Once incidents become long-lived objects rather than prompt-shaped blobs, you can start talking seriously about revisions, late evidence, replay, and audit continuity.

That's what "layered" means here. An incident has distinct phases: intake state, live evidence under collection, synthesized hypothesis, reviewer annotations, and final audit record. Those phases have different lifecycles and different consumers. Flattening them into a single blob works fine when an incident is simple and completes in one run. It breaks when evidence arrives late, when the system is interrupted mid-investigation, or when you need to replay synthesis without re-running the full evidence pass. The file-backed IncidentContext was the first step toward treating those phases as first-class state, not just named sections of a big prompt.

Lesson 7: you can test more of this than you think

One of the most useful hardening lessons in the repo was how much of a multi-agent system is actually testable with normal engineering techniques.

At first, it is tempting to think of an agent system as mostly prompt behavior, and therefore mostly manual validation. That does not scale for long.

The breakthrough was to stop thinking about the whole thing as one fuzzy AI system and start testing its surfaces separately.

1. Static validation tests

These do not touch an LLM at all. They just assert that the repo is internally coherent:

expertise YAML files contain the required fields
agent and skill markdown files have valid frontmatter
incident context JSON files match schema
claim objects have the required fields and valid confidence ranges

That catches a surprising amount of breakage before you ever run a live incident.

2. Deterministic logic tests

Some of the system is not AI behavior. It is just logic.

We wrote tests for things like:

synthesis confidence scoring logic
policy gate thresholds
intake regexes such as regression_from_version
recalibration boundaries like [0.45, 0.65)

Those are product decisions encoded in code. They deserve ordinary tests.

3. Contract tests for agents

Because so much of the runtime lives in markdown, agent files and skill files need to be treated as contracts, not prose.

Examples:

every documented agent must have a corresponding .md file
every evidence agent must allow the claim types it actually emits
dry-run gap claims must have confidence = 0.0
routing rules stay consistent with the management agent documentation

If markdown is part of the runtime, markdown deserves tests.

4. Golden cases from real incidents

Past incidents became fixtures. We asserted things like:

at least four evidence agents contributed
the top hypothesis stays within a target confidence range
no PII leaks into hypothesis text
audit steps appear in the right order
known incidents still produce the same class of answer after refactors

That turns solved incidents into reusable engineering assets.

5. LLM evals where model behavior really matters

There is still a model-shaped layer, especially around synthesis quality, reviewer behavior, and policy gate quality. That is where evals belong.

The repo now includes evals for:

synthesis quality
reviewer PII and promise detection
policy gate correctness
incident epistemic safety

And the tests are not just local rituals anymore. They are wired into CI, so pushes and pull requests get automated results instead of "I tried a few prompts and it seemed okay."

The broader lesson is simple: multi-agent systems are still software. Test the deterministic parts deterministically. Test contracts as contracts. Use golden cases for regressions. Save LLM evals for the surfaces that actually require model judgment.

What these failures had in common

None of these were "the model reasoned badly" problems. They were places where I had relied on an implicit contract that wasn't actually real:

I treated instructions as executors
I treated missing config as something the system could gracefully work around
I treated slash-command syntax as if it were a portable invocation layer
I treated parent-session tools as if they would be inherited by child sessions
I treated local auth as if it would behave the same across runtime contexts
I treated prompt memory as if it were durable system state
I treated too much of the system as if it had to be tested manually

The fix in each case was the same: make the boundary explicit, codify the workaround, and stop relying on implied behavior.

The debugging order that actually worked

Once I noticed the pattern, I stopped starting with the prompt and started with the execution chain. This ordered checklist is the most transferable thing in this post, and it applies to any multi-agent system, not just Claude Code.

In practice, I kept asking the same questions in the same order:

Did the right worker actually run? If the wrong agent was dispatched, or no agent existed for that role, nothing downstream mattered.
Did it have the tools I thought it had? A skill that mentions az is not the same thing as an agent that can execute az.
Was configuration complete? If required fields were empty, the system needed to stop, not improvise.
Was I using a real invocation mechanism? Slash-command looking syntax felt convenient, but convenience is not the same thing as runtime support.
Was the tool available in this session or only in the parent? MCP scope turned out to matter more than I expected.
Only then: was the model reasoning actually wrong?

That order saved a lot of time. If step 1 is false, better prompting will not help. If step 3 is false, the best model in the world can only produce a sophisticated answer around missing inputs. If step 5 is false, you do not have an agent-quality problem. You have a system-boundary problem.

That was one of the most useful mindset shifts of the whole project. Multi-agent debugging often looks like prompt debugging at first, but a lot of it is closer to distributed systems debugging: execution path, capability scope, state, and contracts.

What changed in the repo because of these runs

The first live incidents changed the codebase in very concrete ways.

We added explicit evidence-agent definitions instead of assuming skills were executable on their own. We added validate-expertise so incomplete setup blocked the run instead of quietly degrading into dry-run mode. We introduced an Onboarding Agent so filling the YAML files became a guided flow instead of a scavenger hunt. We rewrote skill invocations in agent instructions to explicit file-read-and-execute patterns. And we documented the CRM direct-call workaround in decisions.md with a re-evaluation trigger instead of pretending the sub-agent path already worked.

That sounds mundane, but that's the point. The system got better not because we discovered a magic prompt, but because we tightened the contracts around how it actually runs.

That was the point where the support copilot stopped being a design exercise and started feeling like software.

One thread from Part 1 that doesn't resurface here: the confidence formula (FinalConf = sigmoid(Support - Conflict) × AgreementMultiplier) and the self-improvement loop through expertise YAML updates. The formula survived and is tested (Lesson 7).

The next post is about productionization: Teams timeouts, async replies, attachments, blob storage, containerization, Azure deployment, and the IT-admin realities that only show up once you leave the notebook.

How I designed a multi-agent AI support copilot

Eelco Los — Fri, 27 Mar 2026 14:21:37 +0000

Every support ticket starts the same way at our SaaS company: open the ticket, scan the description, then spend the next 15 minutes manually gathering context across five different systems. Check application telemetry for exceptions. Look up the customer in the CRM. Grep provisioning logs for failed sync events. Search work item systems and source control for related bugs. Cross-reference identity policy config if authentication is involved.

That context is all out there. It's just scattered. By the time you've assembled it, you've already spent most of your time budget on information retrieval, not on reasoning.

That's what I set out to fix. This is the first post in a series about the multi-agent AI support copilot.

The idea: a side-by-side AI copilot, not a replacement

The typical AI support story starts with a ticket routing chatbot or an automatic responder. That's not this. We're not changing the helpdesk agent's job or asking the model to speak to the customer. Instead, we start a background process alongside the human support agent. The moment a ticket arrives, the copilot gathers context from the surrounding systems and checks whether that evidence corroborates, weakens, or contradicts what the customer is reporting.

The first milestone is deliberately modest from the product side: show the ticket and the supporting evidence side by side so the support agent can reason faster with better context. Under the hood, the implementation already goes further and can synthesize ranked hypotheses with confidence scores, but I still want the first user-visible win to be evidence the human can inspect. For now, the human stays in charge.

Everything that touches the customer still requires a human to approve it.

The design had three hard requirements:

Parallel evidence gathering: all domains queried at the same time, not sequentially
Structured outputs: every agent returns validated JSON, not prose that needs re-parsing
Human-in-the-loop for every action: the copilot informs judgment, humans approve, deterministic code acts

Where the design came from

The initial architecture sketch came from a long ChatGPT conversation, the kind where you brain-dump a problem and the model helps you think through the components. That session produced the five-agent skeleton: identity resolution, observability telemetry, provisioning logs, work items, and a synthesis layer. It also produced the confidence arbitration formula:

FinalConf = sigmoid(Support - Conflict) × AgreementMultiplier

Where Support and Conflict are weighted sums of evidence claims, each claim weighted by the originating agent's reliability (R_agent) and the claim's local confidence.

The second source was internal: a reusable agentic scaffolding template we use for experiments. It had a graded architecture philosophy that mapped almost exactly to what we needed:

Grade	Template capability	What we needed
1	`CLAUDE.md` session memory	SupportAgent persona + IncidentContext schema
2	Domain expertise YAML files	Mental models per support domain (APM, CRM, SCIM, B2C, ALM)
3	Skills (`SKILL.md` with frontmatter)	Evidence agent skill implementations
4	Closed-loop validation	Re-evidence loop + Reviewer gate
5	Orchestration (parallel worker dispatch)	Management Agent with parallel evidence dispatch

The template's confidence_score field in expertise YAML files turned out to be exactly our R_agent reliability weight. Agents that update their own expertise files after each incident self-improve over time.

The two-layer repo structure

The template introduced a two-layer structure we adopted verbatim:

support-agent-research/
├── .agentic/              ← domain knowledge layer
│   ├── CLAUDE.md          - SupportAgent persona, session bootstrap
│   ├── memory/            - architecture decisions, ADR log
│   ├── plans/             - living docs for major system design changes
│   ├── expertise/         - per-domain YAML mental models
│   └── specs/             - per-incident IncidentContext files
│
└── .claude/               ← execution layer
    ├── agents/            - agent definitions (.md with frontmatter)
    ├── skills/            - evidence skill implementations (SKILL.md)
    └── settings.json      - hook wiring (Reviewer gate, Policy Gate)

The important part is that .agentic/ is the knowledge layer. It keeps the system's memory between incidents: persona, architecture notes, domain expertise, plans, and incident files. In this article, plans/ just shows that the system's design and build history live in the same place. .claude/ is the execution layer. It turns that knowledge into agents, skills, and hooks. Keeping them separate means you can update domain knowledge without touching execution logic, and vice versa.

How this matches broader agentic patterns

I didn't invent this split in a vacuum. The Techorama 2025 sessions I captured in my notebook pointed at the same shape: a central orchestrator, specialist workers, shared context, and a clean split between knowledge and execution. I then turned that shape into a reusable agentic template, basically a starter kit that bootstraps those layers into a repo and keeps the supporting plans, memory, and delivery notes in one place.

That also matches the plan modes now showing up in current CLIs: they separate thinking from doing. The template goes further by preserving state, specialist workers, and feedback loops, so the agent can plan, execute, and improve without starting from zero each time.

These public repos corroborate the same building blocks. microsoft/skills covers skills, custom agents, AGENTS.md templates, and MCP configs. dotagents and source-agents focus on keeping one canonical instruction set synchronized across tools. claude-reflect turns corrections into durable memory and reusable skills. agnix adds validation gates by linting agent configs before they break workflows. For the orchestrator-worker shape itself, I point to the architecture docs from Anthropic and OpenAI below rather than forcing a weak repo comparison.

Anthropic draws a line between workflows and agents and recommends starting simple. The composable patterns they call out, like routing, parallelization, orchestrator-workers, and evaluator-optimizer loops, are the same kinds of building blocks I ended up using here. OpenAI's Agents SDK makes a similar point by keeping the primitive set small: instructions, tools, handoffs, guardrails, sessions, and tracing. It also separates orchestration done by the LLM from orchestration done in code. That distinction matters to me because I want the model to reason, but I want deterministic code to route work and enforce boundaries. See Anthropic's "Building effective agents", OpenAI Agents SDK docs, and OpenAI Agents orchestration docs.

AGENTS.md fits this pattern too. It's basically a repo-local README for agents, a human-readable place for durable instructions that complements the normal README.md. That's exactly what .agentic/CLAUDE.md is doing for me here. See AGENTS.md.

`.agentic` piece	What it stores	Broader pattern
`.agentic/CLAUDE.md`	Persona, operating rules, bootstrap guidance	Repo-local agent instructions, AGENTS.md
`.agentic/memory/`	Architecture decisions and shared context	Durable instructions and session memory
`.agentic/plans/`	Living record of the system's design and build history	Bootstrap paths, planning, and rollout history
`.agentic/expertise/`	Domain-specific mental models	Specialist workers and routing inputs
`.agentic/specs/`	Incident-scoped state and evidence	Persistent session state, blackboard
`.claude/agents/`	Worker definitions	Handoffs, agent-as-tool
`.claude/skills/`	Narrow deterministic actions	Tools and guardrails
`.claude/settings.json`	Hook wiring and policy gates	Deterministic routing and execution control

That's why I ended up with a central Management Agent, specialized workers, a shared IncidentContext, and deterministic gates. The manager handles routing, the workers stay narrow, and the shared context gives every worker the same incident memory without letting them talk to each other directly. Then the reviewer and policy hooks decide what can actually happen. I want the model to gather evidence and propose answers, but I want code to decide when work is allowed to move forward. That same split shows up in Anthropic's guidance, OpenAI's Agents SDK docs, and AGENTS.md.

+------------------------+      +------------------------+
| .agentic               | ---> | .claude                |
| knowledge layer        |      | execution layer        |
|                        |      |                        |
| - CLAUDE.md            |      | - agents/              |
| - memory/              |      | - skills/              |
| - plans/               |      | - settings.json        |
| - expertise/           |      |                        |
| - specs/               |      |                        |
+------------------------+      +------------------------+

Core design decisions

Before writing a line, we laid out the key architectural bets in decisions.md:

Orchestrator-Worker over Decentralised: agents never pass control to each other. This prevents circular reasoning and cascading hallucinations. The Orchestrator is the only router.
IncidentContext as Blackboard: the shared knowledge base all agents write to and the Synthesis Agent reads from. Enables cross-domain correlation without agent-to-agent communication. "Identity policy changed last week" + "provisioning returned null" + "telemetry shows NullReferenceException" all point to the same root cause.
CLI-first auth: every skill uses the vendor's own CLI (az, gh, acli) for authentication. Skills are credential-free; the bootstrap section validates auth before any query runs.
No LLM both concludes and acts: reasoning and execution are separate. The Policy Gate is deterministic code, not a model.

Orchestrator-Worker at a glance

      Support ticket
            |
            v
     Management Agent
   /   |    |    |   \
  /    |    |    |    \
  v    v    v    v     v
 APM  CRM  SCIM  B2C  ALM
  \    |    |    |    /
   \   |    |    |   /
    v  v    v    v  v
     IncidentContext
            |
            v
     Synthesis Agent

IncidentContext at a glance

IncidentContext
  - intake
  - resolved_identity
  - evidence
  - hypotheses
  - audit_log

That's the design story. In part 2, I'll show what happened when the first live tickets hit the wiring.

References

Cross-boundary communication between desktop and web

Eelco Los — Mon, 23 Feb 2026 10:59:32 +0000

We have a desktop product that customers actively use, and we want to migrate toward a SaaS offering. In practice, that means we need backwards compatibility while we ship new features.

During that transition you often end up in a "hybrid" state: a desktop shell still exists, but more and more UI and logic moves into web technology (for example hosted inside WebView2).

That hybrid state introduces a core challenge: how do you preserve everyday interactions across boundaries? Drag & drop, keyboard copy/paste, focus, selection, and other "it just works" behaviors tend to break the moment parts of the UI live in different browsing contexts (iframes/windows) or even in a host process.

Demo repo (reference implementation): https://github.com/EelcoLos/iframe-dnd-demo
Live demo: https://eelcolos.github.io/iframe-dnd-demo/

A pragmatic way to do incremental delivery is to introduce new modules behind explicit boundaries:

embed legacy/new pieces via IFrame
split experiences into separate windows when the host is desktop (or when multi-monitor workflows help)
use Web Components to build reusable UI pieces without betting on one framework

The next question becomes: how do those pieces communicate so interactions (drag & drop, keyboard copy/paste) still work across boundaries. And how does it bridge to the desktop host?

This post outlines a simple architecture: message passing all the way down, but with clear layers:

DOM messaging for iframe ↔ parent (window.postMessage)
Cross-window transport when needed (BroadcastChannel with fallbacks)
WebView2 web messaging for web ↔ host (window.chrome.webview.postMessage)

+------------------------ Desktop host (WebView2) -------------------------+
| WebMessageReceived  <---  chrome.webview.postMessage(...)                |
|        ^                                                                 |
|        |  CoreWebView2.PostWebMessageAsJson/String(...)                  |
|        |                                                                 |
|  +---------------- Parent web shell (coordinator) --------------------+  |
|  | iframe <-> parent: window.postMessage                              |  |
|  | cross-window (optional): BroadcastChannel                          |  |
|  |   fallback: postMessage relay via coordinator (Firefox ETP)        |  |
|  +--------------------------------------------------------------------+  |
+--------------------------------------------------------------------------+

In the web layer, the parent shell acts as a coordinator:

routes messages between iframes
handles cross-iframe pointer interactions (coordinate conversion, hit-testing)
holds shared state (e.g., clipboard-like state for keyboard copy/paste)

Then the parent shell optionally bridges certain messages to the desktop host.

Layer 1: iframe ↔ parent (DOM `postMessage`)

This is regular browser messaging:

set an explicit targetOrigin (avoid '*')
validate the sender origin on receive

In the demo, this routing enables:

pointer-based drag & drop across iframes
keyboard copy/paste across iframes

Layer 2: parent ↔ desktop host (WebView2 WebMessage)

WebView2 provides a separate channel:

JS → host: window.chrome.webview.postMessage(...)
host → JS: CoreWebView2.PostWebMessageAsJson/String(...)
JS receive: window.chrome.webview.addEventListener('message', ...)

Microsoft’s docs emphasize treating web content as untrusted and validating origins / message payloads.

Message contracts: be explicit (the “action/type” field)

The implementation choice I like most in this demo: every message carries a clear discriminator so the receiver can route behavior.

Web ↔ web (iframes/windows): `type`

In the browser-to-browser layer, the repo uses type values like dragStart, parentDrop, itemCopied, requestPaste, etc.

Example shapes from the repo’s API.md:

{ "type": "dragStart", "text": "Item 1", "id": "1", "source": "frame-a" }

{ "type": "parentDrop", "x": 123, "y": 456, "dragData": { "id": "1" } }

If you want versioning too, you can wrap that idea:

{
  "v": 1,
  "type": "itemCopied",
  "source": "frame-a",
  "payload": { "itemData": { "id": "1" } }
}

Web ↔ host (WebView2): `action`

For the WebView2 host bridge, the demo uses an action field for the same purpose (route on action in WebMessageReceived).

Example shape:

{ "action": "copy", "description": "Item", "quantity": "12", "unitPrice": "450" }

Contract in action (from the demo repo)

JS side (from public/webcomponent-table-source-html5.html) posts JSON strings to the WebView2 host:

// If running in WebView2 (C# host), also send copy to native code
if (window.chrome && window.chrome.webview) {
  window.chrome.webview.postMessage(JSON.stringify({
    action: 'copy',
    description: selectedRow.dataset.desc,
    quantity: selectedRow.dataset.qty,
    unitPrice: selectedRow.dataset.price
  }));
}

The same page also posts other actions like dragstart, dragend, and (on double-click) drop.

C# side (from WebView2App/HybridModeWindow.xaml.cs) receives and routes based on action:

WebViewSource.CoreWebView2.WebMessageReceived += CoreWebView2_WebMessageReceived;

private void CoreWebView2_WebMessageReceived(object? sender, CoreWebView2WebMessageReceivedEventArgs e)
{
    string? message = null;

    try
    {
        message = e.TryGetWebMessageAsString();
    }
    catch (ArgumentException)
    {
        // Message is not a string, try getting it as JSON
        message = e.WebMessageAsJson;
    }

    // Message format examples:
    // {"action":"drop","description":"...","quantity":12,"unitPrice":450}
    // {"action":"copy","description":"...","quantity":"12","unitPrice":"450"}
    if (string.IsNullOrEmpty(message)) return;

    using var json = System.Text.Json.JsonDocument.Parse(message);
    var root = json.RootElement;

    if (!root.TryGetProperty("action", out var actionProp)) return;
    var action = actionProp.GetString();

    if (action == "drop")
    {
        // parse description/quantity/unitPrice and add to the target DataGridView
    }
    else if (action == "copy")
    {
        // store the copied data so Ctrl+V in the target window can paste it
    }
}

Why it matters:

you can be backwards compatible across modules
you can implement request/response via correlationId
you can validate shape (and reject unknown/untrusted messages)

Testing strategy

In-web behavior: Playwright is a good fit, treating iframes and windows as first-class.
- Testing iframe drag and drop with Playwright
Cross-window behavior: add tests that open child pages from a coordinator and assert keyboard interactions.
Host bridge behavior: test separately at the desktop integration layer (WebMessageReceived handlers, navigation/origin checks).

Security checklist (practical)

Validate origins for DOM postMessage: set a strict targetOrigin on send, and on receive validate event.origin (allowlist) and optionally event.source before trusting event.data.
In WebView2, always check the current document origin before trusting messages.
Prefer JSON messages and validate schema.
Disable features you don’t need (host objects, web messaging, scripts).

Sources

Web/native interop: https://learn.microsoft.com/en-us/microsoft-edge/webview2/how-to/communicate-btwn-web-native
WebView2 security: https://learn.microsoft.com/en-us/microsoft-edge/webview2/concepts/security
Frames in WebView2: https://learn.microsoft.com/en-us/microsoft-edge/webview2/concepts/frames
Demo repo: https://github.com/EelcoLos/iframe-dnd-demo

.NET File‑Based Apps for API Prototyping: What Bit Me on First Run

Eelco Los — Thu, 29 Jan 2026 12:03:21 +0000

Using .NET file-based apps (via dotnet run app.cs) enables rapid prototyping and simplified project structure by eliminating project scaffolding.
This feature was announced for dotnet 10 just before .Net Conf 2025: https://devblogs.microsoft.com/dotnet/announcing-dotnet-run-app/

It’s a neat way to try ideas quickly.
I tried to find out if this would work with FastEndpoints to demo an API that way.

Example (FastEndpoints + file-based app)

#:sdk Microsoft.NET.Sdk.Web
#:package FastEndpoints@7.*-*

using FastEndpoints;

var builder = WebApplication.CreateBuilder();

builder.Services.AddFastEndpoints();

var app = builder.Build();
app.UseFastEndpoints();
app.Run();

public record MyRequest(string FirstName, string LastName, int Age);
public record MyResponse(string FullName, bool IsOver18);

public class MyEndpoint : Endpoint<MyRequest, MyResponse>
{
    public override void Configure()
    {
        Post("/api/user/create");
        AllowAnonymous();
    }

    public override Task HandleAsync(MyRequest req, CancellationToken ct) =>
        Send.OkAsync(new($"{req.FirstName} {req.LastName}", req.Age >= 18),
           cancellation: ct);
}

Gotcha: JSON/AOT defaults can break “works with web”

When running with the web SDK in file-based apps, you can hit a runtime startup error like:

System.NotSupportedException: JsonTypeInfo metadata for type
'System.Collections.Generic.IEnumerable`1[System.String]' was not provided by TypeInfoResolver of type
'[AppJsonSerializerContext]'.

This is effectively a System.Text.Json source-generation/AOT mismatch: the app is configured to require generated metadata, but the required root types were not generated.
This 'gotcha' is not pointed out in the dev blog, but one of the first bullet points at the learn article does hint at it:

Key benefits include:

Reduced boilerplate for simple applications.

Self-contained source files with embedded configuration.

Native AOT publishing enabled by default.

Automatic packaging as .NET tools.

Why this happens (NativeAOT + JSON)

With NativeAOT / trimming, reflection is restricted, so System.Text.Json can't reliably do reflection-based (de)serialization.
In practice, you need to provide compile-time metadata via source generation (JsonSerializerContext / JsonSerializable), i.e. custom serialization setup for the types your endpoints bind/return.

Because this is a lot of "serialization bookkeeping" for a single-file prototype, I currently prefer disabling NativeAOT for this scenario.

Fix (per gist): disable AOT

To opt out of the default NativeAOT behavior (and avoid the JSON metadata requirement while prototyping), add this:

#:property PublishAot=false

This runs the app as regular JIT again, so System.Text.Json can use reflection and the error goes away.

You can find the entire gist at:
https://gist.github.com/EelcoLos/1c5f3c6be9ac765719ee880f3dcfec71

References

Fantastic Knowledge & How to Retain It

Eelco Los — Tue, 09 Dec 2025 15:29:44 +0000

In the spirit of giving, consider this a guide to retaining knowledge using a personal knowledge base system. The goal is simple: turn 'aha' moments into captured insights you can reliably reuse. Then, turn those captured insights into reliable recall using everyday practices.

From Thoughts to Notes

When I'm talking about retaining knowledge, I myself think back to the days of note taking in school.

Back then, before I had note-taking apps and wikis, I realized that remembering starts with externalizing. I used to fill the margins of books with notes, underlines, symbols, and even loose inlays. Small, personal anchors to help me retain what mattered. In this guide, we'll look at how to recreate that same kind of durable knowledge retention with modern tools and habits.

From that, I learned a bit of the techniques that still work today.

Techniques That Work

Capture 'aha' immediately: write a note, i.e. marginal notes or drop sticky inlays while interacting with information, especially the things that make you understand what is being written: the 'aha'.
Title + one-sentence thesis: force clarity early so future-you retrieves faster.
Context first: add why it matters.

Example: What note did I create for Copilot Spaces

So, to give an example of what I wrote when I first learned about Copilot Spaces (url: https://github.com/copilot/spaces), I wrote the following:

Within GitHub, Copilot Spaces can serve as focused, persistent workspaces for specific knowledge domains (e.g. "knowledge retainment", "copilot agent"). Each Space aggregates code examples, ADRs, runbooks, exploratory prototypes, and curated prompts, reducing siloed tribal knowledge and enabling asynchronous onboarding.
it does help for [[zettelkasten]] type of knowledge, as it scopes to the topic itself

It shows the 'aha', for it being a shareable topic scoped option for GitHub regarding knowledge retainment. Especially use of aggregate code examples, ADRs, etc.
the 'Why it matters' is addressed as well: it helps the Zettelkasten type of knowledge, scoping it to topic.
As title, I defined this as 'copilot-spaces-can-be-topic-specific-knowledge-base'.

Now, these techniques make you have a bunch of notes, but these by themselves are a ball-of-mud. So, after this, we want to think of mapping these thoughts.

Why Mind Mapping Still Matters

Mind mapping is a brainstorming technique that organizes information hierarchically around a central topic

The brainstorming technique on the topic can be disjointed from the information retrieval, like a conference. Though you can brainstorm on the topic while being present on the session too.
At that moment, you can bind the information of the retrieval session to an idea.
To have this information structured in a general way of thinking, we can use a system called 'Zettelkasten'.

Zettelkasten: Core Workflow Anatomy

The Zettelkasten method (German for 'slip box' or 'note box') is a system for personal knowledge management and note-taking, developed by the German sociologist Niklas Luhmann. Its purpose is to capture and connect ideas in a dynamic network that fosters creative thinking and the generation of new insights, rather than simply collecting information.

The 'zettel' (slip) is a note in a 'kast' (box) to organize (parts of) a topic. As you build more and more notes an organized box can look like this:

A Zettelkasten note can have a lot of information on it. what it primarily should do is to help you structure the earlier created note in a way that it can be found and that it finds its way into other information you found before.

The body is still here and still the most important. It's just being added with metadata in the header, such as the unique identifier title and tags regarding which topics it's close to.
A footer can be used; it's main regarding source of knowledge.

Creating a web of information using links

The earlier created title in the example shows usage here in the 'linkage'. As shown in the example above, you can use double brackets to reference a link to another note. These connections are in a way your 'oh, this thing refers to another'. For me, the earlier usage of 'copilot-spaces-can-be-topic-specific-knowledge-base', makes the searching of such notes easier, as it would start or complete a (phrase of a) sentence.
When you get to connect a decent chunk of these cards, you can probably also say something about the higher level topic. To me, in this regard of the title just now, it could be regarding this blog: writing about how Zettelkasten works for me. It wouldn't be an entire blog, but just some keywords for you to write what things are about and how that connects on a grand stage.

When would you use it then?

I've been asked as well: well, nice and all, but storing a lot and not using it, is just hoarding.
All my blogs so far are mainly made of those 'aha' moments in one way or another. Furthermore, I can think of your observability log queries that you might need from time to time to be those 'aha' or 'don't google this over and over again' sections. I've lately been saving Application Insights KQL queries. So, applying it to recall occasionally used information is a good use-case.

Should you carry a notebook with you at all times learning about this? Maybe so, maybe not. It doesn't have to be your main retainer of notes though. We have tools to help store them digitally.

Storing Notes Digitally (Obsidian, Notion)

For Zettelkasten-style notes, Obsidian and Notion both work well: Obsidian excels with local Markdown and fast link creation; Notion adds flexible databases and views. My setup leans toward Markdown for speed, with occasional Obsidian views for structure notes that index neighborhoods of related ideas. Whatever you choose, keep notes atomic, use clear titles, and maintain lightweight structure notes rather than sprawling folders.

My experience transferring knowledge from some keywords or one sentence in a knowledge base system, and have it already have links, etc. is really hard. Too hard to maintain yourself, I would argue.

Using AI to help your knowledgebase

Then I noticed a YouTube video by NetworkChuck titled 'You've Been Using AI the Hard Way'. In it, he describes the use of AI as an accelerator. The CLI tools like Gemini CLI, Copilot and Claude might focus on code by default, but it is never stated that they should solely be used for it.
In fact, if you pass a small instruction file to those CLI tools, you can explain that you'd like the output in a Zettelkasten form. This way, you can leverage the 'aha' moment even more: state the topic and 'aha' to the LLM and it'll create the note for you.

AI can also help as a research tool. For this, you could also check out NotebookLM from Google. Technically, you should be able to have the same result via CLI. However, I tried to get a training regarding musical world building with a dozen YouTube, PDF and website sources. Here, NotebookLM does provide value over cli in my humble opinion. It could be because it will use the entire transcripts of YouTube videos, whereas the CLI will only use your markdown notes.

References

Niklas Luhmann - Slip-box Method (original practice descriptions and archival analyses). also https://zettelkasten.de/introduction/ has a lot of information on this topic.
NetworkChuck - "You've Been Using AI the Hard Way (Use This Instead)" (video): https://www.youtube.com/watch?v=MsQACpcuTkU
Copilot Spaces - Topic-scoped knowledge workspaces: https://docs.github.com/en/copilot/concepts/context/spaces
NotebookLM - Using external sources (e.g., videos) to augment a personal knowledge base: https://notebooklm.google/

Personal Experience Highlights

So, all in all, my highlights on retaining knowledge are:

Topic-scoped workspaces (e.g., Copilot Spaces) function as focused knowledge bases; scoping artifacts and prompts per topic improves retrieval and supports a Zettelkasten-like flow. In Obsidian vaults these can be folders.
CLI-driven AI (e.g. Gemini CLI) is faster than browser UIs and works directly with local Markdown notes, keeping ideation and distillation tightly integrated.
Experimenting with NotebookLM and AI CLI to incorporate external sources (like videos) into the graph, followed by manual distillation into atomic notes and vetted links.
Treat the system as an active thinking partner: keep notes atomic, rewrite often, and periodically refresh structure notes that organize neighborhoods.
Tooling matters: VS Code with lightweight mind-mapping add-ons accelerates the map-to-note funnel without turning maps into storage.
Converting quotes into single-idea notes and linking back to an overview improves reuse and prevents annotation bloat.

Closing

A healthy Zettelkasten is less a knowledge base 'warehouse' and more a conversational partner. Mind maps and AI copilots amplify its adaptability when used as catalysts, not crutches. The goal is compound insight: small, precise notes that keep paying dividends across projects and time.

Lessons learned implementing SCIM with Microsoft Entra and the SCIM Validator

Eelco Los — Fri, 28 Nov 2025 15:00:54 +0000

I had to redo the entire SCIM validator journey after we've migrated to a new company. This article shares the practical lessons from that rework: tightening concurrency, adding hybrid caching, clarifying when /Schemas actually matters and structuring validation runs so progress is repeatable instead of guesswork. SCIM still promises automated provisioning from an IdP, but the path to a production-grade, Entra-compatible implementation is about disciplined iteration, not just “passing the tests.”

Why SCIM

Once integrated, lifecycle events (create, update, deactivate) flow from your IdP without manual admin work, improving security posture and reducing bespoke connector logic. Interoperability hinges on spec fidelity: consistent status codes, predictable resource representation, correct filtering, and pagination behavior.

Train of thought

The Microsoft SCIM Validator applies stricter rules for connectors intended for publication in the Microsoft Entra app store. If targeting store publication, aim for full compliance across correctness, concurrency, and performance; if not, be mostly compliant with RFC 7643/7644 and Entra guidance while prioritizing reliability and pragmatic trade-offs.

Validation strategy

Start with the synchronous (“preview”) validator suite to establish deterministic behavior. This path exposed spec gaps cleanly (status codes, PATCH semantics, schema responses) without concurrency noise.
Tackle the parallel/asynchronous suite only after idempotency and locking are solid. Rapid sequences (like PATCH immediately after CREATE) surfaced race conditions when write visibility lagged behind reads.
Use slight pacing as a diagnostic tool (about 300ms between requests) to reduce false negatives from artificial races; remove pacing once your server proves truly concurrent-safe.

Caching

Essential for Entra-compatible performance. The validator repeatedly fetches stable resources (e.g., /Schemas, frequent user/group reads), so hitting your datastore for every request becomes a bottleneck and tempts reimplementation of local validator logic.

Internal/dev: a simple in-memory cache (e.g., IMemoryCache) suffices for single-instance development.
Production/multi-instance: a hybrid model with in-memory plus a distributed cache (e.g., Redis) improves throughput and keeps repeated fetches fast across instances.
Cache stable data (schemas, group memberships that change infrequently) aggressively; respect cache invalidation for writes and reflect updated etags/versioning correctly.

Concurrency and idempotency

Make PUT/PATCH idempotent — retries from the validator should not corrupt state or produce inconsistent responses.
Adopt optimistic concurrency (versioning or etags) and enforce If-Match semantics where applicable, so conflicting updates are clear and recoverable.
Normalize case for attributes (userName, emails) to avoid equality-test failures, and ensure multi-valued attributes return deterministic ordering or include explicit primary markers.
Throttle intentionally with 429 Too Many Requests and include Retry-After; avoid generic 500s under load which obscure recoverable capacity issues.

Spec coverage priorities

Implement /Schemas early if targeting Microsoft Entra store publication (the validator uses it for capability discovery). For non-store implementations, /Schemas is not required; focus on RFC 7643/7644 compliance and core provisioning endpoints.
Return precise status codes: 200/201/204 for success paths, 400 for bad requests, 404 for unknown resources, 409 for conflicts, 412 for precondition failures, and 429 for throttling.
Gradually flesh out filtering and pagination. Support the basics first (_startIndex, count, filter eq/and), then expand to more complex filters once stability is proven.
Ensure consistent resource representations and stable IDs; avoid transient fields that vary run-to-run.

Local exposure & tunneling

High-fidelity SCIM validation requires the external validator to reach your local endpoint reliably. Lightweight tunnels (e.g., ngrok) are fast to start but their free/low tiers impose connection/session limits that parallel SCIM runs can exhaust. Azure Dev Tunnels (devtunnel) offer longer-lived, higher-volume sessions with configurable access, reducing flakiness during heavy validator cycles.

Example commands:

# Start local app on http://localhost:5010 (HTTP)
ngrok http 5010

# Azure Dev Tunnel for HTTPS endpoint on port 5011 (longer-lived, higher volume)
devtunnel host -p 5011 -a --protocol https

Notes:

Use a consistent port strategy (5010 HTTP dev, 5011 HTTPS) to simplify validator configuration.
Regenerate tunnels per session; do not reuse stale URLs in cached validator runs.

Tunneling checklist:

Choose ngrok for quick, short sessions; switch to devtunnel for sustained or parallel test runs.
Prefer HTTPS tunnel endpoints to match production security and avoid mixed content.
Capture and inject the tunnel URL into validator configuration (SCIM base), then tear down afterward.
Monitor for throttling/timeouts; change provider if limits introduce test noise.
Avoid exposing non-SCIM debug endpoints; scope the tunnel to required routes only.

Operational tips

Validate your SCIM base URL and TLS setup; the Microsoft SCIM Validator needs a reachable, secure endpoint with correct schema and auth.
Log at correlation/request IDs to tie parallel validation threads back to server decisions, especially when diagnosing races or caching effects.
Document supported attributes and mapping expectations for IdP admins; clarity reduces misconfiguration and brittle assumptions.

Recommended path to success

Decide early whether your connector targets Microsoft Entra store publication; choose the validator implementation accordingly (store-level: stricter; pass all, general: mostly compliant; pass most).
Pass the synchronous validator suite to establish a clean baseline and identify spec gaps without concurrency noise.
Add hybrid caching and optimistic concurrency to harden performance and correctness (idempotent PUT/PATCH, etags/If-Match, 429 with Retry-After).
Run the parallel suite with slight pacing (~300ms) to isolate true concurrency defects, then remove pacing once stable.
Re-run under realistic concurrent load and verify determinism in responses, versions, and resource representations.

Closing thoughts

SCIM success is less about “just pass the tests” and more about disciplined concurrency, idempotency, and caching that make those tests repeatably pass at scale. Treat the synchronous suite as the baseline for correctness, and the parallel suite as the proving ground for production readiness.

References

SCIM Validator: https://scimvalidator.microsoft.com/
Microsoft Entra SCIM guide: https://learn.microsoft.com/en-us/entra/identity/app-provisioning/use-scim-to-provision-users-and-groups
SCIM RFC 7643 (Schema): https://www.rfc-editor.org/rfc/rfc7643
SCIM RFC 7644 (Protocol): https://www.rfc-editor.org/rfc/rfc7644

Where coding agents excel (and where they don't)

Eelco Los — Thu, 06 Nov 2025 07:04:52 +0000

This is a first-person, hands-on writeup of how I experienced using Copilot coding agents today. I include examples, a gotcha I hit while using it, and a short checklist so you can try it in your repos.

Why I started using Copilot coding agents

I started experimenting with Copilot coding agents because I wanted to scaffold a lot of things which IDE-based agents do not, and to see what an assistant could do for me (refactoring scaffolding, creating test harnesses, running quick migrations) and actually execute code in a prepared environment instead of just suggesting edits. Over the last few months I tested agents on various projects and refined a small set of rules that helped me experience the current possibilities of delegating tasks to an agent.

Beyond IDE Agents: What Copilot Coding Agents Can Do

Unlike traditional IDE-based agents, Copilot coding agents can perform a broader range of actions, including:

Branch Creation: Automatically create new branches for development tasks.
Environment Setup and WIP PRs: Set up development environments and initiate Work-In-Progress (WIP) Pull Requests on GitHub.
Task/Issue Deduction: Understand and deduce tasks from issues or descriptions. Those tasks will be put in the WIP PR description.
Building and Running Tasks: Execute build processes and run various development tasks defined in earlier steps.
WIP PR to Draft PR: Transition a WIP Pull Request to a draft Pull Request with a pre-filled body.

Where coding agents excel (and where they don't)

Based on my experience, here's where coding agents currently shine and where they struggle:

✅ Good at:

Little tasks: Small, well-defined tasks are handled well.
Dependency updates: Updating dependencies is a task that agents can perform reliably.

⚠️ Okay at:

Tasks with a lot of moving parts: For example, major refactors can be challenging for agents to handle on their own.

❌ Terrible at:

Major UI things: User interface work is not a strong suit for coding agents at the moment.

How agents run in practice: the `copilot-setup-steps.yml` convention

The typical flow I use now is:

Create .github/workflows/copilot-setup-steps.yml on the repository's default branch.
In that workflow, prepare everything the agent needs: runtimes, credentials (via environment secrets), databases, caches.
Let the agent run against that prepared environment: the agent then performs code changes, runs tests, and can open PRs as usual.

Treat copilot-setup-steps.yml as a minimal pre-flight script: it should be fast, idempotent, and only prepare what the agent actually needs.

Minimal example I use as a starting point

name: GitHub Copilot Setup

on: workflow_dispatch

jobs:
  copilot-setup-steps:
    name: Setup Environment for GitHub Copilot
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
    environment: copilot
    steps:
      - name: Setup dotnet
        uses: actions/setup-dotnet@v5
        with:
          dotnet-version: 9.0.x
      - name: checkout
        uses: actions/checkout@v5
      - name: Configure NuGet sources
        run: |
          if [[ -n "${{ secrets.PACKAGES_PAT }}" ]]; then
            dotnet nuget update source "github" --username this-is-irrelevant --password ${{ secrets.PACKAGES_PAT }} --store-password-in-clear-text
          else
            echo "Could not find Packages PAT"
            exit 1
          fi
      - name: Azure login
        uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

Notes from my experience:

the job name should be copilot-setup-steps and file copilot-setup-steps.yml exactly. others are ignored.
Keep this workflow quick — long boot times mean more waiting (and, on paid plans, more compute cost).
remember to have permissions regarding checkout and your login. Respectively contents: read and id-token: write.

Secrets, environments, and the things that bit me

I use a repository Environment named copilot and put environment variables and secrets there instead of raw repository secrets. In practice put credentials the agent legitimately needs into the copilot environment in the repo settings. Limit who can modify that environment. You might need to put organization credentials here as well. This mindset seems to be changing from time to time. Earlier, this wasn't necessary, now it is.

A gotcha I ran into and how I handled them:

Prerequisites not always picked up: Even when I put prerequisites steps in a GitHub issue, the agent wouldn't always pick them up. I found I needed to refer to the prerequisites in between every step. Luckily, with the new steering capabilities, I could guide the agent mid-session to ensure it followed the necessary steps.

A practical checklist (so you can try this quickly)

Add .github/workflows/copilot-setup-steps.yml to your default branch.
Add a repo environment named copilot and add environment variables / environment secrets used by the setup workflow.
Keep the setup job fast: install only what is necessary and cache dependencies.
Consider self-hosted runners if you need internal network access or want to keep secrets entirely on-prem.
Add a short AGENTS.md describing rules (what an agent may change, code style, testing expectations).

Defining Agent Instructions: The `AGENTS.md` Approach

To ensure your Copilot coding agent operates effectively and adheres to project standards, it's crucial to provide clear and explicit instructions. A common practice is to create an AGENTS.md file (or copilot-instructions.md as I initially used) in your repository to house these guidelines. These instructions act as guardrails, directing the agent's behavior and ensuring consistency.

Here are some examples of instructions I've used to guide my Copilot agent (rename 'Test' and 'Production' to your own purposes):

*   **Azure Best Practices:** When working with Azure, always invoke the `azure_development-get_best_practices` tool if available to ensure adherence to best practices.
*   **E2E Test Rules:** Ensure End-to-End (E2E) tests do not contain `.only` in `.describe` or `.it` blocks. Remove any instances found.
*   **Coding Standards:** Apply specific coding standards for languages like C#, Bicep, Dockerfile, and TypeScript by following guidelines in their respective instruction files (e.g., `./.github/instructions/csharp.instructions.md`).
*   **Azure Environment Rules:** Adhere to strict rules for Azure deployments:
    *   Use 'Test' for Test deployments and 'Production' for Production deployments.
    *   Never deploy to test and production simultaneously.
    *   Production deployments require PIM (Privileged Identity Management) role activation.
    *   Prefer updating Bicep templates over direct Azure CLI modifications for infrastructure changes.
*   **ADR Adherence:** Before implementing changes, review accepted Architectural Decision Records (ADRs) in the `docs/design-decisions` directory, ensuring their status is "Accepted" and considering their consequences.
*   **PR Creation:** When creating a Pull Request, include a reference to an Azure Boards work item using the format `[AB#12345]` above any headings in the PR description.

Pitfalls & advice from real runs

Explore the capabilities of coding agents, with boundaries: Know what it's good at (the small things first). Then, as you start to build your repo with instructions, gradually increase the hand-off to coding agent.
Write clear tasks: the agent will be much more useful if you give it a ticket with clear acceptance criteria and a test it must pass. I started using a PR template specifically for agent-generated PRs so reviewers know what to focus on.
Review everything: agents will make suggestions that look plausible but can introduce subtle mistakes. Always review and run the test suite locally.

Closing thoughts

I like how the combination of mission control (Agent HQ) + explicit pre-flight workflows made agents useful rather than noisy. The plan mode and steering affordances gave me the confidence to run medium-length tasks without babysitting everything.

At the same time, good repository hygiene is now essential: lock permissions on the copilot environment, and maintain a short AGENTS.md documenting what agents should and should not do. If you need absolute control over secrets or internal network access,

Please try it 3 times before judging. Share your thoughts in the comments below.

Why Your ASP.NET Core LogLevel 'Warning' Still Sends Information Logs to Application Insights

Eelco Los — Mon, 29 Sep 2025 05:32:39 +0000

I first noticed this pattern while helping a teammate trim noisy telemetry costs. We had set the global logging level in an ASP.NET Core app to Warning. We redeployed. Yet the Azure Portal continued to show a steady stream of Information traces arriving from the same service. It felt like the platform was ignoring us. It wasn’t. We were ignoring a subtle layering rule.

The puzzle

Configuration (simplified):

"Logging": {
  "LogLevel": {
    "Default": "Warning"
  },
  "ApplicationInsights": {
    "LogLevel": {
      "Default": "Information"
    }
  }
}

Everyone expected Warning to be the minimum loglevel. Application Insights kept getting Information entries. The instinct is to blame the SDK. The real cause is that provider specific configuration lowers the threshold just for that provider. The Application Insights logger never re‑applies a minimum. It faithfully forwards what the Microsoft.Extensions.Logging infrastructure lets through. That infrastructure had already been told: for the provider whose alias is ApplicationInsights(Abbreviated: AI), allow Information.

How the layers actually line up

Think of the journey of a log:

Your code calls logger.LogInformation("User logged in").
The generic logging infrastructure consults all configured filters. A global default says Warning. A provider override for ApplicationInsights says Information. Because the target is that provider, Information is allowed.
The Application Insights logger receives the entry. It calls its own IsEnabled. That only checks two things: the level is not None and telemetry has not been globally disabled.
The logger maps LogLevel.Information to SeverityLevel.Information and sends a TraceTelemetry.
Optional sampling or processors may still drop it downstream, but cost has already been incurred in your process and often in ingestion.

No secret minimum. No hidden widening. Just configuration precedence.

The confirmation in code

IsEnabled inside the AI logger:

public bool IsEnabled(LogLevel logLevel)
{
    return logLevel != LogLevel.None && this.telemetryClient.IsEnabled();
}

Level mapping (translation only):

private static SeverityLevel GetSeverityLevel(LogLevel logLevel)
{
    switch (logLevel)
    {
        case LogLevel.Critical: return SeverityLevel.Critical;
        case LogLevel.Error: return SeverityLevel.Error;
        case LogLevel.Warning: return SeverityLevel.Warning;
        case LogLevel.Information: return SeverityLevel.Information;
        case LogLevel.Debug:
        case LogLevel.Trace:
        default: return SeverityLevel.Verbose;
    }
}

TelemetryClient.IsEnabled():

public bool IsEnabled()
{
    return !this.configuration.DisableTelemetry;
}

So if telemetry is not disabled, everything that passed filtering is shipped.

Where teams stumble

A provider override left behind by a template or copy paste. A category override intended for console logs but applied broadly. Or sampling giving the illusion that filtering is working because only some Information entries survive. All of these hide the fact that the AI provider was fed the lower level in the first place.

Making your minimum level actually stick

Remove the provider override if you do not want it:

"Logging": {
  "LogLevel": {
    "Default": "Warning"
  }
}

Or explicitly align it so future contributors see intent:

"Logging": {
  "LogLevel": {
    "Default": "Warning"
  },
  "ApplicationInsights": {
    "LogLevel": {
      "Default": "Warning"
    }
  }
}

If you prefer a programmatic assertion:

builder.Logging.AddFilter<
    Microsoft.Extensions.Logging.ApplicationInsights.ApplicationInsightsLoggerProvider>(
    category: string.Empty,
    level: LogLevel.Warning);

Fine grained category tuning still works:

builder.Logging
    .AddFilter("Microsoft", LogLevel.Warning)
    .AddFilter("MyApp.NoisyComponent", LogLevel.Error)
    .AddFilter<Microsoft.Extensions.Logging.ApplicationInsights.ApplicationInsightsLoggerProvider>(
        "MyApp.Important", LogLevel.Information);

If you must drop lower severity traces after they enter the pipeline (for example, during a temporary diagnostic burst), a telemetry processor can discard them, but remember you are paying the cost of generating them:

public class MinimumSeverityProcessor : ITelemetryProcessor
{
    private readonly ITelemetryProcessor _next;
    public MinimumSeverityProcessor(ITelemetryProcessor next) => _next = next;

    public void Process(ITelemetry item)
    {
        if (item is TraceTelemetry tt &&
            tt.SeverityLevel.HasValue &&
            tt.SeverityLevel < SeverityLevel.Warning)
        {
            return;
        }
        _next.Process(item);
    }
}

Registering a small factory (pattern varies by version) wires it in. Use this sparingly.

How to verify instead of assuming

Add console logging side by side. Emit one log at every level:

logger.LogTrace("T");
logger.LogDebug("D");
logger.LogInformation("I");
logger.LogWarning("W");
logger.LogError("E");
logger.LogCritical("C");

In the Azure Logs query editor:

traces
| where message in ("T","D","I","W","E","C")
| project timestamp, message, severityLevel
| order by timestamp desc

If you see Information rows and you did not intend to, search your configuration for a provider or category override before blaming sampling or the SDK.

A minimal repro and its repair

Broken:

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddApplicationInsightsTelemetry();

builder.Logging.ClearProviders();
builder.Logging.AddApplicationInsights();
builder.Logging.AddConsole();

builder.Configuration.AddInMemoryCollection(new Dictionary<string,string?>
{
    ["Logging:LogLevel:Default"] = "Warning",
    ["Logging:ApplicationInsights:LogLevel:Default"] = "Information"] // silent lowering
});

var app = builder.Build();

app.MapGet("/", (ILogger<Program> log) =>
{
    log.LogInformation("This still goes to AI");
    log.LogWarning("This also goes to AI");
    return "Hi";
});

app.Run();

Repaired:

builder.Configuration.AddInMemoryCollection(new Dictionary<string,string?>
{
    ["Logging:LogLevel:Default"] = "Warning"
});

Or declare the intended loglevel again via AddFilter.

Useful quick snippets

An appsettings template:

{
  "ApplicationInsights": {
    "ConnectionString": "InstrumentationKey=00000000-0000-0000-0000-000000000000"
  },
  "Logging": {
    "LogLevel": {
      "Default": "Warning",
      "MyApp.ImportantArea": "Information"
    }
  }
}

A runtime toggle (handy for experiments, not production best practice):

app.MapGet("/toggle-ai", (TelemetryConfiguration cfg) =>
{
    cfg.DisableTelemetry = !cfg.DisableTelemetry;
    return $"Telemetry enabled: {!cfg.DisableTelemetry}";
});

When a provider override is the right choice

You might intentionally gather richer telemetry centrally while keeping local console lean. You might temporarily elevate verbosity during an incident. You might selectively keep a high value category at Information in AI while leaving everything else at Warning. All valid, provided the difference is deliberate and documented.

Final pre deploy pass

Read through your combined logging configuration. Search for ApplicationInsights under Logging. Confirm any category lines that lower severity truly need to. Confirm sampling configuration matches what you expect. Issue a burst of test logs and verify what surfaces in the portal.

Closing

Nothing magical forced those Information traces through. Configuration precedence invited them. Once you internalize the path, debugging level mismatches becomes quick and boring. That is exactly what you want in an observability foundation.

What are your experiences? Let them know in the comments below

Using FakeLoggerProvider (and ILoggerFactory) in FastEndpoints

Eelco Los — Fri, 05 Sep 2025 12:31:12 +0000

In the first post we focused on FakeLogger<T> for simple, targeted logger assertions.

This follow-up goes a layer deeper: capturing all logging via FakeLoggerProvider, optionally wiring it into an ILoggerFactory, and asserting over snapshots (not just the latest record). This is especially useful when:

Multiple categories log during a request
Ordering matters (e.g. warning → error path)
You want to assert absence or presence of specific patterns
You're testing pipeline-style frameworks (here: FastEndpoints)

All examples use .NET 8+ and Microsoft.Extensions.Diagnostics.Testing.

When to use FakeLoggerProvider instead of FakeLogger

Need	Use
Just assert one logger category	`FakeLogger<T>`
Observe everything (multiple categories)	`FakeLoggerProvider`
Need a factory (framework constructs loggers)	`ILoggerFactory` + provider
Query all log records with LINQ	`provider.Collector.GetSnapshot()`

FakeLoggerProvider acts like any other logging provider. It gathers every LogRecord. You can get:

Collector.LatestRecord (single)
Collector.GetSnapshot() (immutable list)
Structured state (record.StateValues)
Exceptions (record.Exception)
Scopes (record.Scopes)

Minimal example (provider + factory)

// FastEndpoints test sample
// NuGet: Microsoft.Extensions.Diagnostics.Testing

using FastEndpoints;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Testing;
using Xunit;

namespace Demo.LoggingTests;

public class GetThingEndpoint : EndpointWithoutRequest<string>
{
    private readonly ILogger<GetThingEndpoint> _log;
    public GetThingEndpoint(ILogger<GetThingEndpoint> log) => _log = log;

    public override void Configure() => Get("/api/things/{id}");

    public override Task HandleAsync(CancellationToken ct)
    {
        var id = Route<string>("id");
        if (string.IsNullOrWhiteSpace(id))
        {
            _log.LogWarning("Empty id provided");
            ThrowError("Invalid id");
        }

        _log.LogInformation("Fetching thing {ThingId}", id);
        _log.LogDebug("Repository call starting");
        _log.LogInformation("Returning thing {ThingId}", id);
        return SendAsync(id!, cancellation: ct);
    }
}

public class GetThingEndpointTests
{
    [Fact]
    public async Task Logs_All_Expected_Messages_In_Order()
    {
        var provider = new FakeLoggerProvider();

        var ep = Factory.Create<GetThingEndpoint>(svc =>
            svc.AddTestServices(services =>
            {
                services.AddSingleton<ILoggerProvider>(provider);

                // If the framework builds loggers via ILoggerFactory, supply one:
                services.AddSingleton<ILoggerFactory>(sp =>
                    new LoggerFactory([sp.GetRequiredService<ILoggerProvider>()]));
            }));

        await ep.HandleAsync("ABC123", CancellationToken.None);

        var records = provider.Collector.GetSnapshot()
            .Where(r => r.CategoryName == typeof(GetThingEndpoint).FullName)
            .OrderBy(r => r.Timestamp)
            .ToList();

        Assert.Collection(records,
            r => Assert.Equal(LogLevel.Information, r.Level),
            r => Assert.Equal(LogLevel.Debug, r.Level),
            r => Assert.Equal(LogLevel.Information, r.Level));

        Assert.Contains(records, r => r.Message.Contains("Returning thing"));
        var structured = records.First(r => r.Message.StartsWith("Fetching thing"));
        Assert.Contains(structured.StateValues, kv => kv.Key == "ThingId" && (string?)kv.Value == "ABC123");
    }
}

Key points:

We register FakeLoggerProvider as ILoggerProvider.
We create a concrete LoggerFactory so anything resolving ILogger<T> via factory still flows through our provider.
Collector.GetSnapshot() gives an immutable list—safe to query multiple times.

Using LatestRecord vs GetSnapshot

var last = provider.Collector.LatestRecord;
Assert.NotNull(last);
Assert.Equal(LogLevel.Information, last.Level);

// Full list (ordered by arrival)
var all = provider.Collector.GetSnapshot();
var warnings = all.Where(r => r.Level == LogLevel.Warning).ToList();

Use LatestRecord when you just expect one or want a quick “something logged” assertion. Prefer snapshots plus LINQ for clarity when order or filtering matters.

Handling no logs (expected or defensive)

FakeLoggerProvider.Collector.LatestRecord throws InvalidOperationException if nothing was logged yet. Catch/Assert when verifying absence:

var ex = Assert.Throws<InvalidOperationException>(() =>
{
    var _ = provider.Collector.LatestRecord;
});
Assert.Equal("No records logged.", ex.Message);

Or simply:

Assert.Empty(provider.Collector.GetSnapshot());

Asserting warnings for not-found cases (mirroring the example)

From the original test style:

var warning = provider.Collector
    .GetSnapshot()
    .FirstOrDefault(r => r.Level == LogLevel.Warning &&
                         r.Message == $"Unable to find user by name {username}");

Assert.NotNull(warning);

Structured state and scopes

var rec = provider.Collector.GetSnapshot()
    .Single(r => r.Message.StartsWith("Fetching thing"));

var thingId = rec.StateValues.Single(kv => kv.Key == "ThingId").Value;
Assert.Equal("ABC123", thingId);

// Scopes (if code used BeginScope)
foreach (var scope in rec.Scopes)
{
    // scope is a formatted scope string or key/value pair sequence
}

When you must register ILoggerFactory

Some frameworks (or helper libs) explicitly request ILoggerFactory. If you only register ILoggerProvider, they create their own factory and your provider may be skipped. Supplying:

services.AddSingleton<ILoggerFactory>(sp =>
    new LoggerFactory([sp.GetRequiredService<ILoggerProvider>()]));

ensures a single pipeline. (If you add multiple providers, pass them all in the array.)

I often start with FakeLogger<T> and switch to provider + factory the moment I need to assert more than one message or category.

Common pitfalls

Pitfall	Fix
No logs captured	Ensure provider registered before endpoint created
LatestRecord throws	Use GetSnapshot() for empties
Wrong category filtered	Compare `record.CategoryName` to `typeof(TheType).FullName`
Structured value missing	Ensure you used named template: LogInformation("Id {ThingId}", id)

Quick FastEndpoints factory helper (optional)

public static class LoggingTestSetup
{
    public static (TEndpoint endpoint, FakeLoggerProvider provider) CreateWithLogging<TEndpoint>()
        where TEndpoint : class, IEndpoint
    {
        var provider = new FakeLoggerProvider();
        var endpoint = Factory.Create<TEndpoint>(svc =>
            svc.AddTestServices(s =>
            {
                s.AddSingleton<ILoggerProvider>(provider);
                s.AddSingleton<ILoggerFactory>(sp =>
                    new LoggerFactory([sp.GetRequiredService<ILoggerProvider>()]));
            }));
        return (endpoint, provider);
    }
}

Wrap-up

FakeLoggerProvider elevates log testing from “one-off assert” to “full pipeline visibility.” Pairing it with an explicit ILoggerFactory gives deterministic, framework-friendly logging in test hosts (including FastEndpoints). Use it when you outgrow the single-category simplicity of FakeLogger<T>.

Happy testing.

Better ILogger testing in .NET

Eelco Los — Fri, 22 Aug 2025 11:48:18 +0000

Logging in .NET is solid in production, but when it comes to unit and integration tests… things get messy.
I’ve often found myself skipping over logger checks because mocking ILogger<T> felt brittle or just not worth the effort.

But here’s the thing: logs aren’t just background noise. They’re often the first place you look when something goes wrong in production. If the logging isn’t there, or if it’s wrong, you’ll notice it the hard way.

In this post, I’ll walk through:

Why it’s actually worth checking logs in tests
Why mocking loggers is frustrating
How FakeLogger<T> makes this easy, with examples you can drop straight into your tests (Nuget)

Why check logging?

Logs are your safety net. They give you visibility into what your app was doing when something went sideways. Without them, debugging turns into guesswork.

A few reasons to test your logs:

Catch silly mistakes early, like logging the wrong variable, or writing unreadable object dumps, like [Object object].
Make sure log levels make sense. LogError vs LogDebug matters when monitoring and alerting are wired up.
Keep operational contracts intact. Logs are consumed by dashboards, pipelines, and alerting systems. Breaking them silently can hurt in production.

So while it’s tempting to skip logging checks in tests, a little validation here goes a long way.

Mocking `ILogger<T>` feels awkward

If you’ve ever tried mocking ILogger with Moq, NSubstitute, or FakeItEasy, you probably know the pain.

For example, to me, FakeItEasy states: if you use A.Fake on any interface, it'll create something usable. However, when using

var logger = A.Fake<ILogger<DemoClass>>();

And then try and run it, it'll show:

FakeItEasy error output

Failed to create fake of type Microsoft.Extensions.Logging.ILogger`1:
    No usable default constructor was found on the type Microsoft.Extensions.Logging.ILogger`1.
    An exception of type System.ArgumentException was caught during this call. Its message was:
    Can not create proxy for type Microsoft.Extensions.Logging.ILogger`1 because the target type is not accessible. 
    Make it public, or internal and mark your assembly with 
    [assembly: InternalsVisibleTo("DynamicProxyGenAssembly2, PublicKey=...")] 
    attribute, because assembly Microsoft.Extensions.Logging.Abstractions is strong-named. 
    (Parameter 'additionalInterfacesToProxy')

Another way is shown at Moq (from https://www.freecodecamp.org/news/how-to-use-fakelogger-to-make-testing-easier-in-net/ ):

Moq NotSupported example

// Arrange
        var mockLogger = new Mock<ILogger>();

        // pass the mockedLogger to our service
        var orderService = new OrderService(
            mockLogger.Object, 
            new Mock<IInvoiceService>().Object
        );

        var customerId = Guid.NewGuid();
        var order = new Order
        {
            ID = Guid.NewGuid(),
            CustomerId = customerId,
            Products = [new Product { ID = Guid.NewGuid(), Name = "Ping pong balls", Price = 1.00M }],
            OrderDate = default,
        };

        // Act
        orderService.ProcessOrder(order);      

        // Assert
        mockLogger.Verify(x => x.LogInformation("Processing order..."), Times.Once);
        mockLogger.Verify(x => x.LogInformation("Order processed successfully."), Times.Once);
    }

which shows

 System.NotSupportedException: 
Unsupported expression: x => x.LogInformation("Processing order...", new[] {  })

Introducing `FakeLogger<T>`

Microsoft recognized how painful logging tests could be in real-world services. Mocking ILogger<T> often led to brittle tests or convoluted setups. To address this, they introduced FakeLogger<T> in .NET 8, as part of the testing-friendly tooling described in “Fake It Til You Make It…To Production”.

FakeLogger<T> is tiny, test-oriented, and captures logs in memory. You can assert log messages, levels, exceptions, and even structured state without relying on Moq, FakeItEasy, or reflection hacks.

Here’s a minimal example:

using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Testing;
using Xunit;

public class MyServiceTests
{
    [Fact]
    public void DoWork_LogsExpectedMessage()
    {
        // Arrange: create a FakeLogger for MyService
        var fakeLogger = new FakeLogger<MyService>();

        var service = new MyService(fakeLogger); // inject logger into service

        // Act
        service.DoWork();

        // Assert: verify log message
        var record = fakeLogger.LatestRecord;
        Assert.NotNull(record);
        Assert.Equal(LogLevel.Information, record.LogLevel);
        Assert.Contains("Work done", record.Message);
    }
}

With FakeLogger<T>, you can quickly verify your logs without wrestling with generic delegates or fragile mock setups. For more complex scenarios, it also exposes all captured LogRecords, which is handy for asserting multiple messages or structured logging.

Full integration-test example (xUnit + WebApplicationFactory)

Here’s a complete integration test that:

Creates a FakeLogger<T> for the category T you care about (replace MyService with the concrete type used as ILogger<T> in your app).
Injects that instance into the test host via WithWebHostBuilder.
Calls an endpoint.
Retrieves the registered logger from the test host and asserts using LatestRecord.

using System.Linq;
using System.Threading.Tasks;
using FluentAssertions;
using Microsoft.AspNetCore.Mvc.Testing;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Testing;
using Xunit;

// Replace `Program` with the class that defines your app's host entry (top-level Program in minimal APIs)
public class BudgetIntegrationTests : IClassFixture<WebApplicationFactory<Program>>
{
    private readonly WebApplicationFactory<Program> _factory;

    public BudgetIntegrationTests(WebApplicationFactory<Program> factory)
    {
        _factory = factory;
    }

    [Fact]
    public async Task GetBudget_ReturnsFile_AndLogsMessage()
    {
        // Replace `MyService` with the concrete type T used in ILogger<T>
        var fakeLogger = new FakeLogger<MyService>();

        // Create a factory that injects our fake logger into DI
        var factory = _factory.WithWebHostBuilder(builder =>
        {
            builder.ConfigureServices(services =>
            {
                // Remove any existing ILogger<MyService> registration (if present)
                var existing = services.SingleOrDefault(d => d.ServiceType == typeof(ILogger<MyService>));
                if (existing != null) services.Remove(existing);

                // Register our fake logger as the ILogger<MyService> implementation
                services.AddSingleton<ILogger<MyService>>((ILogger<MyService>)fakeLogger);
            });
        });

        // Create client and call the endpoint that triggers the log
        using var client = factory.CreateClient();
        var response = await client.GetAsync("/api/budget/download"); // adjust path to your endpoint
        response.EnsureSuccessStatusCode();

        // Retrieve the registered logger from the test host and assert
        var logger = (FakeLogger<MyService>)factory.Services.GetRequiredService<ILogger<MyService>>();
        logger.LatestRecord.Should().NotBeNull();
        logger.LatestRecord.Message.Should().Contain("Returning budget file in");
    }
}

Notes:

MyService should be replaced with the type used when your code calls logger.LogInformation(...) or similar (often the service or controller class).
If the app registers loggers differently (or uses factory-style providers), removing the existing descriptor ensures your fake gets picked up.
LatestRecord will be null if nothing was logged. Use assertions accordingly.

Wrap-up

ILogger<T> is fantastic for production observability, but not designed to be easily asserted in tests. Instead of wrestling with Log<TState> signatures or brittle mock setups, FakeLogger<T> gives you a simple, inspectable test surface and keeps tests readable.

I added a small FakeLogger demo that shows how to capture and assert ILogger output
from a .NET service in unit tests. See the code and run the tests locally:

Repo: https://github.com/EelcoLos/nx-tinkering
Implementation & review (merged PR): https://github.com/EelcoLos/nx-tinkering/pull/726
Exact file (on main): https://github.com/EelcoLos/nx-tinkering/blob/main/apps/fakelogger-demo.Test/MyServiceTests.cs

Run locally (PowerShell)

dotnet test .\apps\fakelogger-demo.Test\fakelogger-demo.Test.csproj

Happy testing! 🚀

Lessons Learned Shipping .NET Apps with Docker, Alpine, and Kubernetes

Eelco Los — Fri, 27 Jun 2025 09:36:18 +0000

🛠️ Update (4th of July, 2025):

Microsoft’s official .NET 8+ base images now define a secure numeric user via APP_UID.

If you're using those, prefer USER $APP_UID over manually creating appuser.

I’ve updated the Docker Security section to reflect this recommended approach while keeping the broader appuser pattern for non-Microsoft base images.

I love containerization.

From personal projects running at home to production-grade services, containers have transformed the way I build and ship software. They're lightweight, consistent, and (when used correctly) secure. For local development, I usually prefer to work with full SDKs. But for deployments, I lean heavily on containers, DevContainers, and GitHub Actions.

This post will walk you through a solid workflow for building and running .NET apps in Docker using Alpine, preparing images with CI, and tuning for Kubernetes deployments with realistic resource limits.

🧊 Running .NET in Alpine Containers

Alpine is a super minimal Linux distro that makes for compact Docker images. Microsoft ships Alpine-based variants of .NET like this (at the time of writing this is dotnet 9):

FROM mcr.microsoft.com/dotnet/aspnet:9.0-alpine

To have a minimal container is what I feel containerization is really about: work with the OS that is minimal in scope and just focuses on the app execution.
But there's a gotcha: cultural and timezone data isn’t included by default. To make your app work correctly across locales and timezones, add:

RUN apk add --no-cache icu-libs tzdata

➡️ See Andrew Lock’s excellent guide for deeper insights on this issue.

🛡️ Docker Security: Running as Non-Root (And Doing It Right)

One of the most common but overlooked Docker security pitfalls is that containers run as root by default. If someone breaks out of your app process, they’re root inside the container. And that's bad news.

Defining the Non-Root User

Start by creating a lightweight user and group in the image:

RUN addgroup -S appgroup && adduser -S appuser -G appgroup

-S creates system users/groups (no home directory, no password).
This keeps the image small and secure.

🔎 Note on $APP_UID:
The $APP_UID pattern is a Microsoft documented convention introduced in .NET 8+ base images. These images define a numeric non-root user internally and expose its UID via the APP_UID environment variable. This makes it easy to write:
USER $APP_UID
If you're using a non-Microsoft base image (like Alpine or Debian), this variable won't exist unless you define it yourself:
ENV APP_UID=10001
USER $APP_UID
So while the pattern is technically portable, only Microsoft's .NET base images provide it by default. For broader compatibility, the traditional adduser appuser && USER appuser pattern is still widely used and understood. Read more about Microsofts recommendation at https://devblogs.microsoft.com/dotnet/securing-containers-with-rootless/#using-app

Secure File Ownership: Use `COPY --chown`

I used to rely on fixing permissions like this:

COPY ./build/api .
RUN chown -R appuser:appgroup .

But this isn’t ideal:

Adds an extra layer.
Slower on large file sets.
Messy.

Then I learned to assign ownership directly at copy time:

COPY --chown=appuser:appgroup ./build/api .

This:

Instantly assigns correct ownership.
Avoids extra RUN chown.
Makes your Dockerfile cleaner and more declarative.

Putting It Together

FROM mcr.microsoft.com/dotnet/aspnet:9.0-alpine

RUN addgroup -S appgroup && adduser -S appuser -G appgroup

WORKDIR /app

COPY --chown=appuser:appgroup ./build/api .
COPY --chown=appuser:appgroup entrypoint.sh .

USER appuser

ENTRYPOINT ["./entrypoint.sh"]

The app runs with the least privilege necessary.
Files are owned properly the moment they're brought into the image.
Clean. Predictable. Secure.

🛠️ CI Builds Artifacts, Docker Just Packages

One of the best things you can do is keep your Dockerfile lean. Not only to avoid compiling inside Docker: it bloats your image and slows builds, but also because of the what docker is for: containerizing your application. Therefore, your app should be ready to be containerized. That is, how I experience Docker to primarily be: the 'containerizer'. So, to build then, use your CI pipeline to build and publish the app, then use Docker to package the output. This will give you inspectable artifacts of the build that.

🔧 GitHub Actions: Build and Upload Artifacts

- name: Build and Publish
  run: |
    dotnet publish -o ${{ env.PUBLISH_FOLDER_NAME }} ${{ inputs.publish-args }}

- name: Upload Build Artifact
  uses: actions/upload-artifact@v4
  with:
    name: ${{ inputs.artifact-name }}
    path: ${{ inputs.project-folder }}/${{ env.PUBLISH_FOLDER_NAME }}

Then in your Docker build step, pull the artifacts back down:

- name: Download artifacts
  run: |
    IFS=',' read -ra artifacts <<< "${{ inputs.download-artifact }}"
    for artifact in "${artifacts[@]}"; do
      mkdir -p "${{ inputs.working-directory }}/build/$artifact"
      gh run download --name "$artifact" --dir "${{ inputs.working-directory }}/build/$artifact"
    done

Finally, build and push the image:

- name: Build and push
  uses: docker/build-push-action@v6
  with:
    file: ${{ env.DOCKERFILE }}
    context: ${{ inputs.working-directory }}
    push: true
    tags: ${{ inputs.container-tags }}
    cache-from: type=gha
    cache-to: type=gha,mode=max

This artifact-first approach gives you:

Reproducibility
Cleaner build caching
Easy debugging (you can inspect the build output separately)

☸️ Kubernetes + Helm: Resource Limits That Actually Work

Let’s be real: .NET isn’t the smallest kid on the block. You can’t slap a tiny resource limit on it without consequences.

🔍 What Microsoft Recommends for AKS

Microsoft’s official guidance for AKS firmly states:

“Set pod requests and limits on all pods in your YAML manifests. If the AKS cluster uses resource quotas and you don't define these values, your deployment may be rejected.”
— AKS Best Practices (resource requests & limits)

They further caution:

“Pod CPU and memory limits define the maximum amount of CPU and memory a pod can use… avoid setting a pod limit higher than your nodes can support.”
— AKS Best Practices (resource guidelines)

Microsoft also provides a default starting configuration in their examples:

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 250m
    memory: 256Mi

This isn’t a strict minimum. But it is a realistic baseline that balances scheduling, performance, and cost .

My .NET-Focused Configuration

Here’s the setup that consistently works for .NET workloads I test:

resources:
  requests:
    cpu: 10m
    memory: 20Mi
  limits:
    cpu: 100m
    memory: 175Mi

⚠️ While .NET can technically run with ~125 Mi memory, in practice this leads to:

Sluggish cold starts
Failing health probes
Garbage collector thrash

Pushing memory to 175 Mi ensures decent startup times and runtime stability.

⚖️ TL;DR Recommendations

Always define both requests and limits
Memory: Setting limit > request improves stability: start around 175 Mi for .NET
CPU: A reasonable request (~100m) with a higher limit helps performance without causing throttling
These aren’t arbitrary: they reflect Microsoft’s AKS baseline examples (Deployment and cluster reliability best practices for Azure, Resource management best practices for Azure Kubernetes Service, Deployment and cluster reliability best practices for Azure, What is the best practice to have request and limit values to a pod in)

🎯 Final Thoughts

Use small sized base images, like Alpine, but patch it with you needs (ie: icu-libs and tzdata)
Run as a non-root user inside your Docker containers
Use CI to build the app, and let Docker just package it
Tune your K8s Helm charts to keep .NETs footprint small, but still responsive under pressure of your required workload

Containers are amazing, but they're even better when treated with care. With these practices, you’ll ship faster, safer, and smarter—whether it's production, staging, or even your home lab.

Got questions or tweaks to share? Drop them in the comments—I'd love to hear your workflow!

Practical Versioning Considerations for .NET APIs and Packages

Eelco Los — Mon, 02 Jun 2025 11:09:39 +0000

Versioning in .NET development goes beyond simply incrementing numbers. In my maintenance of a business critical service and NuGet package I had to migrate to a higher major version on both the API and package. I noticed subtle compatibility considerations when upgrading/migrating.
In this post, we'll explore those versioning challenges and solutions across these key areas:

The Hidden Complexity of Package Versioning - NuGet vs assembly version discrepancies and pre-release strategies
API Versioning and Compatibility Challenges - URL versioning approaches and the subtle breaking change of JSON casing
Breaking Changes You Might Miss - Default behavior changes, dependency bumps, and configuration modifications
Practical Implementation Guidelines - Concrete strategies for package authors and API designers
Testing Your Versioning Strategy - Package compatibility and API contract testing approaches
Version Communication Strategy - How to effectively communicate changes to internal teams and public consumers
Common Pitfalls to Avoid - Key mistakes to watch out for when versioning
Conclusion

The Hidden Complexity of Package Versioning

NuGet vs Assembly Version Discrepancy

One often overlooked issue is the disconnect between NuGet package versions and assembly versions. When you create a pre-release like 4.0.3-beta, the assembly version remains 4.0.3.0. Developers using decompilers see no indication it's a pre-release, leading to confusion.

The Solution: Use the build number strategically:

<!-- Instead of this -->
<Version>4.0.3-beta</Version>

<!-- Do this -->
<Version>4.0.3.1-beta</Version>

This ensures both NuGet and decompilers show version progression correctly.

Pre-Release Versioning Strategies

Building on this approach, you can choose different strategies depending on your workflow:

Semantic Pre-Release (for major versions):

6.0.0-beta.1 → 6.0.0-beta.2 → 6.0.0-rc.1 → 6.0.0

Rolling Minor Versions (for continuous development):

5.35.0 → 5.36.0-beta.1 → 5.36.0-beta.2 → 5.36.0

Build Number Strategy (when assembly version clarity matters):

4.0.3.1-beta → 4.0.3.2-beta → 4.0.4.0

Key insight: Use beta.1 not beta1 for proper sorting on NuGet.org, and consider build numbers when assembly version visibility is important for your consumers.

API Versioning and Compatibility Challenges

When evolving REST APIs, you'll encounter several versioning strategies and subtle breaking changes that can catch you off guard.

URL Versioning Strategies

// Path-based versioning
[Route("api/v1/products")]
[Route("api/v2/products")]

// Header-based versioning
[HttpGet]
[ApiVersion("1.0")]
[ApiVersion("2.0")]

// Query parameter versioning
/api/products?version=2.0

The Subtle Breaking Change: JSON Casing

Here's a compatibility issue many developers overlook: property casing changes between API frameworks. I encountered this one in migrating my business critical service when changing one of the properties from boolean to DateTime?

Traditional .NET Controllers (default PascalCase):

{
  "ProductId": 123,
  "ProductName": "Widget",
  "CreatedDate": "2024-01-15T10:30:00Z"
}

Minimal APIs (default camelCase):

{
  "productId": 123,
  "productName": "Widget",
  "createdDate": "2024-01-15T10:30:00Z"
}

This seemingly minor change can break client applications that expect specific casing. Consider this when:

Migrating from Controllers to Minimal APIs
Moving between different API frameworks
Updating serialization configurations

Solutions:

// Explicit casing control
services.ConfigureHttpJsonOptions(options =>
{
    options.SerializerOptions.PropertyNamingPolicy = 
        JsonNamingPolicy.CamelCase;
});

// Or maintain compatibility
services.ConfigureHttpJsonOptions(options =>
{
    options.SerializerOptions.PropertyNamingPolicy = null; // PascalCase
});

Breaking Changes You Might Miss

These are the changes that seem harmless but can completely break your consumers' applications.

Default Behavior Changes

// Version 1.0 - throws exceptions
public User GetUser(int id) => userRepository.GetById(id);

// Version 2.0 - returns null (breaking!)
public User GetUser(int id) => 
    userRepository.FirstOrDefault(u => u.Id == id);

Dependency Version Bumps

<!-- Your library targets .NET 6 -->
<TargetFramework>net6.0</TargetFramework>

<!-- Upgrading to .NET 8 might break consumers still on .NET 6 -->
<TargetFramework>net8.0</TargetFramework>

Configuration Changes

// Version 1.x
services.AddMyService(options => {
    options.EnableFeature = true; // Default was false
});

// Version 2.x - default changed to true
// Consumers expecting false behavior will break

Practical Implementation Guidelines

Success in versioning comes down to clear strategies for both package authors and API designers.

For Package Authors

Package maintainers need strategies that balance innovation with stability.

Version Your Interfaces Explicitly

public interface IMyService
{
    Task<string> ProcessAsync(string input);
}

// When adding parameters, create a new interface
public interface IMyServiceV2 : IMyService
{
    Task<string> ProcessAsync(string input, 
        CancellationToken cancellationToken);
}

Use Obsolete Attributes Effectively

// Phase 1: Warning only (default behavior)
[Obsolete("Use ProcessV2Async instead. " +
          "This method will be removed in v3.0.0")]
public async Task<string> ProcessAsync(string input)
{
    return await ProcessV2Async(input, CancellationToken.None);
}

// Phase 2: Force compilation error in next major version
[Obsolete("Use ProcessV2Async instead. " +
          "This method has been removed.", error: true)]
public async Task<string> ProcessAsync(string input)
{
    throw new NotSupportedException(
        "This method has been removed. Use ProcessV2Async instead.");
}

// Planned obsolescence with clear timeline
[Obsolete("Use ProcessV2Async instead. " +
          "This method will become a compilation error in v3.0.0 (June 2024)")]
public async Task<string> ProcessAsync(string input)
{
    return await ProcessV2Async(input, CancellationToken.None);
}

Obsolete Progression Strategy:

v2.0.0: Introduce new method
v2.1.0: Mark old method [Obsolete] with warning
v2.2.0: Update message with removal timeline  
v3.0.0: Set [Obsolete(error: true)] - forces compilation errors 
        but method still exists
v3.0.1: Actually remove the obsolete method entirely

Strategic Migration Approach:

// v2.1.0 - Soft warning
[Obsolete("Use ProcessV2Async instead. " +
          "This method will cause compilation errors in v3.0.0")]
public async Task<string> ProcessAsync(string input)
{
    return await ProcessV2Async(input, CancellationToken.None);
}

// v3.0.0 - Hard error, but method still exists for emergency fallback
[Obsolete("Use ProcessV2Async instead. " +
          "This method has been removed.", error: true)]
public async Task<string> ProcessAsync(string input)
{
    // Still functional for those who suppress the error temporarily
    return await ProcessV2Async(input, CancellationToken.None);
}

// v3.0.1 or v3.1.0 - Complete removal
// Method is entirely deleted from codebase

Benefits of this approach:

v3.0.0 upgrade path: Consumers can upgrade to the major version and see exactly what needs to be fixed
Emergency override: Teams can temporarily suppress the error (#pragma warning disable CS0618) while they migrate
Clear timeline: Everyone knows the method will be completely gone in the next minor release
Safer major version adoption: Teams aren't afraid to upgrade to v3.0.0 knowing they can still compile

Advanced Migration Pattern:

// v3.0.0 - Compilation error with escape hatch
[Obsolete("Use ProcessV2Async instead. " +
          "Method will be removed in v3.1.0. " +
          "Suppress CS0618 if you need temporary compatibility.", 
          error: true)]
public async Task<string> ProcessAsync(string input)
{
    // Log usage for monitoring migration progress
    logger?.LogWarning(
        "Obsolete method ProcessAsync called. Migrate to ProcessV2Async.");
    return await ProcessV2Async(input, CancellationToken.None);
}

This approach gives consumers a much better migration experience and reduces the fear of upgrading major versions!

Document Breaking Changes

## Breaking Changes in v2.0.0
- JSON response casing changed from PascalCase to camelCase
- `GetUser()` now returns null instead of throwing for missing users
- Minimum .NET version requirement increased to .NET 8

For API Designers

API versioning requires balancing backward compatibility with the need to evolve and improve.

Plan for Backward Compatibility

// Good: Additive changes
public class ProductResponse
{
    public int Id { get; set; }
    public string Name { get; set; }
    public DateTime CreatedDate { get; set; }

    // New in v1.1 - doesn't break existing clients
    public string? Description { get; set; }
}

Use Content Negotiation

[HttpGet]
public IActionResult GetProduct(int id, 
    [FromHeader] string? apiVersion = "1.0")
{
    return apiVersion switch
    {
        "2.0" => Ok(productService.GetProductV2(id)),
        _ => Ok(productService.GetProduct(id))
    };
}

Graceful Deprecation

[HttpGet("api/v1/products")]
[Obsolete("Use /api/v2/products instead")]
public IActionResult GetProductsV1()
{
    Response.Headers.Add("X-API-Deprecated", "true");
    Response.Headers.Add("X-API-Sunset", "2024-12-31");
    return GetProductsV2();
}

Testing Your Versioning Strategy

Testing isn't just about functionality—it's about ensuring your versioning strategy actually works in practice.

Package Compatibility Testing

# Test your package with different framework versions
dotnet pack
dotnet test --framework net6.0
dotnet test --framework net8.0

API Contract Testing

[Test]
public async Task ApiShouldMaintainJsonCasingAsync()
{
    var response = await client.GetAsync("/api/v1/products/1");
    var json = await response.Content.ReadAsStringAsync();

    // Ensure backward compatibility
    Assert.That(json, Contains.Substring("ProductId"));
    Assert.That(json, Does.Not.Contain("productId"));
}

Version Communication Strategy

Clear communication is just as important as technical implementation when it comes to versioning.

For Internal Teams

Changelog: Document every change, no matter how small
Migration guides: Provide step-by-step upgrade instructions
Deprecation timeline: Give consumers time to adapt

For Public APIs

API documentation: Version-specific endpoint documentation
SDKs: Maintain SDK versions aligned with API versions
Support policy: Clear support timelines for each version

Common Pitfalls to Avoid

Assuming minor changes aren't breaking - JSON casing, default values, and behavior changes can break consumers
Not testing with real consumer scenarios - Your breaking change detector might miss runtime issues
Inadequate deprecation periods - Give consumers time to migrate
Inconsistent versioning across related packages - Keep related packages in sync

Conclusion

Effective versioning requires thinking beyond semantic version numbers. Consider the entire ecosystem: package consumers, API clients, deployment scenarios, and migration paths. The goal isn't just to communicate changes—it's to enable smooth evolution while maintaining trust with your consumers.

Remember: every change is potentially breaking to someone. The key is understanding its impact and communicating clearly.

What versioning challenges have you encountered in your .NET projects? Share your experiences in the comments below!

DEV Community: Eelco Los

How I hardened my multi-agent AI support copilot

Lesson 1: Skills are documents. Agents are executors.

Lesson 2: Fail fast on configuration. Silent dry-run is an anti-pattern.

Lesson 3: Slash commands don't exist in sub-agents

Lesson 4: Constrain the interface, not just the credentials

Lesson 5: local-dev auth and production auth are different systems

Lesson 6: IncidentContext needed to be durable and layered

Lesson 7: you can test more of this than you think

1. Static validation tests

2. Deterministic logic tests

3. Contract tests for agents

4. Golden cases from real incidents

5. LLM evals where model behavior really matters

What these failures had in common

The debugging order that actually worked

What changed in the repo because of these runs

How I designed a multi-agent AI support copilot

The idea: a side-by-side AI copilot, not a replacement

Where the design came from

The two-layer repo structure

How this matches broader agentic patterns

Core design decisions

Orchestrator-Worker at a glance

IncidentContext at a glance

References

Cross-boundary communication between desktop and web

Layer 1: iframe ↔ parent (DOM postMessage)

Layer 2: parent ↔ desktop host (WebView2 WebMessage)

Message contracts: be explicit (the “action/type” field)

Web ↔ web (iframes/windows): type

Web ↔ host (WebView2): action

Contract in action (from the demo repo)

Testing strategy

Security checklist (practical)

Sources

.NET File‑Based Apps for API Prototyping: What Bit Me on First Run

Example (FastEndpoints + file-based app)

Gotcha: JSON/AOT defaults can break “works with web”

Why this happens (NativeAOT + JSON)

Fix (per gist): disable AOT

References

Fantastic Knowledge & How to Retain It

From Thoughts to Notes

Techniques That Work

Example: What note did I create for Copilot Spaces

Why Mind Mapping Still Matters

Zettelkasten: Core Workflow Anatomy

Creating a web of information using links

When would you use it then?

Storing Notes Digitally (Obsidian, Notion)

Using AI to help your knowledgebase

References

Personal Experience Highlights

Closing

Lessons learned implementing SCIM with Microsoft Entra and the SCIM Validator

Why SCIM

Train of thought

Validation strategy

Caching

Concurrency and idempotency

Spec coverage priorities

Local exposure & tunneling

Operational tips

Recommended path to success

Closing thoughts

References

Where coding agents excel (and where they don't)

Why I started using Copilot coding agents

Beyond IDE Agents: What Copilot Coding Agents Can Do

Where coding agents excel (and where they don't)

How agents run in practice: the copilot-setup-steps.yml convention

Minimal example I use as a starting point

Secrets, environments, and the things that bit me

A practical checklist (so you can try this quickly)

Defining Agent Instructions: The AGENTS.md Approach

Pitfalls & advice from real runs

Closing thoughts

Why Your ASP.NET Core LogLevel 'Warning' Still Sends Information Logs to Application Insights

The puzzle

Layer 1: iframe ↔ parent (DOM `postMessage`)

Web ↔ web (iframes/windows): `type`

Web ↔ host (WebView2): `action`

How agents run in practice: the `copilot-setup-steps.yml` convention

Defining Agent Instructions: The `AGENTS.md` Approach

Mocking `ILogger<T>` feels awkward

Introducing `FakeLogger<T>`

Secure File Ownership: Use `COPY --chown`