DEV Community: Johnny Z

Agent with Vercel's Eve Framework

Johnny Z — Mon, 22 Jun 2026 04:51:54 +0000

Vercel recently open-sourced Eve, a framework for building durable AI agents. It takes an opinionated, filesystem-first approach: instead of wiring up model loops, tool dispatch, and session persistence yourself, you author a directory of files and Eve handles the rest.

I took it for a spin by building a shopping assistant — an agent that can search a product catalog, check inventory, compare prices, read reviews, and place orders. Here's what I found.

The Core Separation: Agent vs Channel

Eve draws a hard line between what the agent is and how it communicates. The agent is the reasoning core — model, tools, instructions. It doesn't know or care how users reach it. A channel is just comms — it handles inbound transport, auth, message format, and delivery for a specific platform.

This means the same agent can simultaneously serve a browser chat widget, a Slack bot, a CLI, and a custom webhook — without any conditional logic in the agent itself. You add surfaces by adding channel files, not by changing agent code.

Channels: How Users Reach the Agent

In Eve, a channel is the edge adapter between a platform and your agent. It normalizes inbound messages, owns the conversation resume handle (continuationToken), and decides how responses get delivered back.

The default channel is eve.ts — the HTTP session API that the dev TUI, browser clients, and curl all talk to. But channels aren't limited to HTTP. Eve ships integrations for Slack, Discord, Teams, Telegram, Twilio (SMS/voice), GitHub, and Linear. You can also write custom channels for any surface (webhooks, WebSockets, internal systems).

Each channel is a file under agent/channels/:

agent/channels/
├── eve.ts      # HTTP API (always present, even without a file)
├── slack.ts    # Slack DMs, mentions, buttons
└── intake.ts   # Your custom webhook channel

The key insight: your agent logic (instructions + tools) stays the same regardless of channel. You write the agent once, then expose it through multiple surfaces by adding channel files. The channel handles platform-specific concerns (auth, message format, delivery) while the agent handles reasoning.

The Default Eve Channel

The eve.ts channel is always present — it provides a session-based HTTP API that the dev TUI, browser clients (useEveAgent), and curl all use. The key concept is durable sessions: you POST to create a session, stream events from it via NDJSON, and continue it with a continuationToken. Sessions survive server restarts and support reconnection at any event index (?startIndex=N). This is fundamentally different from stateless request/response — Eve owns the conversation state server-side.

How Sessions Are Durable by Default

Eve sessions aren't just "kept in memory" — they're backed by a workflow engine. Under the hood, every turn runs as a durable workflow built on the open-source Workflow SDK. Eve checkpoints progress at each step boundary (one model call + its tool calls = one step). If the process crashes mid-turn, it resumes from the last completed step rather than replaying everything.

Locally, this is just files on disk. Run your agent and you'll find a .workflow-data/ directory:

.workflow-data/
├── runs/       # one JSON file per session (workflow state)
├── steps/      # checkpoint per completed step
├── streams/    # event streams (what clients read via /stream)
├── hooks/      # parked continuation tokens (waiting for input)
└── events/     # workflow lifecycle events

This means you can:

Start a conversation, ask the agent something
Kill the server (Ctrl+C)
Restart it
Continue the conversation with the same sessionId and continuationToken

The session picks up exactly where it left off — including mid-turn recovery if a step was already completed before the crash.

Obviously, local files aren't scalable for production. Eve's durability is pluggable via the Workflow SDK's "World" abstraction — the storage/queue/streaming backend. You pick a world package and Eve uses it for all session persistence:

World	Backend	Use case
`@workflow/world-local`	Filesystem (`.workflow-data/`)	Local dev (default)
`@workflow/world-postgres`	PostgreSQL + graphile-worker	Self-hosted production
`@workflow/world-vercel`	Vercel Workflow (managed)	Vercel deployments
`@workflow-worlds/redis`	Redis + BullMQ	Self-hosted, high throughput
`@workflow-worlds/mongodb`	MongoDB	Self-hosted
`@workflow-worlds/turso`	Turso/libSQL (embedded SQLite)	Edge/embedded

To switch, you set it in agent.ts:

export default defineAgent({
  model: openai("gpt-4o"),
  modelContextWindowTokens: 128_000,
  experimental: {
    workflow: {
      world: "@workflow/world-postgres",
    },
  },
});

The World interface has three responsibilities: Storage (persisting runs, steps, hooks via an append-only event log), Queue (dispatching workflow/step invocations with at-least-once delivery), and Streams (real-time event delivery to clients). You can also build your own if none of the existing options fit.

Example: Exposing the Agent as AG-UI

To make this concrete — I built a custom channel that exposes the Eve agent over the AG-UI protocol (SSE-based, used by CopilotKit and similar frameworks). The channel translates Eve's internal event stream into AG-UI's event vocabulary:

// agent/channels/agui.ts
import { defineChannel, POST, type Session } from "eve/channels";
import { EventEncoder } from "@ag-ui/encoder";
import { EventType, type BaseEvent, type RunAgentInput } from "@ag-ui/core";

export default defineChannel({
  routes: [
    POST("/agui", async (req, { send }) => {
      const body = (await req.json()) as RunAgentInput;
      if (!body.threadId) {
        return Response.json({ error: "Missing 'threadId'." }, { status: 400 });
      }

      const { threadId } = body;
      const runId = body.runId ?? randomUUID();
      const messages = body.messages ?? [];

      // AG-UI is stateless per request — clients send full message history.
      // Pass prior messages as context so Eve's agent sees the conversation.
      const lastUserIdx = messages.findLastIndex((m) => m.role === "user");
      const context = messages.slice(0, lastUserIdx).map((m) =>
        `[${m.role}]: ${typeof m.content === "string" ? m.content : JSON.stringify(m.content)}`
      );

      const session: Session = await send(
        { message: messages[lastUserIdx]?.content ?? "", context },
        { auth: null, continuationToken: `agui:${threadId}:${randomUUID()}` },
      );

      // Read Eve's event stream and translate to AG-UI SSE
      const eveStream = await session.getEventStream();
      const encoder = new EventEncoder();
      // ... event mapping loop (see full source)
    }),
  ],
});

The event mapping is mostly mechanical — actions.requested → TOOL_CALL_START/ARGS/END, action.result → TOOL_CALL_RESULT, message.appended → TEXT_MESSAGE_CONTENT, turn.completed → RUN_FINISHED. But there were a few non-obvious gotchas:

1. Eve sessions are durable; AG-UI runs are not. Eve's event stream never closes after a turn — it waits for the next message. You must close the response stream yourself after emitting RUN_FINISHED. If you don't, the client hangs forever waiting for more data.

2. Eve emits both turn.completed and session.waiting at turn boundaries. If you naively emit RUN_FINISHED for both, the AG-UI client throws: "Cannot send event type 'RUN_FINISHED': The run has already finished." Guard with a flag and only emit once.

3. AG-UI is stateless; Eve is stateful. Each AG-UI request carries the full message history. Since Eve's send() creates/continues sessions via continuationToken, you need a fresh token per request (otherwise Eve tries to continue a stale session). The conversation history goes through context so the agent sees prior turns.

4. Eve actions are typed unions. actions.requested contains a RuntimeActionRequest[] that can be tool-call, subagent-call, or load-skill. You need to filter for action.kind === "tool-call" and use action.toolName / action.callId (not .name / .id which don't exist on the type).

Same agent, same tools, same instructions — but now it speaks AG-UI over SSE at POST /agui. You could have the Eve HTTP channel, a Slack channel, and this AG-UI channel all running simultaneously, each talking to the same underlying agent.

The Developer Experience

Run pnpm dev (or npx eve dev) and you get an interactive terminal UI that speaks the eve channel protocol locally:

☰eve  shopping-agent-orchestrator

> show me laptops under $1000

✓ search_products  query="laptop" maxPrice=1000 → 1 result
✓ get_pricing  product="Dell XPS 13" → $899.10 (Summer Sale)
✓ check_stock  product="Dell XPS 13" → 12 units

I found the Dell XPS 13 for $899.10 (10% off with the Summer Sale).
It's in stock with 12 units available. Would you like to place an order?

The TUI shows:

Tool calls as they happen (collapsed to one-line summaries)
Streaming text as the model generates it
Token usage stats
Slash commands (/model to switch models, /new for a fresh session)

File changes trigger a hot rebuild — edit a tool, and it's live on the next message.

The Gotcha: Custom Models Need `modelContextWindowTokens`

If you're using a non-gateway model (custom baseURL, direct provider), Eve's build will fail with a cryptic error about compaction metadata. You need to explicitly tell it the context window size:

// agent/agent.ts
import { createOpenAI } from "@ai-sdk/openai";
import { defineAgent } from "eve";

const openai = createOpenAI({
  apiKey: process.env.OPENAI_API_KEY!,
  baseURL: process.env.OPENAI_BASE_URL, // custom endpoint
});

export default defineAgent({
  model: openai("gpt-4o"),
  modelContextWindowTokens: 128_000, // required for direct providers
});

What is modelContextWindowTokens? Eve has a built-in "compaction" system that prevents long conversations from overflowing the model's context window. When the conversation reaches ~90% of the window, Eve automatically summarizes older turns into a compressed form so the session can keep going indefinitely. To do this, Eve needs to know how big the window is. Gateway models (like "openai/gpt-4o") have this metadata in Vercel's catalog. But when you bring your own provider via createOpenAI(), Eve has no way to look it up — so you tell it explicitly.

Without this field, the build fails at compile time rather than silently skipping compaction at runtime. The error message ("does not have known AI Gateway context window metadata") doesn't make the fix obvious.

Key Takeaways

Zero orchestration code. I didn't write a single line of routing, tool dispatch, or streaming logic. The model handles multi-step reasoning natively — search → check stock → order — guided by the markdown instructions.
The filesystem convention removes boilerplate. Adding a new capability is "create a file." No registration, no imports to wire up.
Durable sessions out of the box. Multi-turn conversations, reconnection, human-in-the-loop approval — it's all built in.
The dev TUI is genuinely useful. Seeing tool calls execute in real-time while developing makes the feedback loop fast.

Complete sample code

Please feel free to reach out on twitter @roamingcode

Beyond the Agentic Loop, in TypeScript: building a shopping agent with the Orchestrator pattern

Johnny Z — Wed, 17 Jun 2026 00:26:20 +0000

This post is a TypeScript implementation of the pattern described in "Beyond the Agentic Loop: The Orchestrator Pattern for Multi-Agent Systems" by Amogh Ubale (Stackademic). The original is Python with generic agents; here we keep the idea intact and re-theme it as a shopping assistant so the three execution modes have something concrete to chew on. All the design credit goes to that article — go read it first.

The cast: a handful of shopping agents

Before the pattern, the scene. The demo is a small storefront assistant backed by a
few single-purpose agents:

Catalog — list the categories on offer, or search products by keyword and price.
Inventory — check stock and availability for a product.
Pricing — look up the current price and any active promotions.
Reviews — fetch a product's rating and review highlights.
Order — place an order for a product.

A customer request might need just one of these, several of them at once, or a few in a
strict order — and deciding which of those shapes a request calls for is exactly what
the orchestrator is for.

The problem: the LLM as a `while` loop

The default way to build a multi-agent system is the agentic loop: you hand the
model a bag of tools and let it drive.

think → call a tool → observe the result → think again → call another tool → …

The LLM is both the brain and the control flow. That's wonderfully flexible, and
it's the right tool when the task is open-ended and you genuinely don't know the
steps in advance. But in production it has three nasty properties:

Unpredictable shape. Every "think" step is another LLM round-trip, so a three-agent task might take 3 calls or 9 — you don't know until it runs, and latency swings with it. (The article clocks a representative three-agent query at ~7 calls through the loop; the wall-clock and spend follow, but the unpredictability is the part that actually bites.)
Non-determinism. The same question can take a different path each time, which makes behavior hard to reason about and hard to trust with side effects — like placing an order.
Poor observability. "Why did it do that?" means replaying a transcript of intermingled reasoning and tool calls. There's no single place where the plan lives.

If you already know which agents exist and what they do, an open-ended reasoning loop
on every request is more freedom than the job needs.

The pattern: decide once, execute deterministically

The orchestrator's move is to separate the decision from the execution. Instead
of letting the model loop, you make exactly two LLM calls with plain,
deterministic code in between:

query ──▶ [ROUTE: LLM #1] ──▶ [EXECUTE: agents, no LLM] ──▶ [SYNTHESIZE: LLM #2] ──▶ answer

Route — one LLM call whose only job is to pick which agent(s) to run.
Execute — ordinary application code runs those agents. No LLM here.
Synthesize — one LLM call turns the structured results into prose.

Two calls, every time, no matter how many agents run. That fixed shape is the whole
point: a plan you can inspect before anything happens, latency that doesn't depend on
the model's mood, and independent work you can fan out. (It's cheaper too — the article
puts the same query at ~2 calls instead of ~7 — but the cost isn't the headline; the
outcomes are.)

1. The registry: agents are just functions

An agent is a name, a description (for the router), a JSON-Schema for its arguments,
and an execute function. Nothing more.

// src/server/orchestrator/types.ts
export type ExecuteFn = (args: AgentArgs, context: AgentContext) => Promise<AgentResult>;

export interface AgentDefinition {
  agent: string;        // human name, e.g. "Catalog Agent"
  description: string;  // shown to the router LLM so it can choose this tool
  parameters: Record<string, unknown>; // JSON Schema for the args
  execute: ExecuteFn;
}

The "registry" is a plain in-process object — agents are registered by hand.
There's deliberately no Redis, no database, no HTTP self-registration. That keeps the
whole thing runnable and testable with zero infrastructure.

// src/server/orchestrator/registry.ts
export const REGISTRY: Record<string, AgentDefinition> = {
  catalog_agent__list_categories: catalogCategoriesAgent,
  catalog_agent__search_products: catalogAgent,
  inventory_agent__check_stock: inventoryAgent,
  pricing_agent__get_deals: pricingAgent,
  reviews_agent__get_reviews: reviewsAgent,
  order_agent__place_order: orderAgent,
};

toolDefinitions() projects that map into the OpenAI tool format the router sees —
each agent becomes one function tool, plus one meta-tool we'll meet shortly.

2. Route: the one decision-making LLM call

The router is given a blunt system prompt: pick tools, do not answer.

// src/server/orchestrator/router.ts
const SYSTEM_PROMPT = `You are a query router. Your ONLY job is to decide which tool(s) to call.
Rules:
- If the query needs ONE agent, call that one tool.
- If the query needs MULTIPLE INDEPENDENT agents, call all of them.
- If the query needs steps IN ORDER (a later step depends on an earlier one), call plan_execution and provide the ordered steps.
Do NOT answer the user's question — just pick tools.`;

We call the model at temperature: 0 with tool_choice: "auto", then read its tool
calls back out. The shape of that tool-call list is the execution plan — we never
ask the model to "answer," only to choose:

// src/server/orchestrator/router.ts
export async function route(query: string): Promise<RouteDecision> {
  const response = await getOpenAIClient().chat.completions.create({
    model: getConfig().ROUTER_MODEL,
    temperature: 0,
    tools: toolDefinitions(),
    tool_choice: "auto",
    messages: [
      { role: "system", content: SYSTEM_PROMPT },
      { role: "user", content: query },
    ],
  });

  const toolCalls = response.choices[0]?.message.tool_calls ?? [];

  // plan_execution present -> sequential. Take its ordered steps.
  const planCall = toolCalls.find((c) => c.function.name === PLAN_EXECUTION_TOOL);
  if (planCall) {
    const parsed = safeParseArgs(planCall.function.arguments) as {
      steps?: Array<{ tool: string; args?: AgentArgs; reason?: string }>;
    };
    const steps = (parsed.steps ?? []).map((s) => ({ tool: s.tool, args: s.args ?? {}, reason: s.reason }));
    return { mode: "sequential", steps };
  }

  const steps = toolCalls.map((c) => ({ tool: c.function.name, args: safeParseArgs(c.function.arguments) }));
  return { mode: steps.length > 1 ? "parallel" : "single", steps };
}

So the router collapses to three outcomes:

one tool → single
several tools → parallel
the plan_execution meta-tool → sequential

3. Execute: the heart of the pattern (no LLM)

This is where parallel and sequential actually diverge — and it's pure TypeScript,
no model involved.

// src/server/orchestrator/executor.ts
export async function* executeStream(mode: Mode, steps: PlanStep[]): AsyncGenerator<ExecEvent, AgentContext> {
  const results: AgentContext = {};

  if (mode === "parallel") {
    for (const step of steps) yield { kind: "agent_start", tool: step.tool, args: step.args };
    const settled = await Promise.all(
      steps.map(async (step) => [step.tool, await runAgent(step, {})] as const),
    );
    for (const [tool, result] of settled) {
      results[tool] = result;
      yield { kind: "agent_result", tool, result };
    }
    return results;
  }

  // single + sequential: ordered; each step sees prior results as context.
  for (const step of steps) {
    yield { kind: "agent_start", tool: step.tool, args: step.args };
    const result = await runAgent(step, results);
    results[step.tool] = result;
    yield { kind: "agent_result", tool: step.tool, result };
  }
  return results;
}

Read the two branches side by side:

Parallel is Promise.all. The agents are independent, so they all fire at once and you pay for the slowest one, not the sum. "What's the price, rating, and stock of the iPhone 15?" becomes three lookups that have nothing to say to each other — run them together.
Sequential is an ordered for loop where each step receives the accumulated results as its context. That's how a later agent consumes an earlier one's output. "Find a laptop under $1000, check it's in stock, then order it" can't be parallel — the order step needs the product the search produced.

(The generator yields a small event before and after each agent. That's only so a
transport can show progress; it doesn't change the logic.)

4. `plan_execution`: a signal, not an agent

How does the router say "do these in order"? With a meta-tool that runs no code:

// src/server/orchestrator/registry.ts
export const PLAN_EXECUTION_TOOL = "plan_execution";
// ...its tool schema asks for { reason, steps: [{ tool, args, reason }] }

When the router selects plan_execution, the orchestrator switches to sequential
mode. The original article treats it purely as a signal and leaves the ordering and
data-passing unspecified. This repo makes one deliberate addition so the demo
actually works end-to-end: plan_execution returns the ordered steps, and the
executor threads results forward as context. The order agent then resolves the
product the search found (see resolveTargetProduct in
src/server/lib/resolve-product.ts). That's the difference between a pattern diagram
and a thing you can run.

5. Synthesize: the only creative call

Once the agents have produced structured data, a second LLM call turns it into an
answer. This is the only step with any "writing" to do, so it runs warmer and streams
its tokens out.

// src/server/orchestrator/synthesizer.ts
export async function* synthesizeStream(query: string, results: AgentContext): AsyncGenerator<string> {
  const stream = await getOpenAIClient().chat.completions.create({
    model: getConfig().SYNTH_MODEL,
    temperature: 0.7,
    stream: true,
    messages: [
      { role: "system", content: "Summarize the agent results into a clear, helpful answer." },
      { role: "user", content: `User asked: ${query}\nResults: ${JSON.stringify(results)}` },
    ],
  });
  for await (const chunk of stream) {
    const delta = chunk.choices[0]?.delta?.content;
    if (delta) yield delta;
  }
}

What this buys you

Putting the three phases together, the payoff is exactly the inverse of the loop's
pain points — and these enablements, not the price tag, are the real reason to reach
for it:

A plan you can trust. The decision is a single inspectable object — the RouteDecision — produced before any agent runs. You can log it, assert on it, gate it, replay it. That's what makes it safe to let an agent actually place an order.
Debuggability. The execute phase is deterministic, so a bug there reproduces every time instead of hiding in a different transcript on each run.
Parallelism for free. Independent work is a Promise.all; you didn't have to teach the model to be concurrent.
A testable core. Because the middle phase has no LLM in it, executeStream is an ordinary async function you can unit-test with a stub registry — no API key, no flakiness.
Predictable runs (the boring-but-nice one). Always two LLM calls, whether the request touches one agent or five — so latency is something you can put a number on, and the bill is lower as a side effect.

Sample queries → how they route

Query	Mode	Agents
`what do you have?`	single	`catalog_agent__list_categories`
`what's the price, rating and availability of the iPhone 15?`	parallel	`pricing` + `reviews` + `inventory` (at once)
`find a laptop under $1000, make sure it's in stock, then order it`	sequential	`search` → `check stock` → `order`

Same agents, same data — the router decides the shape of the run.

When the loop still wins

This isn't "orchestrator good, loop bad." The agentic loop is the right tool when the
task is genuinely exploratory: you don't know the steps ahead of time, the toolset is
open-ended, or the agent needs to re-plan mid-flight based on what it discovers. The
orchestrator trades that adaptability for predictability — and it assumes you can
enumerate your agents up front. Note too that the router here is itself a single LLM
call, so a truly novel multi-hop plan it has never seen is out of scope by design.

The article's framing is the one to keep: loop for exploration, orchestrator for
production. If you already know your agents and you need bounded latency, parallel
execution, and debuggable runs — ask the model once, execute, synthesize. Two calls,
done.

Complete sample code

Please feel free to reach out on twitter @roamingcode

Agent Skills in Microsoft Agent Framework

Johnny Z — Thu, 04 Jun 2026 05:52:25 +0000

The Microsoft Agent Framework recently added skills support, built around progressive disclosure (still in beta). The Give Your Agents Domain Expertise with Agent Skills devblog is an excellent introduction, so I won't re-tread the basics here.

If you've used skills in a coding agent, the idea is familiar: a skill is just a folder — a SKILL.md manifest plus reference documents and scripts — that the agent discovers and pulls in only when it needs to. Instead of stuffing every capability into the system prompt, the agent sees a lightweight catalog of skill names and descriptions, and loads the full content on demand. That's the whole point of progressive disclosure: an agent's context is a budget, and skills are a way to spend it lazily.

In practice that part just works: when a request matches a skill, the model is nudged to call the built-in load_skill tool, and the framework returns the skill's full content for the model to use. Triggering and loading behave exactly as advertised.

But spending the budget is only half the story. Once a skill's content is loaded, where does it actually live — and is it ever dropped? The docs are silent on this, and it's the question the rest of this post digs into.

The short answer: within a session, it isn't dropped at all. Starting a new session drops everything, of course — that much is obvious. The part worth knowing is what happens inside a single session: once loaded, a skill's full content stays in the conversation for the entire life of that session. There's no budget, no sliding window, no eviction. The rest of the post shows how I confirmed this, and why it matters.

Watching a skill get triggered

The sample is a tiny console app running entirely against a local Ollama model — no cloud keys, and every HTTP call is traced so I can see exactly what goes over the wire (complete sample code). There's a single skill on disk:

skills/unit-converter/
├── SKILL.md                        # name + description + usage steps
└── references/conversion-table.md  # the actual conversion factors

Wiring it into the agent is one line — AgentSkillsProvider is just an AIContextProvider:

var agentOptions = new ChatClientAgentOptions
{
    Name = "UnitConverterAgent",
    ChatOptions = new ChatOptions
    {
        Instructions = "You are a helpful assistant that can convert units. ...",
        Tools = [AIFunctionFactory.Create(Tool.Convert)]
    },
    AIContextProviders = [skillsProvider],   // <-- skills plug in here
};

On every request, that provider does two things. First, it injects a catalog of skills — names and descriptions only — into the system prompt. That's the entire "advertisement" the model sees up front; no factors, no usage steps:

<available_skills>
  <skill>
    <name>unit-converter</name>
    <description>Convert between common units using a multiplication factor.
      Use when asked to convert miles, kilometers, pounds, or kilograms.</description>
  </skill>
</available_skills>

Second, it registers three tools the model can call to pull in more on demand: load_skill, read_skill_resource, and run_skill_script.

Intercepting the tool calls

To watch the triggering happen, I don't need to read the trace — the framework lets you intercept every tool call with function-invocation middleware. AIAgentBuilder.Use(...) wraps the agent and hands you each call before it runs:

var agent = chatClient.AsAIAgent(agentOptions);

return new AIAgentBuilder(agent)
    .Use(async (_, ctx, next, ct) =>
    {
        if (ctx.Function.Name is "load_skill" or "read_skill_resource" or "run_skill_script")
        {
            Console.WriteLine($"Skill triggered: {ctx.Function.Name}({ctx.Arguments.GetValueOrDefault("skillName")})");
        }
        return await next(ctx, ct);
    })
    .Build();

The three skill tools are supplied by the provider, but they flow through the same function-invoking pipeline as my own Convert tool — so this one interceptor sees them all, and I just filter by name.

Now I ask a question that needs the skill:

How many kilometers is a marathon (26.2 miles)? And how many pounds is 75 kilograms?

and the triggering shows up live:

Skill triggered: load_skill(unit-converter)
Skill triggered: read_skill_resource(unit-converter)
Agent: A marathon of 26.2 miles is approximately 42.16 kilometers, and 75 kilograms is approximately 165.35 pounds.

So the disclosure unfolds in stages, exactly as designed:

The model sees only the catalog, decides unit-converter is relevant, and calls load_skill("unit-converter").
The framework returns the full SKILL.md as the tool result. Its usage steps tell the model to consult references/conversion-table.md.
The model calls read_skill_resource to pull that reference, then runs the actual conversion.

Each step pulls in a little more context, only when it is needed. This is progressive disclosure working as promised — the part the docs cover well. The interesting question is what happens to all that loaded content next.

Loaded once, kept for the whole session

So where does that loaded content go? Straight into the session history — and it stays. After the run I read the history back and tagged the skill messages:

===== Session history after run: 8 messages =====
  [ 1] [SKILL] assistant call -> load_skill
  [ 2] [SKILL] tool      tool result          ← the full SKILL.md body
  [ 3] [SKILL] assistant call -> read_skill_resource
  [ 4] [SKILL] tool      tool result          ← the reference content
  ...

The load_skill body and the reference are sitting right there as ordinary tool messages, and nothing removes them. That's the part to take away: within a session, loaded skill content lives forever. It's not the skills provider holding on to it — load_skill just returns a normal tool message, and a tool message is history like any other. So every subsequent turn on that session re-sends the whole thing. No budget, no sliding window, no eviction; the only thing that clears it is starting a new session.

Compact automatically — but only when skills are in play

Skills can be large, so on a long-lived session this adds up fast: you can't keep carrying every loaded skill forward. The fix is compaction, and the framework ships it out of the box. CompactionProvider is just another AIContextProvider you add alongside the skills provider, and SummarizationCompactionStrategy summarizes older history instead of dropping it — and it groups messages so a load_skill call is never split from its result.

I don't want to compact on every turn, though — only when there's actually skill content to reclaim. A CompactionTrigger is just a predicate over the message groups, so I gate it on whether a skill tool was called:

CompactionTrigger skillsTriggered = index =>
    index.Groups.SelectMany(g => g.Messages).Any(History.MentionsSkillTool);

AIContextProviders =
[
    skillsProvider,
    new CompactionProvider(
        new SummarizationCompactionStrategy(chatClient, skillsTriggered, minimumPreservedGroups: 2)),
];

Compaction runs before each turn. On a fresh first turn there's nothing to compact; once a skill has been loaded, the next turn triggers a one-off summarization call and the bulky SKILL.md body drops out of what's sent to the model — replaced by a short summary, while the conversation keeps going. Spend the budget lazily on the way in, reclaim it automatically on the way out.

Complete sample code

Please feel free to reach out on twitter @roamingcode

Building Autonomous Agent Coding Harness

Johnny Z — Thu, 16 Apr 2026 05:32:27 +0000

This is a personal experiment in autonomous coding source, built with the Claude Agent SDK. It takes a spec (markdown or text) and builds a full-stack application using three specialized agents, as described in this Anthropic post.

Requirement

Build a full-stack application (Next.js + .NET) weather chat.
I have manually created an "ideal target solution" reference implementation

Why This Project Is Hard

While building a weather chat app sounds straightforward, this implementation intentionally introduces architectural challenges that test whether coding agents can work with unfamiliar, cutting-edge libraries — or whether they fall back to well-known patterns:

Backend (Agent Construction & Local LLM Integration): The .NET API utilizes the Microsoft Agent Framework and exposes the agent via the relatively new AG-UI protocol. A key challenge lies in the underlying Microsoft.Extensions.AI pipeline: coding agents must understand how to connect a local Ollama server, correctly register it as an IChatClient, configure the agent with tools, and seamlessly wire everything into the .NET dependency injection container.
Schema-Driven UI Rendering (The Catalyst): To achieve the visual "Generative UI" component, the application utilizes @vercel-labs/json-render. This introduces a profound layer of abstraction. Rather than passing generic data to props, coding agents must grasp an indirect, specification-based rendering model. The frontend strictly expects tool outputs to be converted into a structured UI spec tree (e.g., Container -> WeatherCard -> ForecastGrid), mapped dynamically to concrete React components via a component catalog.
Full-Stack Tool Coupling & Protocol Bridging: Driven by the strict schema requirements of the UI, tool execution becomes a highly coupled, full-stack concern. The backend emits raw AG-UI Server-Sent Events (SSE), which the Next.js server must manually parse and map to the Vercel AI SDK 'UIMessage' types. Crucially, because the AG-UI protocol exposes tool execution results directly to the client stream as JSON payloads, coding agents must explicitly co-design the C# backend tool call result types to satisfy the frontend's schema-driven expectations.
Custom Generative UI Transport & State: Because these tightly-coupled tool outputs stream directly to the client, standard AI SDK hooks aren't enough out-of-the-box. The frontend requires configuring useChat with a custom DefaultChatTransport. Agents must design the UI interface such that the incoming JSON payloads seamlessly inject complex parts into the ChatMessage state. They must deeply understand multi-part message trees—accurately inspecting part.type and part.state === "output-available" to interrupt typical text rendering and conditionally mount the generated JSON UI spec.

First Round Result

Feature requirement file only — intentionally instructed to use simulated/mock weather data to reduce complexity
Result output

Gap Analysis

Dimension	Reference (Target)	Generated (Round 1)
.NET version	.NET 10	.NET 8
Backend framework	Microsoft Agent Framework (`Microsoft.Agents.AI`)	Plain ASP.NET Core MVC
Streaming protocol	AG-UI via SSE	Standard JSON REST
LLM integration	Ollama via `OllamaSharp` + `IChatClient` DI	None — rule-based string matching
Frontend AI SDK	`@ai-sdk/react` `useChat` + `DefaultChatTransport`	Raw `fetch()` + `useState`
UI rendering	`@json-render` (schema-driven spec tree)	Direct hardcoded React components

Every architectural constraint specified in the feature requirements — AG-UI, Microsoft Agent Framework, Ollama, json-render — was ignored. The agents built a conventional CRUD-style app instead.

What it got right: The app is functional end-to-end with good visual design (glassmorphic cards, dynamic backgrounds, custom SVG icons), responsive layout, and clean code structure. About 7 of 16 features work partially or fully.

What it missed: No SSE streaming, no LLM tool calling (just regex location extraction), no schema-driven UI rendering, no AI SDK hooks. The ai npm package was even installed but never imported.

Takeaway: Given only a feature spec, coding agents gravitate toward familiar patterns from training data. The novel integration requirements (AG-UI, json-render, Agent Framework) — which are the architecturally interesting parts — were completely bypassed in favor of well-known alternatives.

Second Round Result

Enhanced feature requirements with explicit architectural instructions — specifying MapAGUI, ChatClientAgent, defineCatalog/defineRegistry, useChat with transport, etc.
After round 1's results, custom skills created for json-render and Microsoft Agent Framework, and installed official Vercel Next.js and AI SDK skills to give agents better guidance
Result output

Gap Analysis

Dimension	Reference (Target)	Generated (Round 2)
.NET version	.NET 10	.NET 10
Backend framework	Microsoft Agent Framework (`MapAGUI`)	Packages installed but not used — plain REST API
Streaming protocol	AG-UI via SSE	Standard JSON REST
LLM integration	Ollama via `OllamaSharp` + `IChatClient`	Package installed, only checks if Ollama is running — never calls it
Frontend AI SDK	`@ai-sdk/react` `useChat` + `DefaultChatTransport`	Package installed but uses raw `fetch()`
UI rendering	`@json-render/react` (real package)	Fake shim — hand-written `json-render-compat.ts` reimplements `defineCatalog`/`defineRegistry` as simple wrappers

Progress from round 1: The agents now acknowledge the required technologies — correct .NET version, right NuGet packages installed, catalog/registry file structure present. The feature requirements with explicit API names clearly helped.

What's still wrong: The acknowledgment is superficial. The agents installed Microsoft.Agents.AI and OllamaSharp but never called MapAGUI() or created a ChatClientAgent. Instead of installing @json-render/react, they wrote a 40-line compatibility shim that mimics the API surface but does nothing — the <Renderer> component from json-render is never used. The backend is still hardcoded pattern matching over 6 cities with no LLM.

Takeaway: Adding skills and explicit architectural instructions moved agents from "completely ignore" to "install the packages and create the right file names." But the actual wiring — the hard part — was still substituted with familiar patterns. The agents created a cargo cult of the architecture: the right shape, with none of the substance.

Conclusion

The progression across rounds tells a clear story. Round 1 completely ignored the architectural requirements. Round 2 acknowledged them superficially — installing the right packages, creating files with the right names — but never actually wired anything up. The hand-written json-render shim and the unused NuGet packages are the most telling evidence.

None of this is entirely surprising. These are integration challenges that even experienced engineers would need to research and iterate on — connecting unfamiliar frameworks across a full-stack boundary is genuinely hard. The deeper issue is that even with upfront planning enforced (preventing agents from "one-shotting" the app), intrinsic technical challenges in the implementation details cause coding agents to silently fall back to what they know.

What these experiments suggest is that producing quality implementations with coding agents requires highly detailed, step-by-step plans — not just feature specs or architectural diagrams, but concrete wiring instructions that leave little room for substitution. Simply adding skills as supplementary context does not bridge the gap when the core integration patterns are unfamiliar to the model.

Next Steps

The experiments above point to a clear gap: the planning agent produces plans that are too high-level for the coding agent to follow faithfully when unfamiliar technologies are involved. The next iteration of the harness will focus on two changes:

Interactive upfront planning: Rather than generating a plan in one shot and handing it off, the planning agent will produce a detailed, step-by-step implementation plan that can be reviewed and refined before any code is written. Each step should be concrete enough that the coding agent knows exactly which API to call, which package to import, and how to wire it — leaving no room for silent substitution.
Step-by-step execution with verification: Instead of letting the coding agent execute the entire plan autonomously, the harness will execute one step at a time, verifying the output of each step (builds, tests, correct imports) before proceeding to the next. This catches drift early — if the agent installs a package but doesn't use it, or writes a shim instead of using the real library, the verification step surfaces the problem immediately rather than letting it compound.

This follows the approach outlined in the autonomous coding quickstart, adapted to the multi-agent harness architecture described in this project.**

Please feel free to reach out on twitter @roamingcode

Building End-to-End Local AI Agents with Microsoft Agent Framework and AG-UI

Johnny Z — Sun, 23 Nov 2025 06:06:37 +0000

The Microsoft Agent Framework significantly elevates AI agent orchestration. A standout feature is its implementation of the Agent–User Interaction (AG-UI) Protocol, which standardizes how AI agents connect to user-facing applications.

Below is a quick-start guide to connecting these components into a fully end-to-end solution using local Ollama models.

1. Service Configuration

First, configure the dependency injection container. The ChatClientAgent is based on the IChatClient abstraction from Microsoft.Extensions.AI.

Note: We register the agent as a Keyed Service to allow for multiple distinct agents within the same host.

var builder = WebApplication.CreateBuilder(args);

// 1. Register the Ollama Client
builder.Services.AddTransient<IChatClient>(provider =>
{
    var factory = provider.GetRequiredService<IHttpClientFactory>();
    // Ensure you use a wrapper that handles standard formatting 
    // (see Implementation Note below)
    return new OllamaApiClient(factory.CreateClient("OllamaClient"), "phi4");
});

// 2. Register the AI Agent
builder.Services.AddKeyedTransient<ChatClientAgent>(
    "local-ollama-agent",
    (provider, key) =>
    {
        var options = new ChatClientAgentOptions
        {
            Id = key.ToString(),
            Name = "Local Assistant",
            Description = "An AI agent running on local Ollama.",
            ChatOptions = new ChatOptions { Temperature = 0 }
        };

        return provider.GetRequiredService<IChatClient>()
            .CreateAIAgent(options, provider.GetRequiredService<ILoggerFactory>());
    });

2. Expose the AG-UI Endpoint

Once configured, map the agent instance directly to an HTTP route. This exposes the agent via the standard AG-UI protocol.

var agent = app.Services.GetRequiredKeyedService<ChatClientAgent>("local-ollama-agent");

// Expose the agent on the root path
app.MapAGUI("/", agent);

3. Connect a Client

To consume the agent programmatically, the framework provides the AGUIChatClient. This allows .NET applications to communicate with your agent over HTTP seamlessly.

var chatClient = new AGUIChatClient(
    httpClient,
    "http://localhost:5000",
    provider.GetRequiredService<ILoggerFactory>());

var clientAgent = chatClient.CreateAIAgent(
    name: "local-client",
    description: "AG-UI Client Agent");

Frontend Integration: The AG-UI Protocol also offers ready-made libraries for TypeScript and Python, allowing you to spin up frontend interfaces in minutes.

Implementation Note: Protocol Compliance

The AG-UI protocol mandates that all messages contain a messageId property. Native Ollama responses do not currently provide this. To ensure compatibility, I created a simple wrapper class to inject the required IDs into the Ollama response stream.

Complete sample code

Please feel free to reach out on twitter @roamingcode

Model context protocol server prompts with microsoft semantic kernel

Johnny Z — Wed, 23 Apr 2025 22:34:37 +0000

This post focuses on implementing server prompts, a key feature of the Model Context Protocol (MCP) designed for reusable template definitions. We will explore how to implement these server prompts using both the MCP C# SDK and Semantic Kernel for enhanced templating capabilities. Further details on MCP server prompts can be found in the MCP documentation.

MCP Server Prompts via MCP C# SDK Attributes

MCP C# SDK allows for defining prompts through attributes. This method offers a direct implementation without requiring Semantic Kernel for basic string manipulation as the following example shows.


[McpServerPromptType]
internal sealed class StringFormatPrompt
{
    private readonly string _format;
    private readonly ILogger _logger;

    public StringFormatPrompt(ILogger<StringFormatPrompt> logger)
    {
        _logger = logger;
        _format = "Tell a joke about {0}.";
    }

    [McpServerPrompt(Name = "Joke"), Description("Tell a joke about a topic.")]
    public IReadOnlyCollection<ChatMessage> Format([Description("The topic of the joke.")] string topic)
    {
        _logger.LogInformation("Generating prompt with topic: {Topic}", topic);
        var content = string.Format(CultureInfo.InvariantCulture, _format, topic);
        return [
            new (ChatRole.User, content)
        ];
    }
 }    

 // Register for the prompt
 var serverBuilder = builder.Services.AddMcpServer()
    .WithHttpTransport()
    .WithPrompts<StringFormatPrompt>();

Semantic Kernel Templates as MCP Server Prompts

Semantic Kernel provides templating capabilities through JSON/YAML, Handlebars, and Liquid formats, along with plugin support. These templates can be exposed as MCP prompts using the MCP C# SDK.

Prompt Templates in Semantic Kernel
Semantic Kernel templates are configured with PromptTemplateConfig, created by IPromptTemplateFactory implementations, and can be easily rendered with input variables for dynamic prompt generation.

var templateConfig = new PromptTemplateConfig("Tell a joke about {{$topic}}.");
IPromptTemplateFactory templateFactory = new KernelPromptTemplateFactory();
var template = templateFactory.Create(templateConfig);
var text = await template.RenderAsync(kernel,
    new KernelArguments
    {
        { "topic", "cats" }
    });

Expose prompts as McpServerPrompt
McpServerPrompt is the abstract base class that represents an MCP prompt we can implement.


internal sealed class TemplateServerPrompt : McpServerPrompt
{
    public TemplateServerPrompt(PromptTemplateConfig promptTemplateConfig, IPromptTemplateFactory? promptTemplateFactory, ILoggerFactory? loggerFactory)
    {
        promptTemplateFactory ??= new KernelPromptTemplateFactory(loggerFactory ?? NullLoggerFactory.Instance);
        _template = promptTemplateFactory.Create(promptTemplateConfig);

        // MCP prompt
        ProtocolPrompt = new()
        {
            Name = promptTemplateConfig.Name ?? _template.GetType().Name,
            Description = promptTemplateConfig.Description,
            Arguments = promptTemplateConfig.InputVariables
                .Select(inputVariable =>
                    new PromptArgument
                    {
                        Name = inputVariable.Name,
                        Description = inputVariable.Description,
                        Required = inputVariable.IsRequired
                    })
                .ToList(),
        };
    }

    public override async ValueTask<GetPromptResult> GetAsync(RequestContext<GetPromptRequestParams> request, CancellationToken cancellationToken = default)
    {
        KernelArguments? arguments = default;

        var dictionary = request.Params?.Arguments;
        if (dictionary is not null)
        {
            arguments = new ();
            foreach (var (key, value) in dictionary)
            {
                arguments[key] = value;
            }
        }

        var kernel = request.Services?.GetService<Kernel>() ?? new Kernel();
        var text = await _template.RenderAsync(kernel, arguments, cancellationToken);

        return 
            new GetPromptResult
            {
                Messages = [
                    new PromptMessage
                    {
                        Content = new Content { Text = text }
                    } 
            ]
        };
    }
}

// Register for the prompt with DI and MCP server
// builder.Services.AddSingleton<TemplateAIFunction>(...)
var serverBuilder = builder.Services.AddMcpServer()
    .WithHttpTransport();
serverBuilder.Services.AddSingleton<McpServerPrompt>(provider => 
    provider.GetRequiredService<TemplateServerPrompt>());

Exposing AIFunction as McpServerPrompt
The McpServerPrompt class provides a Create method to expose a Microsoft.Extensions.AI.AIFunction as an MCP server prompt.


internal sealed class TemplateAIFunction : AIFunction 
{
    //...

    protected override async ValueTask<object?> InvokeCoreAsync(AIFunctionArguments arguments, CancellationToken cancellationToken)
    {
        KernelArguments kernelArguments = [];

        foreach (var argument in arguments)
        {
            kernelArguments[argument.Key] = argument.Value;
        }

        var kernel = arguments.Services?.GetService<Kernel>() ?? new Kernel();
        var text = await _template.RenderAsync(kernel, kernelArguments, cancellationToken);
        return text;
    }
}

// Register for the prompt with DI and MCP server
// builder.Services.AddSingleton<TemplateAIFunction>(...)
var serverBuilder = builder.Services.AddMcpServer()
    .WithHttpTransport();
serverBuilder.Services.AddSingleton<McpServerPrompt>(provider => 
    McpServerPrompt.Create(provider.GetRequiredService<TemplateServerPrompt>()));

Complete sample code

Please feel free to reach out on twitter @roamingcode

AWS Bedrock anthropic claude tool call integration with microsoft semantic kernel

Johnny Z — Mon, 14 Apr 2025 23:34:57 +0000

As of April 2025, the official Microsoft Semantic Kernel connector for Amazon Microsoft.SemanticKernel.Connectors.Amazon does not natively support tool/function calls. Apparently, Semantic Kernel is shifting its approach towards an LLM abstraction layer based on Microsoft.Extensions.AI, aiming for a more unified and extensible architecture. Currently, only OpenAI and Ollama implementations are available within this new abstraction. It is anticipated that an implementation for AWS Bedrock Anthropic Claude based on Microsoft.Extensions.AI will become available in the future. Therefore, in the interim, I implemented a custom solution. The approach leverages the existing IChatClient interface, making the implementation relatively straightforward. Since function calls are supported by this interface, the solution involves implementing it on top of the AWS Bedrock Runtime SDK.

Implement IChatClient with AWS Bedrock Runtime

The IChatClient interface essentially contains two methods: one for standard chat responses and another for streamed responses. The implementation involves mapping these two methods to the IAmazonBedrockRuntime.ConverseAsync and ConverseStreamAsync methods, as demonstrated in the full implementation of the AnthropicChatClient here.

Setting up Function Calls with Semantic Kernel

Here's how to set up function calls with Semantic Kernel using our custom AnthropicChatClient:

Set up kernel and functions
This step configures the chat completion service with function invocation capabilities and registers it with the Semantic Kernel.

// Set up chat completion service
IChatClient chatClient = ...;
IChatCompletionService chatService =
    chatClient
        .AsBuilder()
        .UseFunctionInvocation() // Enables function call functionality
        .Build()
        .AsChatCompletionService();

// Register the Bedrock chat completion service
var builder = Kernel.CreateBuilder();
builder.Services.AddKeyedSingleton("bedrock", chatService);
// Add plugins/functions
builder.Plugins.AddFromType<MenuPlugin>();
// ...
var kernel = builder.Build();

Use automatically tool calls
This code demonstrates how to use the configured chat completion service to automatically invoke functions based on the user's input.

// Set up bedrock
var runtimeClient = new AmazonBedrockRuntimeClient(RegionEndpoint.APSoutheast2);
IChatClient client = new AnthropicChatClient(runtimeClient, "anthropic.claude-3-5-sonnet-20241022-v2:0");

// Configure the chat client as shown in step 1.
IChatCompletionService chatCompletionService = client
    .AsBuilder()
    .UseFunctionInvocation()
    .Build()
    .AsChatCompletionService();

var chatHistory = new ChatHistory();
chatHistory.AddUserMessage("What is the special soup and its price?");

var promptExecutionSettings = new PromptExecutionSettings
{
    FunctionChoiceBehavior = FunctionChoiceBehavior.Auto(options: new()
    {
        RetainArgumentTypes = true
    }),
    ExtensionData = new Dictionary<string, object>
    {
        { "temperature", 0 }, 
        { "max_tokens_to_sample", 1024 } // Required parameter for Anthropic models
    }
};

var messageContent = await chatCompletionService
    .GetChatMessageContentAsync(chatHistory,  promptExecutionSettings, kernel);
Console.WriteLine(messageContent.Content);

// Expected output : Today's special soup is Clam Chowder and it costs $9.99.

Complete sample code

Please feel free to reach out on twitter @roamingcode

Model context protocol integration with microsoft semantic kernel

Johnny Z — Sat, 05 Apr 2025 05:00:09 +0000

The Model Context Protocol (MCP) aims to standardize connections between AI systems and data sources. This post demonstrates integrating mcp-playwright with Semantic Kernel and phi4-mini (via Ollama) for browser automation.

Setting up the Playwright MCP Server

Install the MCP Playwright package:
```
npm install @playwright/mcp
```

Add a script to package.json:

{
  "scripts": {
    "server": "npx @playwright/mcp --port 8931"
  }
}

Start the server:
```
npm run server
```
This will launch the Playwright MCP server, displaying the port and endpoints in the console.

Running phi4-mini with Ollama for Function Calling

For reliable function calling, phi4-mini:latest (as of March 27, 2025) requires a custom Modelfile.

Create a custom Modelfile: (See example)

Create the model in Ollama:

ollama create phi4-mini:latest -f <path/to/Modelfile>

Implementing the MCP Client in Semantic Kernel

Install the MCP client NuGet package:

dotnet add package ModelContextProtocol --prerelease

Connect to the Playwright MCP server and retrieve tools:

var mcpClient = await McpClientFactory.CreateAsync(
    new McpServerConfig
    {
        Id = "playwright",
        Name = "Playwright",
        TransportType = TransportTypes.Sse,
        Location = "http://localhost:8931"
    });
var tools = await mcpClient.ListToolsAsync();

Configure Semantic Kernel with the MCP tools:

var kernelBuilder = Kernel.CreateBuilder();
kernelBuilder.AddOllamaChatCompletion(modelId: "phi4-mini");
kernelBuilder.Plugins.AddFromFunctions(
    pluginName: "playwright",
    functions: tools.Select(x => x.AsKernelFunction()));
var kernel = kernelBuilder.Build();

var executionSettings = new PromptExecutionSettings
{
    FunctionChoiceBehavior = FunctionChoiceBehavior.Auto(
        options: new()
        {
            RetainArgumentTypes = true
        }),
    ExtensionData = new Dictionary<string, object>
    {
        { "temperature", 0 }
    }
};

var result = await kernel.InvokePromptAsync(
    "open browser and navigate to https://www.google.com",
    new KernelArguments(executionSettings));

This code snippet connects to the MCP server, retrieves available tools, and integrates them into Semantic Kernel as functions. The prompt instructs the model to open a browser and navigate to Google, demonstrating the integration.

Complete sample code

Please feel free to reach out on twitter @roamingcode

Azure OpenAI Error Handling in Semantic Kernel

Johnny Z — Wed, 08 Jan 2025 06:27:40 +0000

In real-world systems, it's crucial to handle HTTP errors effectively, especially when interacting with Large Language Models (LLMs) like Azure OpenAI. Rate limit exceeded errors (tokens per minute or requests per minute) always happen at some point, resulting in 429 errors. This blog post explores different approaches to HTTP error handling with semantic kernel and Azure OpenAI.

Default

var builder = Kernel.CreateBuilder();
builder.AddAzureOpenAIChatCompletion(
    deploymentName: "gpt-4o-2024-08-06",
    endpoint: "https://resource-name.openai.azure.com",
    apiKey: "api-key"); // Or DefaultAzureCredential

The default setup for Semantic Kernel with Azure OpenAI by AddAzureOpenAIChatCompletion. This approach offers a built-in retry policy that automatically retries requests up to three times with exponential backoff. Additionally, it can detect specific HTTP headers like 'retry-after' to implement more tailored retries.

HttpClient

var factory = provider.GetRequiredService<IHttpClientFactory>();
var httpClient = factory.CreateClient("auzre:gpt-4o");

var builder = Kernel.CreateBuilder();
builder.AddAzureOpenAIChatCompletion(
    deploymentName: "gpt-4o-2024-08-06",
    endpoint: "https://resource-name.openai.azure.com",
    apiKey: "api-key",  // Or DefaultAzureCredential
    httpClient: httpClient);

By configuring an HttpClient instance, you can gain more control over HTTP error handling. Semantic Kernel disables the default retry policy when HttpClient is provided. This allows you to implement custom retry logic using the Microsoft.Extensions.Http.Resilience library. With this approach, you can define the number of retry attempts, timeouts, and how to handle specific error codes like 429 (rate limit exceeded). It is strongly recommended to add retry policies to handle transient errors with HttpClient

services.AddHttpClient("auzre:gpt-4o")
    // 'standard' automatically handle transient errors inlcuding '429'
    .AddStandardResilienceHandler() 
    .Configure(options =>
        {
            // Options for attempts and time out etc
            options.Retry.MaxRetryAttempts = 5;
        });

An important benefit of using HttpClient is that it's not limited to Azure OpenAI. This approach works with other AI connectors like OpenAI as well.

AzureOpenAIClient

var azureOpenAIClient = new AzureOpenAIClient(
    endpoint: new Uri("https://resource-name.openai.azure.com"),
    new ApiKeyCredential("api-key"), // Or DefaultAzureCredential
    new AzureOpenAIClientOptions());

var builder = Kernel.CreateBuilder();
builder.AddAzureOpenAIChatCompletion(
    deploymentName: "gpt-4o-2024-08-06",
    azureOpenAIClient);

This approach offers similar functionality to the default setup with the built-in retry policy. In addition, AzureOpenAIClient provides more flexibility from AzureOpenAIClientOptions.

var clientOptions = new AzureOpenAIClientOptions
    {
        Transport = new HttpClientPipelineTransport(httpClient),
        RetryPolicy = new ClientRetryPolicy(maxRetries: 5)
    };

This configuration enables you to combine HTTP retry policies from HttpClient with custom pipeline policy-based retries from the Azure OpenAI SDK.

Recommendations

The default setup might not be suitable for scenarios where you frequently encounter token limit issues.
If you already have AzureOpenAIClient registered and require maximum control, this approach allows you to leverage both HTTP client policies and Azure OpenAI pipeline policy-based retries.

Please feel free to reach out on twitter @roamingcode

Working with multiple language models in Semantic Kernel

Johnny Z — Sat, 28 Dec 2024 07:33:07 +0000

It is common to work with multiple large language models (LLMs) simultaneously, especially when running evaluations or tests. Semantic Kernel supports registering multiple text generation and embedding services using serviceId and modelId.

Register 'serviceId' and 'modelId'

Suppose we have the following setup

 builder.AddAzureOpenAIChatCompletion(
    deploymentName: "gpt-4-1106-Preview",
    endpoint: "https://resource-name.openai.azure.com",
    apiKey: "api-key",
    modelId: "gpt-4",
    serviceId: "azure:gpt-4");

builder.AddAzureOpenAIChatCompletion(
    deploymentName: "gpt-4o-2024-08-06",
    endpoint: "https://resource-name.openai.azure.com",
    apiKey: "api-key",
    modelId: "gpt-4o",
    serviceId: "azure:gpt-4o");

 builder.AddOllamaChatCompletion(
    modelId: "phi3",
    endpoint: new Uri("http://localhost:11434"),
    serviceId: "local:phi3");

When execute kernel functions or prompts, 'serviceId' and 'modelId' can be passed into 'PromptExecutionSettings' like the following shows

var promptExecutionSettings  = new PromptExecutionSettings
{
    ServiceId = "local:phi3"
};
// 
// or just modelId 
//    new PromptExecutionSettings
//     {
//         ModelId = "gpt-4o"
//     }
//
var result = await kernel.InvokePromptAsync(
    """
    Answer with the given fact:
    Sky is blue and violets are purple

    input:
    What color is sky?
    """, 
    new KernelArguments(promptExecutionSettings));

When registering chat completion services, if serviceId is provided, Semantic Kernel also registers chat completion services as keyed. With the above registration, the following would work:

var chatCompletionService = kernel.Services
    .GetRequiredKeyedService<IChatCompletionService>("azure:gpt-4o");

IAIService and IAIServiceSelector

All AI-related services, including chat completion and text embedding, implement the IAIService interface, which defines a metadata property. This metadata contains attributes specific to the service implementation. For instance, the AzureOpenAIChatCompletionService includes the deployment name and model name. The default IAIServiceSelector resolves services by serviceId first, and then by modelId to match the IAIService metadata. To gain full control over AI service selection, you can implement a custom IAIServiceSelector and register it as a service with Semantic Kernel.

Sample code here

Please feel free to reach out on twitter @roamingcode

OpenAI chat completion with Json output format

Johnny Z — Fri, 20 Dec 2024 01:31:15 +0000

I can't recall how many times I've tried to convince an LLM to return JSON so that I could perform API calls based on natural language inputs from users. Recently, I discovered that this functionality is natively supported by the Semantic Kernel and Microsoft AI Extension Library. It is officially documented by the OpenAI API here. Note that this feature is only available in the latest large language models from GPT-4o/o1 and later. If you are using Azure OpenAI, ensure you have the supported versions when deploying models.

Chat completion

Semantic Kernel supports JSON output formatting in the ResponseFormat property from PromptExecutionSettings, as shown in the code below:

// Configure Azure/OpenAI and semantic kernel first.

var chatCompletionService = kernel.Services.GetRequiredService<IChatCompletionService>();

var history = new ChatHistory();
history.AddSystemMessage("Extract the event information.");
history.AddUserMessage("Alice and Bob are going to a science fair on Friday.");

var jsonSerializerOptions = new JsonSerializerOptions(JsonSerializerOptions.Default)
{
    PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
    UnmappedMemberHandling = JsonUnmappedMemberHandling.Disallow,
};
var responseFormat = CalendarEvent.JsonResponseSchema(jsonSerializerOptions);

var response = await chatCompletionService.GetChatMessageContentAsync(
    history, 
    new AzureOpenAIPromptExecutionSettings
    {
        ResponseFormat = responseFormat // Json schema
    });
// Json result    
var result = JsonSerializer.Deserialize<CalendarEvent>(response.ToString(), jsonSerializerOptions);

Generate Json schema from types

JSON schema can be automatically generated using Microsoft.Extensions.AI.AIJsonUtilities, which is referenced from Semantic Kernel.

public sealed class CalendarEvent
{
    [Description("Name of the event")]
    public required string Name { get; init; }

    [Description("Day of the event")]
    public required string Day { get; init; }

    [Description("List of participants of the event")]
    public required string[] Participants { get; init; }

    public static ChatResponseFormat JsonResponseSchema(JsonSerializerOptions? jsonSerializerOptions = default)
    {
        var inferenceOptions = new AIJsonSchemaCreateOptions
        {
            IncludeSchemaKeyword = false,
            DisallowAdditionalProperties = true,
        };

        // Json schema from types with descriptions on properties
        var jsonElement = AIJsonUtilities.CreateJsonSchema(
            typeof(CalendarEvent),
            description: "Calendar event result",
            serializerOptions: jsonSerializerOptions,
            inferenceOptions: inferenceOptions);

        var kernelJsonSchema = KernelJsonSchema.Parse(jsonElement.GetRawText());
        var jsonSchemaData = BinaryData.FromObjectAsJson(kernelJsonSchema, jsonSerializerOptions);

        return ChatResponseFormat.CreateJsonSchemaFormat(
            nameof(CalendarEvent).ToLowerInvariant(),
            jsonSchemaData,
            jsonSchemaIsStrict: true);
    }
}

Sample code here

Please feel free to reach out on twitter @roamingcode

Lightweight AI Evaluation with SemanticKernel

Johnny Z — Tue, 17 Dec 2024 23:28:50 +0000

For quick and easy evaluation or comparison of AI responses in .NET applications, particularly tests. We can leverage autoevals excellent 'LLM-as-a-Judge' prompts with the help of Semantic Kernel.

Sample code

Note that you need to setup semantic kernel with chat completion first. It is also recommended to set 'Temperature' to 0.

var json = 
    """
    {
        "humor" : {
            "output" : "this maybe funny"
        }
    }
    """;
await foreach (var result in 
        kernel.Run(json, executionSettings: executionSettings))
{
    Console.WriteLine($"[{result.Key}]: result: {result.Value?.Item1}, score: {result.Value?.Item2}");
}

Source

While Microsoft.Extensions.AI.Evaluation is in the making, it currently involves a little too much 'ceremonies' for simple use cases.

Please feel free to reach out on twitter @roamingcode

DEV Community: Johnny Z

Agent with Vercel's Eve Framework

The Core Separation: Agent vs Channel

Channels: How Users Reach the Agent

The Default Eve Channel

How Sessions Are Durable by Default

Example: Exposing the Agent as AG-UI

The Developer Experience

The Gotcha: Custom Models Need modelContextWindowTokens

Key Takeaways

Beyond the Agentic Loop, in TypeScript: building a shopping agent with the Orchestrator pattern

The cast: a handful of shopping agents

The problem: the LLM as a while loop

The pattern: decide once, execute deterministically

1. The registry: agents are just functions

2. Route: the one decision-making LLM call

3. Execute: the heart of the pattern (no LLM)

4. plan_execution: a signal, not an agent

5. Synthesize: the only creative call

What this buys you

Sample queries → how they route

When the loop still wins

Agent Skills in Microsoft Agent Framework

Watching a skill get triggered

Intercepting the tool calls

Loaded once, kept for the whole session

Compact automatically — but only when skills are in play

Building Autonomous Agent Coding Harness

Requirement

Why This Project Is Hard

First Round Result

Gap Analysis

Second Round Result

Gap Analysis

Conclusion

Next Steps

Building End-to-End Local AI Agents with Microsoft Agent Framework and AG-UI

1. Service Configuration

2. Expose the AG-UI Endpoint

3. Connect a Client

Implementation Note: Protocol Compliance

Model context protocol server prompts with microsoft semantic kernel

MCP Server Prompts via MCP C# SDK Attributes

Semantic Kernel Templates as MCP Server Prompts

AWS Bedrock anthropic claude tool call integration with microsoft semantic kernel

Implement IChatClient with AWS Bedrock Runtime

Setting up Function Calls with Semantic Kernel

Model context protocol integration with microsoft semantic kernel

Setting up the Playwright MCP Server

Running phi4-mini with Ollama for Function Calling

Implementing the MCP Client in Semantic Kernel

Azure OpenAI Error Handling in Semantic Kernel

Default

HttpClient

AzureOpenAIClient

Recommendations

Working with multiple language models in Semantic Kernel

Register 'serviceId' and 'modelId'

IAIService and IAIServiceSelector

OpenAI chat completion with Json output format

Chat completion

Generate Json schema from types

Lightweight AI Evaluation with SemanticKernel

Sample code

The Gotcha: Custom Models Need `modelContextWindowTokens`

The problem: the LLM as a `while` loop

4. `plan_execution`: a signal, not an agent