Seenivasa Ramadurai

Posted on Apr 4

Agent Middleware in Microsoft Agent Framework 1.0

#ai #dotnet #agents #architecture

A familiar pipeline pattern applied to AI agents

Covers three middleware types, registration scopes, termination, result override, and when to use each

Not a New Idea

If you have used ASP.NET Core or Express.js, you already understand the core concept. Both frameworks let you register a chain of functions around every request. Each function receives a context and a next() delegate. Calling next() continues the chain. Not calling it short circuits it. That is the pipeline pattern a clean way to apply cross cutting concerns like logging, authentication, and error handling without touching any business logic.

Microsoft’s Agent Framework applies this exact pattern to AI agents. The next() delegate becomes call_next(), the context object holds the agent’s conversation instead of an HTTP request, and the pipeline wraps an AI reasoning turn instead of a web request. If you know app.Use() or app.use(), you already know the shape of what follows.

What is new, and worth understanding deeply, is that an agent turn is not a single request/response cycle. It is a multi step reasoning loop, and Agent Framework exposes three distinct interception points within it. The rest of this post covers all three types, how they differ, when to use each, and how they come together in a real SQL agent example.

Middleware

The Agent Framework supports three types of middleware, each intercepting a different layer of execution:

Agent middleware wraps agent runs, giving you access to inputs, outputs, and overall control flow.
Function middleware wraps individual tool calls, enabling input validation, result transformation, and execution control.
Chat middleware wraps the underlying requests sent to AI models, exposing raw messages, options, and responses.

All three types support both function based and class based implementations.

Chaining

When multiple middleware of the same type are registered, they execute as a chain each middleware calls call_next() to hand off control to the next one in line.

Rather than passing updated values into call_next() as arguments, middleware mutates the shared context object directly. This means any changes you make to the context before calling call_next() are automatically visible to downstream middleware, with no need to thread values through the call explicitly.

Execution Order

Agent level middleware always wraps run level middleware. Given agent middleware [A1, A2] and run middleware [R1, R2], the execution order is:

A1 → A2 → R1 → R2 → Agent → R2 → R1 → A2 → A1

Function and chat middleware follow the same wrapping principle, applied at the time of each tool call or chat request respectively.

Why we need it

The biggest value is not convenience; it is correctness and consistency.

Without middleware, teams usually end up in one or both of these patterns:

Pattern 1: policy hidden in prompts

Example instruction:

"Never run destructive SQL. Never send data to personal email."

This is useful guidance, but it is still model behavior, not a hard gate. As prompts get long, tools increase, and edge cases appear, this policy can become inconsistent. It is also hard to audit after the fact.

Pattern 2: policy duplicated in each tool

def run_sql(query: str) -> str:
    if "drop" in query.lower():
        return "blocked"
    ...

def export_data(target: str) -> str:
    if "gmail.com" in target.lower():
        return "blocked"
    ...

def quote_inventory_line(quantity: int) -> str:
    if quantity > 10000:
        return "blocked"
    ...

This looks safe, but it creates:

duplicated logic
inconsistent rules across tools
expensive updates when policy changes

Middleware fixes both

With middleware, concerns live at the right boundary:

run level checks in Agent middleware
per tool checks in Function middleware
model call telemetry/metadata in Chat middleware

Result:

cleaner tools
stronger guardrails
easier tests
better observability

1. Agent Middleware-outermost layer

Agent middleware is the outermost layer of the pipeline. It fires once per turn before any LLM call is made and after the final reply or response is produced making it the right place for concerns that span the entire turn: input validation, security screening, audit logging, and output transformation.

Implementation Styles & Chaining

Agent middleware supports both class based and function based implementations both are fully equivalent, and the choice comes down to whether you need instance state or prefer a lighter syntax.
When multiple middleware components are registered, they form a chain. Each component is responsible for calling call_next() to pass control to the next layer; omitting this call short-circuits the pipeline, preventing any downstream middleware or the LLM from running.

Note that call_next() takes no arguments. Instead of passing updated values explicitly, middleware mutates the shared AgentContext object directly — any changes made before await call_next() are automatically visible to everything further down the chain.

Class-Based Implementation

Subclass AgentMiddleware and override process(). The example below shows SecurityAgentMiddleware It inspects the latest user message and short-circuits the pipeline if it detects a threat the LLM is never invoked for blocked requests.

class SecurityAgentMiddleware(AgentMiddleware):
    """Agent-level guard: blocks risky **user chat text** before the model runs.

    Inspects ``context.messages[-1]`` (latest user turn). If :func:`_unsafe_input_reason`
    returns a reason, sets ``context.result`` to a canned assistant reply and **does not**
    call ``call_next()``, so the LLM and tools are skipped for that turn.
    """

    async def process(
        self,
        context: AgentContext,
        call_next: Callable[[], Awaitable[None]],
    ) -> None:
        # Only the latest user utterance is checked (typical for a single-turn REPL).
        last_message = context.messages[-1] if context.messages else None
        if last_message and last_message.text:
            query = last_message.text
            reason = _unsafe_input_reason(query)
            if reason:
                print(f"[SecurityAgentMiddleware] Security Warning: {reason}; blocking request.")
                # Short-circuit: set the assistant reply here; do NOT call call_next() → no LLM, no tools.
                context.result = AgentResponse(
                    messages=[
                        Message(
                            "assistant",
                            [f"Request blocked: {reason}."],
                        )
                    ]
                )
                return

        print("[SecurityAgentMiddleware] Security check passed.")
        # Continue pipeline: model + optional run_sql; function middleware runs inside tool path.
        await call_next()

# here is the _unsafe_input_reason function & For brevity, I’ve omitted the full code.”

def _unsafe_input_reason(query: str) -> str | None:
    """Classify why a user message should be blocked, or ``None`` if it may proceed.

    Checks run in order: injection-style patterns first, then destructive natural language.
    """
    # Order matters: catch obvious SQL fragments before broader NL patterns.
    if _looks_like_dangerous_sql(query):
        return "injection-style or suspicious SQL fragment in your message"
    if _looks_like_destructive_database_intent(query):
        return "destructive database request (e.g. delete/drop/truncate)"
    return None

Function Based and Decorator Based Styles

Agent Framework also supports function based and decorator based implementations. All three styles are equivalent; choose based on whether you need state or explicit type annotations.

Function based

async def logging_agent_middleware(

context: AgentContext,

next: Callable[[AgentContext], Awaitable[None]],

) -> None:

print("[Agent] Turn starting")

await next(context)

print("[Agent] Turn completed")

Decorator-based (no type annotation required)

@agent_middleware

async def simple_agent_middleware(context, next):

print("Before agent execution")

await next(context)

print("After agent execution")

Registering Middleware

Middleware is registered when constructing the agent. Pass a list to the middleware argument different middleware types can be mixed in the same list and the framework routes each to the correct pipeline layer automatically:

FOUNDRY_PROJECT_ENDPOINT = "https://sreeniagent.services.ai.azure.com/api/projects/sreeni_foundry"
FOUNDRY_MODEL = "gpt-4.1"


async with (
    AzureCliCredential() as credential,
    Agent(
        client=FoundryChatClient(
            credential=credential,
            project_endpoint=FOUNDRY_PROJECT_ENDPOINT, # Your Microsoft Foundry project URL 
            model=FOUNDRY_MODEL, # The model you deployed 
        ),
        name="Sreeni-SqlAssistant",
        instructions=(
            "You help users query a small demo database. "
            "The only table is `customers` with columns id, name, city. "
            "Always use the run_sql tool with a proper SELECT; explain results briefly."
        ),
        tools=run_sql,
        # Agent middleware wraps the turn; function middleware wraps each tool call
        middleware=[SecurityAgentMiddleware(), LoggingFunctionMiddleware()],
    ) as agent,
):

When to Use Agent Middleware

Agent middleware is the right choice for any concern that applies to the turn as a whole, rather than to a specific tool call or model request.

2.FunctionMiddleware- The ToolCall Layer

FunctionMiddleware fires inside the agent turn, but only when the LLM decides to invoke a tool. A single agent turn can trigger multiple tool calls, and FunctionMiddleware wraps each one independently. This makes it the right place for concerns that are specific to tool execution: timing, input validation, result transformation, and tool call auditing.

The FunctionInvocationContext Object

Each FunctionMiddleware component receives a FunctionInvocationContext, which is scoped to a single tool invocation:

When to Use FunctionMiddleware

Use it for concerns specific to tool execution the execution timing and performance monitoring, validating or sanitising tool arguments before they run, capping the number of times a tool may be called in one turn, transforming tool results before the LLM sees them, or auditing exactly which tools were called and with what arguments.

Terminating the Function Calling Loop

Setting context.terminate = True inside FunctionMiddleware does something powerful: it stops the LLM’s function calling loop entirely. The LLM will not receive the tool result and will not make any further tool calls in this turn. This is useful for enforcing tool call budgets or stopping a loop that is going in an undesirable direction.


@function_middleware

async def budget_middleware(context, next):

 if context.function.name == "run_sql":

 # Allow at most one SQL query per turn

 call_count = context.metadata.get("sql_calls", 0)

 if call_count >= 1:

 context.result = "Query limit reached for this turn."

 context.terminate = True  # stop the LLM tool-calling loop

 return

 context.metadata["sql_calls"] = call_count + 1

 await next(context)

Warning: Termination and Chat History

Terminating the function calling loop can leave the chat history in an inconsistent state a tool-call message with no corresponding tool result. This may cause errors if the same history is used in subsequent agent runs. Use termination carefully and consider clearing or repairing the history afterward.

3. ChatMiddleware —The LLM Call Layer

ChatMiddleware is the deepest layer. It wraps the actual inference call sent to the underlying language model the raw list of messages, the model options, and the response that comes back. This layer fires for every call to the LLM within a turn, which can be more than one if tools are used.

The ChatContext Object

Each ChatMiddleware component receives a ChatContext.

Function Based Example


  async def logging_chat_middleware(

  context: ChatContext,

  next: Callable[[ChatContext], Awaitable[None]],

  ) -> None:

  print(f"[Chat] Sending {len(context.messages)} messages to model")

  await next(context)

  print("[Chat] Model response received")

Because ChatMiddleware sees the exact message list going to the model, it can be used to inject system instructions, strip sensitive content, enforce token budgets, or even substitute a cached response all without the AgentMiddleware or FunctionMiddleware layers knowing anything changed.

When to Use ChatMiddleware

Use it when you need access to the raw LLM call: injecting or modifying system level instructions per call, redacting PII from messages before they leave your infrastructure, enforcing token count limits, caching repeated inference calls, or monitoring every model request for compliance purposes.

Registration: Agent Level vs. Run Level (run scope)

Microsoft Agent Framework supports two scopes for registering middleware. Understanding the difference is important for designing flexible agent systems.

Agent Level Middleware

Middleware passed in the middleware=[...] list when constructing the Agent applies to every single call to agent.run() for the lifetime of that agent. This is where you put policies that should always be enforced: security guards, mandatory audit logging, content filters.

Run Level Middleware

You can also pass middleware directly to a single agent.run() call. This middleware applies only to that one invocation and is discarded afterward. It is useful for per request customisation: adding a trace ID for a specific call, applying extra validation for a sensitive operation, or attaching a debug logger without affecting every other turn.

Choosing the Right Middleware Type

With three types available, the choice usually comes down to what you need to see and at what granularity.

Conclusion

Microsoft Agent Framework’s middleware brings the same pipeline contract you know from ASP.NET Core and Express ordered components, a context object, and a call_next() delegate into the world of AI agents. The structural difference is that an agent turn is not a single request/response cycle but a multi-step reasoning loop, and Agent Framework exposes three separate interception points within it.

AgentMiddleware is the right home for turn level concerns: security screening, content policy, and audit logging.

FunctionMiddleware is the right home for tool level concerns: execution timing, argument validation, and tool call budgets.

ChatMiddleware is the right home for model level concerns: raw message inspection, token enforcement, and caching.

Thanks
Sreeni Ramadorai

DEV Community