DEV Community

René Nijkamp
René Nijkamp

Posted on • Originally published at itsrene.nl

Azure APIM MCP Audit Logging Without Breaking Everything

Azure APIM MCP Audit Logging Without Breaking Everything

In Part 2, we locked down security. Now let's talk about observability.

You need audit logging for compliance. You need distributed tracing for debugging. You need error handling for production resilience.

Here's the problem: by default, you have none of that.

Out of the box, APIM passes MCP requests through with zero logging. You have no idea if someone's calling tools/list or tools/call. You can't tell which API requests are actually MCP traffic. No audit trail. No visibility. Just requests flowing through in the dark. Not what you want, not what compliance wants, not what security wants.

And when you try to add logging? That's when you discover the fun part: accessing the response body in APIM policies causes requests to hang indefinitely.

I found this out the hard way. Days of debugging. Requests hanging. Timeouts everywhere. The fix? Don't touch the response body. Ever.

This issue has been reported to Microsoft,it is on their radar. But here's the reality: MCP servers are in preview. When you work with preview functionality, you carry the burden of working around these rough edges yourself. That's the tax you pay for being early.

Let me show you how to get the observability you need, without the trial and erroring I went through.

Observability Architecture

Diagram


The Response Body Problem

What You'd Normally Do

In a typical APIM setup, especially during development, you'd log the response body:

<outbound>
    <log-to-eventhub logger-id="my-logger">
        @{
            return new JObject(
                new JProperty("request", context.Request.Body.As<string>()),
                new JProperty("response", context.Response.Body.As<string>())  // HANGS
            ).ToString();
        }
    </log-to-eventhub>
</outbound>
Enter fullscreen mode Exit fullscreen mode

This will hang your requests. The response never completes. Clients timeout.

Why This Happens

When you access context.Response.Body in the <outbound> section:

I dont have access to the APIM MCP internals, but: I suspect that APIM MCP tries to buffer the entire response, which can be large, which can be streaming. If its in memory buffer, it might just trigger errors there. If its streaming, it will block until everything is read, which is not going to happen until the client reads it... In short, a deadlock, no bueno, and you break production.

The frustrating part? This behavior isn't called out clearly in the main policy documentation. The policy is saved without any warnings, and you wonder why suddenly you get timeout errors. After a lot of trial and error, or if you're lucky, someone (Hi there ;) ) warns you first. Probably just one of those things that happens in preview.

Microsoft is aware of this issue, it's been reported and is on their radar.

The good news: there's a reliable workaround that gives you everything compliance needs.

The Solution: Log Metadata, Not Payloads

You can't log response bodies. Fine. Log everything else instead—and it turns out that's exactly what you need for compliance anyway:

<outbound>
    <log-to-eventhub logger-id="mcp-audit-logger">
        @{
            var mcpMethod = "";
            var mcpId = "";

            // Extract MCP method from request body (safely)
            try {
                var requestBody = context.Request.Body.As<JObject>(preserveContent: true);
                mcpMethod = requestBody["method"]?.ToString() ?? "";
                mcpId = requestBody["id"]?.ToString() ?? "";
            } catch {
                mcpMethod = "parse-error";
            }

            return new JObject(
                // Request metadata
                new JProperty("timestamp", DateTime.UtcNow.ToString("o")),
                new JProperty("requestId", context.RequestId),
                new JProperty("subscriptionId", context.Subscription?.Id ?? "none"),
                new JProperty("subscriptionName", context.Subscription?.Name ?? "none"),

                // MCP-specific
                new JProperty("mcpMethod", mcpMethod),
                new JProperty("mcpId", mcpId),

                // Response metadata (SAFE - no body access)
                new JProperty("statusCode", context.Response.StatusCode),
                new JProperty("statusReason", context.Response.StatusReason),

                // Timing
                new JProperty("elapsedMs", context.Elapsed.TotalMilliseconds),

                // Client info
                new JProperty("clientIp", context.Request.IpAddress),
                new JProperty("userAgent", context.Request.Headers.GetValueOrDefault("User-Agent", ""))
            ).ToString();
        }
    </log-to-eventhub>
    <base />
</outbound>
Enter fullscreen mode Exit fullscreen mode

Key points:

  • context.Request.Body.As<JObject>(preserveContent: true) is safe in <outbound>
  • context.Response.StatusCode is safe
  • context.Response.Body is NOT safe
  • Wrap parsing in try-catch for malformed requests
  • If you have any other headers you want to log, you can add them quite easily to the JObject.

Logging Tool Discovery

Tracking who is executing tools is needed, but what about who's discovering your tools?. Who hasnt seen the IP port scans in his life, this is not much else than that. To log these activities, a policy snippet needs to be added to the Inbound section, since this request will never hit any outbound services (and therefore policies)

<inbound>
    <!-- After security checks, before backend -->
    <choose>
        <when condition="@(context.Request.Body.As<JObject>(preserveContent: true)["method"]?.ToString() == "tools/list")">
            <log-to-eventhub logger-id="mcp-audit-logger" partition-id="0">
                @{
                    return new JObject(
                        new JProperty("eventType", "discovery"),
                        new JProperty("timestamp", DateTime.UtcNow.ToString("o")),
                        new JProperty("requestId", context.RequestId),
                        new JProperty("subscriptionId", context.Subscription?.Id ?? "none"),
                        new JProperty("subscriptionName", context.Subscription?.Name ?? "none"),
                        new JProperty("clientIp", context.Request.IpAddress),
                        new JProperty("userAgent", context.Request.Headers.GetValueOrDefault("User-Agent", "")),
                        new JProperty("mcpMethod", "tools/list")
                    ).ToString();
                }
            </log-to-eventhub>
        </when>
    </choose>
</inbound>
Enter fullscreen mode Exit fullscreen mode

Why log discovery separately?

  • Security monitoring: Who is checking for tools, and are they allowed to. It can be a forgotten attack surface. So better log it.
  • Usage analytics: Track which clients are discovering vs actually using tools. If there is an overload of tool discovery calls, you might have an issue in one of your agents.
  • Compliance: Auditors want to know who accessed what capabilities, even if they didn't invoke them
  • Performance: Separate partition for discovery logs (partition-id="0") keeps them isolated from invocation logs

This gives you the complete picture: discovery → invocation → response → errors.


Setting Up Event Hub Logging

Reality check: You need Event Hub for production audit logging. Application Insights alone won't give you the retention, queryability, and compliance guarantees you need. Yes, it's another Azure service. Yes, it adds complexity. But it's the right tool for immutable audit trails. Event Hub also very easily integrates with tools like Datadog or Grafana, or whatever SIEM you use for your end product.

If you're running a small pilot, you can skip this and use App Insights only. For regulated environments? Event Hub is non-negotiable.

1. Create Event Hub

# Variables
RESOURCE_GROUP="rg-apim-mcp"
LOCATION="westeurope"
EVENTHUB_NAMESPACE="eh-apim-mcp-logs"
EVENTHUB_NAME="mcp-audit-logs"

# Create namespace
az eventhubs namespace create \
  --resource-group $RESOURCE_GROUP \
  --name $EVENTHUB_NAMESPACE \
  --location $LOCATION \
  --sku Standard

# Create event hub
az eventhubs eventhub create \
  --resource-group $RESOURCE_GROUP \
  --namespace-name $EVENTHUB_NAMESPACE \
  --name $EVENTHUB_NAME \
  --partition-count 4 \
  --message-retention 7

# Get connection string
az eventhubs namespace authorization-rule keys list \
  --resource-group $RESOURCE_GROUP \
  --namespace-name $EVENTHUB_NAMESPACE \
  --name RootManageSharedAccessKey \
  --query primaryConnectionString -o tsv
Enter fullscreen mode Exit fullscreen mode

2. Create APIM Logger

# Get APIM instance
APIM_NAME="apim-mcp-prod"

# Create logger
az apim logger create \
  --resource-group $RESOURCE_GROUP \
  --service-name $APIM_NAME \
  --logger-id "mcp-audit-logger" \
  --logger-type azureEventHub \
  --connection-string "Endpoint=sb://eh-apim-mcp-logs.servicebus.windows.net/;..." \
  --description "MCP audit logging"
Enter fullscreen mode Exit fullscreen mode

Or via Azure Portal:

  1. Navigate to your APIM instance
  2. APIsLoggers+ Add
  3. Name: mcp-audit-logger
  4. Type: Azure Event Hub
  5. Connection string: (paste from above)
  6. Create

3. Configure Named Values (Optional)

If you're managing multiple environments (dev/staging/prod), use named values. If you're just testing, hard-code it and move on:

az apim nv create \
  --resource-group $RESOURCE_GROUP \
  --service-name $APIM_NAME \
  --named-value-id "audit-logger-id" \
  --display-name "audit-logger-id" \
  --value "mcp-audit-logger"
Enter fullscreen mode Exit fullscreen mode

Then reference in policy:

<log-to-eventhub logger-id="{{audit-logger-id}}">
Enter fullscreen mode Exit fullscreen mode

Complete Production Logging Policy

Here's the full policy with audit logging integrated:

<policies>
    <inbound>
        <!-- Security policies from Part 2 -->
        <choose>
            <when condition="@(context.Request.Headers.GetValueOrDefault(\"Ocp-Apim-Subscription-Key\", \"\") == \"\")">
                <return-response>
                    <set-status code="401" reason="Unauthorized" />
                    <set-body>{"error": "Subscription key required"}</set-body>
                </return-response>
            </when>
        </choose>

        <set-variable name="originalSubKey" 
                      value="@(context.Request.Headers.GetValueOrDefault(\"Ocp-Apim-Subscription-Key\", \"\"))" />
        <set-variable name="originalJwtToken" 
                      value="@(context.Request.Headers.GetValueOrDefault(\"Authorization\", \"\"))" />

        <!-- Store request timestamp for elapsed time calculation -->
        <set-variable name="requestStartTime" value="@(DateTime.UtcNow)" />

        <base />

        <rate-limit calls="100" renewal-period="60" />

        <choose>
            <when condition="@(context.Subscription == null || context.Subscription.Id == null)">
                <return-response>
                    <set-status code="401" reason="Unauthorized" />
                    <set-body>{"error": "Invalid subscription key"}</set-body>
                </return-response>
            </when>
        </choose>

        <set-header name="Ocp-Apim-Subscription-Key" exists-action="override">
            <value>@((string)context.Variables["originalSubKey"])</value>
        </set-header>
        <set-header name="Authorization" exists-action="override">
            <value>@((string)context.Variables["originalJwtToken"])</value>
        </set-header>

        <!-- Distributed tracing - preserve existing or generate new -->
        <set-header name="X-Request-ID" exists-action="skip">
            <value>@(context.RequestId)</value>
        </set-header>

        <!-- Log tool discovery requests -->
        <choose>
            <when condition="@(context.Request.Body.As<JObject>(preserveContent: true)["method"]?.ToString() == "tools/list")">
                <log-to-eventhub logger-id="mcp-audit-logger" partition-id="0">
                    @{
                        return new JObject(
                            new JProperty("eventType", "discovery"),
                            new JProperty("timestamp", DateTime.UtcNow.ToString("o")),
                            new JProperty("requestId", context.RequestId),
                            new JProperty("subscriptionId", context.Subscription?.Id ?? "none"),
                            new JProperty("subscriptionName", context.Subscription?.Name ?? "none"),
                            new JProperty("clientIp", context.Request.IpAddress),
                            new JProperty("userAgent", context.Request.Headers.GetValueOrDefault("User-Agent", "")),
                            new JProperty("mcpMethod", "tools/list")
                        ).ToString();
                    }
                </log-to-eventhub>
            </when>
        </choose>
    </inbound>
    <backend>
        <base />
    </backend>
    <outbound>
        <!-- Audit logging (SAFE - no response body access) -->
        <log-to-eventhub logger-id="mcp-audit-logger">
            @{
                var mcpMethod = "";
                var mcpId = "";
                var userId = "";

                // Extract MCP details
                try {
                    var requestBody = context.Request.Body.As<JObject>(preserveContent: true);
                    mcpMethod = requestBody["method"]?.ToString() ?? "";
                    mcpId = requestBody["id"]?.ToString() ?? "";
                } catch {
                    mcpMethod = "parse-error";
                }

                // Extract user ID from JWT (if present)
                try {
                    var authHeader = context.Request.Headers.GetValueOrDefault("Authorization", "");
                    if (!string.IsNullOrEmpty(authHeader) && authHeader.StartsWith("Bearer ")) {
                        var token = authHeader.Substring(7);
                        var jwt = System.IdentityModel.Tokens.Jwt.JwtSecurityTokenHandler.ReadJwtToken(token);
                        userId = jwt.Claims.FirstOrDefault(c => c.Type == "sub")?.Value ?? "";
                    }
                } catch {
                    userId = "jwt-parse-error";
                }

                return new JObject(
                    // Timestamps
                    new JProperty("timestamp", DateTime.UtcNow.ToString("o")),
                    new JProperty("requestStartTime", ((DateTime)context.Variables["requestStartTime"]).ToString("o")),

                    // Request metadata
                    new JProperty("requestId", context.RequestId),
                    new JProperty("operationId", context.Operation?.Id ?? ""),
                    new JProperty("apiId", context.Api?.Id ?? ""),

                    // Subscription details
                    new JProperty("subscriptionId", context.Subscription?.Id ?? "none"),
                    new JProperty("subscriptionName", context.Subscription?.Name ?? "none"),

                    // User context
                    new JProperty("userId", userId),
                    new JProperty("clientIp", context.Request.IpAddress),
                    new JProperty("userAgent", context.Request.Headers.GetValueOrDefault("User-Agent", "")),

                    // MCP-specific
                    new JProperty("mcpMethod", mcpMethod),
                    new JProperty("mcpId", mcpId),

                    // Response metadata (SAFE)
                    new JProperty("statusCode", context.Response.StatusCode),
                    new JProperty("statusReason", context.Response.StatusReason),

                    // Performance
                    new JProperty("elapsedMs", context.Elapsed.TotalMilliseconds),
                    new JProperty("backendTimeMs", context.Response.Headers.GetValueOrDefault("X-Backend-Time", "0")),

                    // Errors
                    new JProperty("isError", context.Response.StatusCode >= 400),
                    new JProperty("lastError", context.LastError?.Message ?? "")
                ).ToString();
            }
        </log-to-eventhub>

        <!-- Response headers - return the request ID (original or generated) -->
        <set-header name="X-Request-ID" exists-action="skip">
            <value>@(context.Request.Headers.GetValueOrDefault("X-Request-ID", context.RequestId))</value>
        </set-header>
        <set-header name="X-RateLimit-Limit" exists-action="override">
            <value>100</value>
        </set-header>
        <set-header name="X-RateLimit-Window" exists-action="override">
            <value>60</value>
        </set-header>

        <base />
    </outbound>
    <on-error>
        <!-- Error logging -->
        <log-to-eventhub logger-id="mcp-audit-logger">
            @{
                return new JObject(
                    new JProperty("timestamp", DateTime.UtcNow.ToString("o")),
                    new JProperty("requestId", context.RequestId),
                    new JProperty("subscriptionId", context.Subscription?.Id ?? "none"),
                    new JProperty("isError", true),
                    new JProperty("errorSource", context.LastError?.Source ?? ""),
                    new JProperty("errorReason", context.LastError?.Reason ?? ""),
                    new JProperty("errorMessage", context.LastError?.Message ?? ""),
                    new JProperty("statusCode", context.Response.StatusCode),
                    new JProperty("elapsedMs", context.Elapsed.TotalMilliseconds)
                ).ToString();
            }
        </log-to-eventhub>

        <base />
    </on-error>
</policies>
Enter fullscreen mode Exit fullscreen mode

Distributed Tracing

Request ID Propagation

Of course you want to be able to trace a request from end-to-end. Because, whats the use otherwise besides just burning money on logs?
The policy above includes:

<!-- Inbound: Preserve existing or generate new -->
<set-header name="X-Request-ID" exists-action="skip">
    <value>@(context.RequestId)</value>
</set-header>

<!-- Outbound: Return the request ID (original or generated) -->
<set-header name="X-Request-ID" exists-action="skip">
    <value>@(context.Request.Headers.GetValueOrDefault("X-Request-ID", context.RequestId))</value>
</set-header>
Enter fullscreen mode Exit fullscreen mode

Use exists-action="skip" instead of "override". This preserves any X-Request-ID that's already present from upstream services (AI agents, proxies, gateways). Only generate a new one if it doesn't exist.

THe part in outbound makes sure we actually send this to the backend service, of course its your responsibility that it uses that ID for its own logging, and propagates it downstream as well.
We dont want to break the chain.


What to Monitor

Once Event Hub is streaming your logs, pipe them to whatever observability platform you're using:

Common destinations:

  • Grafana Cloud - Event Hub → Grafana Alloy → Grafana Cloud (uses Loki backend)
  • Datadog - Event Hub → Azure Function forwarder → Datadog
  • Splunk - Event Hub → Splunk HEC connector
  • Application Insights - Built-in Azure integration if you're all-in on Azure

The Event Hub JSON format (shown in the policy above) works with pretty much any log aggregator. You're getting structured JSON with timestamps, request IDs, and all the metadata you need.

Key metrics to track:

Request Metrics: Requests/min, error rate by status code, P50/P95/P99 latency, rate limit hits (429s)

MCP-Specific: tools/list vs tools/call distribution, per-tool error rates, per-subscription usage patterns

Security: Unauthorized attempts (401/403), invalid subscription keys, unusual client IPs, rate limit violations


Performance Considerations

What to Log (And What Not To)

Always log: Request metadata, response status codes, timing, security context, and errors. These are safe, fast, and give you what compliance needs.

Never log: Response bodies (hangs requests), large payloads (kills performance), or sensitive data (violates compliance). Anything that slows down the critical path or exposes PII or PCI is out.

The metadata-only approach handles millions of requests without breaking a sweat. You dont want to introduce verbose logs, slowing down your performance and loose money on people bailing out.

Sampling Strategy

For high-volume APIs:

<!-- Log 100% of errors, 10% of successes -->
<choose>
    <when condition="@(context.Response.StatusCode >= 400)">
        <log-to-eventhub logger-id="mcp-audit-logger">
            @{ /* full logging */ }
        </log-to-eventhub>
    </when>
    <when condition="@(new Random().Next(100) < 10)">
        <log-to-eventhub logger-id="mcp-audit-logger">
            @{ /* sampled logging */ }
        </log-to-eventhub>
    </when>
</choose>
Enter fullscreen mode Exit fullscreen mode

Compliance & Audit Requirements

Data Retention

(Do check this with your compliance team, these are just guidelines)

  • Event Hub: 7-90 days retention
  • Log Analytics: 30-365 days retention
  • Archive Storage: Unlimited (cold storage)

Audit Trail Fields

For compliance (SOC2, ISO 27001):

{
  "timestamp": "2025-11-05T10:30:45.123Z",
  "requestId": "abc-123-def-456",
  "userId": "user@company.com",
  "subscriptionName": "Company-Production",
  "mcpMethod": "tools/call",
  "toolName": "processPayment",
  "statusCode": 200,
  "clientIp": "203.0.113.42",
  "action": "execute",
  "resource": "/api/payment",
  "result": "success"
}
Enter fullscreen mode Exit fullscreen mode

Note: If you have a WAF in front of your APIM, make sure to collect WAF logs as well, because they contain important security events (blocked attacks, suspicious IPs, etc).

GDPR Considerations

During my career I have made my mistakes, and seen others make them. To prevent you ending up sanitizing logs, which is probably going to ruin your weekend, here are some guidelines:

For user data:

  • Don't log full request/response bodies (they can contain PII, PCI data)
  • Log user IDs, but only if needed (pseudonymized from JWT claims)
  • Support data deletion requests (Event Hub retention handles this)
  • Implement retention policies (compliance will ask for this)

Before You Go Live

Core Requirements (Everyone needs this):

  • Event Hub logger configured
  • Audit logging policy deployed
  • Response body access removed (no hanging, although you will notice this quite fast)
  • Request IDs propagated to backends (preserving upstream IDs)
  • Log retention policies set (check with compliance team)
  • Sampling configured for high volume (if needed)

Monitoring Stack (Choose your poison):

  • Event Hub → Your SIEM/observability platform (Datadog, Grafana, Splunk, etc.)
  • Or Application Insights (if you're all-in on Azure)
  • Dashboards created (whatever tool you use)
  • Alerts configured for errors/latency/security events to disrupt your sleep

What's Next

Security: locked down. Observability: sorted. Now let's tackle the elephant in the room: automation.

Coming soon: Part 4 - GitOps for Azure APIM MCP: Custom Automation Guide

MCP servers don't support Terraform or ARM templates yet. Microsoft knows this is a gap and it's on the roadmap. In the meantime, I'll show you ideas how to automate deployments using custom REST API scripts and CI/CD pipelines—because manual Portal clicks don't scale. This is still something we have to implement, but lets drill down on the concept!


I'm a Product Architect at Backbase, where I design cloud-native banking platforms serving millions of users. The patterns in this series come from real production implementations at enterprise scale. Views are my own.

How are you handling APIM observability? Share your patterns in the comments.

Follow me on LinkedIn for more Azure and platform engineering content.

Top comments (0)