DEV Community

Cover image for Implementing Zero-Trust Networking and Identity for Microsoft Foundry Agents
Jubin Soni
Jubin Soni Subscriber

Posted on

Implementing Zero-Trust Networking and Identity for Microsoft Foundry Agents

Once your multi-agent system (Parts 6-8) is functionally solid, the question that comes up in every enterprise security review is the same: how do you know an agent is only doing what it's authorized to do, over a network path you actually trust, with an audit trail that holds up when someone asks "what did this agent do and why" six months later? This post covers private networking, Entra Agent ID, and audit trail design.

Zero-trust network boundary

Zero-trust network boundary description

Private endpoints: keeping traffic off the public internet

By default, calls from your application to Foundry Models travel over the public internet (encrypted, but publicly routable). For regulated workloads, route through Private Link instead:

resource foundryPrivateEndpoint 'Microsoft.Network/privateEndpoints@2023-04-01' = {
  name: 'pe-foundry-project'
  location: location
  properties: {
    subnet: {
      id: subnetId
    }
    privateLinkServiceConnections: [
      {
        name: 'foundry-connection'
        properties: {
          privateLinkServiceId: foundryResourceId
          groupIds: ['account']
        }
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

Pair this with a Managed VNET at the project level so that outbound calls from your compute (the orchestration layer running the agent code from Part 7) never leave the private network boundary. The failure mode to check for: a dependency library making an unexpected public call (a package doing telemetry callback, for instance) that bypasses your intended network boundary entirely — audit your dependencies' network behavior, not just your own code's.

Entra Agent ID: identity for autonomous agents, not just services

Traditional managed identity was designed for services calling APIs on a fixed schedule with predictable behavior. Entra Agent ID extends this model specifically for autonomous agents that make independent decisions — the identity model needs to answer not just "is this call authenticated" but "was this agent authorized to take this specific action in this specific context."

from azure.identity import DefaultAzureCredential

class AgentIdentityContext:
    def __init__(self, agent_id: str, allowed_actions: list[str]):
        self.agent_id = agent_id
        self.allowed_actions = allowed_actions
        self.credential = DefaultAzureCredential()

    def authorize_action(self, action: str, resource_scope: str) -> bool:
        if action not in self.allowed_actions:
            log_authorization_denial(self.agent_id, action, resource_scope)
            return False
        token = self.credential.get_token(resource_scope)
        return token is not None

def execute_agent_action(agent_context: AgentIdentityContext, action: str, resource_scope: str, fn):
    if not agent_context.authorize_action(action, resource_scope):
        raise PermissionError(f"Agent {agent_context.agent_id} not authorized for {action}")
    return fn()
Enter fullscreen mode Exit fullscreen mode

The key design point: each agent in your multi-agent chain gets its own scoped identity with its own explicit allowed_actions list, rather than all agents sharing one broad service principal. This means a compromised or misbehaving refund agent can't accidentally (or maliciously, if prompt-injected) invoke actions scoped only to the fraud-check agent.

Entra Image description

Security control What it protects against Enforced where
Private Link / Managed VNET Traffic leaving the trusted network boundary Network layer
Entra Agent ID scoped identity An agent invoking actions outside its role Identity/authorization layer, in code
Structured audit log with reasoning Inability to explain why an action was taken Application logging layer
Authorization enforced in code, not model claims Prompt injection claiming false authorization Application logic, never the model's own output

Data residency and customer-managed keys

Private networking and identity handle "who can reach this system and act as whom." A separate, equally common enterprise requirement is "where does this data physically live, and who holds the encryption keys" — data residency and customer-managed keys (CMK) address this and get missed by teams who've handled networking and identity but stop there.

For data residency, confirm which region your Foundry project, its underlying Azure AI Search index, and any storage backing Foundry IQ actually run in — these can silently default to a different region than your primary application if not explicitly configured, which is a real problem for workloads subject to data-sovereignty requirements (GDPR-adjacent obligations, government contracts with residency clauses, etc.):

resource foundryProject 'Microsoft.CognitiveServices/accounts/projects@2024-10-01' = {
  name: 'your-project'
  location: 'westeurope'  // must match your data residency requirement explicitly —
                            // don't rely on the default location of the parent resource group
  properties: {
    // ...
  }
}
Enter fullscreen mode Exit fullscreen mode

For customer-managed keys, the default is Microsoft-managed encryption at rest, which is adequate for most workloads but insufficient for some regulated industries that require the ability to revoke access to data by rotating or deleting a key the organization itself controls:

resource foundryAccount 'Microsoft.CognitiveServices/accounts@2024-10-01' = {
  properties: {
    encryption: {
      keySource: 'Microsoft.KeyVault'
      keyVaultProperties: {
        keyVaultUri: keyVaultUri
        keyName: 'foundry-cmk'
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Neither of these is something you retrofit easily after data has already been written under the default configuration — data residency and CMK are decisions to make explicitly during initial provisioning of the Foundry project, not something to leave as "we'll configure that before the security review." If your compliance requirements are still being finalized when you provision, default to the more restrictive configuration (explicit region pinning, CMK) rather than the platform default, since loosening a restriction later is far easier than migrating already-written data into a stricter configuration retroactively.

Audit trails that actually hold up

A log line saying "agent called refund API" is not an audit trail — when someone asks "why did the agent decide to issue this refund," you need the reasoning captured, not just the action:

import json
import time

def log_agent_decision(agent_id, state, decision, reasoning, authorized_by):
    audit_entry = {
        "timestamp": time.time(),
        "agent_id": agent_id,
        "task_id": state.task_id,
        "decision": decision,
        "reasoning_summary": reasoning,   # the model's stated rationale, captured verbatim
        "state_snapshot": {
            "order_id": state.order_id,
            "fraud_check_result": state.fraud_check_result,
        },
        "authorized_by": authorized_by,   # which identity/policy allowed this
        "input_hash": hash_conversation_context(state),  # for reproducibility without storing PII long-term
    }
    write_to_audit_log(audit_entry)
Enter fullscreen mode Exit fullscreen mode

Three things a real audit trail needs that a simple action log doesn't: the model's stated reasoning at the decision point (not just the outcome), the authorization context (which identity/policy permitted the action), and enough state snapshot to reconstruct why the decision made sense given the inputs — without necessarily storing full raw PII long-term, which is why hashing the input context is often the right tradeoff between auditability and data minimization.

Testing the security boundary, not just the happy path

Write tests that deliberately attempt to violate the boundaries you've set up:

  • Attempt to call an action outside an agent's allowed_actions list and confirm it's denied and logged.
  • Attempt a request that would resolve to a public endpoint and confirm the network boundary rejects it.
  • Simulate a prompt-injection attempt that tries to get an agent to claim authorization for an action it doesn't have, and confirm the authorization check (not the model's own claim) is what gates execution.

That last point matters most: authorization must be enforced in code against the identity's actual permitted action list, never based on what the model says about its own permissions in its output text.

References

Top comments (0)