Kazuma Horishita

Posted on Apr 26

Separating Agent Tool Calls from Authorization and Evidence

#ai #agents #llm #security

Separating Agent Tool Calls from Authorization and Evidence

As LLM applications evolve from chat interfaces into agentic systems that call tools, APIs, workflows, and external services, the security question changes.

The question is no longer only:

Did the model generate the right answer?

It becomes:

What happens when model output turns into an actual action?

For example, a model may generate something like this:

{
  "tool": "send_email",
  "args": {
    "to": "external@example.com",
    "subject": "Report",
    "body": "..."
  }
}

At that point, the most important question is not whether the JSON is syntactically valid.

The real questions are:

On whose behalf is this email being sent?
Is this destination allowed?
Does the body contain sensitive information?
Was this action influenced by untrusted retrieved content?
Is this a high-impact action?
Was it authorized at execution time?
Will allow, deny, defer, or escalation be recorded as evidence?

A tool call is not authorization.

A model-generated tool call is a proposed action.

That proposed action still needs to pass through authorization, enforcement, and evidence boundaries before execution.

TL;DR

For agentic AI systems, model-generated tool calls should be treated as proposed actions, not executable authority.

A safer design separates:

Layer	Role
Model	Proposes an action
Authorization	Decides whether the action is allowed
Enforcement	Ensures only the authorized action executes
Evidence	Records what was proposed, decided, executed, denied, or escalated

A minimal pattern looks like this:

Model
  ↓
Proposed Tool Call
  ↓
Authorization Decision Point
  ↓
Tool Dispatch Enforcement Point
  ↓
Tool / API Execution
  ↓
Evidence Writer

The key implementation ideas are:

Do not execute model-generated tool calls directly.
Normalize proposed actions before authorization.
Treat backend authorization as still required.
Track untrusted input sources conservatively.
Bind authorization decisions to action hashes, principal, scope, and expiry.
Record not only successful execution, but also deny, defer, escalate, freeze, and reauthorization decisions.
Make evidence tamper-resistant and separate from the agent runtime where possible.
Treat human approval as a control that must be designed, not a magic safety layer.

1. The problem: tool calls are proposed actions

Tool calling is powerful.

It lets an AI system do things like:

send messages,
query databases,
update tickets,
read documents,
summarize email,
call internal APIs,
create pull requests,
change access rights,
trigger deployments,
write persistent memory,
delegate work to another agent.

But tool calling also creates a new security boundary.

The model may be influenced by many different sources:

user instructions,
system prompts,
retrieved documents,
external emails,
web pages,
issue comments,
chat logs,
support tickets,
previous tool outputs,
memory,
workflow state.

To the model, these may all become “context.”

From a security perspective, they are not equivalent.

An external email is not user intent.

A web page is not organizational approval.

A GitHub issue is not production deployment authorization.

A retrieved document is not permission to exfiltrate its contents.

A model-generated tool call is not authority.

That is the core problem.

When a model proposes a tool call, the system should treat it as an action request that still needs independent evaluation.

2. The problem with direct tool execution

A simple agent implementation may look like this:

tool_call = model.generate_tool_call(context)
result = dispatch_tool(tool_call)

This is easy to build.

It is also dangerous.

There is no explicit authorization boundary.

There is no clear evidence record.

There is no check that the model-generated action matches the user’s authority, workflow purpose, data classification, destination policy, or runtime state.

A safer system should ask:

Who is the principal?
What action is being requested?
Which tool will be called?
What resource is affected?
Is the destination internal or external?
Was untrusted content involved?
Is this action high-impact?
Which policy applies?
Can evidence be written?
Should this be allowed, denied, deferred, escalated, or frozen?

The action should only execute after that decision.

3. Minimal architecture

A minimal design separates proposed action generation from authorization and execution.

User / Workflow
      ↓
Agent Runtime
      ↓
Model
      ↓
Proposed Tool Call
      ↓
Authorization Decision Point
      ↓
Tool Dispatch Enforcement Point
      ↓
Tool / API
      ↓
Evidence Writer

The model proposes.

The authorization layer decides.

The enforcement layer constrains.

The evidence layer records.

A proposed tool call should be normalized into a structured action request before authorization.

Example:

{
  "principal": {
    "principal_id": "user_123",
    "principal_type": "user"
  },
  "agent": {
    "agent_id": "support_agent",
    "agent_instance_id": "support_agent_001"
  },
  "requested_action": {
    "tool": "send_email",
    "action_type": "external_communication",
    "resource_type": "email_message",
    "destination": {
      "address": "external@example.com",
      "domain": "example.com",
      "classification": "external"
    },
    "data_classification": "internal",
    "attachment_present": false,
    "requires_human_review": true
  },
  "context": {
    "source": "user_request",
    "contains_untrusted_content": true
  }
}

The authorization layer then returns a decision.

{
  "decision": "deny",
  "reason": "external communication includes content influenced by untrusted retrieved input"
}

Only an allowed action should proceed to tool dispatch.

4. Where to implement the authorization boundary

The Authorization Decision Point should usually sit after the model proposes a tool call and before the tool dispatcher executes it.

But in practice, you should not rely on a single control point.

A practical implementation may look like this:

Location	Responsibility
Agent Runtime	Normalize proposed tool calls and request authorization
Tool Router / Dispatcher	Verify authorization decision ID and action hash before execution
Backend API	Re-check RBAC, ABAC, tenant boundary, ownership, and business rules
Evidence Pipeline	Record allow, deny, defer, escalate, freeze, and execution results

This is important:

Agent-side authorization does not replace backend authorization.

The backend must still enforce normal security controls.

The agent layer answers a different question:

Should this proposed tool call be allowed to reach execution at all?

The backend still answers:

Is this API request allowed for this authenticated principal, tenant, resource, and operation?

Both are needed.

5. Implementation sketch

A basic implementation may look like this:

proposed_action = model.generate_tool_call(context)

authorization_decision = authorize(
    principal=current_user,
    agent=agent_instance,
    action=proposed_action,
    context=context,
    policy=policy_store,
    runtime_state=runtime_state
)

write_evidence(
    principal=current_user,
    agent=agent_instance,
    proposed_action=proposed_action,
    authorization_decision=authorization_decision,
    context=context
)

if authorization_decision.decision == "allow":
    result = dispatch_tool(proposed_action)
    write_result_evidence(result)
else:
    handle_non_execution(authorization_decision)

The important part is that authorize() happens before dispatch_tool().

Also, the authorization decision itself is recorded.

Non-execution should also be recorded.

A denial can be just as important as an execution event.

In a real system, authorization or evidence services may fail.

High-impact actions should not silently proceed when the system cannot authorize or record them.

try:
    authorization_decision = authorize(
        principal=current_user,
        agent=agent_instance,
        action=proposed_action,
        context=context,
        policy=policy_store,
        runtime_state=runtime_state
    )
except AuthorizationServiceUnavailable:
    authorization_decision = Decision(
        decision="defer",
        reason="authorization service unavailable"
    )

evidence_written = write_evidence(
    principal=current_user,
    agent=agent_instance,
    proposed_action=proposed_action,
    authorization_decision=authorization_decision,
    context=context
)

if is_high_impact(proposed_action) and not evidence_written:
    raise ExecutionBlocked("evidence required but could not be written")

if authorization_decision.decision == "allow":
    result = dispatch_tool(proposed_action)
    write_result_evidence(result)
else:
    handle_non_execution(authorization_decision)

For high-impact actions, failure to authorize or failure to write evidence should often result in deny or defer, not implicit allow.

6. Source tracking for untrusted input

A difficult question is:

How do we know whether a proposed tool call was influenced by untrusted input?

In practice, we usually cannot perfectly prove semantic influence.

We cannot fully inspect the model’s internal reasoning process.

So the goal should not be “perfect proof of influence.”

A more practical goal is conservative source tracking.

For example:

external emails get trust_level: untrusted,
web pages get trust_level: untrusted,
customer attachments get trust_level: untrusted,
retrieved documents carry source_id, origin, document_type, and retrieved_at,
contexts containing untrusted sources are marked accordingly,
tool arguments derived from external sources retain source IDs,
evidence includes confidence and limitations.

Example:

{
  "context": {
    "input_sources": [
      {
        "source_id": "doc_ext_456",
        "source_type": "retrieved_document",
        "origin": "external",
        "trust_level": "untrusted"
      }
    ],
    "contains_untrusted_content": true,
    "input_influence_assessment": {
      "determined_by": "source_tracker",
      "method": "conservative_context_tainting",
      "confidence": "medium",
      "limitations": "does not prove semantic influence; tracks untrusted sources present in context"
    }
  }
}

This is not a claim that the system understands the model’s internal reasoning.

It is a claim that the system knows which sources were present when the high-impact action was proposed.

That difference matters.

If you ask the model whether it was influenced by malicious content, you may already be asking the compromised component to judge itself.

Source tracking should be outside the model where possible.

7. Binding authorization decisions to action hashes

The Tool Dispatch Enforcement Point should not merely check whether a decision says "allow".

It should verify that the authorization decision applies to the exact action being executed.

Otherwise, an attacker or bug could modify the tool call after authorization.

For example, suppose the authorized destination was:

external@example.com

If the destination, message body, attachment, principal, scope, or resource changes before dispatch, the original authorization decision should no longer apply.

A practical authorization decision may include:

{
  "authorization_decision_id": "authz_decision_789",
  "decision": "allow",
  "action_hash": "sha256:...",
  "principal_id": "user_123",
  "tool": "send_email",
  "resource_type": "email_message",
  "destination": {
    "address": "external@example.com",
    "domain": "example.com",
    "classification": "external"
  },
  "scope": ["external_communication:send"],
  "policy_version": "external_communication_policy@2026-04-26",
  "expires_at": "2026-04-26T12:05:00Z",
  "decision_nonce": "nonce_abc123"
}

At dispatch time, the system should check:

authorization decision ID exists,
action hash matches the current tool call,
principal matches,
tool matches,
resource matches,
destination matches,
scope matches,
policy version is acceptable,
decision has not expired,
nonce has not been reused,
revocation or freeze state is not active.

An authorization decision should be bound to a specific action.

It should not be a reusable “allow token” for arbitrary future tool calls.

8. Allow, deny, defer, escalate, freeze, and reauthorization

Authorization does not have to be binary.

A useful decision model may include:

Decision	When to use
allow	The action is permitted
deny	The action is clearly prohibited
defer	Required information is missing
escalate	A human or higher authority must review
freeze	Runtime state changed and actions must pause
reauthorization_required	Original authorization assumptions changed

For example:

{
  "decision": "escalate",
  "reason": "high-impact external communication may include sensitive retrieved content",
  "required_review": "human_approval"
}

freeze is not just another word for delay.

It should represent a meaningful risk-state change.

Examples:

user authority was revoked,
session anomaly was detected,
tenant boundary mismatch was found,
target resource is under incident response,
downstream delegation expired,
external destination became blocked.

Non-execution decisions should also be recorded.

If an agent tried to perform a high-impact action and the system stopped it, that is useful evidence.

It can help with audits, incident review, policy tuning, and threat detection.

9. Evidence and auditability

For agentic AI tool calls, ordinary application logs may not be enough.

You may need to know:

which agent instance proposed the action,
which principal it acted for,
which tool was requested,
which resource was involved,
whether the action was high-impact,
which input sources were present,
whether untrusted input was involved,
which policy applied,
what authorization decision was made,
whether the action executed,
whether it was denied, deferred, escalated, or frozen,
what the result was.

Example evidence event for a denied tool call:

{
  "event_type": "agentic_action_denied",
  "timestamp": "2026-04-26T12:00:00Z",
  "agent": {
    "agent_id": "support_agent",
    "agent_instance_id": "support_agent_001"
  },
  "principal": {
    "principal_id": "user_123",
    "principal_type": "user"
  },
  "requested_action": {
    "tool": "send_email",
    "action_type": "external_communication",
    "resource_type": "email_message",
    "destination": {
      "address": "external@example.com",
      "domain": "example.com",
      "classification": "external"
    },
    "data_classification": "internal",
    "attachment_present": false
  },
  "authorization": {
    "decision": "deny",
    "policy_reference": "external_communication_policy@2026-04-26",
    "reason": "untrusted retrieved content influenced a high-impact external communication action"
  },
  "context": {
    "input_sources": [
      {
        "source_type": "retrieved_document",
        "origin": "external",
        "trust_level": "untrusted"
      }
    ],
    "input_influence_assessment": {
      "determined_by": "policy_engine",
      "method": "source_tracking",
      "confidence": "medium"
    }
  },
  "result": {
    "executed": false,
    "outcome": "blocked_at_authorization_boundary"
  }
}

The point is not just to record that something was denied.

The point is to preserve enough context to understand why.

10. Making evidence trustworthy

Evidence is only useful if it can be trusted.

A system should consider:

Who writes the evidence?
Can the agent runtime modify or delete it?
Is the evidence store append-only?
Is sensitive content over-collected?
What happens if evidence writing fails?

For high-impact actions, evidence should ideally be written to a system independent from the agent runtime.

Common patterns include:

append-only logs,
WORM storage,
object lock,
SIEM forwarding,
audit log pipelines,
cryptographic digests,
redaction of sensitive raw content,
correlation IDs across model, authorization, tool, and backend logs.

A key design question is:

Should a high-impact action be allowed if evidence cannot be written?

For many systems, the answer should be no.

If an external communication, access-rights change, financial transaction, or production change cannot be evidenced, the safer decision may be deny or defer.

11. Policy example

A simple policy example might look like this:

policies:
  - id: external_communication_policy
    match:
      action_type: external_communication
      destination.classification: external
    conditions:
      - data_classification in ["confidential", "internal"]
      - contains_untrusted_content == true
    decision: escalate
    required_review: human_approval
    evidence_required: true

  - id: production_change_policy
    match:
      action_type: production_system_change
    conditions:
      - principal.role not in ["sre", "release_manager"]
    decision: deny
    evidence_required: true

  - id: sensitive_read_policy
    match:
      action_type: sensitive_data_access
    conditions:
      - data_classification in ["confidential", "restricted"]
      - purpose not in ["user_requested_summary", "approved_workflow"]
    decision: deny
    evidence_required: true

This is only an illustrative example.

Real policies should align with the organization’s IAM model, data classification, tenant boundaries, business workflows, audit requirements, and risk appetite.

12. Read-only tool calls can still be high-impact

High-impact actions are not only write operations.

Read-only actions can also be high-impact.

Examples:

reading customer records,
searching internal documents,
reading Slack logs,
reading Gmail,
querying CRM data,
accessing source code,
reading secrets,
retrieving incident reports,
reading financial data.

A read-only tool call may place sensitive content into the model context.

That content may then influence a later external communication, file share, webhook, or API call.

So “read-only” does not automatically mean “low-risk.”

The risk depends on what is read, why it is read, who requested it, what context it enters, and what downstream actions can use it.

13. Human approval is not magic

Human approval can be useful.

But it is not automatically meaningful.

In real systems, human approval can fail because:

reviewers do not read the details,
approval prompts are too long,
reviewers trust the model’s natural-language explanation,
approval fatigue develops,
untrusted input influence is hidden,
sensitive data classification is unclear,
diffs are not visible,
downstream consequences are not explained.

So if human approval is required, reviewers should not only see a model-generated summary.

They should see structured information such as:

normalized tool call,
destination,
target resource,
data classification,
whether untrusted input was involved,
diff or change summary,
policy reason,
expected impact,
evidence status.

Human approval is not “safe because a human clicked approve.”

It is only useful when the human receives enough information to make a meaningful decision.

14. Relation to AAEF

This article does not require any specific framework.

The design ideas above can be implemented independently.

However, I have been working on a public review draft framework that organizes these ideas more systematically:

AAEF: Agentic Authority & Evidence Framework

AAEF stands for Agentic Authority & Evidence Framework.

The core thesis is:

Model output is not authority.

AAEF v0.2.0 Public Review Draft includes:

44 controls,
Evidence Event JSON Schema,
High-Impact Action Taxonomy,
Assurance Model and Residual Risk Mapping,
Assessment Worksheet,
Reference Architecture,
OWASP Agentic Top 10 mapping.

AAEF is not a certification scheme or formal standard.

It is a public review draft intended to help structure discussion around agentic AI action assurance, authority boundaries, evidence design, and assessment.

Repository:

https://github.com/mkz0010/agentic-authority-evidence-framework

Release:

https://github.com/mkz0010/agentic-authority-evidence-framework/releases/tag/v0.2.0

Discussion:

https://github.com/mkz0010/agentic-authority-evidence-framework/discussions/42

Japanese implementation note:

https://qiita.com/mkz0010/items/a7fb683cb2ef395bda35

Feedback is welcome.