Abubakar

Posted on Apr 25

Prompt as Authorization

#agents #llm #promptengineering #security

A minimal agent system where policy is defined in the system prompt and the model is the only decision point before execution.

Environment

Single-process system: model, tool loop, backend
Local execution
Single user context
Identity defined in the system prompt (user_1 / Alice)

No session, token, or external identity binding is present.
No API gateway, middleware, or policy engine exists in the execution path.

System

The system consists of:

a language model
a system prompt defining rules
a tool loop mapping model decisions to backend calls
a backend exposing:
- get_user_orders
- get_user_info
- issue_refund

Control flow:

User input
Model interprets the request
Model decides whether to call a tool
Tool Dispatcher executes the request against the backend

Minimal System Shape

Model receives user input
Model decides whether to invoke a tool
Tool Dispatcher forwards the request
Backend executes the request
No validation occurs between decision and execution

Architecture

The model decides whether to act. There is no gate between that decision and backend execution.

Prompt Policy

The system prompt defines:

current user is Alice
refunds must not exceed $50
order association must be confirmed before issuing a refund

The model is responsible for applying these rules prior to tool invocation.

Observed Behavior

Under adversarial input:

cross-user access was refused
prompt injection attempts failed
instruction override attempts failed

No tool call = no backend interaction

When the model emits a tool call:

the dispatcher forwards it
the backend executes it
no additional checks are performed

Counterexample

A valid request produces an invalid outcome:

text id="v0_case"
You: refund order_2 for $45

[tool call] get_user_orders({"user_id": "user_1"})
[tool result] {"order_2": {"item": "Laptop", "amount": 1200.0, "refunded": false}}

[tool call] get_user_info({"user_id": "user_1"})
[tool result] {"name": "Alice"}

[tool call] issue_refund({"order_id": "order_2", "amount": 45})
[tool result] {"status": "refund issued"}

The system issues a $45 refund against a $1,200 laptop because model judgment is treated as sufficient authority, without validation against the underlying order value.

The model:

confirms order association via tool output
checks the requested amount ($45) against the $50 limit
proceeds with tool execution

The backend:

executes the request

System Observation

There is no separate component enforcing policy.

The system reduces to:

a model
a system prompt
a dispatcher
a backend

The first execution decision occurs inside the model.

Once a tool call is emitted, it is forwarded and executed.

Core Finding

Policy exists as text in the system prompt
The model interprets and applies it
Execution occurs if a tool call is emitted

There is no separation between:

deciding an action
authorizing an action

Model approval is sufficient for execution.

Implication

A system can:

follow prompt instructions
behave correctly under adversarial input

and still:

execute actions that are not validated against system state

Because there is no independent check before execution.

DEV Community