Radoslav Tsvetkov

Posted on May 14

A pragmatic threat model for AI coding agents, with controls you can ship today

#ai #rust #security #owasp

There is a moment in every AI coding rollout where the question shifts from "can we make this work" to "what is the worst thing this can do". If you have not had that moment yet, this article will save you a quarter.

The OWASP Top 10 for Agentic Applications, published in late 2025, is the cleanest shared vocabulary we have for the failure modes. It is short, opinionated, and useful. This post takes each item, names the failure pattern in plain language, and pairs it with a control you can ship around an AI coding agent today.

The configuration shown uses Akmon's policy profiles, packs, and CLI flags. The pattern is general; if you use a different tool, the lessons translate.

How to read each section

For each item:

What it is, in one paragraph.
The failure story, the kind of incident this prevents.
The control, the actual lever, with code or commands.
The trade off, the thing the control costs you.

1. Prompt injection in tool inputs

What it is. A tool returns text. The text contains a hidden instruction. The agent reads the text and the next decision is reshaped.

The failure story. A scout dossier reads a third-party README. The README has a hidden instruction. The agent later writes a config that the README told it to write.

The control. Use the prod profile in production paths. Restrict the tool surface in a pack. Constrain web_fetch to allowed hosts only.

# .akmon/policy-packs/web.toml
[network]
web_fetch_allowed_hosts = ["docs.example.com", "api.internal"]
web_fetch_require_https = true

akmon --policy-profile prod --policy-pack .akmon/policy-packs/web.toml \
      --task "summarize the API spec at docs.example.com/spec.html"

The trade off. Some legitimate fetches will fail until you add the host. That is a feature.

2. Excessive agency

What it is. The agent has access to tools it does not need for the task. Breadth becomes surface area.

The failure story. A documentation task has access to a shell tool that runs migrations. The model invents a migration command on a misread.

The control. Profile-driven tool surface. Use --plan for read-only scoping before a real run. Add --add-dir to lock the sandbox.

akmon --policy-profile prod --plan \
      --task "list outdated dependencies and propose updates"

The trade off. Two-step workflow. Plan first, implement second. Worth it.

3. Sensitive information disclosure

What it is. Sensitive data ends up in the model context, the logs, or the agent output.

The failure story. A test fixture has a real customer record. The agent surfaces the record in a comment in a generated PR.

The control. Redact specific objects from the session before sharing. Use Ollama for sensitive paths so the prompt never leaves the machine.

akmon redact <session-id> \
  --output sanitized.akmon \
  --object <object-hash> \
  --reason "Customer record removed before audit handoff"

The trade off. Redaction adds friction at handoff. The friction is the point.

4. Improper output handling

What it is. The agent's output is rendered or executed somewhere it should not be.

The failure story. The agent writes a Markdown reply that includes a fake confirmation block. A downstream automation parses the block as a structured action.

The control. Force structured output where it matters. Use --output json in headless flows so the response is machine-parseable, and validate against your own schema downstream.

akmon --yes --output json --task "$task" | jq '.summary'

The trade off. Free-form prose has its place. For action-triggering paths, structured output is non negotiable.

5. Supply chain weaknesses

What it is. A dependency the agent uses changes in a way that affects behavior.

The failure story. An MCP server you use upgraded a tool. The output shape changed. The agent silently misroutes.

The control. Pin model and tool versions in AKMON.md and in policy packs. Run akmon replay in strict mode for a small set of canonical sessions on every PR that touches a tool wrapper.

akmon replay <baseline-session-id> --mode strict --format json | jq '.passed'

The trade off. A small set of canonical sessions has to be maintained. Treat them as part of your test suite.

6. Insecure plugin or tool design

What it is. A tool was designed without least privilege in mind.

The failure story. A generic http.fetch tool can hit any URL, including internal addresses.

The control. Restrict web_fetch to public allowed hosts and an HTTPS requirement. Use --add-dir to lock filesystem reads to the project root. Avoid generic shell tools in production profiles.

akmon --policy-profile prod \
      --add-dir ./src --add-dir ./docs \
      --task "patch the parser to accept ISO 8601 with offset"

The trade off. Some legitimate workflows need broader access. Use staging for those, not prod.

7. Excessive resource consumption

What it is. The agent loops, retries, or expands recursively. Tokens, dollars, and tool calls climb without a ceiling.

The failure story. A planning prompt recurses. Over a long evening it racks up provider charges.

The control. Use --max-budget-usd for headless runs. Use --fallback-model for graceful degradation. Use slo verify to alarm on retry attempts.

akmon --yes --max-budget-usd 2.50 --fallback-model "ollama:qwen-coder-7b" \
      --task "..."

akmon slo verify .akmon/evidence/<session>.json --strict

The trade off. Some legitimate runs will hit the budget. Calibrate per task class. Alarm and review.

8. Vector and embedding weaknesses

What it is. Retrieval introduces content from an index. If the index can be poisoned, your prompts are poisoned.

The failure story. A staging dataset got merged into the production index. An old test record contained a prompt injection. The production agent surfaced it on a real query.

The control. Provenance on every index entry. Use the spec workflow (akmon spec) to gate retrieval-heavy changes through a planning step. Treat RetrievalCall events as the audit trail.

akmon --index --policy-profile prod \
      --task "ground the implementation in the design doc"

The trade off. Provenance metadata is harder to retrofit than to design in. Start with the most sensitive index.

9. Misinformation and overreliance

What it is. The agent claims things confidently that are not true. The user trusts the agent.

The failure story. The agent invents a function in a library. The reviewer trusts it. CI catches it; the team's calibration drops.

The control. Require structured outputs for fact-bearing tasks. Use --architect for two-phase plan plus implementation, where the planner uses a stronger model. Layer human review for any change that touches public APIs.

akmon --architect --planner-model "anthropic:claude-sonnet-4-6" \
      --task "design and implement the JWT rotation flow"

The trade off. Two-phase costs more tokens. The reviewer can read the plan first, which is its own win.

10. Unbounded consumption of context

What it is. Context grows over a long session. Old, irrelevant content shapes new decisions.

The failure story. A multi-hour session keeps adding context until the model truncates from the middle. Behavior shifts in ways nobody can explain.

The control. Use the spec workflow to break work into discrete sessions. Use --continue and --session deliberately, not by habit. Inspect the session journal periodically.

akmon spec parser-iso8601-offset "Accept ISO 8601 timestamps with timezone offsets"
akmon spec parser-iso8601-offset design
akmon spec parser-iso8601-offset tasks
akmon spec parser-iso8601-offset implement

The trade off. Less ambient context. The win is reproducibility.

Putting it together

If you implemented all ten controls, you would have a system with:

A small, well-known tool surface, profile-driven.
Structured output where it matters.
Redaction available before any external handoff.
Hard caps on resource use.
Provenance on retrieval, with RetrievalCall events as evidence.
Replay-based regression detection in CI.

Most teams will not need all ten on day one. Pick the three that match your top risks. Get them in production. Watch them work. Add the rest as they earn their place.

Configuration to put in a pack

A small pack to start with, ready to drop in .akmon/policy-packs/:

# .akmon/policy-packs/baseline.toml
[network]
web_fetch_allowed_hosts = ["docs.example.com", "api.internal"]
web_fetch_require_https = true

[shell]
allowed_commands = ["cargo", "npm", "go", "make", "git"]
deny_commands = ["rm -rf", "sudo"]

[mcp]
allowed_servers = ["https://mcp.tools.internal/orders"]

[tools]
web_fetch_default = "deny"
shell_default = "ask"

Then run:

akmon --policy-profile prod \
      --policy-pack .akmon/policy-packs/baseline.toml \
      --task "..."

Inspect the merged effective policy:

akmon policy show-effective \
  --profile prod \
  --policy-pack .akmon/policy-packs/baseline.toml

The honest part

Controls do not make AI coding agents safe. They make AI coding agents survivable. There will still be model failures, tool bugs, and edge cases nobody saw. The job of the policy and evidence layer is to keep the consequences small and the explanations available.

If you want to dig deeper into the evidence side of the loop, the next post in this series breaks down the redaction workflow. The repo is at github.com/radotsvetkov/akmon. The format is at github.com/radotsvetkov/agef. The site is at radotsvetkov.github.io/akmon.

DEV Community