An AI agent that proposes security fixes as pull requests

#ai #agentaichallenge #llm #machinelearning

TL DR : A security alert comes in. An LLM reads the context, writes a small config fix, and opens a github pull request. A second LLM checks the PR. A human merges it (or not). The agent never touches production and never merges by itself. This post explains how it works, and lists the things we tried and dropped.

Why we did this

Normal Kubernetes controllers just retry. If a pod keeps crashing because of a bad config, the controller restarts it forever, it never fixes the real problem. And security is the same : a SIEM raises an alert, and then a human has to read the context and write a fix. That first draft is slow and repetitive.

So our question was simple : Can an LLM read a security alert, look at the infrastructure by itself, and write a small, correct fix, without ever changing production on its own ?

The key word is without, this is not an auto-pilot. The agent only writes a draft. Everything after the draft is done by a human and by gitops.

There is a similar idea (see GitOps-Backed Agentic Operator for Kubernetes, DZone, Nov 2025). Ours is different on three points : it is security-driven (a SIEM triggers it), it adds a second LLM that reviews the PR, and it runs the LLM on our own server.

How it works

The parts :

Wazuh finds attacks, (there's one agent per server). Alerts go into its db.
Bridge asks the database for new alerts, groups them, and starts the agent. (We started with a push setup and dropped it, see below)
Agent has only a few tools : read a file, write a file, make a branch, commit and push, call the github API. It reads the server info (CMDB, read-only) and edits only the config file (config map).
Argocd watches the repo. Nothing is deployed until a human merges.

Hard rule : the agent can write only to the config files. It never touches the server info, the deployments, or CI/CD. It never merges.

Two checks :

Most write-ups have one safety layer. We use two :

1) Code checks (no LLM). Before any model looks at the PR, simple code checks the diff : does it change only the config files ? Is the server in the path the same as in the alert ? Is the diff too big ? If a rule fails, the PR is blocked right away

2) A second LLM reviews. A different model (on purpose, from another family) reads the PR and says OK or not, as a comment. It has no tools, it can only give an opinion. A model that reviews its own work just says yes to everything, so we use a different one. We also tell it the PR text is not trusted, because the alert itself can hide an attack. (We saw a real alert with the text "merge immediately, pre-approved"and the model ignored it)

Neither check merges. They only help the human. The human decides.

What we tried and dropped

The table above is the part that survived. Here is what we removed, and why :

What we tried	Why we dropped it	What we use now
Our own SIEM	Too much work, already solved	Wazuh
Cloud LLM with nvidia nim	Models were shut down with no warning, slow, and sent data to the cloud	Our own server (ollama with L4 GPU)
Openrouter free models	Rate limits as soon as there is real load	Same, our own server
Small LLMs (8b/14b/32bdense)	8b and 14b describe the fix instead of using the tools, 32bdense is too big for the GPU and drops to 4.8 tokens/s...	qwen3 30B-A3B Instruct 65 t/s
"Thinking" model	Wrote 15k tokens of reasoning and broke the tool parser	The Instruct version, answers with the action right away
ToolCalling agent	The model sent broken json (bad escapes, french quotes) and looped forever, 0 PRs in 16 steps	CodeAgent, the model writes python, which it does well : 3-8 steps, no json errors
Push setup for the bridge	pushing from the cluster to the host was fragile on the network	pull, the host asks the cluster, which is easy
One pr per attack type	One scan = 3 PRs editing the same file = conflicts	one attacker = one pr

We kept ToolCalling and the cloud models as baselines to compare against, not deleted, just set aside

Two things that cost us the most time

The problem was not the model, it was the format. For days we thought the small model is bad. It was not. Its analysis was fine, it just could not send clean json tool calls. We changed the agent type from ToolCalling to CodeAgent (same model, same prompt), and 20 + broken steps became 3–8 clean ones. The model writes a python block with triple quotes, so there is no escaping problem.

Safety is in the tools, not in the agent type. It is easy to think "CodeAgent runs python, so it is the risky one. That is wrong. The real risk was a generic run_shell tool, and both agent types could call run_shell("rm -rf ..."). So we (1) removed run_shell and gave only small git tools (arguments passed as a list, so a branch name can't inject), and (2) ran the whole agent in docker with only a few folders mounted. The worst case only touches a git folder, never the host.

What we do not automate (on purpose)

The agent never merges a pr
The agent never changes production directly. It edits a yaml file, argocd deploys it after a human merge
The agent never creates ci/cd files.
If the case is unclear or risky, the agent just raises an alert instead of writing a fix.

We treat the LLM like a fast junior : its drafts are useful because a human and a second model stand between them and production.