DEV Community

Mehdi BOUTAYEB
Mehdi BOUTAYEB

Posted on

Why evidence matters more than model memory in AI pentesting

An AI finding you cannot reproduce is a liability, not a result. Darkmoon attaches the exact commands and raw tool output to every finding so a human can peer review it.

The trust problem

Most AI security tools return a confidence score and a paragraph. In offensive security that is not enough. If you cannot show the command that proved a vulnerability, you cannot defend it in a report or a remediation meeting.

What an evidence trail actually contains

For every finding Darkmoon keeps the executed command, the raw output, and the reasoning that connected them. The finding is a reproducible artifact, not a claim you have to take on faith.

Why this beats a bigger model

A larger model reduces some errors but never removes them. The evidence trail is what lets a human catch the ones that remain, which is exactly why we made it the core of the design rather than an afterthought.

How it changes the workflow

Reviewers stop re verifying everything by hand and start spot checking the trail. Reports write themselves from real data instead of paraphrased model output.

Try it

If you want to see the evidence trail on a live target, clone the Community Edition and point it at a lab.

Built by pentesters, open sourced for pentesters. Feedback on the methodology and the evidence trail is genuinely welcome.

Top comments (0)