DEV Community

jg-noncelogic
jg-noncelogic

Posted on • Originally published at github.com

Show HN: I built AgentSafety, an open benchmark for coding-agent safety

AgentSafety tests whether coding agents pick allow/ask/refuse on risky ops. 50 practical cases (prompt injection, secret access, destructive cmds, deps, out‑of‑workspace writes). Useful baseline — needs more multi-step and polyglot scenarios. Repo: https://github.com/serkanaltuntas/AgentSafety

Top comments (0)