Every AI coding assistant is shipping the same security bugs.

#security #webdev #programming #ai

*Not a promo.. I mean why would anyone promote something free, actually looking to get some contributors to help us seal sone holes of ai-coded products and encourage founders of ai-written products to respect security and privacy.
*

So, here it goes.. Nowadays many of us are building with Claude Code, Copilot, Cursor, Codex, Gemini, or any AI coding assistant, this is worth running against your project. - To be honest, I did think of building a tool around this, but it doesn't sound nice to monetize on vulnerabilities for me, nor do I see much logic having a 'blackbox' that allegedly scans your projects. We're talking about security here, so IMO such things should be open source and allow contributions.

And of course - my good friend AI helped me speed up the shipment of this repo :)

Some of most common things that appear :

JWT secrets set to "secret" or "changeme"
API keys in NEXT_PUBLIC_ env vars, fully exposed to the browser
User input going directly into system prompts via string interpolation
Vector databases using one shared namespace for all users — any user's RAG query can surface another user's documents
Agents handed child_process access with no scope restrictions

These aren't obscure edge cases, this is how most of AI-generated code comes out, if you allow it to produce HUGE chunks instead of targeted and controlled ai-coding. Even knowing tons about security and vulnerabilities, having AI write code might still expose you to some common cases.

The problem with existing references

OWASP, NIST, and CWE are good. They were written for a world where developers wrote most of their code by hand. They don't cover MCP tool poisoning, cross-agent prompt injection, or what happens when your agent's long-term memory accepts unsanitized writes. Ok, that's not entirely true - today AI-generated code is allover the place, so we see more and more tools to review the code, etc, but many are paid and/or complicated which is an entry barrier for a vibe coder.

What I and few AIs shipped

A 258-item checklist across 17 categories, with a detection method for every item: static grep or AST pattern, runtime test, or config inspection. Severity rated. 33 items in Category 6 specifically cover LLM integration vulnerabilities that don't appear elsewhere.

More usefully: a companion prompt.md that turns the full checklist into a structured codebase scan you can run in one command.

Running it

From your project root, with Claude Code installed:

claude "$(curl -s https://raw.githubusercontent.com/a-leks/genai-app-security-checklist/main/prompt.md)"

With Gemini CLI:

gemini "$(curl -s https://raw.githubusercontent.com/a-leks/genai-app-security-checklist/main/prompt.md)"

The model reads your codebase, runs all 258 checks, and returns a markdown report with severity, file path, line number, code snippet, and a specific remediation for each finding.

What the output looks like

### [6.1] Prompt injection — user input in system prompt
- Severity: Critical
- File: app/api/chat/route.ts
- Line: 34
- Snippet:
    const systemPrompt = `You are a helpful assistant. User context: ${req.body.userBio}`
- Remediation: Move user-supplied content to the user message role, never system.
  Strip prompt control characters before passing any user string to the model.

The LLM-specific items worth knowing

6.26 — MCP tool poisoning. If your agent uses third-party MCP servers, tool results from those servers enter the agent's context as trusted input. An attacker who controls one of those servers can inject instructions through it.

6.27 — Agent memory poisoning. Whatever your agent writes to long-term memory gets read back in future sessions. If malicious content reaches that memory store, it executes next time the agent retrieves it.

6.30 — Cross-agent prompt injection. In multi-agent systems, output from Agent A becomes input to Agent B. If an attacker can influence Agent A's output, Agent B processes the attack payload without knowing its origin is untrusted.