Niat Murad

Posted on Jun 28 • Originally published at wpcoder.ai

Why LLMs Write Insecure WordPress Code — and the Architecture We Built to Fix It.

#wordpress #ai #php #architecture

The Naked Truth About Chatbots and PHP

If you've ever asked a general-purpose LLM to "write me a WordPress plugin," you already know how this story ends. You get a single 400-line file. It registers a hook with the wrong priority. It writes directly to $wpdb with an un-sanitized $_POST value. It calls a function that was deprecated three major versions ago. And the AJAX handler it just generated would happily execute for any unauthenticated visitor who knows the action name.

The frustrating part isn't that the model is "dumb." It's that it's doing exactly what it was built to do.

A general-purpose model is a statistical next-token predictor. It was trained on a giant pile of internet text, which includes excellent code, mediocre code, and a frankly alarming amount of insecure 2014-era StackOverflow snippets that someone copy-pasted into a tutorial. When you ask it for a plugin, it isn't reasoning about the WordPress Plugin Handbook. It's predicting what the next plausible character is, weighted by everything it ever absorbed — good and bad alike. Security isn't a constraint in that process. It's just one more pattern competing with thousands of insecure ones.

So the question we set out to answer wasn't "how do we make the model smarter?" It was a software engineering question: how do you wrap a probabilistic text generator inside a deterministic system that refuses to emit code violating a fixed set of rules?

This post is a walkthrough of the architecture we landed on — a multi-step agentic system that plans, structures, and audits its own output before it ever reaches the developer. No magic, no hand-waving. Just the design patterns that made the output trustworthy.

The Vulnerability Vector: Why "Autocompleted" Plugins Are Dangerous

Let's be specific about why raw LLM output is a security liability in a CMS context, because the failure mode is different from general application code.

WordPress is a hook-heavy environment. Almost everything meaningful happens by attaching callbacks to actions and filters, and those callbacks frequently touch the database, render HTML, or perform privileged operations. That architecture has two consequences for AI-generated code:

The blast radius of one missing check is enormous. A REST endpoint registered without a permission_callback isn't a minor bug — it's an open door. An AJAX handler that skips wp_verify_nonce() is a CSRF vulnerability waiting for a crawler.
The dangerous lines look harmless. A model will confidently generate $wpdb->query("UPDATE ... WHERE id = " . $_GET['id']). It compiles. It "works" in a demo. And it's a textbook SQL injection.

The three checks that matter most in WordPress — input sanitization, output escaping, and authorization — are also the three things a next-token predictor is most likely to drop, because in its training distribution they're frequently absent. Plenty of working tutorial code omits them.

So the real engineering challenge crystallized into a single sentence:

How do we force the system to treat sanitize_text_field(), wp_verify_nonce(), and current_user_can() as non-negotiable compilation requirements rather than optional stylistic suggestions?

You can't solve that with a better prompt alone. Prompts are persuasion; they're probabilistic nudges. We needed structure.

The Architecture: A Domain-Specific Grounding Engine

The core idea is to stop treating code generation as a single text-completion call and start treating it as a pipeline of specialized agents, each with one job, passing structured data between them. We landed on a three-tier agentic cycle.

Phase 1 — Intent Analysis & Blueprinting (the "PM" agent)

The first agent never writes a line of PHP. Its only job is to convert the user's plain-English request into a structured JSON manifest. So a prompt like "a plugin that lets editors flag posts for review" becomes something closer to:

{
  "plugin_slug": "post-review-flags",
  "capabilities_required": ["edit_others_posts"],
  "hooks": [
    { "type": "action", "name": "admin_menu", "callback": "register_review_page" },
    { "type": "ajax", "name": "flag_post", "auth": "logged_in", "nonce": true }
  ],
  "data_layer": { "storage": "post_meta", "meta_key": "_needs_review" },
  "files": ["post-review-flags.php", "includes/class-flag-controller.php", "admin/views/review-list.php"]
}

This manifest is the contract. By forcing the system to declare its capabilities, hooks, and nonce requirements up front — before any code exists — we make security a planning decision instead of a generation accident. If the AJAX handler is declared with "nonce": true here, downstream agents are obligated to honor it.

Phase 2 — Context-Aware Generation (the "Developer" agent)

The second agent compiles the manifest into actual files, one module at a time. The important architectural choice here is separation of concerns enforced by the file tree: data-layer logic, controllers, and view templates are generated as distinct files rather than one monolith.

The agent doesn't generate each file in isolation, though. It maintains an active reference map of the multi-file dependency graph (more on how that works in the context-window section below), so a function declared in the controller is known to the view that calls it.

Phase 3 — The Deterministic Audit Loop (the "Security" agent)

This is the part that actually buys the trust. After generation, a dedicated agent parses the produced code — not by "asking the model if it looks safe," but through a deterministic execution path that walks the structure and verifies invariants:

Every registered AJAX/REST callback has a corresponding capability check and (where state-changing) a nonce verification.
Every value originating from a superglobal passes through a sanitization function before use.
Every dynamic value reaching output is escaped at the point of echo (late escaping).

If an invariant fails, the code is rejected and routed back for regeneration with the specific violation flagged. The loop closes. Nothing ships that hasn't passed the audit.

When we were designing the core engine behind the WPCoder AI plugin generator, we realized that standard semantic embeddings weren't enough. To eliminate hallucinations, we had to anchor the generation loop within a custom-built WordPress Core parsing layer. This ensures the model dynamically validates hooks against active Core APIs before writing the final PHP strings — so a hallucinated function name fails fast instead of shipping silently.

Code Breakdown: Chatbot Output vs. Grounded Agent Output

Architectural claims are cheap. Here's the difference made concrete.

Example A — typical generic AI output. Looks fine at a glance. It's a CSRF hole with a side of SQL injection.

add_action('wp_ajax_save_setting', 'save_setting');

function save_setting() {
    global $wpdb;
    $value = $_POST['value'];
    $wpdb->query(
        "UPDATE {$wpdb->prefix}options SET option_value = '$value' WHERE option_name = 'my_setting'"
    );
    echo "Saved: " . $value;
    wp_die();
}

No nonce. No capability check. Raw $_POST interpolated straight into SQL. Unescaped output. And wp_ajax_save_setting only covers logged-in users by luck of the hook name — the developer never reasoned about authorization at all.

Example B — grounded agent output. Same feature, every invariant satisfied.

add_action('wp_ajax_save_setting', [$this, 'save_setting']);

public function save_setting(): void {
    if (!check_ajax_referer('save_setting_action', 'nonce', false)) {
        wp_send_json_error(['message' => 'Invalid request.'], 403);
    }

    if (!current_user_can('manage_options')) {
        wp_send_json_error(['message' => 'Insufficient permissions.'], 403);
    }

    $value = sanitize_text_field(wp_unslash($_POST['value'] ?? ''));

    update_option('my_setting', $value);

    wp_send_json_success([
        'message' => sprintf('Saved: %s', esc_html($value)),
    ]);
}

Nonce verified, capability checked, input unslashed and sanitized, output escaped, and the unsafe raw $wpdb write replaced with the Options API. None of that is the model "remembering" to be careful. It's the manifest from Phase 1 being enforced by the audit in Phase 3.

Solving the Context Window Wall for Multi-File Plugins

Here's the problem that bites everyone who tries to generate real multi-file projects: the model forgets.

You generate core.php, which registers a script handle via wp_enqueue_script('my-admin-bundle', ...). Three files later you generate the admin dashboard, and the model — having moved past that text in its context — invents a different handle, or enqueues an asset that was never registered. The dashboard silently loads nothing. Classic context-window amnesia.

Our fix is a shared internal state registry that lives outside any single model call. As each sub-agent produces a file, it writes structural facts into this registry — registered script handles, declared class names, public method signatures, meta keys, hook names. Before another sub-agent generates a file that depends on those facts, the relevant registry slice is injected into its context.

So the dashboard agent doesn't recall the handle from a thousand tokens ago. It's handed the exact string 'my-admin-bundle' as a hard fact, because the core agent wrote it to the registry the moment it was created. The dependency graph becomes shared state rather than something each agent has to hold in working memory. That single change is what made plugins beyond a couple of files actually coherent.

Shifting from Code Writer to Code Auditor

What this architecture really changes is the developer's day.

When boilerplate becomes reliable, you stop spending cognitive energy typing add_action, wiring up wp_enqueue_script, or hand-rolling the same nonce dance for the hundredth time. That work doesn't disappear — it gets generated correctly and handed to you for review.

So the job shifts. Your primary role moves from manual typing to architectural auditing. You're reading the generated manifest and asking the questions only a human should own: Is this the right capability for this action? Is post_meta actually the right storage layer here, or should this be a custom table? Does this hook priority interact badly with that other plugin?

To be clear, this is emphatically not a replacement for developers. A system that audits its own output still produces output that a human needs to judge in context, against requirements the model can't see. What agentic generation gives you is leverage: it removes the tedious, error-prone mechanical layer so your attention lands where it's actually valuable — verification, integration, and design.

Conclusion & Community Discussion

Building a code-generation tool you can actually trust means giving up on the single-prompt chat script. A raw next-token predictor will always be a coin flip on security. The reliability comes from the system you wrap around it: structured blueprinting up front, modular generation with shared state, and a deterministic audit loop that refuses to ship code violating its invariants.

None of these pieces are exotic on their own — they're ordinary software engineering patterns. The trick is applying them to constrain a probabilistic generator inside hard, domain-specific rules.

So I'll turn it over to the community: how are you handling security validation when you integrate AI tools into your development stack? Prompt-level guardrails, post-generation static analysis, human review gates, something else entirely? Let's talk about it in the comments.

DEV Community