Beyond CAPTCHA: Building an AI Filter for Contact Form Spam

#webdev #security #ai #tutorial

If you ship public contact forms for a living, you already use CAPTCHA. Probably reCAPTCHA v3 or Cloudflare Turnstile. You probably also have a honeypot field. And the inbox is still full of "Introducing our B2B marketing services."

The reason is not that CAPTCHA is broken. The reason is that the things filling your form are not bots anymore.

This article is about what comes after CAPTCHA. The short version: stop trying to block at the gate, and start sorting at the inbox. Here's how to build that.

The five front-end defenses you should already have

Listing fast, with the implementation path:

Honeypot field -- Hidden input, server drops on populated. Vendor numbers say 70-80% of bots. Cheapest line.
CAPTCHA (Turnstile / reCAPTCHA v3) -- Risk-score based, no UX cost. Industry baseline.
Rate limiting -- Per IP + form id, 60-second window. KV store or Redis.
Signed form token -- Server issues HMAC token on render, verifies on submit. Doubles as CSRF.
Anti-solicitation notice -- Right above the form, not in the footer. Read by reputable senders, ignored by the rest, but cheap.

Get them all in. They will reduce your noise floor.

Why this stops being enough

Two things are happening at the same time.

Human contractors. Outreach vendors hire people to type pitches into your form by hand, paid roughly USD 0.30-0.70 per send. CAPTCHA targets machines, not humans. The contractor reads your honeypot's aria-hidden, ignores it, fills the visible fields, and clicks submit. Your defenses don't fire because nothing about the request is wrong.

AI form-fillers. Open-source stacks like Browser Use, or simple Playwright + GPT scripts, can read a form's DOM, identify required fields and checkboxes contextually, and fill them. The "I confirm this is not a sales inquiry" checkbox is now click-through for an LLM. This was a real psychological barrier three years ago. It is rapidly becoming a non-barrier.

Both of these defeat front-end defenses by design. There's a structural ceiling on "make sending harder." Past it, you need a different layer.

The next layer: classify what arrives

The change in framing: stop trying to block, start sorting. Every inbound response gets a label, the operator chooses what to do with each label.

The labels we use are deliberately simple:

legitimate -- a real inquiry
sales -- a pitch
suspicious -- model is uncertain

Plus a 0-100 score, so the operator can see uncertainty.

Building the pipeline

The skeleton, framework-neutral:

// 1. Submit handler -- never blocks on classification
async function handleSubmit(req: Request) {
  const data = await req.formData();
  const responseId = await saveResponse(data);

  // critical path ends here
  scheduleClassification(responseId);

  return Response.redirect('/thank-you', 303);
}

// 2. Classification job -- async, fail-soft
async function classifyResponse(responseId: string) {
  const text = await loadResponseText(responseId);
  if (!text) return;

  try {
    const result = await Promise.race([
      callClassifier(text.slice(0, 2000)),
      timeout(10_000),
    ]);
    await saveLabel(responseId, result.label, result.score);
  } catch (err) {
    // never fail the form. leave label null and move on.
    logSilently(err);
  }
}

// 3. The classifier call -- here using OpenRouter + Claude Haiku 4.5
async function callClassifier(text: string) {
  const res = await openrouter.chat.completions.create({
    model: 'anthropic/claude-haiku-4.5',
    temperature: 0,
    max_tokens: 256,
    messages: [
      { role: 'system', content: SYSTEM_PROMPT }, // see below
      { role: 'user', content: text },
    ],
  });
  return JSON.parse(res.choices[0].message.content);
}

A few non-obvious decisions are doing most of the work here.

Asynchronous, off the critical path

The submitter sees no latency change. If the classifier is slow, down, or rate-limited, the form still works. On Next.js, after() is the cleanest way to fire-and-forget. On other stacks, a lightweight queue (BullMQ, AWS SQS, Vercel Queues) is fine.

Fail-soft, never fail-hard

If the classifier fails for any reason, the response just gets label = null. The form submission was never coupled to it. This is a hard rule -- once you couple a third-party API to your form's success path, you've created a worse problem than you started with.

Prompt structure: system / user separation

The user-submitted text goes only into the user message, never into the system prompt. This is the simplest defense against prompt injection attempts in the form body. Truncate to 2000 characters; longer messages don't add classification accuracy, they only add cost and risk.

"When in doubt, legitimate"

The single most important line of the system prompt is the disposition rule. Sample structure:

You classify a contact-form response into one of three labels:
- legitimate: a real inquiry from a prospect, customer, or contact
- sales: an unsolicited sales / partnership / outreach pitch
- suspicious: ambiguous, possibly sales but not certain

Output JSON: {"label": "...", "score": <0-100>, "reason": "..."}

When in doubt, return "legitimate". Misclassifying a real inquiry as
sales has higher cost than letting a sales pitch through.

This bias is the difference between a useful tool and a tool that quietly drops your real customers. Wire it in and trust it.

The design call: label, don't delete

Here's the temptation. You have a sales label now. Why not just hide those responses, drop them, suppress notifications?

Don't.

Even at 99% accuracy, one in a hundred real inquiries gets the wrong label. Reading a sales pitch costs a minute. Silently dropping a real prospect costs a lead, a relationship, and the trust of someone who tried to reach you. The asymmetry is enormous.

So the AI's job ends at the label. The operator decides what to filter, what to surface, what to manually correct. Manual corrections should be flagged (label_source = 'manual') and never overwritten by future automatic re-classification.

What you're left with

Five front-end layers, plus an async classifier with the right defaults, gets you to a place where your inbox is mostly real. Not 100% -- nothing is -- but mostly. The operator runs filters like "exclude sales from this analytics view" or "only notify Slack on non-sales responses" as conscious choices, not silent automation.

I work on FORMLOVA, which ships all six layers as built-in features and absorbs the LLM cost on every plan including the free one. As of writing, it's the only mainstream form product where this whole stack is default. If you'd rather not run the classifier yourself, that's the shortcut. If you'd rather build it, the patterns above are the same shape we use internally.