NorthernDev

Posted on Dec 10, 2025

Building a "Bullshit Detector" for LLMs using Node.js and pgvector

#webdev #ai #node #javascript

You’ve spent weeks polishing your prompts. You have set up a robust retrieval system. You validate every piece of data going into your context window.

And yet, your RAG (Retrieval-Augmented Generation) bot still confidently tells users things that are completely wrong.

It doesn't happen often, but when it does, it destroys user trust. The problem with LLMs in production isn't just getting them to answer; it's knowing when they are lying (hallucinating).

Standard software engineering practices, like regex-based unit tests, don't work on non-deterministic natural language output. We need a new layer in our stack.

Here is how I approached building a "Bullshit Detector" middleware using TypeScript, Node.js, and PostgreSQL with pgvector.

The Architecture Problem

A typical RAG flow looks like this:

User asks question.
App retrieves relevant context documents.
LLM generates an answer based on context.
User sees the answer (even if it's wrong).

The issue is step 4. We are trusting the model implicitly.

To catch hallucinations, we need to introduce an adversarial step after generation but before the user sees it. We need a middleware that acts as a relentless fact-checker.

The Solution: Semantic Proximity Check

Since we already have the "source truth" (the documents we retrieved in step 2) and the generated "claim" (the LLM's answer), we can mathematically measure how closely they align.

If the LLM's answer is semantically distant from the source documents it was supposed to use, it's likely bullshitting.

My stack for this middleware:

Runtime: Node.js (lightweight, fast for I/O).
Language: TypeScript (for type safety on the data structures).
Vector DB: PostgreSQL with the pgvector extension.

I chose pgvector because keeping the operational data and vectors in the same database simplifies the architecture immensely compared to managing a separate Pinecone or Weaviate instance for just this validation step.

The Core Logic

The goal isn't to re-run the entire RAG process. The goal is to take the final output and verify its "grounding."

Here is a simplified TypeScript view of the evaluation logic. We use an embeddings model to convert both the generated answer and the source text into vectors, and then calculate the cosine similarity.

import { embedText, cosineSimilarity } from './vectorUtils';

interface AuditRequest {
  llmAnswer: string;
  retrievedContext: string[]; // The raw text chunks passed to the LLM
  threshold: number; // e.g., 0.75
}

export async function validateResponse(req: AuditRequest) {
  // 1. Vectorize the "Claim" (the LLM's answer)
  const answerVector = await embedText(req.llmAnswer);

  let totalSimilarityScore = 0;

  // 2. Compare the claim against every piece of context used
  for (const sourceText of req.retrievedContext) {
    // Vectorize the source truth
    const sourceVector = await embedText(sourceText);

    // Calculate semantic overlap (1.0 = identical meaning, 0.0 = unrelated)
    const similarity = cosineSimilarity(answerVector, sourceVector);
    totalSimilarityScore += similarity;
  }

  // 3. Calculate an average "Trust Score"
  // (In production, we use weighted averages based on relevance)
  const averageTrustScore = totalSimilarityScore / req.retrievedContext.length;

  // 4. Make a Pass/Fail decision
  if (averageTrustScore < req.threshold) {
    return {
      action: "REJECT",
      score: averageTrustScore,
      reason: "The generated response does not align semantically with the provided source context."
    };
  }

  return {
    action: "PASS",
    score: averageTrustScore
  };
}

The Resulting Data Structure

For this to be useful in a real application, the middleware can't just return true/false. The frontend needs to know why something was flagged.

If the system detects a hallucination, it generates a detailed JSON object that can be logged for engineers or used to show a warning in the UI.

{
  "id": "audit_123xyz",
  "timestamp": "2023-10-27T10:00:00Z",
  "trust_score": 0.42,
  "action": "REJECT",
  "audit_details": {
    "reason": "Critical hallucination detected. Answer claims X, but source documents contain Y.",
    "contradictions": [
      {
        "claim": "Product supports XML export",
        "source_truth": "Export formats supported: JSON, CSV only."
      }
    ]
  }
}

Conclusion

Input validation is crucial, but for production-grade AI agents, output verification is mandatory. You cannot rely solely on prompt engineering to prevent hallucinations.

By treating the LLM as an untrusted component and wrapping it with a semantic validation layer using tools like Node.js and pgvector, we can build guardrails that actually work.

I packaged this exact logic into a standalone middleware tool called AgentAudit. It’s designed to drop into existing Node/TS backends to start catching lies immediately.

I'd love to hear how you handle this problem. Are you manually reviewing logs, or do you have automated checks in place?

You can check out the interactive demo here: https://agentaudit-dashboard.vercel.app/

Top comments (9)

Narnaiezzsshaa Truong • Dec 11 '25

I noticed your motivation came from log fatigue. How do you see this detector connecting back to logs—is it meant to replace that process, or just shift the verification upstream?

NorthernDev • Dec 11 '25

It is definitely about shifting verification upstream.
I still keep logs, actually, the JSON output from this tool creates much better, structured logs, but the goal is to stop the bad response from reaching the user in the first place.
So instead of manually reading logs on Friday to find out if the bot lied on Monday, I just check the logs to see what the system already caught and rejected. It turns logging into a debugging tool rather than a safety net.

Narnaiezzsshaa Truong • Dec 11 '25

I love how you’ve reframed logs as debugging artifacts. Do you see any risk that the detector’s assumptions might filter out valid but novel answers, and how would you catch that upstream?

NorthernDev • Dec 11 '25

100%. That is the main trade-off. For enterprise RAG, we usually prioritize strict grounding over novelty, so we accept the risk of filtering out "creative" answers to ensure safety.
To manage that risk, I always recommend deploying in "Shadow Mode" first. You let the detector run and log flags, but you don't actually block the user. This allows you to review the "REJECT" logs and tune the sensitivity threshold to the right level before you turn on the actual gate.

Narnaiezzsshaa Truong • Dec 11 '25

I like the idea of Shadow Mode as training wheels. Do you see contexts where novelty might be worth more than strict grounding, and how would you adapt the detector for those?

NorthernDev • Dec 12 '25

Absolutely. If you are building a creative writing assistant or a brainstorming tool, "hallucination" is often just a harsh word for "creativity." You don't want to stifle that.
That is why the threshold is configurable per request. For a "Creative Mode," you would simply lower the threshold (e.g., to 0.4 instead of 0.8). This allows the model to drift further from the source material and be more inventive, while still catching completely unrelated nonsense. You can basically toggle the strictness dynamically based on the user's intent.

Sloan the DEV Moderator • Dec 11 '25

We loved your post so we shared it on social.

Keep up the great work!

NorthernDev • Dec 11 '25

Wow, that is amazing news! Thank you so much for the support, I really appreciate the DEV team helping to spread the word!

NorthernDev • Dec 10 '25

Author here. I built this because I was tired of manually verifying logs to catch hallucinations. Happy to answer any questions about the architecture or the pgvector implementation.