I built an npm middleware that scores your LLM prompts before they hit your agent workflow

#ai #llm #node #showdev

The problem with most LLM agent workflows is that nobody is checking the quality of the prompts going in.

Garbage in, garbage out but at scale, with agents firing hundreds of prompts per day, the garbage compounds fast.

I built x402-pqs to fix this. It's an Express middleware that intercepts prompts before they hit any LLM endpoint, scores them for quality, and adds the score to the request headers.

Install

npm install x402-pqs

Usage

const express = require("express");
const { pqsMiddleware } = require("x402-pqs");

const app = express();
app.use(express.json());

app.use(pqsMiddleware({
  threshold: 10,       // warn if prompt scores below 10/40
  vertical: "crypto",  // scoring context
  onLowScore: "warn",  // warn | block | ignore
}));

app.post("/api/chat", (req, res) => {
  console.log("Prompt score:", req.pqs.score, req.pqs.grade);
  res.json({ message: "ok" });
});

Every request gets these headers added automatically:

X-PQS-Score —> numeric score (0-40)
X-PQS-Grade —> letter grade (A-F)
X-PQS-Out-Of —> maximum score (40)

How the scoring works

PQS scores prompts across 8 dimensions using 5 cited academic frameworks:

Prompt-side (4 dimensions):

Specificity —> does the prompt define what it wants precisely?
Context —> does it give the model enough to work with?
Clarity —> are the directives unambiguous?
Predictability —> would different runs produce consistent results?

Output-side (4 dimensions):

Completeness, Relevancy, Reasoning depth, Faithfulness

Source frameworks: PEEM (Dongguk University, 2026) · RAGAS · MT-Bench · G-Eval · ROUGE

Real example

This prompt: "who are the smartest wallets on solana right now"

Scored 9/40 —> Grade D.

The optimized version scored 35/40 —> Grade A.

+84% improvement.

Same model. Same API. Completely different output quality.

The payment layer

The scoring API uses x402, an HTTP-native micropayment protocol now governed by the Linux Foundation, with Coinbase, Cloudflare, AWS, Stripe, Google, Microsoft, Visa, and Mastercard as founding members.

Agents can call and pay for scoring autonomously — no API keys, no subscriptions. Just a wallet and $0.001 USDC per score.

There's also a free tier with no payment required:

curl -X POST https://pqs.onchainintel.net/api/score/free \
  -H "Content-Type: application/json" \
  -d '{"prompt": "your prompt here", "vertical": "general"}'

Returns:

{
  "score": 11,
  "out_of": 40,
  "grade": "D",
  "upgrade": "Get full dimension breakdown at /api/score for $0.001 USDC"
}

The data angle

Every scored prompt pair goes into a corpus. At scale this becomes training data for a domain-specific prompt quality model. The thesis is similar to what Andrej Karpathy described recently about LLM knowledge bases, the data compounds in value over time.