The problem with most LLM agent workflows is that nobody is checking the quality of the prompts going in.
Garbage in, garbage out but at scale, with agents firing hundreds of prompts per day, the garbage compounds fast.
I built x402-pqs to fix this. It's an Express middleware that intercepts prompts before they hit any LLM endpoint, scores them for quality, and adds the score to the request headers.
Install
npm install x402-pqs
Usage
const express = require("express");
const { pqsMiddleware } = require("x402-pqs");
const app = express();
app.use(express.json());
app.use(pqsMiddleware({
threshold: 10, // warn if prompt scores below 10/40
vertical: "crypto", // scoring context
onLowScore: "warn", // warn | block | ignore
}));
app.post("/api/chat", (req, res) => {
console.log("Prompt score:", req.pqs.score, req.pqs.grade);
res.json({ message: "ok" });
});
Every request gets these headers added automatically:
-
X-PQS-Score—> numeric score (0-40) -
X-PQS-Grade—> letter grade (A-F) -
X-PQS-Out-Of—> maximum score (40)
How the scoring works
PQS scores prompts across 8 dimensions using 5 cited academic frameworks:
Prompt-side (4 dimensions):
- Specificity —> does the prompt define what it wants precisely?
- Context —> does it give the model enough to work with?
- Clarity —> are the directives unambiguous?
- Predictability —> would different runs produce consistent results?
Output-side (4 dimensions):
- Completeness, Relevancy, Reasoning depth, Faithfulness
Source frameworks: PEEM (Dongguk University, 2026) · RAGAS · MT-Bench · G-Eval · ROUGE
Real example
This prompt: "who are the smartest wallets on solana right now"
Scored 9/40 —> Grade D.
The optimized version scored 35/40 —> Grade A.
+84% improvement.
Same model. Same API. Completely different output quality.
The payment layer
The scoring API uses x402, an HTTP-native micropayment protocol now governed by the Linux Foundation, with Coinbase, Cloudflare, AWS, Stripe, Google, Microsoft, Visa, and Mastercard as founding members.
Agents can call and pay for scoring autonomously — no API keys, no subscriptions. Just a wallet and $0.001 USDC per score.
There's also a free tier with no payment required:
curl -X POST https://pqs.onchainintel.net/api/score/free \
-H "Content-Type: application/json" \
-d '{"prompt": "your prompt here", "vertical": "general"}'
Returns:
{
"score": 11,
"out_of": 40,
"grade": "D",
"upgrade": "Get full dimension breakdown at /api/score for $0.001 USDC"
}
The data angle
Every scored prompt pair goes into a corpus. At scale this becomes training data for a domain-specific prompt quality model. The thesis is similar to what Andrej Karpathy described recently about LLM knowledge bases, the data compounds in value over time.
Links
- npm: x402-pqs
- GitHub: OnChainAIIntel/x402-pqs
- API: pqs.onchainintel.net
- Free endpoint:
POST https://pqs.onchainintel.net/api/score/free
Would love feedback from anyone building agent workflows. What scoring dimensions would you add?
Top comments (0)