OnChainAIIntel

Posted on Apr 20

Introducing PQS: The Fastest Way to Get Better Output From Any AI Model

#ai #llm #productivity #showdev

Every AI model you're using is better than you think.

Your prompts are the bottleneck.

I've been building OnChainIntel (an AI-powered crypto wallet behavioral analysis tool) and Prompt Quality Score for the past several months. Every piece of content we produce runs through LLMs for analysis. And the single biggest lever on output quality isn't which model we use. It's how precisely we instruct it.

The problem: until now, there has been no standardized way to measure prompt quality. No CPM equivalent for prompts (example from the advertising industry). No cited framework that a non-technical user could apply.

So we built one.

What is PQS?

PQS aka Prompt Quality Score is the world's first named AI prompt quality score built on cited academic research and industry evaluation frameworks.

It works in two layers:

Prompt-side scoring: how well-constructed is your input?
-Specificity: does the prompt define what it wants precisely?
-Context: does it give the model enough to work with?
-Clarity: are the directives unambiguous?
-Predictability: would different runs produce consistent results?

Output-side scoring — how good is the result?
-Completeness: did the output cover what the prompt implied?
-Relevancy: is it answering the actual question?
-Reasoning depth: does it demonstrate structured thinking?
-Faithfulness: does it stay grounded in what was asked?

Each dimension scored 1–10. Total out of 80.

8 attributes. 5 frameworks. 1 score.

The Proof Layer

Here's what makes PQS different from every other prompt optimizer.

We don't just rewrite your prompt and tell you it's better. We run both, the original and the optimized version, and then show you the actual outputs side by side.

The output difference is the proof. No trust required.

Real example from today --> Prompt: "explain machine learning"

Original output: Machine learning is a method of building computer systems that improve their performance on specific tasks through experience, rather than being explicitly programmed for every scenario.

Original PQS score: 9/40

Optimized output: "You are an expert educator and technical communicator. I need you to explain machine learning in a comprehensive yet accessible way. Please structure your explanation as follows: 1) Start with a clear definition and core concept, 2) Explain the main types (supervised, unsupervised, reinforcement learning) with real-world examples, 3) Describe the basic process of how ML models learn from data, 4) Provide 2-3 concrete applications people encounter daily, 5) Address common misconceptions, and 6) Conclude with why it matters for the future. Target your explanation for someone with basic technical literacy but no ML background. Use analogies where helpful, avoid excessive jargon, and aim for 400-600 words total."

Optimized PQS score: 35/40. +84% improvement.

Same model. Same API. Completely different output.

The Frameworks Behind it

PQS is not opinion dressed up as a number. Every dimension traces to a cited, peer-reviewed framework:

-PEEM (Prompt Engineering Evaluation Metrics) was published March 11, 2026 by Dongguk University. The first academic framework for joint prompt and response evaluation. Validated across 7 benchmarks and 5 task models. Showed that PEEM-guided rewriting improves downstream accuracy by up to 11.7 points, outperforming supervised and reinforcement learning baselines. Three weeks old. Nobody has built a product on it yet. Until tonight.

-RAGAS evaluates faithfulness, answer relevancy, and context precision. Used in production pipelines at teams running Claude, GPT-4o, and Gemini.

-MT-Bench LMSYS multi-turn benchmark. GPT-4 scores showed >0.8 correlation with human ratings. Industry standard for evaluating reasoning quality.

-G-Eval LLM-as-judge framework using chain-of-thought reasoning. Improves scoring reliability by 10–15% over direct scoring.

-ROUGE the original NLP completeness metric. Used in summarization evaluation since 2004.

PEEM · RAGAS · MT-Bench · G-Eval · ROUGE
First time applied at the consumer level.

Why Now?

Three reasons this matters right now:

The tooling is developer-only. Every prompt evaluation tool that exists — LangSmith, DeepEval, Opik, RAGAS, etc. requires Python, datasets, and engineering setup. There is no consumer-facing product with a named quality score. PQS is that product.
The academic work just landed. PEEM was published March 11, 2026. It is the most rigorous prompt evaluation framework ever proposed and it has not been turned into a product. We built on it first.
The market is massive and extremely underserved. Prompt engineering is a $1.5B market growing at 32% CAGR. Every tool serving it is aimed at developers. The consumer layer did not exist until tonight.

The Defensibility Question

Someone will ask: is this really the first?
Here's the honest answer.

Academic frameworks exist: PEEM, ROUGE, G-Eval. Developer tools exist: LangSmith, DeepEval. None of them have produced a named, consumer-facing, citable prompt quality score with a methodology anyone can reference and build on.

PQS is not a product feature. It's the first serious attempt by anyone to create a named AI prompt quality score.

We're not claiming it's perfect. We're claiming it's first. And we're making the methodology open so anyone can improve it.

Try It
Paste a prompt. Hit submit.
https://pqs.onchainintel.net

Like what you see? Go Pro for $19.99/mo. Unlimited optimizations, API access, better output at scale:

PQS: Prompt Quality Score
The fastest way to get better output from any AI model. Paste a prompt. Get an optimized version. Ship better work.

Web: https://pqs.onchainintel.net
Pricing:https://pqs.onchainintel.net/pricing
MCP server: npm i pqs-mcp-server
GitHub: https://github.com/OnChainAIIntel/prompt-optimization-engine
X: @OnChainAIIntel
Built by: @kenburbary

PQS is x402-native on Base mainnet. Pay-per-call with USDC, or subscribe via Stripe.

What's Next

-Cross-model scoring —> same prompt through Claude and GPT-4o simultaneously, scored by a third model as judge. Shows you not just a better prompt, but which model executes it best.
-PQS Leaderboard —> weekly rankings of the highest-scoring prompts by vertical. Published publicly.
-PQS Whitepaper —> full academic-style documentation of the framework, to be submitted to arXiv within 30 days.
-PQS API —> so other tools can integrate the standard and display PQS scores natively.

DEV Community