<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: OnChainAIIntel</title>
    <description>The latest articles on DEV Community by OnChainAIIntel (@onchainaiintel).</description>
    <link>https://dev.to/onchainaiintel</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3860854%2F8478277f-090c-44c8-9fdd-c450782b28b5.png</url>
      <title>DEV Community: OnChainAIIntel</title>
      <link>https://dev.to/onchainaiintel</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/onchainaiintel"/>
    <language>en</language>
    <item>
      <title>Google Cloud NEXT '26 Shipped a Full Agentic Stack. One Layer Is Missing.</title>
      <dc:creator>OnChainAIIntel</dc:creator>
      <pubDate>Sun, 26 Apr 2026 14:11:19 +0000</pubDate>
      <link>https://dev.to/onchainaiintel/google-cloud-next-26-shipped-a-full-agentic-stack-one-layer-is-missing-ni0</link>
      <guid>https://dev.to/onchainaiintel/google-cloud-next-26-shipped-a-full-agentic-stack-one-layer-is-missing-ni0</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-cloud-next-2026-04-22"&gt;Google Cloud NEXT Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy5slv96ewuul2klxoe73.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy5slv96ewuul2klxoe73.png" alt=" " width="800" height="404"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Google Cloud NEXT '26 Shipped a Full Agentic Stack. One Layer Is Missing.
&lt;/h1&gt;

&lt;p&gt;Google Cloud NEXT '26 kicked off April 22 in Las Vegas, and the story Thomas Kurian told from the keynote stage was one of the most coherent ones that a hyperscaler has put together in years. The pitch: a unified stack. Silicon built for the models, models grounded in your data, agents running on those models, all of it secured by the infrastructure underneath. It is the same stack Google runs for Search, YouTube, Chrome, and Android, now pointed at your enterprise.&lt;/p&gt;

&lt;p&gt;The announcements are heavy. &lt;strong&gt;Gemini Enterprise Agent Platform&lt;/strong&gt; landed as the Vertex AI successor, with Agent Studio, Agent-to-Agent Orchestration, Agent Registry, Agent Identity, Agent Gateway, and Agent Observability. Google unveiled &lt;strong&gt;8th Generation TPUs&lt;/strong&gt; split across two chips, TPU 8t for training (scaling to 9,600 chips and 2 petabytes of shared memory in a single superpod) and TPU 8i tuned for inference. &lt;/p&gt;

&lt;p&gt;Underneath, a new megascale fabric called &lt;strong&gt;Virgo Network&lt;/strong&gt; was introduced to power the AI Hypercomputer. &lt;strong&gt;Agentic Data Cloud&lt;/strong&gt; brought a cross-cloud Lakehouse and Knowledge Catalog. &lt;strong&gt;Agentic Defense&lt;/strong&gt; folded Google Threat Intelligence, Security Operations, and the recently acquired Wiz into an AI Application Protection Platform. The &lt;strong&gt;Gemini Enterprise app&lt;/strong&gt; got an Agent Designer, an Inbox for managing agent activity, long-running agents, Skills, Projects (which give agents permanent memory), Deep Think, and Microsoft 365 interoperability.&lt;/p&gt;

&lt;p&gt;Sundar Pichai dropped the stat that has been making the rounds all week: roughly 75% of all new Google code is now AI-generated and reviewed by engineers, up from about half last fall. First-party model traffic is running at 16 billion tokens per minute. Just over half of Alphabet's machine learning compute investment in 2026 is earmarked for the Cloud business.&lt;/p&gt;

&lt;p&gt;This is not a slide deck. It is a serious bet on what SiliconANGLE correctly called the control plane of the agent era.&lt;/p&gt;

&lt;p&gt;And there is exactly one layer missing from the stack. Nobody on stage named it. Nobody shipped a product for it. And it is the layer that will decide whether any of this actually works in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Layer Nobody Announced
&lt;/h2&gt;

&lt;p&gt;Look at the Agent Platform feature list one more time: Studio, Orchestration, Registry, Identity, Gateway, Observability. That is a respectable control plane for agents. You can build them, connect them, catalog them, authenticate them, route them, and watch them.&lt;/p&gt;

&lt;p&gt;What you cannot do, based on anything announced this week, is score the quality of what goes into them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Every agent runs on inputs.&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;Prompts from humans. Prompts from other agents. Tools call payloads. Context injected from a knowledge catalog or a RAG step. In a world of multi-agent workflows, those inputs are not just the user's problem anymore. Agent A writes a prompt for Agent B. Agent B interprets that prompt, calls a tool, receives a response, and drafts its own prompt for Agent C. The Agent-to-Agent protocol that Google is now pushing at 150 organizations means those chains are about to get longer and more autonomous.&lt;/p&gt;

&lt;p&gt;The quality of every link in that chain is, right now, unmeasured.&lt;/p&gt;

&lt;p&gt;Observability tells you what happened. It does not tell you whether the input that caused it was any good to begin with. You see that Agent B failed. You do not see that Agent A handed it an ambiguous, under-specified, or context-poisoned prompt. You end up debugging the failure as a model problem, a tool problem, or a routing problem. It was an input problem.&lt;/p&gt;

&lt;p&gt;This is what I have been calling the 'AI input quality problem'. It is not a prompt engineering problem. Prompt engineering is a craft humans do at a keyboard. The AI input quality problem is what happens when LLMs write prompts for other LLMs, at scale, with no human in the loop, and nobody is scoring the quality of the handoff.&lt;/p&gt;

&lt;p&gt;I have been calling this the 'AI input quality problem' for months, and I shipped a tool to score it pre-flight. Watching Google's announcements this week, I realized the gap I've been measuring is now an industry-scale problem, not just a startup-scale one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Gets Louder Over the Next Twelve Months
&lt;/h2&gt;

&lt;p&gt;Two things Google announced at NEXT '26 will exacerbate this problem, not solve it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Long-running agents.&lt;/strong&gt; &lt;br&gt;
Gemini Enterprise now supports agents that operate for hours or days, with Projects giving them permanent memory. "Long-running" means more steps, more chances for context drift, more accumulated prompt decay. A weakly formed prompt at step one becomes a badly framed decision at step fifty. Memory does not fix this. Memory preserves the weakness and lets it propagate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A2A at scale.&lt;/strong&gt;&lt;br&gt;
Google says 150 organizations are already running Agent-to-Agent orchestration. When agents talk to each other, every message is a prompt generated by an LLM and consumed by an LLM. The humans who would have caught ambiguity in a Slack thread or a design review are not in the loop. The protocol ships the envelope. It does not grade the letter inside.&lt;/p&gt;

&lt;p&gt;Vertical integration buys you a lot. Google's stack means your TPU, your model, your runtime, your data layer, and your governance all speak the same language. What vertical integration does not buy you is a quality signal on the content flowing through that stack. The input is still just text. Text is still ambiguous. Ambiguity compounds through agent chains the same way floating point errors compound through a long numerical pipeline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Developers Should Actually Do This Week&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you are building on Gemini Enterprise Agent Platform, or planning to, here is the operator's version of the argument.&lt;/p&gt;

&lt;p&gt;1) Treat agent-to-agent handoffs as a product surface, not plumbing.** They are the place where most of your production issues will originate. Log the prompts Agent A sends to Agent B. Store them. Review a sampled slice weekly. The first time you do this you will be a little shocked at what your agents are saying to each other.&lt;/p&gt;

&lt;p&gt;2) Add a pre-flight check, not just a post-hoc trace.** Observability after the fact tells you the agent failed. A pre-flight quality check on the prompt before it enters the next agent tells you whether the failure was even preventable. This is the difference between a crash log and a linter. Both are useful. Only one gets you home at a reasonable hour.&lt;/p&gt;

&lt;p&gt;3) Assume the input is the bug until proven otherwise.** When your agent chain breaks, the most common cause in the next generation of these workflows will not be the model or the tool. It will be an input that was too vague, too verbose, or too contaminated with irrelevant context. Debug the prompt first, the model second.&lt;/p&gt;

&lt;p&gt;4)Score your prompts the way you score your code.** Code has coverage, complexity, and lint. Prompts have nothing, yet. Define quality dimensions (clarity, specificity, context sufficiency, safety, retrievability), score them, gate on them. "It felt like a good prompt" stops being an acceptable quality signal the moment an agent is writing prompts on your behalf.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bet and the Gap
&lt;/h2&gt;

&lt;p&gt;Kurian is probably right that the enterprise adopting AI agents at scale will tend to choose the platform where model, runtime, silicon, governance, and productivity all come from one company. Vertical integration is a real advantage at this layer, and the economics of 8th gen TPUs plus Virgo Network suggest Google is preparing to compete on inference pricing that Nvidia-dependent competitors will struggle to match.&lt;/p&gt;

&lt;p&gt;But the bet has a weak link, and it is not Google's alone. It is the industry's. We have assembled a full agentic stack, top to bottom, without a quality layer for the one thing flowing through all of it. The control plane without a quality plane is the same shape as the early web with HTTP and no SSL, or early databases before ACID. It works until it really, really does not.&lt;/p&gt;

&lt;p&gt;Eventually somebody names the layer. Somebody ships the tool. Somebody writes the analysis pointing out that the emperor's stack is beautiful and well-governed and serving inference at record speed, and also, every prompt in it is being trusted on vibes.&lt;/p&gt;

&lt;p&gt;The AI input quality problem is real. It is getting louder. NEXT '26 was the week the stack caught up to the agent era and the quality layer fell one more beat behind.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Disclosure: I build in this space. I run &lt;a href="https://pqs.onchainintel.net" rel="noopener noreferrer"&gt;PQS (Prompt Quality Score)&lt;/a&gt; under OnChainIntel, a pre-flight quality score for prompts and agent inputs. I wrote this because NEXT '26 is a real moment for the agent era, and the layer I work on every day is the one nobody put on a slide this week. If you are building on Gemini Enterprise Agent Platform, pay attention to what goes into your agents, not just what comes out.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>cloudnextchallenge</category>
      <category>googlecloud</category>
    </item>
    <item>
      <title>Introducing PQS: The Fastest Way to Get Better Output From Any AI Model</title>
      <dc:creator>OnChainAIIntel</dc:creator>
      <pubDate>Mon, 20 Apr 2026 06:20:23 +0000</pubDate>
      <link>https://dev.to/onchainaiintel/introducing-prompt-quality-score-pqs-the-worlds-first-named-ai-prompt-quality-score-1a4a</link>
      <guid>https://dev.to/onchainaiintel/introducing-prompt-quality-score-pqs-the-worlds-first-named-ai-prompt-quality-score-1a4a</guid>
      <description>&lt;p&gt;Every AI model you're using is better than you think.&lt;/p&gt;

&lt;p&gt;Your prompts are the bottleneck.&lt;/p&gt;

&lt;p&gt;I've been building OnChainIntel (an AI-powered crypto wallet behavioral analysis tool) and Prompt Quality Score for the past several months. Every piece of content we produce runs through LLMs for analysis. And the single biggest lever on output quality isn't which model we use. It's how precisely we instruct it.&lt;/p&gt;

&lt;p&gt;The problem: until now, there has been no standardized way to measure prompt quality. No CPM equivalent for prompts (example from the advertising industry). No cited framework that a non-technical user could apply.&lt;/p&gt;

&lt;p&gt;So we built one.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is PQS?
&lt;/h2&gt;

&lt;p&gt;PQS aka Prompt Quality Score is the world's first named AI prompt quality score built on cited academic research and industry evaluation frameworks.&lt;/p&gt;

&lt;p&gt;It works in two layers:&lt;/p&gt;

&lt;p&gt;Prompt-side scoring: how well-constructed is your input?&lt;br&gt;
&lt;strong&gt;-Specificity:&lt;/strong&gt; does the prompt define what it wants precisely?&lt;br&gt;
&lt;strong&gt;-Context:&lt;/strong&gt; does it give the model enough to work with?&lt;br&gt;
&lt;strong&gt;-Clarity:&lt;/strong&gt; are the directives unambiguous?&lt;br&gt;
&lt;strong&gt;-Predictability:&lt;/strong&gt; would different runs produce consistent results?&lt;/p&gt;

&lt;p&gt;Output-side scoring — how good is the result?&lt;br&gt;
&lt;strong&gt;-Completeness:&lt;/strong&gt; did the output cover what the prompt implied?&lt;br&gt;
&lt;strong&gt;-Relevancy:&lt;/strong&gt; is it answering the actual question?&lt;br&gt;
&lt;strong&gt;-Reasoning depth:&lt;/strong&gt; does it demonstrate structured thinking?&lt;br&gt;
&lt;strong&gt;-Faithfulness:&lt;/strong&gt; does it stay grounded in what was asked?&lt;/p&gt;

&lt;p&gt;Each dimension scored 1–10. Total out of 80.&lt;/p&gt;

&lt;p&gt;8 attributes. 5 frameworks. 1 score.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Proof Layer
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2em4v4tx9sno509utwuj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2em4v4tx9sno509utwuj.png" alt=" " width="800" height="460"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's what makes PQS different from every other prompt optimizer.&lt;/p&gt;

&lt;p&gt;We don't just rewrite your prompt and tell you it's better. We run both, the original and the optimized version, and then show you the actual outputs side by side.&lt;/p&gt;

&lt;p&gt;The output difference is the proof. No trust required.&lt;/p&gt;

&lt;p&gt;Real example from today --&amp;gt; Prompt: "explain machine learning"&lt;/p&gt;

&lt;p&gt;Original output: Machine learning is a method of building computer systems that improve their performance on specific tasks through experience, rather than being explicitly programmed for every scenario.&lt;/p&gt;

&lt;p&gt;Original PQS score: 9/40&lt;/p&gt;

&lt;p&gt;Optimized output: "You are an expert educator and technical communicator. I need you to explain machine learning in a comprehensive yet accessible way. Please structure your explanation as follows: 1) Start with a clear definition and core concept, 2) Explain the main types (supervised, unsupervised, reinforcement learning) with real-world examples, 3) Describe the basic process of how ML models learn from data, 4) Provide 2-3 concrete applications people encounter daily, 5) Address common misconceptions, and 6) Conclude with why it matters for the future. Target your explanation for someone with basic technical literacy but no ML background. Use analogies where helpful, avoid excessive jargon, and aim for 400-600 words total."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Optimized PQS score: 35/40. +84% improvement.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc6ez1dxu7bhcsjuj5po1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc6ez1dxu7bhcsjuj5po1.png" alt=" " width="800" height="742"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Same model. Same API. Completely different output.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Frameworks Behind it
&lt;/h2&gt;

&lt;p&gt;PQS is not opinion dressed up as a number. Every dimension traces to a cited, peer-reviewed framework:&lt;/p&gt;

&lt;p&gt;-PEEM (Prompt Engineering Evaluation Metrics) was published March 11, 2026 by Dongguk University. The first academic framework for joint prompt and response evaluation. Validated across 7 benchmarks and 5 task models. Showed that PEEM-guided rewriting improves downstream accuracy by up to 11.7 points, outperforming supervised and reinforcement learning baselines. Three weeks old. Nobody has built a product on it yet. Until tonight.&lt;/p&gt;

&lt;p&gt;-RAGAS evaluates faithfulness, answer relevancy, and context precision. Used in production pipelines at teams running Claude, GPT-4o, and Gemini.&lt;/p&gt;

&lt;p&gt;-MT-Bench LMSYS multi-turn benchmark. GPT-4 scores showed &amp;gt;0.8 correlation with human ratings. Industry standard for evaluating reasoning quality.&lt;/p&gt;

&lt;p&gt;-G-Eval LLM-as-judge framework using chain-of-thought reasoning. Improves scoring reliability by 10–15% over direct scoring.&lt;/p&gt;

&lt;p&gt;-ROUGE the original NLP completeness metric. Used in summarization evaluation since 2004.&lt;/p&gt;

&lt;p&gt;PEEM · RAGAS · MT-Bench · G-Eval · ROUGE&lt;br&gt;
First time applied at the consumer level.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Now?
&lt;/h2&gt;

&lt;p&gt;Three reasons this matters right now:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;The tooling is developer-only. Every prompt evaluation tool that exists — LangSmith, DeepEval, Opik, RAGAS, etc. requires Python, datasets, and engineering setup. There is no consumer-facing product with a named quality score. PQS is that product.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The academic work just landed. PEEM was published March 11, 2026. It is the most rigorous prompt evaluation framework ever proposed and it has not been turned into a product. We built on it first.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The market is massive and extremely underserved. Prompt engineering is a $1.5B market growing at 32% CAGR. Every tool serving it is aimed at developers. The consumer layer did not exist until tonight.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Defensibility Question
&lt;/h2&gt;

&lt;p&gt;Someone will ask: is this really the first?&lt;br&gt;
Here's the honest answer.&lt;/p&gt;

&lt;p&gt;Academic frameworks exist: PEEM, ROUGE, G-Eval. Developer tools exist: LangSmith, DeepEval. None of them have produced a named, consumer-facing, citable prompt quality score with a methodology anyone can reference and build on.&lt;/p&gt;

&lt;p&gt;PQS is not a product feature. It's the first serious attempt by anyone to create a named AI prompt quality score.&lt;/p&gt;

&lt;p&gt;We're not claiming it's perfect. We're claiming it's first. And we're making the methodology open so anyone can improve it.&lt;/p&gt;

&lt;p&gt;Try It&lt;br&gt;
Paste a prompt. Hit submit.&lt;br&gt;
&lt;a href="https://shorturl.at/ljKnH" rel="noopener noreferrer"&gt;https://pqs.onchainintel.net&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Like what you see? Go Pro for $19.99/mo. Unlimited optimizations, API access, better output at scale: &lt;/p&gt;




&lt;p&gt;&lt;strong&gt;PQS: Prompt Quality Score&lt;/strong&gt;&lt;br&gt;
The fastest way to get better output from any AI model. Paste a prompt. Get an optimized version. Ship better work.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Web: &lt;a href="https://shorturl.at/ljKnH" rel="noopener noreferrer"&gt;https://pqs.onchainintel.net&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Pricing:&lt;a href="https://shorturl.at/FyIdk" rel="noopener noreferrer"&gt;https://pqs.onchainintel.net/pricing&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;MCP server: &lt;code&gt;npm i pqs-mcp-server&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/OnChainAIIntel/prompt-optimization-engine" rel="noopener noreferrer"&gt;https://github.com/OnChainAIIntel/prompt-optimization-engine&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;X: &lt;a href="https://x.com/OnChainAIIntel" rel="noopener noreferrer"&gt;@OnChainAIIntel&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Built by: &lt;a href="https://x.com/kenbubary" rel="noopener noreferrer"&gt;@kenburbary&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;PQS is x402-native on Base mainnet. Pay-per-call with USDC, or subscribe via Stripe.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;-Cross-model scoring —&amp;gt; same prompt through Claude and GPT-4o simultaneously, scored by a third model as judge. Shows you not just a better prompt, but which model executes it best.&lt;br&gt;
-PQS Leaderboard —&amp;gt; weekly rankings of the highest-scoring prompts by vertical. Published publicly.&lt;br&gt;
-PQS Whitepaper —&amp;gt; full academic-style documentation of the framework, to be submitted to arXiv within 30 days.&lt;br&gt;
-PQS API —&amp;gt; so other tools can integrate the standard and display PQS scores natively.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>productivity</category>
      <category>showdev</category>
    </item>
    <item>
      <title>I scored 500 AI prompts across 8 quality dimensions — here's what broke</title>
      <dc:creator>OnChainAIIntel</dc:creator>
      <pubDate>Sat, 18 Apr 2026 18:59:04 +0000</pubDate>
      <link>https://dev.to/onchainaiintel/i-scored-500-ai-prompts-across-8-quality-dimensions-heres-what-broke-23kn</link>
      <guid>https://dev.to/onchainaiintel/i-scored-500-ai-prompts-across-8-quality-dimensions-heres-what-broke-23kn</guid>
      <description>&lt;p&gt;Most teams are getting 10 to 30% of what their LLM model can actually do.Not because the model is weak. Because the prompt is.&lt;/p&gt;

&lt;p&gt;I’ve spent the last two weeks scoring prompts. Real ones, from real builders, across real verticals, against an 8-dimension quality rubric. This weekend I ran another 500 through the scorer to pressure-test the pattern. Every dataset converges on the same number: the average production prompt scores 13 to 16 out of 80. That’s 17 to 20% of what the rubric says a well-formed prompt looks like.&lt;/p&gt;

&lt;p&gt;You’re paying for a Ferrari and driving it in first gear to the mailbox.&lt;/p&gt;

&lt;p&gt;What I Measured&lt;br&gt;
Every prompt got scored on 8 dimensions. Each scored 1 to 10, totaling 80:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clarity. Is the task unambiguous?&lt;/li&gt;
&lt;li&gt;Specificity. Concrete targets, numbers, scope?&lt;/li&gt;
&lt;li&gt;Context. Background, assumptions, domain?&lt;/li&gt;
&lt;li&gt;Constraints. Limits, rules, edge cases?&lt;/li&gt;
&lt;li&gt;Output format. What shape should the response take?&lt;/li&gt;
&lt;li&gt;Role definition. “Act as a ___”?&lt;/li&gt;
&lt;li&gt;Examples. Few-shot or reference cases?&lt;/li&gt;
&lt;li&gt;Chain-of-thought structure. Reasoning scaffolding?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren’t arbitrary. They map to what the prompt engineering literature has known for five years. PEEM, RAGAS, G-Eval, MT-Bench, the Anthropic and OpenAI prompting guides. Everyone agrees these dimensions matter. Nobody’s checking whether their production prompts actually hit them.&lt;/p&gt;

&lt;p&gt;The Data&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkzcee793uwm0gt1mu8nr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkzcee793uwm0gt1mu8nr.png" alt=" " width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;500 software engineering prompts. Real-world format “Build X using Y.”&lt;/p&gt;

&lt;p&gt;Average score: 13.3 out of 80&lt;br&gt;
83% graded F. 17% graded D. Zero scored C or above.&lt;/p&gt;

&lt;p&gt;After rewriting against the rubric: average 68.5 out of 80. A B+.&lt;br&gt;
Average improvement: +55 points. 425% relative gain.&lt;/p&gt;

&lt;p&gt;For context, the organic dataset. 248 prompts submitted by real users of the scoring tool, across 7 verticals, showed the same pattern. 89% graded D or F. Average before-score of 15.8/80.&lt;/p&gt;

&lt;p&gt;Software prompts were slightly worse than average. Not better. Engineers aren’t exempt from this. If anything, the “it’s a technical task so it must be rigorous” assumption is the trap.&lt;/p&gt;

&lt;p&gt;What’s Actually Missing&lt;br&gt;
Here’s the dimension breakdown. Look at how specific the failure pattern is:&lt;/p&gt;

&lt;p&gt;Examples scored 1.01 out of 10. Across 500 prompts that developers wrote to build production software, essentially zero included a reference case, a shape to follow, or a “here’s what good looks like.”&lt;/p&gt;

&lt;p&gt;This is the dimension every prompt engineering guide tells you matters most. The gap between what engineers know they should do and what they actually write is near-total.&lt;/p&gt;

&lt;p&gt;Constraints at 1.09. Role definition at 1.18. Clarity, the only dimension averaging above 2, sits at 3.19.&lt;/p&gt;

&lt;p&gt;Download the Medium app&lt;br&gt;
Engineers are writing English sentences with tech keywords. The structural scaffolding that turns a wish into a spec is almost entirely absent.&lt;/p&gt;

&lt;p&gt;What This Looks Like In Practice&lt;br&gt;
A representative prompt from the dataset:&lt;/p&gt;

&lt;p&gt;“Build a real-time collaborative text editor using React for the frontend.”&lt;/p&gt;

&lt;p&gt;Scores 14/80.&lt;/p&gt;

&lt;p&gt;It sounds specific. It names a technology. It has a verb. But the model receiving it has to guess. Collaboration for how many users? What’s the sync strategy? Operational transform or CRDT? What’s the latency budget? What does “done” look like? No examples. No output format. No constraints.&lt;/p&gt;

&lt;p&gt;The rewritten version, same task, generated by the scorer to address the rubric gaps, scored 69/80. It defined the target user. It specified real-time sync requirements. It listed technical constraints including concurrent editors, conflict resolution strategy, and latency targets. It specified the response format. It included an example implementation signature.&lt;/p&gt;

&lt;p&gt;Same end goal. Different input. 5x the quality score. The before version wastes tokens on clarification and produces generic output. The after version reads more like a spec than a prompt.&lt;/p&gt;

&lt;p&gt;Why This Matters Now&lt;br&gt;
The easy dismissal is “people should write better prompts.” That misses the systemic problem.&lt;/p&gt;

&lt;p&gt;The industry has spent years focused on output evals. Every eval platform measures what the model produced. Almost nobody measures what it was given (inputs).&lt;/p&gt;

&lt;p&gt;That worked when prompts were single-shot, human-written, and reviewed before shipping. It stops working the moment prompts become infrastructure.&lt;/p&gt;

&lt;p&gt;In agentic workflows where one LLM call feeds the next, a 13/80 input becomes the input to the next call, which is already compromised before you add retrieval or structured tool calls. In the last week alone, three x402-native agent systems went live that share the same input surface of natural language:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Daydreams Taskmarket. Agents bidding on work described in plain text.&lt;/li&gt;
&lt;li&gt;PeptAI. Autonomous peptide discovery running wet-lab orders.&lt;/li&gt;
&lt;li&gt;AlliGo. A credit bureau scoring agent behavior across endpoints.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the descriptions populating those systems score like the 500 in this dataset, the agent economy is routing compute and payments on structurally empty inputs.&lt;/p&gt;

&lt;p&gt;The output eval loop can’t catch this. By the time the output looks wrong, the compute bill is already on your card.&lt;/p&gt;

&lt;p&gt;An Infrastructure Problem&lt;br&gt;
Telling engineers to write better prompts is telling engineers to write better SQL without giving them a linter. Telling teams to review prompts manually is telling them to do code review without git blame.&lt;/p&gt;

&lt;p&gt;The answer is the same answer every other quality-assurance problem eventually reached. Measure it. Instrument it. Put the measurement in the continuous integration pipeline. Block the bad ones before they ship.&lt;/p&gt;

&lt;p&gt;The Solution&lt;br&gt;
It’s already here at https:// pqs.onchainintel.net.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1. Free tier: paste a prompt, get the 8-dimension breakdown plus a suggested rewrite. No signup required.&lt;/li&gt;
&lt;li&gt;2. Paid tiers: start at $19.99/mo for unlimited private-repo CLI usage. $99.99/mo for teams with GitHub PR checks, Slack alerts, and shared dashboards.&lt;/li&gt;
&lt;li&gt;3. x402 tier: the paid API charges $0.025 to $0.125 per scoring call in USDC on Base and Solana.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://pqs.onchainintel.net/pricing" rel="noopener noreferrer"&gt;Subscription available&lt;/a&gt; for all tiers via Stripe.&lt;/p&gt;

&lt;p&gt;You don’t have to adopt it. But you should at minimum run 10 of your production prompts through the free tier this week and look at the numbers.&lt;/p&gt;

&lt;p&gt;If you’re shipping anything that takes prompts from humans or other agents, the input layer is measurable. Start measuring. You’re leaving most of the model’s capability on the table, and you don’t have to.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;If you want to score your own prompts, PQS is live at &lt;a href="https://pqs.onchainintel.net?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=500-prompts" rel="noopener noreferrer"&gt;pqs.onchainintel.net&lt;/a&gt; — free tier available, paid tiers for full 8-dimension scoring and batch runs.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MCP server&lt;/strong&gt; on npm: &lt;code&gt;npm install pqs-mcp-server&lt;/code&gt; — drop-in for Claude and other MCP-compatible agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Action&lt;/strong&gt;: &lt;a href="https://github.com/marketplace/actions/pqs-check" rel="noopener noreferrer"&gt;PQS Check on the Marketplace&lt;/a&gt; — score prompts in CI before they ship&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API&lt;/strong&gt;: Direct x402 micropayments on Base or Bearer API key (subscription tiers)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Data from this teardown: 500 software prompts, average score before optimization 13.27/80, average after 68.47/80, 425% improvement. 416 graded F, 84 graded D, 0 at C+ or above.&lt;/p&gt;

&lt;p&gt;If you're in DevRel, DevAdvocate, or DevEx working on AI pipelines — this is the input-quality data your builders need to see. Feel free to forward.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What I'd love feedback on: which of the 8 dimensions surprised you most as the common failure mode? My hypothesis was clarity, but the data says &lt;code&gt;examples&lt;/code&gt; at 1.01 average and &lt;code&gt;constraints&lt;/code&gt; at 1.09 — the structural stuff almost nobody includes.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>showdev</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>The Magic Prompt Formula is good. It's still missing a layer.</title>
      <dc:creator>OnChainAIIntel</dc:creator>
      <pubDate>Thu, 16 Apr 2026 05:52:56 +0000</pubDate>
      <link>https://dev.to/onchainaiintel/the-magic-prompt-formula-is-good-its-still-missing-a-layer-1oj4</link>
      <guid>https://dev.to/onchainaiintel/the-magic-prompt-formula-is-good-its-still-missing-a-layer-1oj4</guid>
      <description>&lt;p&gt;Everyone in AI circles eventually discovers the Magic Prompt Formula. If you haven't, here it is. It's the most widely shared structured approach to prompting, and it genuinely works.&lt;/p&gt;

&lt;p&gt;The formula has five parts:&lt;/p&gt;

&lt;p&gt;Role (Who): Assign a specific expert persona, "You are a senior content strategist" or "Act as a seasoned Python developer"&lt;/p&gt;

&lt;p&gt;Action (What): Use a clear verb defining what the AI should do ,"Draft," "Refactor," "Analyze," "Summarize"&lt;/p&gt;

&lt;p&gt;Context (Why): Provide relevant background so the model doesn't give generic answers; your audience, your product, your use case&lt;/p&gt;

&lt;p&gt;Examples (How): Give one or two samples of the expected output style — this is few-shot prompting in practice&lt;/p&gt;

&lt;p&gt;Constraints &amp;amp; Format (Boundaries): Set limits and define structure, "Under 200 words," "No jargon," "Single block of copy," "Use bullet points"&lt;/p&gt;

&lt;p&gt;Instead of typing "write me a LinkedIn post," you write:&lt;/p&gt;

&lt;p&gt;You are a senior content strategist specializing in B2B SaaS. Write a LinkedIn post announcing [product]. My audience is AI developers and technical founders. Here's an example of the tone I want: [example]. Keep it under 200 words, no buzzword fluff, single block of copy.&lt;/p&gt;

&lt;p&gt;That's Role, Action, Context, Examples, and Constraints all in one prompt. You've gone from a generic instruction to a specialized brief. The output quality difference is real and immediate.&lt;/p&gt;

&lt;p&gt;So what's the problem?&lt;/p&gt;

&lt;p&gt;The Magic Prompt Formula covers about half of what actually determines prompt quality.&lt;/p&gt;

&lt;p&gt;I know this because I built a tool that measures it.&lt;/p&gt;

&lt;p&gt;PQS: Prompt Quality Score —&amp;gt; scores prompts across 8 dimensions before you send them to a model. The five ingredients of the Magic Formula map cleanly onto five of those dimensions: clarity, specificity, context, examples, and constraints. A well-structured Magic Formula prompt typically scores in the 47–52 range out of 80. Solid. Grade A territory even.&lt;/p&gt;

&lt;p&gt;But there's a dimension the formula doesn't touch at all: chain-of-thought structure.&lt;/p&gt;

&lt;p&gt;CoT structure measures whether your prompt scaffolds the model's reasoning: numbered steps, analysis frameworks, structured output sequences, decision trees. It's the difference between asking an expert to answer a question and asking them to walk you through how they'd think about it.&lt;/p&gt;

&lt;p&gt;Magic Formula prompts score 3 or 4 out of 10 on CoT structure. Consistently. Across every vertical we've tested.&lt;/p&gt;

&lt;p&gt;I ran three Magic Formula prompts through PQS this week. One content prompt, one software prompt, one crypto analysis prompt.&lt;/p&gt;

&lt;p&gt;All three were well-constructed. All three scored Grade A before optimization.&lt;/p&gt;

&lt;p&gt;After PQS optimization, all three jumped by 63–71%.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyrfeig4v2pkc2jaiw8nb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyrfeig4v2pkc2jaiw8nb.png" alt=" " width="800" height="601"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The CoT dimension went from an average of 3.7 to 9.0 across all three. Every other dimension moved 1–4 points. CoT moved 5–6.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fch1s7es71zykcui3iris.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fch1s7es71zykcui3iris.png" alt=" " width="800" height="146"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's what that looks like in practice. The content prompt went from:&lt;/p&gt;

&lt;p&gt;"Keep it under 200 words, no buzzword fluff, end with a clear CTA, format as a single block of copy"&lt;/p&gt;

&lt;p&gt;To: "Structure your post: 1. Hook: open with a relatable developer pain point. 2. Problem: briefly explain the cost of discovering prompt issues too late. 3. Solution: introduce PQS as the pre-inference fix. 4. Benefit: one concrete outcome. 5. CTA: direct readers to pqs.onchainintel.net"&lt;/p&gt;

&lt;p&gt;Same constraints. Completely different reasoning scaffolding. The model doesn't just know what to write, it knows how to think through the writing.&lt;/p&gt;

&lt;p&gt;This is the AI 'input quality' problem in concrete form.&lt;/p&gt;

&lt;p&gt;The Magic Formula solves the WHO and the WHAT. It tells the model who it is and what you want. What it doesn't do is solve the HOW, the reasoning path the model should follow to get there.&lt;/p&gt;

&lt;p&gt;Most people find out their prompt was weak after the output disappoints them. By then you've already burned tokens, lost time, and often shipped something mediocre. PQS scores the prompt before inference. It catches the CoT gap, and every other gap, before you run it.&lt;/p&gt;

&lt;p&gt;If you're already using the Magic Formula, you're ahead of most people! Prompt Quality Score shows you exactly how much further ahead you could be.&lt;/p&gt;

&lt;p&gt;Score your next prompt before you send it → &lt;a href="https://pqs.onchainintel.net" rel="noopener noreferrer"&gt;https://pqs.onchainintel.net&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;About OnChainIntel — AI-powered on-chain wallet analysis. We decode the behavioral patterns, hidden biases, and implicit bets behind any wallet's transaction history. Try it free at onchainintel.net · Follow us on X: &lt;a class="mentioned-user" href="https://dev.to/onchainaiintel"&gt;@onchainaiintel&lt;/a&gt; · TikTok: &lt;a class="mentioned-user" href="https://dev.to/onchainintel"&gt;@onchainintel&lt;/a&gt; · YouTube: &lt;a class="mentioned-user" href="https://dev.to/onchainaiintel"&gt;@onchainaiintel&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Prompt Quality Score (PQS) Now Supports x402 Payments on Solana</title>
      <dc:creator>OnChainAIIntel</dc:creator>
      <pubDate>Mon, 13 Apr 2026 10:26:45 +0000</pubDate>
      <link>https://dev.to/onchainaiintel/prompt-quality-score-pqs-now-supports-x402-payments-on-solana-40nd</link>
      <guid>https://dev.to/onchainaiintel/prompt-quality-score-pqs-now-supports-x402-payments-on-solana-40nd</guid>
      <description>&lt;p&gt;Been heads down building. Quick update for anyone working in the agentic payments space.&lt;/p&gt;

&lt;p&gt;PQS (Prompt Quality Score), the pre-flight quality gate for AI agent workflows, now accepts x402 payments on both Base mainnet and Solana, in addition to the existing free tier and API key access.&lt;br&gt;
What PQS does&lt;/p&gt;

&lt;p&gt;Before your agent sends a prompt to an expensive LLM endpoint, PQS scores it across 8 dimensions: clarity, specificity, context, constraints, output format, role definition, examples, and chain-of-thought structure. Returns a score 0–80, grade A–F, percentile ranking, and top 3 fixes. &lt;/p&gt;

&lt;p&gt;Catch weak prompts before they cost you tokens or USDC.&lt;/p&gt;

&lt;h1&gt;
  
  
  Endpoint reference
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Free tier&lt;/strong&gt; — no auth required**&lt;br&gt;
POST &lt;a href="https://pqs.onchainintel.net/api/score/free" rel="noopener noreferrer"&gt;https://pqs.onchainintel.net/api/score/free&lt;/a&gt;&lt;br&gt;
{ "prompt": "your prompt here", "vertical": "software" }&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Score&lt;/strong&gt; + dimensional breakdown — $0.025 USDC via x402&lt;br&gt;
POST &lt;a href="https://pqs.onchainintel.net/api/score" rel="noopener noreferrer"&gt;https://pqs.onchainintel.net/api/score&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Full analysis&lt;/strong&gt; + optimized rewrite — $0.125 USDC via x402&lt;br&gt;
POST &lt;a href="https://pqs.onchainintel.net/api/score/full" rel="noopener noreferrer"&gt;https://pqs.onchainintel.net/api/score/full&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Batch&lt;/strong&gt; up to 5 prompts — $0.25 USDC via x402&lt;br&gt;
POST &lt;a href="https://pqs.onchainintel.net/api/score/batch" rel="noopener noreferrer"&gt;https://pqs.onchainintel.net/api/score/batch&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;A/B&lt;/strong&gt; compare two prompts — $1.25 USDC via x402&lt;br&gt;
POST &lt;a href="https://pqs.onchainintel.net/api/score/compare" rel="noopener noreferrer"&gt;https://pqs.onchainintel.net/api/score/compare&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  x402 payment details
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Base mainnet:&lt;/strong&gt; CDP facilitator, USDC ERC-20, EIP-3009 TransferWithAuthorization&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solana:&lt;/strong&gt; x402-compatible, USDC SPL token&lt;br&gt;
No account required for x402 path — agent pays per call, no subscription&lt;/p&gt;

&lt;p&gt;API key path also available for subscription-based access&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MCP server&lt;/li&gt;
&lt;li&gt;bash npx pqs-mcp-server&lt;/li&gt;
&lt;li&gt;Listed on Smithery (88/100), Official MCP Registry, Glama, and mcp.so. Three tools: score_prompt (free), optimize_prompt ($0.025), compare_models ($1.25).&lt;/li&gt;
&lt;li&gt;Python SDK&lt;/li&gt;
&lt;li&gt;bash pip install pqs-sdk&lt;/li&gt;
&lt;li&gt;Async support via AsyncPQSClient. Full x402 and API key auth.&lt;/li&gt;
&lt;li&gt;OpenAPI spec: &lt;a href="https://pqs.onchainintel.net/openapi.json" rel="noopener noreferrer"&gt;https://pqs.onchainintel.net/openapi.json&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Building this as infrastructure for the agent economy, score the input before the inference fires. Cheaper than one bad prompt.&lt;/p&gt;

&lt;p&gt;@kenburbary| pqs.onchainintel.net&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>promptengineering</category>
      <category>web3</category>
    </item>
    <item>
      <title>I built an npm middleware that scores your LLM prompts before they hit your agent workflow</title>
      <dc:creator>OnChainAIIntel</dc:creator>
      <pubDate>Sat, 04 Apr 2026 12:14:05 +0000</pubDate>
      <link>https://dev.to/onchainaiintel/i-built-an-npm-middleware-that-scores-your-llm-prompts-before-they-hit-your-agent-workflow-53ci</link>
      <guid>https://dev.to/onchainaiintel/i-built-an-npm-middleware-that-scores-your-llm-prompts-before-they-hit-your-agent-workflow-53ci</guid>
      <description>&lt;p&gt;The problem with most LLM agent workflows is that nobody is checking the quality of the prompts going in.&lt;/p&gt;

&lt;p&gt;Garbage in, garbage out but at scale, with agents firing hundreds of prompts per day, the garbage compounds fast.&lt;/p&gt;

&lt;p&gt;I built &lt;code&gt;x402-pqs&lt;/code&gt; to fix this. It's an Express middleware that intercepts prompts before they hit any LLM endpoint, scores them for quality, and adds the score to the request headers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;x402-pqs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Usage
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;express&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;express&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;pqsMiddleware&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;x402-pqs&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;express&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;express&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;pqsMiddleware&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;// warn if prompt scores below 10/40&lt;/span&gt;
  &lt;span class="na"&gt;vertical&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;crypto&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// scoring context&lt;/span&gt;
  &lt;span class="na"&gt;onLowScore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;warn&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// warn | block | ignore&lt;/span&gt;
&lt;span class="p"&gt;}));&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/api/chat&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Prompt score:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pqs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pqs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;grade&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ok&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every request gets these headers added automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;X-PQS-Score&lt;/code&gt; —&amp;gt; numeric score (0-40)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;X-PQS-Grade&lt;/code&gt; —&amp;gt; letter grade (A-F)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;X-PQS-Out-Of&lt;/code&gt; —&amp;gt; maximum score (40)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How the scoring works
&lt;/h2&gt;

&lt;p&gt;PQS scores prompts across 8 dimensions using 5 cited academic frameworks:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt-side (4 dimensions):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Specificity —&amp;gt; does the prompt define what it wants precisely?&lt;/li&gt;
&lt;li&gt;Context —&amp;gt; does it give the model enough to work with?&lt;/li&gt;
&lt;li&gt;Clarity —&amp;gt; are the directives unambiguous?&lt;/li&gt;
&lt;li&gt;Predictability —&amp;gt; would different runs produce consistent results?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Output-side (4 dimensions):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Completeness, Relevancy, Reasoning depth, Faithfulness&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Source frameworks: PEEM (Dongguk University, 2026) · RAGAS · MT-Bench · G-Eval · ROUGE&lt;/p&gt;

&lt;h2&gt;
  
  
  Real example
&lt;/h2&gt;

&lt;p&gt;This prompt: &lt;code&gt;"who are the smartest wallets on solana right now"&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Scored &lt;strong&gt;9/40&lt;/strong&gt; —&amp;gt; Grade D.&lt;/p&gt;

&lt;p&gt;The optimized version scored &lt;strong&gt;35/40&lt;/strong&gt; —&amp;gt; Grade A. &lt;/p&gt;

&lt;p&gt;+84% improvement.&lt;/p&gt;

&lt;p&gt;Same model. Same API. Completely different output quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  The payment layer
&lt;/h2&gt;

&lt;p&gt;The scoring API uses &lt;a href="https://x402.org" rel="noopener noreferrer"&gt;x402&lt;/a&gt;, an HTTP-native micropayment protocol now governed by the Linux Foundation, with Coinbase, Cloudflare, AWS, Stripe, Google, Microsoft, Visa, and Mastercard as founding members.&lt;/p&gt;

&lt;p&gt;Agents can call and pay for scoring autonomously — no API keys, no subscriptions. Just a wallet and $0.001 USDC per score.&lt;/p&gt;

&lt;p&gt;There's also a &lt;strong&gt;free tier&lt;/strong&gt; with no payment required:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://pqs.onchainintel.net/api/score/free &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"prompt": "your prompt here", "vertical": "general"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Returns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"out_of"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"grade"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"D"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"upgrade"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Get full dimension breakdown at /api/score for $0.001 USDC"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The data angle
&lt;/h2&gt;

&lt;p&gt;Every scored prompt pair goes into a corpus. At scale this becomes training data for a domain-specific prompt quality model. The thesis is similar to what Andrej Karpathy described recently about LLM knowledge bases, the data compounds in value over time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;npm: &lt;a href="https://npmjs.com/package/x402-pqs" rel="noopener noreferrer"&gt;x402-pqs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/OnChainAIIntel/x402-pqs" rel="noopener noreferrer"&gt;OnChainAIIntel/x402-pqs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;API: &lt;a href="https://pqs.onchainintel.net" rel="noopener noreferrer"&gt;pqs.onchainintel.net&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Free endpoint: &lt;code&gt;POST https://pqs.onchainintel.net/api/score/free&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Would love feedback from anyone building agent workflows. What scoring dimensions would you add?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>node</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
