<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ken</title>
    <description>The latest articles on DEV Community by Ken (@kenerator).</description>
    <link>https://dev.to/kenerator</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3979996%2F56e8dc59-5eb2-43ca-858d-4b0b69c6ed63.jpeg</url>
      <title>DEV Community: Ken</title>
      <link>https://dev.to/kenerator</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kenerator"/>
    <language>en</language>
    <item>
      <title>A Fluent LLM Answer Is Not the Same as an Inspected Answer</title>
      <dc:creator>Ken</dc:creator>
      <pubDate>Thu, 11 Jun 2026 19:12:25 +0000</pubDate>
      <link>https://dev.to/kenerator/a-fluent-llm-answer-is-not-the-same-as-an-inspected-answer-o98</link>
      <guid>https://dev.to/kenerator/a-fluent-llm-answer-is-not-the-same-as-an-inspected-answer-o98</guid>
      <description>&lt;p&gt;Last time I hit a guardrail, it did not offer to repair my car.&lt;/p&gt;

&lt;p&gt;This one will not repair the car either. But it can help repair an answer that&lt;br&gt;
forgot where the car is.&lt;/p&gt;

&lt;p&gt;Here is the small version of the problem:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I need to get my car washed and the carwash is only 50 meters away. Should I&lt;br&gt;
drive there or just walk?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;An LLM can answer that walking is better. The distance is short. Walking saves&lt;br&gt;
fuel. Walking is simple.&lt;/p&gt;

&lt;p&gt;That sounds reasonable until you ask what actually moved.&lt;/p&gt;

&lt;p&gt;Walking moves the person to the car wash. It does not move the car.&lt;/p&gt;

&lt;p&gt;That is not a grammar problem or a tone problem. The answer violates a&lt;br&gt;
precondition: the car must be at the wash before the car can be washed.&lt;/p&gt;

&lt;p&gt;Prompting can sometimes fix this one case. So can switching models. The same&lt;br&gt;
class of failure can still show up across local models, hosted commercial&lt;br&gt;
models, coding assistants, and agent frameworks.&lt;/p&gt;

&lt;p&gt;The more useful pattern is not "write a better prompt and hope." The useful&lt;br&gt;
pattern is hybrid reasoning:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LLM draft
  -&amp;gt; structured facts
  -&amp;gt; selected inspection
  -&amp;gt; evidence-backed repair packet
  -&amp;gt; revised answer
  -&amp;gt; fact extraction again
  -&amp;gt; selected inspection again
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important part is the last line.&lt;/p&gt;

&lt;p&gt;The repair is not the finish line. The repaired answer still has to pass&lt;br&gt;
inspection.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why This Is Hybrid Reasoning
&lt;/h2&gt;

&lt;p&gt;"Guardrails" has become a popular word for LLM safety and reliability, but the&lt;br&gt;
term can hide very different mechanisms. A keyword filter, a schema validator,&lt;br&gt;
a formal solver, a decision table, and a Bayesian network are not the same&lt;br&gt;
tool.&lt;/p&gt;

&lt;p&gt;The pattern here is more specific:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;language model
  -&amp;gt; structured representation
  -&amp;gt; selected reasoning mechanism
  -&amp;gt; feedback
  -&amp;gt; revised language
  -&amp;gt; selected reasoning mechanism again
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The LLM drafts, extracts, and repairs. The non-LLM components do the parts they&lt;br&gt;
are better suited for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CLIPS inspects explicit rules.&lt;/li&gt;
&lt;li&gt;Solver/Z3 inspects feasibility and constraints.&lt;/li&gt;
&lt;li&gt;ZEN inspects decision tables and policy admissibility.&lt;/li&gt;
&lt;li&gt;Bayesian networks update review-risk posteriors under uncertainty.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key design choice is selection. Do not force every mechanism into every&lt;br&gt;
problem.&lt;/p&gt;
&lt;h2&gt;
  
  
  Four Small Scenarios
&lt;/h2&gt;

&lt;p&gt;The public &lt;code&gt;common-sense-guardrails&lt;/code&gt; example uses four scenarios:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;What can go wrong&lt;/th&gt;
&lt;th&gt;Inspection that fits&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;car-wash&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The answer moves the person, not the car.&lt;/td&gt;
&lt;td&gt;CLIPS for object presence; Solver/Z3 for feasibility evidence.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;coupon-stack&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The answer stacks discounts that policy or margin rules do not allow.&lt;/td&gt;
&lt;td&gt;CLIPS and ZEN for policy; BN for review risk.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pallet-door&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The answer suggests pushing a wide pallet through a narrower door.&lt;/td&gt;
&lt;td&gt;CLIPS for the rule surface; Solver/Z3 for dimensional feasibility.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cold-chain&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The answer ignores certified refrigerated handling and traceability.&lt;/td&gt;
&lt;td&gt;CLIPS and ZEN for policy; BN for incomplete compliance evidence.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The pallet-door case has the same practical absurdity as the car-wash case.&lt;br&gt;
"Just push the wide pallet through the narrow door" is not a logistics plan.&lt;br&gt;
It is a sentence that avoided doing geometry.&lt;/p&gt;

&lt;p&gt;The ultimate comic version would combine all four:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Someone needs their car washed, wants to use multiple coupons, and has an&lt;br&gt;
extra-wide pallet of fresh-frozen fish strapped to the roof of their car.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That would exercise object presence, coupon policy, dimensional feasibility,&lt;br&gt;
and cold-chain handling in one memorable errand.&lt;/p&gt;

&lt;p&gt;It is ridiculous. It is also a good reminder that production guardrails often&lt;br&gt;
belong to different owners. Marketing or finance may own coupon policy.&lt;br&gt;
Logistics may own pallet feasibility. QA or safety may own cold-chain handling.&lt;br&gt;
A platform team may own the repair loop.&lt;/p&gt;

&lt;p&gt;Those groups should not all be forced to edit one monolithic prompt every time&lt;br&gt;
one policy or constraint changes.&lt;/p&gt;
&lt;h2&gt;
  
  
  What The Guardrail Looks Like
&lt;/h2&gt;

&lt;p&gt;For the car-wash case, the native CLIPS rule is direct:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(defrule car-required-at-wash
  (required-object
    (object car)
    (required-location car_wash)
    (current-location ?where)
    (present-at-required-location false))
  (moved-object
    (action-id ?action)
    (object person)
    (to car_wash))
  =&amp;gt;
  (assert
    (guardrail-finding
      (status fail)
      (rule-id car-required-at-wash)
      (severity error)
      (message "Walking moves the person to the wash, but the car remains at home."))))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For coupon and cold-chain scenarios, Bayesian Network scoring adds a different&lt;br&gt;
kind of inspection. It does not prove a contradiction; it makes review risk&lt;br&gt;
explicit enough to route, repair, or escalate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;coupon-stack / --guardrails auto
  selected: clips, zen, bn
  BN attempt 1: needs_review = 0.95064 -&amp;gt; fail
  BN attempt 2: needs_review = 0.222 -&amp;gt; pass

cold-chain / --guardrails auto
  selected: clips, zen, bn
  BN attempt 1: needs_review = 0.921 -&amp;gt; fail
  BN attempt 2: needs_review = 0.1247 -&amp;gt; pass
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Live Output Will Vary
&lt;/h2&gt;

&lt;p&gt;While preparing the full Field Note, we tried to get a neat live capture from a&lt;br&gt;
local Ollama model.&lt;/p&gt;

&lt;p&gt;That did happen. But not every time, and not in exactly the same way.&lt;/p&gt;

&lt;p&gt;One model reproduced the naive car-wash failure and repaired cleanly. Another&lt;br&gt;
reached a final pass, but the repaired prose was awkward. Another exposed&lt;br&gt;
structured-output fragility before the deeper inspections could run cleanly.&lt;/p&gt;

&lt;p&gt;For a minute, that was frustrating.&lt;/p&gt;

&lt;p&gt;Then it became the point.&lt;/p&gt;

&lt;p&gt;Live LLM output can vary. Model version, local server load, decoding behavior,&lt;br&gt;
context handling, provider adapters, JSON behavior, timeout behavior, and small&lt;br&gt;
prompt/runtime differences can all change what comes back.&lt;/p&gt;

&lt;p&gt;That is why the intermediate artifacts matter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What did the draft recommend?&lt;/li&gt;
&lt;li&gt;What facts were extracted?&lt;/li&gt;
&lt;li&gt;Which inspections were selected?&lt;/li&gt;
&lt;li&gt;Which findings failed?&lt;/li&gt;
&lt;li&gt;What repair packet was built?&lt;/li&gt;
&lt;li&gt;Did the revised answer pass inspection?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The final paragraph alone is not enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Full Field Note:
&lt;a href="https://nxus.systems/field-notes/guardrail-loops-for-llm-repair" rel="noopener noreferrer"&gt;https://nxus.systems/field-notes/guardrail-loops-for-llm-repair&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Example docs:
&lt;a href="https://docs.nxus.systems/nxuskit/examples/integrations/common-sense-guardrails/" rel="noopener noreferrer"&gt;https://docs.nxus.systems/nxuskit/examples/integrations/common-sense-guardrails/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Example source:
&lt;a href="https://github.com/nxus-SYSTEMS/nxusKit-examples/tree/main/examples/integrations/common-sense-guardrails" rel="noopener noreferrer"&gt;https://github.com/nxus-SYSTEMS/nxusKit-examples/tree/main/examples/integrations/common-sense-guardrails&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;SDK:
&lt;a href="https://github.com/nxus-SYSTEMS/nxusKit" rel="noopener noreferrer"&gt;https://github.com/nxus-SYSTEMS/nxusKit&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The lesson is not that one model always gets the car-wash question wrong. The&lt;br&gt;
lesson is that a fluent answer is not the same thing as an inspected answer.&lt;/p&gt;

&lt;p&gt;For workflows where correctness matters, let the LLM draft. Then make the facts&lt;br&gt;
explicit, run the selected inspections, repair from evidence, and inspect the&lt;br&gt;
repair.&lt;/p&gt;

</description>
      <category>devtools</category>
      <category>python</category>
      <category>ai</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
