<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: verdic28</title>
    <description>The latest articles on DEV Community by verdic28 (@verdic28).</description>
    <link>https://dev.to/verdic28</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3709492%2Fc5ae5022-9023-4d91-90b3-731995271efd.png</url>
      <title>DEV Community: verdic28</title>
      <link>https://dev.to/verdic28</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/verdic28"/>
    <language>en</language>
    <item>
      <title>AI in Production: Why Prompts, Filters, and Monitoring Aren’t Enough</title>
      <dc:creator>verdic28</dc:creator>
      <pubDate>Tue, 13 Jan 2026 17:35:46 +0000</pubDate>
      <link>https://dev.to/verdic28/ai-in-production-why-prompts-filters-and-monitoring-arent-enough-1bp2</link>
      <guid>https://dev.to/verdic28/ai-in-production-why-prompts-filters-and-monitoring-arent-enough-1bp2</guid>
      <description>&lt;p&gt;Over the past year, LLMs have moved from demos into real production systems — agentic workflows, internal tools, customer-facing automations, and decision pipelines.&lt;/p&gt;

&lt;p&gt;What’s surprising is that most production failures aren’t about model intelligence or latency.&lt;/p&gt;

&lt;p&gt;They’re about trust.&lt;/p&gt;

&lt;p&gt;The kinds of failures we keep seeing in production&lt;/p&gt;

&lt;p&gt;In real deployments, teams usually rely on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;careful prompt engineering&lt;/li&gt;
&lt;li&gt;structured outputs (JSON, schemas)&lt;/li&gt;
&lt;li&gt;post-hoc monitoring and logs&lt;/li&gt;
&lt;li&gt;human review for high-risk cases
These work… until systems get more complex.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some recurring failure modes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Confident hallucinations&lt;br&gt;
Outputs are fluent, structured, and pass surface checks — but contain fabricated facts or incorrect assumptions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Intent drift&lt;br&gt;
The model technically answers the prompt, but exceeds what it was allowed to do (e.g. advice instead of summarization, inference instead of extraction).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Contextual overreach&lt;br&gt;
LLMs pull in outside knowledge that violates domain or regulatory boundaries — especially common in agent + tool-calling setups.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Silent failures&lt;br&gt;
Nothing crashes. Logs look fine. But downstream systems act on invalid outputs.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These issues often pass prompts and basic validation because those tools are probabilistic and best-effort, not enforceable.&lt;/p&gt;

&lt;p&gt;Why prompt engineering doesn’t scale&lt;/p&gt;

&lt;p&gt;Prompts are great at guiding behavior.&lt;br&gt;
They’re bad at enforcing boundaries.&lt;/p&gt;

&lt;p&gt;Once you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;long-running agents&lt;/li&gt;
&lt;li&gt;retries and memory&lt;/li&gt;
&lt;li&gt;tool invocation&lt;/li&gt;
&lt;li&gt;multiple stakeholders relying on outputs
you need something closer to contracts, not suggestions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Introducing &lt;a href="https://www.verdic.dev/" rel="noopener noreferrer"&gt;Verdic&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;I’m working on Verdic, a validation and enforcement layer for production AI systems.&lt;/p&gt;

&lt;p&gt;The core idea is simple:&lt;/p&gt;

&lt;p&gt;Before an LLM output is executed, stored, or acted upon, it should be validated against an explicit intent and scope contract.&lt;/p&gt;

&lt;p&gt;Verdic focuses on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pre-execution validation (not just monitoring after the fact)&lt;/li&gt;
&lt;li&gt;Intent, scope, and domain compliance&lt;/li&gt;
&lt;li&gt;Deterministic enforcement, not “hope the prompt holds”
It’s designed to sit between the LLM and the application, similar to how we validate inputs in traditional systems — but applied to AI outputs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn’t about removing creativity.&lt;br&gt;
It’s about making AI safe to rely on in production environments.&lt;/p&gt;

&lt;p&gt;Why this matters now&lt;/p&gt;

&lt;p&gt;As AI systems move into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fintech and regulated domains&lt;/li&gt;
&lt;li&gt;enterprise workflows&lt;/li&gt;
&lt;li&gt;internal decision systems
the key question isn’t:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;“Is the model smart?”&lt;/p&gt;

&lt;p&gt;It’s:&lt;/p&gt;

&lt;p&gt;“Can we trust the output every time, even under drift and edge cases?”&lt;br&gt;
That’s an engineering and governance problem, not a model problem.&lt;/p&gt;

&lt;p&gt;Open question to the community&lt;/p&gt;

&lt;p&gt;For those running LLMs in production:&lt;/p&gt;

&lt;p&gt;What failures surprised you the most?&lt;br&gt;
Where did prompts and monitoring fall short?&lt;br&gt;
How are you validating outputs before they cause damage?&lt;br&gt;
I’m especially interested in real post-mortems, not theory.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>AI in Production: Why Prompts, Filters, and Monitoring Aren’t Enough</title>
      <dc:creator>verdic28</dc:creator>
      <pubDate>Tue, 13 Jan 2026 17:23:10 +0000</pubDate>
      <link>https://dev.to/verdic28/ai-in-production-why-prompts-filters-and-monitoring-arent-enough-3a03</link>
      <guid>https://dev.to/verdic28/ai-in-production-why-prompts-filters-and-monitoring-arent-enough-3a03</guid>
      <description>&lt;p&gt;Over the past year, LLMs have moved from demos into real production systems — agentic workflows, internal tools, customer-facing automations, and decision pipelines.&lt;/p&gt;

&lt;p&gt;What’s surprising is that most production failures aren’t about model intelligence or latency.&lt;/p&gt;

&lt;p&gt;They’re about trust.&lt;/p&gt;

&lt;p&gt;The kinds of failures we keep seeing in production&lt;/p&gt;

&lt;p&gt;In real deployments, teams usually rely on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;careful prompt engineering&lt;/li&gt;
&lt;li&gt;structured outputs (JSON, schemas)&lt;/li&gt;
&lt;li&gt;post-hoc monitoring and logs&lt;/li&gt;
&lt;li&gt;human review for high-risk cases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These work… until systems get more complex.&lt;/p&gt;

&lt;p&gt;Some recurring failure modes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Confident hallucinations&lt;br&gt;
Outputs are fluent, structured, and pass surface checks — but contain fabricated facts or incorrect assumptions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Intent drift&lt;br&gt;
The model technically answers the prompt, but exceeds what it was allowed to do (e.g. advice instead of summarization, inference instead of extraction).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Contextual overreach&lt;br&gt;
LLMs pull in outside knowledge that violates domain or regulatory boundaries — especially common in agent + tool-calling setups.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Silent failures&lt;br&gt;
Nothing crashes. Logs look fine. But downstream systems act on invalid outputs.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These issues often pass prompts and basic validation because those tools are probabilistic and best-effort, not enforceable.&lt;/p&gt;

&lt;p&gt;Why prompt engineering doesn’t scale&lt;/p&gt;

&lt;p&gt;Prompts are great at guiding behavior.&lt;br&gt;
They’re bad at enforcing boundaries.&lt;/p&gt;

&lt;p&gt;Once you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;long-running agents&lt;/li&gt;
&lt;li&gt;retries and memory&lt;/li&gt;
&lt;li&gt;tool invocation&lt;/li&gt;
&lt;li&gt;multiple stakeholders relying on outputs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;you need something closer to contracts, not suggestions.&lt;/p&gt;

&lt;p&gt;Introducing &lt;a href="https://www.verdic.dev/" rel="noopener noreferrer"&gt;Verdic&lt;/a&gt;: &lt;/p&gt;

&lt;p&gt;I’m working on Verdic, a validation and enforcement layer for production AI systems.&lt;/p&gt;

&lt;p&gt;The core idea is simple:&lt;/p&gt;

&lt;p&gt;Before an LLM output is executed, stored, or acted upon, it should be validated against an explicit intent and scope contract.&lt;/p&gt;

&lt;p&gt;Verdic focuses on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pre-execution validation (not just monitoring after the fact)&lt;/li&gt;
&lt;li&gt;Intent, scope, and domain compliance&lt;/li&gt;
&lt;li&gt;Deterministic enforcement, not “hope the prompt holds”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s designed to sit between the LLM and the application, similar to how we validate inputs in traditional systems — but applied to AI outputs.&lt;/p&gt;

&lt;p&gt;This isn’t about removing creativity.&lt;br&gt;
It’s about making AI safe to rely on in production environments.&lt;/p&gt;

&lt;p&gt;Why this matters now&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;As AI systems move into:&lt;/li&gt;
&lt;li&gt;fintech and regulated domains&lt;/li&gt;
&lt;li&gt;enterprise workflows&lt;/li&gt;
&lt;li&gt;&lt;p&gt;internal decision systems&lt;br&gt;
the key question isn’t:&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;“Is the model smart?”&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Can we trust the output every time, even under drift and edge cases?”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s an engineering and governance problem, not a model problem.&lt;/p&gt;

&lt;p&gt;Open question to the community&lt;/p&gt;

&lt;p&gt;For those running LLMs in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What failures surprised you the most?&lt;/li&gt;
&lt;li&gt;Where did prompts and monitoring fall short?&lt;/li&gt;
&lt;li&gt;How are you validating outputs before they cause damage?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’m especially interested in real post-mortems, not theory.&lt;/p&gt;

&lt;h1&gt;
  
  
  ai #llm #devops #softwareengineering
&lt;/h1&gt;

</description>
    </item>
  </channel>
</rss>
