<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Yefym Dmukh</title>
    <description>The latest articles on DEV Community by Yefym Dmukh (@yefymdmukh).</description>
    <link>https://dev.to/yefymdmukh</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3708811%2F7c8e8ae0-723d-45e1-800b-01fcb97883ac.png</url>
      <title>DEV Community: Yefym Dmukh</title>
      <link>https://dev.to/yefymdmukh</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yefymdmukh"/>
    <language>en</language>
    <item>
      <title>We kept breaking production workflows with prompt changes — so we started treating prompts as code</title>
      <dc:creator>Yefym Dmukh</dc:creator>
      <pubDate>Tue, 13 Jan 2026 12:34:09 +0000</pubDate>
      <link>https://dev.to/yefymdmukh/we-kept-breaking-production-workflows-with-prompt-changes-so-we-started-treating-prompts-as-code-4eel</link>
      <guid>https://dev.to/yefymdmukh/we-kept-breaking-production-workflows-with-prompt-changes-so-we-started-treating-prompts-as-code-4eel</guid>
      <description>&lt;p&gt;Hey folks,&lt;/p&gt;

&lt;p&gt;At the beginning of &lt;strong&gt;2024&lt;/strong&gt;, we were working as a service company for enterprise customers with a very concrete request:&lt;br&gt;
automate &lt;strong&gt;incoming emails → contract updates → ERP systems&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The first versions worked.&lt;br&gt;
Then, over time, they quietly stopped working.&lt;/p&gt;

&lt;p&gt;And not just because of new edge cases or creative wording.&lt;/p&gt;

&lt;p&gt;Emails we had already processed correctly started failing again.&lt;br&gt;
The same supplier messages produced different outputs weeks later.&lt;br&gt;
Minor prompt edits broke unrelated extraction logic.&lt;br&gt;
Model updates changed behavior without any visible signal.&lt;br&gt;
And business rules ended up split across prompts, workflows, and human memory.&lt;/p&gt;

&lt;p&gt;In an ERP context, this is unacceptable — you don’t get partial credit for “mostly correct”.&lt;/p&gt;

&lt;p&gt;We looked for existing tools that could stabilize AI logic under these conditions. We didn’t find any that handled:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;regression against &lt;strong&gt;previously working inputs&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;controlled evolution of prompts&lt;/li&gt;
&lt;li&gt;decoupling AI logic from automation workflows&lt;/li&gt;
&lt;li&gt;explainability when something changes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So we did what we knew from software engineering and automation work:&lt;br&gt;
we treated prompts as &lt;strong&gt;business logic&lt;/strong&gt;, and built a continuous development, testing, and deployment framework around them.&lt;/p&gt;

&lt;p&gt;That meant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;versioned prompts&lt;/li&gt;
&lt;li&gt;explicit output schemas&lt;/li&gt;
&lt;li&gt;regression tests against historical inputs&lt;/li&gt;
&lt;li&gt;model upgrades treated as migrations, not surprises&lt;/li&gt;
&lt;li&gt;and releases that were blocked unless everything still worked&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By late &lt;strong&gt;2024&lt;/strong&gt;, this approach allowed us to reliably extract contract updates from unstructured emails from over 100 suppliers into ERP systems with ** 100% signal accuracy**.&lt;/p&gt;

&lt;p&gt;Our product is now deployed across multiple enterprises in &lt;strong&gt;2025&lt;/strong&gt;.&lt;br&gt;
&lt;strong&gt;We’re sharing it as open source because this problem isn’t unique to us&lt;/strong&gt; — it’s what happens when LLMs leave experiments and enter real workflows.&lt;/p&gt;

&lt;p&gt;You can think of it like &lt;strong&gt;cursor for prompts + GitHub + Execution and Integration Environment&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The mental model that finally clicked for us wasn’t “prompt engineering”, but prompt = code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Patterns that actually mattered for us
&lt;/h2&gt;

&lt;p&gt;These weren’t theoretical ideas — they came from production failures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Narrow surface decomposition One prompt&lt;/strong&gt; = one signal. No “do everything” prompts. Boolean / scalar outputs instead of free text.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test before production (always)&lt;/strong&gt; If behavior isn’t testable, it doesn’t ship. No runtime magic, no self-healing agents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decouple AI logic from workflows&lt;/strong&gt; Prompts don’t live inside n8n / agents / app code. Workflows call versioned prompt releases.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model changes are migrations, not surprises&lt;/strong&gt; New model → rerun regressions offline → commit or reject.
This approach is already running in several enterprise deployments.
One example: extracting business signals from incoming emails into ERP systems with 100% signal accuracy at the indicator level (not “pretty text”, but actual machine-actionable flags).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Genum is (and isn’t)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Open source (on-prem)&lt;/li&gt;
&lt;li&gt;Free to use (SaaS optional, lifetime free tier)&lt;/li&gt;
&lt;li&gt;Includes a small $5 credit for major model providers so testing isn’t hypothetical&lt;/li&gt;
&lt;li&gt;Not a prompt playground&lt;/li&gt;
&lt;li&gt;Not an agent framework&lt;/li&gt;
&lt;li&gt;Not runtime policy enforcement&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s infrastructure for making &lt;strong&gt;AI behavior boring and reliable&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you’re:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;shipping LLMs inside real systems&lt;/li&gt;
&lt;li&gt;maintaining business automations&lt;/li&gt;
&lt;li&gt;trying to separate experimental AI from production logic&lt;/li&gt;
&lt;li&gt;tired of prompts behaving like vibes instead of software&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;we’d genuinely love feedback — especially critical feedback.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links (if you want to dig in):
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Repo: &lt;a href="https://github.com/genumai/" rel="noopener noreferrer"&gt;https://github.com/genumai/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Docs: &lt;a href="https://genum.ai/docs" rel="noopener noreferrer"&gt;https://genum.ai/docs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Website: &lt;a href="https://genum.ai" rel="noopener noreferrer"&gt;https://genum.ai&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;YouTube (patterns &amp;amp; deep dives): &lt;a href="https://www.youtube.com/@Genum-ai" rel="noopener noreferrer"&gt;https://www.youtube.com/@Genum-ai&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;We are looking for advisors: &lt;a href="https://cdn.genum.ai/docs/advisor_pitch.pdf" rel="noopener noreferrer"&gt;https://cdn.genum.ai/docs/advisor_pitch.pdf&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We’re not here to sell anything — this exists because we needed it ourselves.&lt;br&gt;
Happy to answer questions, debate assumptions, or collaborate with people who are actually running this stuff in production.&lt;/p&gt;

&lt;p&gt;— The Genum team&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>discuss</category>
      <category>promptengineering</category>
    </item>
  </channel>
</rss>
