<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Bob</title>
    <description>The latest articles on DEV Community by Bob (@jsxyzb).</description>
    <link>https://dev.to/jsxyzb</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3869111%2Feac87454-3c4f-4595-922c-669cd0221fbc.png</url>
      <title>DEV Community: Bob</title>
      <link>https://dev.to/jsxyzb</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jsxyzb"/>
    <language>en</language>
    <item>
      <title>Why Prompt-Only Moderation Failed in My AI Generation App</title>
      <dc:creator>Bob</dc:creator>
      <pubDate>Fri, 10 Apr 2026 06:50:29 +0000</pubDate>
      <link>https://dev.to/jsxyzb/why-prompt-only-moderation-failed-in-my-ai-generation-app-1m11</link>
      <guid>https://dev.to/jsxyzb/why-prompt-only-moderation-failed-in-my-ai-generation-app-1m11</guid>
      <description>&lt;p&gt;When I first added moderation to my AI generation app, I treated it as a text problem.&lt;/p&gt;

&lt;p&gt;That seemed reasonable at the time. A user sends a prompt, I check the prompt, and if it looks unsafe, I block the request before it reaches the model.&lt;/p&gt;

&lt;p&gt;That approach worked for a very short time.&lt;/p&gt;

&lt;p&gt;It stopped working the moment I supported image inputs, reference images, and multiple generation flows. At that point, I realized something important: prompt-only moderation is not really moderation. It is just one partial check inside a much larger pipeline.&lt;/p&gt;

&lt;p&gt;This post is about what changed in my backend once I accepted that.&lt;/p&gt;

&lt;h2&gt;
  
  
  The mistake: treating moderation as a wrapper
&lt;/h2&gt;

&lt;p&gt;A lot of AI products start with moderation as a thin wrapper around generation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;receive a prompt&lt;/li&gt;
&lt;li&gt;run a text safety check&lt;/li&gt;
&lt;li&gt;call the model provider&lt;/li&gt;
&lt;li&gt;return the result&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The problem is that real generation workflows are rarely that simple.&lt;/p&gt;

&lt;p&gt;Once users can upload source images, provide reference images, or switch between text-to-image and image-to-image generation flows, the prompt becomes just one component of the overall request. A completely harmless prompt can still be paired with problematic input images. If the backend only inspects the text, the system will inevitably have a blind spot.&lt;/p&gt;

&lt;p&gt;That was the first issue I had to fix.&lt;/p&gt;

&lt;h2&gt;
  
  
  Moderation belongs inside the generation pipeline
&lt;/h2&gt;

&lt;p&gt;I ended up moving moderation into the backend generation workflow itself instead of treating it as a separate utility.&lt;/p&gt;

&lt;p&gt;Conceptually, the flow became:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;validate the request&lt;/li&gt;
&lt;li&gt;load the selected provider and model&lt;/li&gt;
&lt;li&gt;inspect both prompt text and image inputs&lt;/li&gt;
&lt;li&gt;block flagged requests before spending credits&lt;/li&gt;
&lt;li&gt;create the generation task only if moderation passes&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That decision helped for two reasons.&lt;/p&gt;

&lt;p&gt;First, it kept moderation close to the actual business rules. I did not want unsafe requests to consume credits, create external jobs, or leave behind half-failed task records.&lt;/p&gt;

&lt;p&gt;Second, it forced me to normalize the input shape. Instead of only thinking in terms of prompt, I had to define a moderation input that could include prompt text, image URLs, model context, and generation scene.&lt;/p&gt;

&lt;p&gt;That made the system much easier to reason about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prompt checks are useful, but incomplete
&lt;/h2&gt;

&lt;p&gt;Text moderation is still valuable. It catches a lot of obvious cases early, and it is usually cheaper and faster than processing images.&lt;/p&gt;

&lt;p&gt;But text-only checks have two major limitations.&lt;/p&gt;

&lt;p&gt;The first is obvious: users can submit problematic visual input even if the prompt itself looks harmless.&lt;/p&gt;

&lt;p&gt;The second is less obvious: language coverage is uneven. Depending on the moderation provider, some languages are better supported than others. That means your confidence level should not be the same across all prompts.&lt;/p&gt;

&lt;p&gt;In my case, that pushed me toward a more defensive design: if text checks are incomplete, the rest of the safety system has to acknowledge that limitation instead of pretending the problem is solved.&lt;/p&gt;

&lt;h2&gt;
  
  
  Images changed the design
&lt;/h2&gt;

&lt;p&gt;The biggest improvement came from treating image inputs as first-class moderation targets.&lt;/p&gt;

&lt;p&gt;That sounds straightforward, but it changed several implementation details:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the moderation step now had to collect image URLs from different request fields&lt;/li&gt;
&lt;li&gt;the backend needed one normalized moderation interface, even if the underlying provider had different APIs for text and image checks&lt;/li&gt;
&lt;li&gt;moderation results had to return structured categories and scores, not just a single boolean&lt;/li&gt;
&lt;li&gt;failure behavior had to be explicit&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last point matters more than it seems.&lt;/p&gt;

&lt;p&gt;If a moderation provider fails, what should happen?&lt;/p&gt;

&lt;p&gt;You have to choose between two imperfect options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fail-open: allow the request and accept some risk&lt;/li&gt;
&lt;li&gt;fail-closed: block the request and accept some false positives or degraded UX&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There is no universal correct answer. It depends on the kind of product you are building, your abuse tolerance, and how costly a bad generation is for you. But the important part is to make the decision deliberately. Silent fallback logic is where safety systems get weak.&lt;/p&gt;

&lt;h2&gt;
  
  
  Provider-specific APIs should not leak everywhere
&lt;/h2&gt;

&lt;p&gt;Another lesson was that moderation providers should be isolated behind a small internal interface.&lt;/p&gt;

&lt;p&gt;Not because provider abstraction is fashionable, but because safety logic tends to spread if you let it.&lt;/p&gt;

&lt;p&gt;If one route handler knows how text moderation works, another knows how image moderation works, and a third knows how to interpret provider-specific category names, you do not have a moderation layer anymore. You have moderation fragments.&lt;/p&gt;

&lt;p&gt;I found it much cleaner to keep a moderation manager in the backend and let the generation route ask one question: “Is this request safe enough to proceed?”&lt;/p&gt;

&lt;p&gt;That does not remove complexity. It contains it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The practical takeaway
&lt;/h2&gt;

&lt;p&gt;The most useful shift in my thinking was this:&lt;/p&gt;

&lt;p&gt;Moderation is not a feature attached to generation. It is part of generation.&lt;/p&gt;

&lt;p&gt;Once I started treating it that way, the backend became easier to evolve. I could add checks for both prompt text and image inputs, make blocking decisions before credits were consumed, and keep provider-specific moderation details out of the rest of the app.&lt;/p&gt;

&lt;p&gt;I am using this approach while building &lt;a href="https://videoflux.video" rel="noopener noreferrer"&gt;videoflux.video&lt;/a&gt;, where one workflow needs to support AI image and video generation without assuming that a prompt alone tells the full safety story.&lt;/p&gt;

&lt;p&gt;Disclosure: I’m the builder of videoflux.video.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>security</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
