<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Guardex</title>
    <description>The latest articles on DEV Community by Guardex (@guardex).</description>
    <link>https://dev.to/guardex</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4016192%2Fc15e8f23-779a-4bf0-8b1c-18b0b087d542.png</url>
      <title>DEV Community: Guardex</title>
      <link>https://dev.to/guardex</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/guardex"/>
    <language>en</language>
    <item>
      <title>How We Built a 200ms Image Moderation API on Cheap CPUs Using YOLOv8 and ONNX</title>
      <dc:creator>Guardex</dc:creator>
      <pubDate>Sun, 05 Jul 2026 12:59:50 +0000</pubDate>
      <link>https://dev.to/guardex/how-we-built-a-200ms-image-moderation-api-on-cheap-cpus-using-yolov8-and-onnx-216k</link>
      <guid>https://dev.to/guardex/how-we-built-a-200ms-image-moderation-api-on-cheap-cpus-using-yolov8-and-onnx-216k</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F7kickjcb07wiiwp59yre.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F7kickjcb07wiiwp59yre.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Moderating user-generated content (UGC) is a necessity for almost any modern web application. But if you rely on major cloud providers like AWS Rekognition or Google Cloud Vision, scaling your platform can quickly lead to eye-watering API bills. &lt;/p&gt;

&lt;p&gt;Moreover, hosting heavy PyTorch or TensorFlow models on GPU-enabled servers is a massive overhead for indie projects.&lt;/p&gt;

&lt;p&gt;I wanted to solve this. So I spent the last few months building &lt;strong&gt;SafeVision&lt;/strong&gt; — a real-time, CPU-optimized image moderation API that runs in under 200ms on a basic VPS.&lt;/p&gt;

&lt;p&gt;Here is the exact architecture and optimization stack I used to make it happen.&lt;/p&gt;




&lt;h3&gt;
  
  
  The Architecture: Object Detection + Scene Classification
&lt;/h3&gt;

&lt;p&gt;To avoid false positives, a single model isn't enough. We implemented a dual-model consensus engine:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;YOLOv8 Object Detector (Hawk Model):&lt;/strong&gt; Specialized in identifying specific threat objects like weapons, blades, and blood. It returns precise bounding boxes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EfficientNet Classifier:&lt;/strong&gt; Evaluates the overall scene context (NSFW, violence, gore). This prevents a medical surgery image from being flagged as a crime scene.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decision Engine:&lt;/strong&gt; Merges results from both models based on dynamic threshold rules to make the final "allow" or "block" decision.&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  The Optimization: Porting to ONNX Runtime
&lt;/h3&gt;

&lt;p&gt;Running PyTorch models on standard CPU servers usually results in terrible latency (often &amp;gt;1.5 seconds per image). To optimize the engine, we did the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ONNX Conversion:&lt;/strong&gt; We converted our trained YOLOv8 and CNN models to ONNX format.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CPU Execution Provider:&lt;/strong&gt; By using ONNX Runtime optimized for CPU execution, we reduced memory footprint by 70% and cut inference time down to &lt;strong&gt;150ms - 200ms&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lazy Loading &amp;amp; Caching:&lt;/strong&gt; Weights are loaded into memory once on startup and cached, avoiding filesystem I/O overhead on incoming requests.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  The API and Client-Side Blurring
&lt;/h3&gt;

&lt;p&gt;We built the backend using &lt;strong&gt;FastAPI&lt;/strong&gt; due to its asynchronous performance. Instead of doing heavy image manipulation on the server, the API returns the bounding boxes of the flagged objects:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"safe"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"categories"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"weapon"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.94&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"box"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"width"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;250&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"height"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;180&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"latency_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;180&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  &lt;strong&gt;Give it a try!&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;I have just launched SafeVision on Product Hunt and opened a free Developer Sandbox (1,000 monthly scans).&lt;/p&gt;

&lt;p&gt;Check the live demo here: &lt;a href="https://guardextech.com" rel="noopener noreferrer"&gt;SafeVision&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Product Hunt Launch: If you want to support a solo developer building open-alternatives, check us out on Product Hunt: &lt;a href="https://www.producthunt.com/products/guardex" rel="noopener noreferrer"&gt;Product Hunt&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I would love to hear your feedback on the API latency or how we can optimize ONNX inference even further!&lt;/p&gt;

</description>
      <category>showdev</category>
      <category>python</category>
      <category>webdev</category>
      <category>saas</category>
    </item>
  </channel>
</rss>
