<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ratonpeureu</title>
    <description>The latest articles on DEV Community by Ratonpeureu (@ratonpeureu).</description>
    <link>https://dev.to/ratonpeureu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3909700%2Fc740ca2d-0f85-4a70-80fa-e8a6c31c560f.png</url>
      <title>DEV Community: Ratonpeureu</title>
      <link>https://dev.to/ratonpeureu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ratonpeureu"/>
    <language>en</language>
    <item>
      <title>I built a real AI video processing SaaS from Senegal no GPT wrappers, just HuggingFace + OpenCV + YOLO + Detectron2+Medidapie+ Celery</title>
      <dc:creator>Ratonpeureu</dc:creator>
      <pubDate>Sun, 03 May 2026 00:25:07 +0000</pubDate>
      <link>https://dev.to/ratonpeureu/i-built-a-real-ai-video-processing-saas-from-senegal-no-gpt-wrappers-just-huggingface-opencv--1l3i</link>
      <guid>https://dev.to/ratonpeureu/i-built-a-real-ai-video-processing-saas-from-senegal-no-gpt-wrappers-just-huggingface-opencv--1l3i</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj60tf8auypualy3n0979.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj60tf8auypualy3n0979.png" alt=" " width="800" height="429"&gt;&lt;/a&gt;## The problem I was solving&lt;/p&gt;

&lt;p&gt;Every creator I know spends 3-4 hours manually cutting &lt;br&gt;
one video into clips for TikTok and Instagram.&lt;/p&gt;

&lt;p&gt;The algorithm rewards volume — not perfection.&lt;br&gt;
Post 20 clips, maybe 2 go viral.&lt;br&gt;
Post 1 perfectly edited video, maybe 0 do.&lt;/p&gt;

&lt;p&gt;So I built ClipFarmer.&lt;/p&gt;


&lt;h2&gt;
  
  
  Not a GPT wrapper — real computer vision
&lt;/h2&gt;

&lt;p&gt;This is the part I want to be clear about.&lt;/p&gt;

&lt;p&gt;Most "AI tools" people encounter — especially in &lt;br&gt;
West Africa — are scams. Someone charges you to &lt;br&gt;
access ChatGPT through a Telegram bot and calls it &lt;br&gt;
"AI formation."&lt;/p&gt;

&lt;p&gt;ClipFarmer uses actual machine learning models &lt;br&gt;
running on the processing pipeline:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Whisper (HuggingFace)&lt;/strong&gt; — automatic speech &lt;br&gt;
recognition for subtitle generation. Runs locally &lt;br&gt;
on the worker, no API call, no per-minute billing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;YOLO + OpenCV (cv2)&lt;/strong&gt; — scene detection and &lt;br&gt;
object tracking. Used to find the best cut points &lt;br&gt;
in a video — not just splitting at fixed intervals &lt;br&gt;
but finding where scenes actually change.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Detectron2&lt;/strong&gt; — instance segmentation. Powers &lt;br&gt;
background removal and masking effects directly &lt;br&gt;
on video frames.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MediaPipe&lt;/strong&gt; — pose and face landmark detection. &lt;br&gt;
Used for smart reframing — keeping the subject &lt;br&gt;
centered when converting 16:9 to 9:16 vertical &lt;br&gt;
format for TikTok.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenCV (cv2)&lt;/strong&gt; — the backbone of all frame-level &lt;br&gt;
processing. Every effect, every transition, every &lt;br&gt;
crop runs through cv2 pipelines.&lt;/p&gt;

&lt;p&gt;These aren't API calls to someone else's model.&lt;br&gt;
They run on our workers.&lt;/p&gt;


&lt;h2&gt;
  
  
  The effects and transitions pipeline
&lt;/h2&gt;

&lt;p&gt;This was the hardest part to build.&lt;/p&gt;

&lt;p&gt;Each effect is a cv2 pipeline that processes frames &lt;br&gt;
individually and reassembles them into a video. &lt;br&gt;
Things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Color grading (dark moody, vintage grain, RGB split)&lt;/li&gt;
&lt;li&gt;CRT scanline overlay&lt;/li&gt;
&lt;li&gt;Motion blur&lt;/li&gt;
&lt;li&gt;Skeleton overlay (MediaPipe pose)&lt;/li&gt;
&lt;li&gt;Background removal (Detectron2 masks)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Transitions between clips use frame blending and &lt;br&gt;
optical flow — not simple cuts or crossfades.&lt;/p&gt;

&lt;p&gt;The whole thing runs as a Celery chord:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;workflow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;chord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;spliter_clip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;s&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_path&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;workflow_tasks_parallel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;s&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;task_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Split first → then effects + subtitles + transitions &lt;br&gt;
run in parallel on the clips → reassemble.&lt;/p&gt;




&lt;h2&gt;
  
  
  The stack
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Backend:&lt;/strong&gt; FastAPI + Celery + RabbitMQ + Redis&lt;br&gt;&lt;br&gt;
&lt;strong&gt;AI/CV:&lt;/strong&gt; Whisper + YOLO + Detectron2 + MediaPipe + OpenCV&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Storage:&lt;/strong&gt; MinIO (self-hosted S3-compatible, presigned uploads)&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Frontend:&lt;/strong&gt; React + Vite + TailwindCSS&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Database:&lt;/strong&gt; PostgreSQL + SQLAlchemy async&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Deployment:&lt;/strong&gt; Docker Compose on a VPS&lt;/p&gt;

&lt;p&gt;Each AI model runs in its own conda environment &lt;br&gt;
inside the worker container — Whisper, Detectron2, &lt;br&gt;
and MediaPipe have conflicting dependencies so &lt;br&gt;
isolating them was non-negotiable.&lt;/p&gt;




&lt;h2&gt;
  
  
  The African creator angle
&lt;/h2&gt;

&lt;p&gt;In Senegal and West Africa:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mobile money (Wave, Orange Money) is how people pay&lt;/li&gt;
&lt;li&gt;Credit cards are rare&lt;/li&gt;
&lt;li&gt;Most AI tools people see are scams or inaccessible&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ClipFarmer accepts Wave and Orange Money natively. &lt;br&gt;
And it runs real models — not a chat interface &lt;br&gt;
pretending to be a video tool.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Conflicting ML dependencies are brutal.&lt;/strong&gt; &lt;br&gt;
Whisper, Detectron2, and MediaPipe cannot share &lt;br&gt;
a Python environment cleanly. The solution was &lt;br&gt;
separate conda envs and subprocess calls between &lt;br&gt;
them from the main worker.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Presigned uploads are mandatory for video.&lt;/strong&gt; &lt;br&gt;
Having the client upload directly to MinIO instead &lt;br&gt;
of streaming through FastAPI was the difference &lt;br&gt;
between a server that crashes on large files and &lt;br&gt;
one that handles them fine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;cv2 frame processing is slow without batching.&lt;/strong&gt; &lt;br&gt;
Processing frames one by one destroyed performance. &lt;br&gt;
Batching frame reads and writes cut processing &lt;br&gt;
time significantly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Docker networking will humble you.&lt;/strong&gt; &lt;br&gt;
My Celery worker couldn't reach RabbitMQ because &lt;br&gt;
the FastAPI container was missing &lt;code&gt;RABBITMQ_URL&lt;/code&gt; — &lt;br&gt;
cost me an afternoon of traceback reading.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it is now
&lt;/h2&gt;

&lt;p&gt;Live at &lt;a href="https://clipfarmer.site" rel="noopener noreferrer"&gt;clipfarmer.site&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Free credits to try it out. Mobile payment for &lt;br&gt;
West African creators.&lt;/p&gt;

&lt;p&gt;I'm curious — has anyone else built cv2 processing &lt;br&gt;
pipelines at scale? The frame batching and memory &lt;br&gt;
management on long videos is still something I'm &lt;br&gt;
optimizing.&lt;/p&gt;

&lt;p&gt;What would make you switch from manual editing?&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>productivity</category>
      <category>python</category>
    </item>
  </channel>
</rss>
