<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mohamed Heni</title>
    <description>The latest articles on DEV Community by Mohamed Heni (@henimohamed).</description>
    <link>https://dev.to/henimohamed</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3944975%2F65533033-4b94-4c8c-93db-1bc70dad858c.png</url>
      <title>DEV Community: Mohamed Heni</title>
      <link>https://dev.to/henimohamed</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/henimohamed"/>
    <language>en</language>
    <item>
      <title>How I Built a Real-Time Face Recognition Security System on a Raspberry Pi</title>
      <dc:creator>Mohamed Heni</dc:creator>
      <pubDate>Fri, 05 Jun 2026 08:38:05 +0000</pubDate>
      <link>https://dev.to/henimohamed/how-i-built-a-real-time-face-recognition-security-system-on-a-raspberry-pi-2oi2</link>
      <guid>https://dev.to/henimohamed/how-i-built-a-real-time-face-recognition-security-system-on-a-raspberry-pi-2oi2</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; I built a face-based access control system that runs completely offline on a Raspberry Pi. It uses cosine similarity between face vectors to grant or deny folder access. The whole thing is under 300 lines of Python.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Physical folder-level security is usually done with passwords or encryption. Both have a weakness: anyone with the password gets in. I wanted something that physically identifies the user — and runs on cheap hardware with no cloud dependency.&lt;/p&gt;

&lt;p&gt;So I set up a Raspberry Pi with a camera module and built a facial authentication loop.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;The flow is straightforward:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Registration phase:&lt;/strong&gt; The camera captures the authorized user's face. The system converts it to grayscale, extracts facial landmarks using &lt;code&gt;face_recognition&lt;/code&gt;, and stores the resulting 128-dimensional face vector locally.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Verification phase:&lt;/strong&gt; When someone opens the protected folder, the camera activates, captures the current face, extracts its vector, and compares it to the stored vector using cosine similarity.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Decision:&lt;/strong&gt; If similarity exceeds a configurable threshold — access granted. If not — the folder closes immediately.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Tech
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Hardware: Raspberry Pi 3B+ + USB Camera
Software: Python 3, OpenCV, face_recognition (dlib), NumPy
Metric: Cosine similarity via scikit-learn
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key decision was using cosine similarity over Euclidean distance. Face vectors are direction-sensitive — two images of the same person under different lighting produce vectors pointing in roughly the same direction but with different magnitudes. Cosine similarity ignores magnitude and compares direction, which makes it much more robust to lighting changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Edge Case That Took the Longest to Solve
&lt;/h2&gt;

&lt;p&gt;The hardest bug wasn't the ML — it was the camera initialization race condition.&lt;/p&gt;

&lt;p&gt;When the folder-trigger script launches the camera, there's a brief window where the first frame is black (auto-exposure hasn't settled). If the system captures that black frame as the test face, the feature extraction returns an all-zero vector — and cosine similarity between zero vectors is undefined (division by zero).&lt;/p&gt;

&lt;p&gt;Fix: I added a 3-frame warmup that discards the first captures and only processes once pixel variance stabilizes above a threshold. Simple, effective, and took way too long to figure out.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;warmup_camera&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cap&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;warmup_frames&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Discard initial dark frames until exposure stabilizes.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;warmup_frames&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;ret&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;frame&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;ret&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cap&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;A Raspberry Pi costs $35. A camera module costs $15. For $50, you get biometric folder security that works entirely offline — no cloud API calls, no subscription fees, no internet required.&lt;/p&gt;

&lt;p&gt;This approach — edge AI with cheap hardware — is massively underused. Most "AI security" products are cloud-dependent, which means latency, privacy risks, and recurring costs. Running the inference locally on a $35 computer changes the economics.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Face recognition on a Raspberry Pi is entirely feasible with Python + OpenCV + dlib&lt;/li&gt;
&lt;li&gt;Cosine similarity handles lighting variation better than Euclidean distance for face vectors&lt;/li&gt;
&lt;li&gt;Camera initialization race conditions are a real edge case — always warm up&lt;/li&gt;
&lt;li&gt;Edge AI deployment doesn't need expensive hardware; it needs the right tradeoffs&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;I'm exploring two improvements:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Adding liveness detection (blink detection) to prevent photo-based spoofing&lt;/li&gt;
&lt;li&gt;Converting the model to TensorFlow Lite for faster inference on the Pi&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The full source is on GitHub: &lt;a href="https://github.com/HENI-MOHAMED/FaceRegnition" rel="noopener noreferrer"&gt;github.com/HENI-MOHAMED/FaceRegnition&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Have you run face recognition on edge hardware? What issues did you run into?&lt;/p&gt;

</description>
      <category>python</category>
      <category>raspberrypi</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Building an AI-Powered Short Video Automation Platform with Flask, React Native, and Playwright</title>
      <dc:creator>Mohamed Heni</dc:creator>
      <pubDate>Mon, 01 Jun 2026 08:30:57 +0000</pubDate>
      <link>https://dev.to/henimohamed/building-an-ai-powered-short-video-automation-platform-with-flask-react-native-and-playwright-3cai</link>
      <guid>https://dev.to/henimohamed/building-an-ai-powered-short-video-automation-platform-with-flask-react-native-and-playwright-3cai</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; I built TikTonik — an AI platform that detects trending content, generates scripts, renders videos with FFMPEG, and publishes them to TikTok — all without human intervention. Here's the architecture and what I learned building it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Short-form video is the most effective content format right now, but producing it at scale is brutally manual:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Find a trend → Write a script → Record voiceover → Find B-roll → Edit → Render → Upload&lt;/li&gt;
&lt;li&gt;Each video takes 1-3 hours for a decent output&lt;/li&gt;
&lt;li&gt;Most creators burn out after maintaining this cadence for a few weeks&lt;/li&gt;
&lt;li&gt;"AI video tools" are mostly wrappers that still require manual steps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I wanted to build a system that could go from "trend detected" to "video published" with zero human touching the pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;TikTonik is organized as a multi-application workspace with four major components:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;└─ LibTikTonk V2.2 (Backend) ── Flask API + Queue System + ThreadPoolExecutor
    ├─ AI Trend Pipeline  (LLM + trend analysis)
    ├─ FFMPEG/MoviePy   (video rendering)
    └─ Playwright Uploader (TikTok auth + post)

└─ frontend-window (Tauri Desktop)  ── Vite + React + Rust
└─ FrontendMobile (React Native/Expo)  ── iOS + Android
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1. Backend (LibTikTonkV2.2)
&lt;/h3&gt;

&lt;p&gt;The core is a &lt;strong&gt;Flask application&lt;/strong&gt; with a threaded task queue. When you submit a job:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Trend Detection&lt;/strong&gt; — LLM scans trends and generates video concepts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Script Generation&lt;/strong&gt; — AI writes a short-form script&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice Synthesis&lt;/strong&gt; — Kokoro TTS converts script to voiceover&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Video Rendering&lt;/strong&gt; — FFMPEG + MoviePy composites the final video&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Upload&lt;/strong&gt; — Playwright automates TikTok upload with session cookies
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Worker thread architecture
&lt;/span&gt;&lt;span class="n"&gt;worker_thread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;threading&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;queue_worker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;daemon&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;worker_thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# ThreadPoolExecutor for parallel video processing
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;create_video&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;process_create_video_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;upload&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;process_upload_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Desktop Dashboard (Tauri + React)
&lt;/h3&gt;

&lt;p&gt;Built with &lt;strong&gt;Vite + React + Tauri&lt;/strong&gt;. Tauri wraps the web app as a native desktop app using Rust. Operators can view the job queue, configure AI limits, and monitor upload history — all from a native window with system tray integration.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Mobile Clients (React Native / Expo)
&lt;/h3&gt;

&lt;p&gt;Two mobile apps using &lt;strong&gt;React Native (Expo)&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;TikTonk&lt;/strong&gt; — Full client for viewing scheduled jobs, monitoring queue state, uploading from mobile&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TickTonik 2.0&lt;/strong&gt; — Lightweight monitoring for quick status checks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both connect via REST API to the Flask backend with Appwrite cloud sync.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hard Parts
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Session-based TikTok Upload
&lt;/h3&gt;

&lt;p&gt;TikTok's API is restricted, so the uploader uses &lt;strong&gt;Playwright browser automation&lt;/strong&gt; with cached session cookies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;playwright&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;headless&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;storage_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CookiesDir/tiktok_session.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_page&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.tiktok.com/upload/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The challenge: TikTok changes its DOM structure frequently. The upload selector that worked last week breaks this week. I solved this by maintaining a cookie cache that preserves sessions between uploads, minimizing re-login frequency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Queue-Based Parallel Processing
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;task_queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;task_status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="n"&gt;task_lock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;threading&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;task_lock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;task_status&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;processing&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;submit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;process_task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The lock prevents status update races. ThreadPoolExecutor handles parallel execution with configurable MAX_WORKERS.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cross-Platform Startup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;python_cmd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;platform&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;linux&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;executable&lt;/span&gt;
&lt;span class="n"&gt;backend&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Popen&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;python_cmd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;main.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;cwd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;backend_dir&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;frontend&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Popen&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dev&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;cwd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;frontend_dir&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Graceful shutdown with signal handlers ensures both processes clean up properly.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Redis queue&lt;/strong&gt; instead of in-memory for persistence across restarts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-tenancy&lt;/strong&gt; — currently single-tenant, needs user isolation for SaaS&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better error recovery&lt;/strong&gt; — cleanup works but task states could be more descriptive&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Direct API upload&lt;/strong&gt; — TikTok Business API would be more reliable than Playwright&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Flask + ThreadPoolExecutor is capable&lt;/strong&gt; for queue-based workloads. You don't always need Celery.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Playwright + session cookies works&lt;/strong&gt; for platforms without APIs, but plan for DOM changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Four-platform deployment&lt;/strong&gt; (Flask API + Tauri Desktop + React Native iOS + Android) from one codebase is viable with shared API contracts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Video rendering is the bottleneck&lt;/strong&gt; — FFMPEG is CPU-bound. GPU acceleration would be a major upgrade&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Backend:&lt;/strong&gt; Python, Flask, Flask-Limiter, ThreadPoolExecutor&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Desktop:&lt;/strong&gt; Vite, React, Tauri (Rust)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mobile:&lt;/strong&gt; React Native (Expo)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI:&lt;/strong&gt; LLM script generation, Kokoro TTS, trend analysis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automation:&lt;/strong&gt; Playwright&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rendering:&lt;/strong&gt; FFMPEG, MoviePy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sync:&lt;/strong&gt; Appwrite&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment:&lt;/strong&gt; Docker&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Comments / Discussion
&lt;/h2&gt;

&lt;p&gt;Have you built an automated content pipeline? What's your approach to handling platform-specific upload APIs versus browser automation? I'd love to hear how others solve the "last mile" problem.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>python</category>
      <category>showdev</category>
    </item>
    <item>
      <title>I Built a Desktop AI Assistant That Controls Your Computer — Here's How</title>
      <dc:creator>Mohamed Heni</dc:creator>
      <pubDate>Mon, 25 May 2026 09:30:57 +0000</pubDate>
      <link>https://dev.to/henimohamed/i-built-a-desktop-ai-assistant-that-controls-your-computer-heres-how-58l1</link>
      <guid>https://dev.to/henimohamed/i-built-a-desktop-ai-assistant-that-controls-your-computer-heres-how-58l1</guid>
      <description>&lt;p&gt;TL;DR: I built Yaldabaoth — a desktop AI assistant that doesn't just answer questions. It reads your screen, runs PowerShell commands, clicks buttons, types text, and automates your entire workflow. No cloud dependency. No API calls for automation. Just Python, Rust, and raw OS control.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;In 2025, "AI assistants" meant one of two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Chatbots&lt;/strong&gt; — A text box where you type and an LLM talks back&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API wrappers&lt;/strong&gt; — Tools that chain API calls together but have zero ability to touch the actual operating system&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Neither of these actually helps you &lt;em&gt;do&lt;/em&gt; things on your computer. Want to open an application, take a screenshot, parse a PDF, run a PowerShell script, and compile a report? Good luck chaining that through a chat interface.&lt;/p&gt;

&lt;p&gt;I wanted something different. An assistant that sits on your desktop, sees what you see, and acts on your behalf — like having an engineer sitting next to you who can operate any part of the system.&lt;/p&gt;

&lt;p&gt;So I built Yaldabaoth.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;Yaldabaoth is a &lt;strong&gt;four-layer system&lt;/strong&gt;:&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: The Shell — Tauri + React
&lt;/h3&gt;

&lt;p&gt;Instead of Electron (which would have added 150+ MB to the binary), I used &lt;strong&gt;Tauri&lt;/strong&gt; — a Rust-based framework that wraps a webview frontend in a native shell. The UI is React with a glassmorphism design.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Tauri:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Binary size: ~5 MB vs Electron's ~150 MB&lt;/li&gt;
&lt;li&gt;Native performance for intensive operations&lt;/li&gt;
&lt;li&gt;Direct Rust system access when needed&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Layer 2: The Orchestrator — Python Backend
&lt;/h3&gt;

&lt;p&gt;The Rust shell communicates with a Python backend that handles all the heavy lifting. Communication is via stdin/stdout JSON protocol — lightweight, no HTTP server needed.&lt;/p&gt;

&lt;p&gt;The orchestrator manages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Command routing (voice/text → appropriate handler)&lt;/li&gt;
&lt;li&gt;State persistence between commands&lt;/li&gt;
&lt;li&gt;Multi-step task chaining&lt;/li&gt;
&lt;li&gt;Personality profile switching (Professional vs. creative modes)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Layer 3: The Automation Engine — Win32 API + PowerShell
&lt;/h3&gt;

&lt;p&gt;This is where the magic happens. The Python backend has direct access to the Windows OS through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;pywinauto&lt;/strong&gt; — Native Win32 API control for clicking, typing, window management&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PowerShell subprocess&lt;/strong&gt; — OS-level commands (service control, registry edits, file operations)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WMI&lt;/strong&gt; — System information queries (processes, hardware, network)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Layer 4: The Perception System — OCR + Screen Parsing
&lt;/h3&gt;

&lt;p&gt;Screen parsing runs in a separate thread to keep the UI responsive. It uses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OCR-based text extraction&lt;/strong&gt; — Screenshots → text → action decisions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-threaded processing&lt;/strong&gt; — One thread for screen capture + OCR, another for command execution, a third for UI responsiveness&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chained automation&lt;/strong&gt; — Click → wait for UI update → re-scan screen → next action&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Hard Parts
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Threading Nightmares
&lt;/h3&gt;

&lt;p&gt;Screen parsing is slow. OCR a screenshot, parse the text, decide what to do — you're looking at 500ms to 2 seconds per cycle. If you do this on the main thread, your entire app freezes.&lt;/p&gt;

&lt;p&gt;The solution was a &lt;strong&gt;producer-consumer architecture&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Thread 1&lt;/strong&gt;: Screen capture → OCR → queue the parsed text&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thread 2&lt;/strong&gt;: Command executor — reads from queue, takes action&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thread 3&lt;/strong&gt;: Main UI thread — stays responsive
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────┐    ┌──────────┐    ┌──────────┐
│ Screen  │───&amp;gt;│ Queue     │───&amp;gt;│ Command  │
│ Capture │    │ (JSON)   │    │ Executor │
└─────────┘    └──────────┘    └──────────┘
      │                              │
      v                              v
 ┌─────────┐                   ┌──────────┐
 │ OCR     │                   │ Win32 API│
 │ Parser  │                   │/PowerShell│
 └─────────┘                   └──────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Python's &lt;code&gt;threading.Queue&lt;/code&gt; with &lt;code&gt;daemon=True&lt;/code&gt; threads was sufficient — no need for multiprocessing or async for this use case.&lt;/p&gt;

&lt;h3&gt;
  
  
  Voice-First vs. Text-First UX
&lt;/h3&gt;

&lt;p&gt;I wanted a &lt;strong&gt;Push-to-Talk&lt;/strong&gt; interface (F10 key) so you could speak commands naturally. But speech recognition introduces latency and errors. The compromise:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Voice input preferred for simple commands ("Open Chrome", "Check CPU usage")&lt;/li&gt;
&lt;li&gt;Text fallback for complex multi-step sequences&lt;/li&gt;
&lt;li&gt;The orchestrator normalizes both into the same command pipeline&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Rust ↔ Python Bridge
&lt;/h3&gt;

&lt;p&gt;Tauri apps expect Rust backends. Yaldabaoth needs Python. Bridging them without adding a web server was tricky.&lt;/p&gt;

&lt;p&gt;The solution: &lt;strong&gt;stdin/stdout JSON-RPC&lt;/strong&gt;. The Rust shell spawns the Python process and communicates via JSON messages on stdin/stdout. No sockets, no HTTP, no dependency on a running server. The Python process lives as long as the app is open.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Rust side — minimal example&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;python&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"python"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;.arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"backend/main.py"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;.stdin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;Stdio&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;piped&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="nf"&gt;.stdout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;Stdio&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;piped&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="nf"&gt;.spawn&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Send command&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;r#"{"action": "click", "target": "Chrome"}"#&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;python&lt;/span&gt;&lt;span class="py"&gt;.stdin&lt;/span&gt;&lt;span class="nf"&gt;.as_ref&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.unwrap&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.write_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="nf"&gt;.as_bytes&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Read response&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;String&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;python&lt;/span&gt;&lt;span class="py"&gt;.stdout&lt;/span&gt;&lt;span class="nf"&gt;.as_ref&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.unwrap&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.read_to_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Desktop automation is harder than cloud automation.&lt;/strong&gt; Cloud APIs are designed to be called programmatically. Desktop UIs are designed for humans. Parsing a rendered UI and making decisions from it is fundamentally different from calling an API endpoint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Threading early, threading often.&lt;/strong&gt; I rebuilt the threading model three times. The first version was single-threaded and froze constantly. The second over-engineered with multiprocessing. The third — simple Queue-based threading — was just right.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Computer use was already possible in 2025.&lt;/strong&gt; Before "computer use" became a buzzword in 2026, a Python script + OCR + Win32 API was all you needed. The novelty isn't the technology — it's wiring it together with a voice-first, responsive UI that feels like an assistant, not a script.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Tech Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Shell&lt;/td&gt;
&lt;td&gt;Tauri (Rust + WebView2)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;React&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backend&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Automation&lt;/td&gt;
&lt;td&gt;pywinauto, WMI, PowerShell&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Voice&lt;/td&gt;
&lt;td&gt;Push-to-Talk (F10)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Screen parsing&lt;/td&gt;
&lt;td&gt;OCR + multi-threaded pipeline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Binary size&lt;/td&gt;
&lt;td&gt;~5 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cross-platform support&lt;/strong&gt; — Currently Windows-only due to Win32 API dependency. Linux adaptation via X11/Wayland is on the roadmap.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better screen parsing&lt;/strong&gt; — Using vision models directly instead of OCR for richer UI understanding.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plugin system&lt;/strong&gt; — Let users write custom automation modules.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Repo
&lt;/h2&gt;

&lt;p&gt;The full source code is on GitHub: &lt;a href="https://github.com/HENI-MOHAMED/Yaldabaoth" rel="noopener noreferrer"&gt;github.com/HENI-MOHAMED/Yaldabaoth&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with Tauri, React, Python, Rust, and more coffee than I'd like to admit.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>opensource</category>
      <category>tauri</category>
    </item>
    <item>
      <title>Building a Multi-Agent Tax Audit System with LangGraph and Odoo</title>
      <dc:creator>Mohamed Heni</dc:creator>
      <pubDate>Fri, 22 May 2026 00:34:31 +0000</pubDate>
      <link>https://dev.to/henimohamed/building-a-multi-agent-tax-audit-system-with-langgraph-and-odoo-2a06</link>
      <guid>https://dev.to/henimohamed/building-a-multi-agent-tax-audit-system-with-langgraph-and-odoo-2a06</guid>
      <description>&lt;p&gt;TL;DR: I built a multi-agent system that audits invoices, detects fiscal inconsistencies, and generates compliance reports — integrated with Odoo ERP and standalone databases.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Tax auditing for small and medium businesses in Tunisia is manual, error-prone, and slow. Most accounting firms rely on Excel and manual checks. Regulations change frequently, and keeping up is a full-time job.&lt;/p&gt;

&lt;p&gt;I wanted to build something that could ingest financial data, run audit rules automatically, and produce compliance reports — without needing a team of accountants.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;The system uses a LangGraph-based multi-agent architecture:&lt;/p&gt;

&lt;h3&gt;
  
  
  Agent 1: Data Ingestion Agent
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Connects to Odoo ERP or standalone SQL databases&lt;/li&gt;
&lt;li&gt;Extracts invoices, ledgers, and financial statements&lt;/li&gt;
&lt;li&gt;Normalizes data into a unified schema&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Agent 2: Audit Agent
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Applies tax rules against the data&lt;/li&gt;
&lt;li&gt;Detects anomalies: missing invoices, misclassified expenses, VAT discrepancies&lt;/li&gt;
&lt;li&gt;Flags items for human review&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Agent 3: Reporting Agent
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Generates compliance reports&lt;/li&gt;
&lt;li&gt;Produces a summary of findings with risk levels&lt;/li&gt;
&lt;li&gt;Suggests corrective actions&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Orchestrator
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;LangGraph manages the flow between agents&lt;/li&gt;
&lt;li&gt;Handles state, retries, and error recovery&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Technical Challenges
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Schema Mismatch:&lt;/strong&gt; Every Odoo instance is customized differently. The ingestion agent had to handle dynamic schemas — detecting table structures at runtime and mapping them to a canonical audit model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-Agent Coordination:&lt;/strong&gt; Getting three agents to work together without stepping on each other's state was the hardest part. LangGraph's checkpointing was essential here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Regional Tax Rules:&lt;/strong&gt; Tunisian tax law isn't well-documented in English. Building the rules engine meant working directly with Arabic and French regulatory texts.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Real-time invoice validation&lt;/li&gt;
&lt;li&gt;Multi-company support&lt;/li&gt;
&lt;li&gt;A dashboard for non-technical accountants&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The repo is at github.com/HENI-MOHAMED/Audit-Agent.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with Python, FastAPI, LangGraph, Odoo, and a lot of coffee.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>langgraph</category>
      <category>odoo</category>
    </item>
  </channel>
</rss>
