<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: zkaria gamal</title>
    <description>The latest articles on DEV Community by zkaria gamal (@zkaria_gamal_3cddbbff21c8).</description>
    <link>https://dev.to/zkaria_gamal_3cddbbff21c8</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3769631%2F3d68bd01-7a2c-4665-9e8b-5c879b3811e5.jpg</url>
      <title>DEV Community: zkaria gamal</title>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/zkaria_gamal_3cddbbff21c8"/>
    <language>en</language>
    <item>
      <title>The auth problem nobody talks about when running AI microservices locally</title>
      <dc:creator>zkaria gamal</dc:creator>
      <pubDate>Thu, 18 Jun 2026 17:24:23 +0000</pubDate>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8/the-auth-problem-nobody-talks-about-when-running-ai-microservices-locally-2g7j</link>
      <guid>https://dev.to/zkaria_gamal_3cddbbff21c8/the-auth-problem-nobody-talks-about-when-running-ai-microservices-locally-2g7j</guid>
      <description>&lt;p&gt;Most voice AI tutorials assume you have an API key and a cloud endpoint. Mine had to run on the machine in front of me — 4GB GPU, no cloud, no managed auth layer.&lt;/p&gt;

&lt;p&gt;That constraint forced me to solve a problem I hadn't seen written about anywhere: how do you authenticate requests between two local Python processes without a database, a shared secret sitting in your repo, or a session manager?&lt;/p&gt;

&lt;p&gt;This is the story of what I built and the auth protocol I had to design from scratch.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;AI-RTC-Agent is a fully local real-time voice agent. The architecture has four isolated layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;React client&lt;/strong&gt; — captures mic audio, streams it over WebRTC using native &lt;code&gt;RTCPeerConnection&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python WebRTC server&lt;/strong&gt; — receives 48kHz PCM frames, runs VAD, segments utterances&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FastMCP server&lt;/strong&gt; — runs Whisper small for STT, plus email, calendar, and search tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent layer&lt;/strong&gt; — LLM intent routing with adapters for OpenAI, Gemini, and local Ollama&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The data flow looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Browser mic → WebRTC (48kHz PCM) → VAD segmentation → FastMCP (Whisper STT) → transcript back over WebRTC DataChannel
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No HTTP round-trip on the return path. The transcript is pushed directly over a WebRTC DataChannel, which keeps latency tight.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Audio Pipeline
&lt;/h2&gt;

&lt;p&gt;Before we get to auth, the VAD pipeline is worth explaining because it drives the whole segmentation design.&lt;/p&gt;

&lt;p&gt;The browser streams 48kHz mono PCM. The server runs &lt;code&gt;webrtcvad&lt;/code&gt; which requires 16kHz — so every incoming frame gets decimated in lockstep. But here's the thing: you can't feed 16kHz audio to Whisper and expect good results. So the system maintains two separate buffers from the same stream:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;16kHz buffer&lt;/strong&gt; evaluated by &lt;code&gt;webrtcvad&lt;/code&gt; on a 300ms sliding window at aggressiveness 3&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;raw 48kHz buffer&lt;/strong&gt; that accumulates the actual speech frames for Whisper&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When the VAD detects 2 consecutive seconds of silence, the utterance is considered complete. The raw 48kHz buffer gets wrapped with a WAV header, encoded to base64, and sent to the FastMCP server for transcription.&lt;/p&gt;

&lt;p&gt;Whisper is preloaded as a module-level singleton at server boot via a &lt;code&gt;LoadModelService&lt;/code&gt; — so there's no cold-start penalty on the first utterance.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Auth Problem
&lt;/h2&gt;

&lt;p&gt;Here's where it gets interesting.&lt;/p&gt;

&lt;p&gt;The WebRTC server and the FastMCP server are two separate processes communicating over localhost HTTP. In production you'd put a reverse proxy in front, use mTLS, or drop a secret into a secrets manager. But this is a local developer workspace — no infrastructure, no ops, no database.&lt;/p&gt;

&lt;p&gt;The naive solution is a static API key in &lt;code&gt;.env&lt;/code&gt;. The problem: static keys sit in config files, get committed to repos, and never rotate. Even locally, it's a bad habit to build into an open-source blueprint.&lt;/p&gt;

&lt;p&gt;I needed something that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Required &lt;strong&gt;zero database&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Left &lt;strong&gt;zero static credentials&lt;/strong&gt; in the source files&lt;/li&gt;
&lt;li&gt;Was &lt;strong&gt;stateless&lt;/strong&gt; — no session syncing between processes&lt;/li&gt;
&lt;li&gt;Was &lt;strong&gt;time-limited&lt;/strong&gt; — a captured key shouldn't be reusable&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Solution: Deterministic Timestamp-Based Auth
&lt;/h2&gt;

&lt;p&gt;Both processes independently run the same algorithm:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;api_key_generator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expire_time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;expire_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;expire_time&lt;/span&gt;  &lt;span class="c1"&gt;# 5-second sliding epoch window
&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;now_utc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tz&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;ZoneInfo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UTC&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;now_utc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;expire_time&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_api_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_value&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;suffix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_suffix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;prefix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_prefix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;suffix&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prefix&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The prefix and suffix are derived from deterministic functions over the timestamp — &lt;code&gt;math.sqrt(math.log10(timestamp))&lt;/code&gt; — so both sides can independently compute the expected key for any given 5-second window.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How validation works:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;WebRTC server generates a key, appends it as &lt;code&gt;X-API-Key&lt;/code&gt; header, sends the audio payload&lt;/li&gt;
&lt;li&gt;FastMCP middleware intercepts the request, extracts the header, parses the embedded timestamp&lt;/li&gt;
&lt;li&gt;Middleware independently generates the expected key for that timestamp window&lt;/li&gt;
&lt;li&gt;If the strings match and the timestamp is within one grace interval (5 seconds), the request is authenticated&lt;/li&gt;
&lt;li&gt;If the key is replayed outside the window — rejected&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;What this gives you:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No database, no credential storage&lt;/li&gt;
&lt;li&gt;Keys expire automatically every 5 seconds&lt;/li&gt;
&lt;li&gt;A captured key is useless after the window closes&lt;/li&gt;
&lt;li&gt;Both processes stay fully stateless&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The MCP Layer
&lt;/h2&gt;

&lt;p&gt;The FastMCP server is the heavy-lifting microservice. Beyond Whisper STT it exposes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Mail tools&lt;/strong&gt; — SMTP send/reply with thread headers (&lt;code&gt;In-Reply-To&lt;/code&gt;, &lt;code&gt;References&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Calendar tools&lt;/strong&gt; — Google Calendar API with &lt;code&gt;.ics&lt;/code&gt; fallback if OAuth isn't configured&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Search tools&lt;/strong&gt; — DuckDuckGo with a token-bucket rate limiter (1.0 req/sec)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every tool response goes through a unified response parser: &lt;code&gt;ok(data, message)&lt;/code&gt;, &lt;code&gt;err(message, code)&lt;/code&gt;, &lt;code&gt;paginated(items, total)&lt;/code&gt; — consistent shape across the whole layer, which makes testing clean.&lt;/p&gt;




&lt;h2&gt;
  
  
  Testing
&lt;/h2&gt;

&lt;p&gt;Both the VAD server and the MCP tool layer have full pytest suites:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Test the FastMCP tools&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;mcp &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; pytest tests/ &lt;span class="nt"&gt;-v&lt;/span&gt;

&lt;span class="c"&gt;# Test the WebRTC VAD server&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;server &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; pytest tests/ &lt;span class="nt"&gt;-v&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The MCP tests cover transcription accuracy, SMTP reply threading, calendar parsing, and rate limiter behavior.&lt;/p&gt;




&lt;h2&gt;
  
  
  Running It Locally
&lt;/h2&gt;

&lt;p&gt;Requirements: Python 3.10+, Node.js 18+, ffmpeg, and a 4GB GPU (or CPU, slower).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Install Python deps&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="c"&gt;# 2. Start FastMCP server&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;mcp &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; python main.py        &lt;span class="c"&gt;# localhost:8005&lt;/span&gt;

&lt;span class="c"&gt;# 3. Start WebRTC backend&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;server &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; python main.py     &lt;span class="c"&gt;# localhost:8080&lt;/span&gt;

&lt;span class="c"&gt;# 4. Start React client&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;client &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm run dev   &lt;span class="c"&gt;# localhost:5173&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Speak into the mic, pause for 2 seconds, and the transcript appears in the dashboard.&lt;/p&gt;




&lt;h2&gt;
  
  
  Repo
&lt;/h2&gt;

&lt;p&gt;github.com/zkzkGamal/AI-RTC-Agent&lt;/p&gt;

&lt;p&gt;MIT licensed. Issues, PRs, and questions on the auth protocol or VAD pipeline are all welcome — especially curious if anyone has seen a cleaner approach to the zero-database local auth problem.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Building a real-time voice AI assistant with WebRTC and LangGraph</title>
      <dc:creator>zkaria gamal</dc:creator>
      <pubDate>Sun, 14 Jun 2026 11:44:19 +0000</pubDate>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8/building-a-real-time-voice-ai-assistant-with-webrtc-and-langgraph-225d</link>
      <guid>https://dev.to/zkaria_gamal_3cddbbff21c8/building-a-real-time-voice-ai-assistant-with-webrtc-and-langgraph-225d</guid>
      <description>&lt;p&gt;I recently finished building AI-RTC-Agent, an open-source real-time voice assistant workspace. It handles low-latency audio streaming, voice activity segmentation, and executes local tools (like search, email, and calendar) while maintaining a steady voice stream.&lt;/p&gt;

&lt;p&gt;Here is the GitHub repository if you want to check out the code or run it locally:&lt;br&gt;
&lt;a href="https://github.com/zkzkGamal/AI-RTC-Agent" rel="noopener noreferrer"&gt;https://github.com/zkzkGamal/AI-RTC-Agent&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Architecture
&lt;/h3&gt;

&lt;p&gt;The project is split into four decoupled services to keep CPU-heavy tasks from blocking the audio processing loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;React Client: A Vite frontend that manages the microphone with the browser's RTCPeerConnection API and handles half-duplex turn control to prevent audio feedback.&lt;/li&gt;
&lt;li&gt;WebRTC Audio Processor: An asynchronous Python backend using aiortc and webrtcvad. It downsamples 48kHz audio to 16kHz for voice activity detection and segments user speech.&lt;/li&gt;
&lt;li&gt;FastAPI Orchestrator: Powered by LangGraph to manage intent routing and conversation state.&lt;/li&gt;
&lt;li&gt;FastMCP Server: Runs a warm-booted Whisper model locally for speech-to-text (STT) and exposes search and Google API tools.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Decoupling the WebRTC connection from the transcription and tool execution was critical. If the thread running the audio ingestion gets blocked by a transcription job or a web search, the audio stream drops frames. Offloading these to the FastMCP instance solves this.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dynamic Model Switching
&lt;/h3&gt;

&lt;p&gt;You can configure the system to use different LLMs and STT models by modifying the .env file. The orchestrator supports swapping the main language model between Ollama (for running local models like Qwen), OpenAI, or Google Gemini. &lt;/p&gt;

&lt;h3&gt;
  
  
  Service-to-Service Security
&lt;/h3&gt;

&lt;p&gt;To secure communication between local microservices without the overhead of a database, I implemented a custom dynamic cryptographic authentication middleware. The client and servers calculate a time-locked token based on a Unix epoch sliding window of 5 seconds. The receiving service verifies the signature against synchronized system clocks, keeping the auth stateless.&lt;/p&gt;

&lt;h3&gt;
  
  
  UI Feedback
&lt;/h3&gt;

&lt;p&gt;To keep the UX responsive while tools are running, the FastAPI agent broadcasts Socket.IO events (like tool_start and tool_finished). The React frontend immediately displays indicators showing what the agent is doing (such as calling the DuckDuckGo search tool) before streaming the voice response back.&lt;/p&gt;

&lt;p&gt;Feel free to check out the setup instructions and run start.sh to test it out. I would love to get your feedback on the architecture.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zkzkGamal/AI-RTC-Agent" rel="noopener noreferrer"&gt;https://github.com/zkzkGamal/AI-RTC-Agent&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webrtc</category>
      <category>python</category>
      <category>mcp</category>
    </item>
    <item>
      <title>How to Convert Binary Segmentation Masks to YOLO Bounding Boxes (Python &amp; OpenCV)</title>
      <dc:creator>zkaria gamal</dc:creator>
      <pubDate>Thu, 11 Jun 2026 09:35:38 +0000</pubDate>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8/how-to-convert-binary-segmentation-masks-to-yolo-bounding-boxes-python-opencv-34f9</link>
      <guid>https://dev.to/zkaria_gamal_3cddbbff21c8/how-to-convert-binary-segmentation-masks-to-yolo-bounding-boxes-python-opencv-34f9</guid>
      <description>&lt;p&gt;Have you ever found a perfect dataset for your object detection project, only to realize the ground truth is in the form of &lt;strong&gt;binary segmentation masks&lt;/strong&gt; (black-and-white images) instead of &lt;strong&gt;YOLO bounding boxes&lt;/strong&gt; (&lt;code&gt;.txt&lt;/code&gt; files)? &lt;/p&gt;

&lt;p&gt;If you are training a YOLO model (v5, v8, v10, v11, etc.), you need coordinates in the format:&lt;br&gt;
&lt;code&gt;&amp;lt;class_id&amp;gt; &amp;lt;x_center&amp;gt; &amp;lt;y_center&amp;gt; &amp;lt;width&amp;gt; &amp;lt;height&amp;gt;&lt;/code&gt; (normalized between 0 and 1).&lt;/p&gt;

&lt;p&gt;Converting pixel-level masks into YOLO coordinates manually is a nightmare. Luckily, you can automate this using &lt;strong&gt;OpenCV&lt;/strong&gt; and &lt;strong&gt;Python&lt;/strong&gt; in just a few lines of code.&lt;/p&gt;

&lt;p&gt;In this tutorial, we will walk through the exact steps and math required to build a robust conversion pipeline.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Core Concept: Contours to Bounding Boxes
&lt;/h2&gt;

&lt;p&gt;To convert a binary mask into a bounding box, we need to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Find the boundaries of the white pixels (foreground object).&lt;/li&gt;
&lt;li&gt;Compute the minimum enclosing rectangle around those boundaries.&lt;/li&gt;
&lt;li&gt;Normalize the pixel coordinates into YOLO's standard format.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here is what the visual flow looks like:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg1aeaxjzf0dzb4iduv7a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg1aeaxjzf0dzb4iduv7a.png" alt="Mask to Bounding Box Visual" width="640" height="640"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;(Derived from the ISIC skin lesion dataset: left is the mask region, right is the calculated bounding box).&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 1: Extract the Contours using OpenCV
&lt;/h2&gt;

&lt;p&gt;OpenCV provides a powerful function called &lt;code&gt;cv2.findContours&lt;/code&gt; that detects boundaries of binary shapes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="c1"&gt;# Load mask in grayscale
&lt;/span&gt;&lt;span class="n"&gt;mask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;imread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;path_to_mask.png&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IMREAD_GRAYSCALE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Find all external boundaries
&lt;/span&gt;&lt;span class="n"&gt;contours&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findContours&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RETR_EXTERNAL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CHAIN_APPROX_SIMPLE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;contours&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No objects found in the mask!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Select the largest contour (assuming a single primary object)
&lt;/span&gt;    &lt;span class="n"&gt;largest_contour&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;contours&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;contourArea&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 2: Calculate Bounding Box Coordinates
&lt;/h2&gt;

&lt;p&gt;Once we have the contour points, we can compute two types of bounding boxes:&lt;/p&gt;

&lt;h3&gt;
  
  
  Option A: Standard Axis-Aligned Bounding Box
&lt;/h3&gt;

&lt;p&gt;This is the standard box used by most YOLO models.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# x, y are top-left coordinates; w, h are width and height in pixels
&lt;/span&gt;&lt;span class="n"&gt;x_min&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_min&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;w_pixel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;h_pixel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;boundingRect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;largest_contour&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Calculate center coordinates in pixel space
&lt;/span&gt;&lt;span class="n"&gt;x_center&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x_min&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w_pixel&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y_center&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_min&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h_pixel&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Option B: Rotated Bounding Box (Minimum Area)
&lt;/h3&gt;

&lt;p&gt;If your object is tilted or elongated and you want to use oriented/rotated bounding boxes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Returns ((x_center, y_center), (width, height), angle_of_rotation)
&lt;/span&gt;&lt;span class="n"&gt;rect&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;minAreaRect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;largest_contour&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;box_points&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;boxPoints&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rect&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Coordinates of the 4 corners
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3: Normalize Coordinates to YOLO Format
&lt;/h2&gt;

&lt;p&gt;YOLO labels must be normalized relative to the overall image dimensions so that they scale correctly regardless of image resolution.&lt;/p&gt;

&lt;p&gt;$$x_{norm} = \frac{x_{center}}{img_width}, \quad y_{norm} = \frac{y_{center}}{img_height}$$&lt;br&gt;
$$w_{norm} = \frac{w_{pixel}}{img_width}, \quad h_{norm} = \frac{h_{pixel}}{img_height}$$&lt;/p&gt;

&lt;p&gt;Here is the helper function to calculate and save this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;normalize_to_yolo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_center&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_center&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;w_pixel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;h_pixel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;img_w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;img_h&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;x_norm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x_center&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;img_w&lt;/span&gt;
    &lt;span class="n"&gt;y_norm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_center&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;img_h&lt;/span&gt;
    &lt;span class="n"&gt;w_norm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;w_pixel&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;img_w&lt;/span&gt;
    &lt;span class="n"&gt;h_norm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;h_pixel&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;img_h&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;x_norm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_norm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;w_norm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;h_norm&lt;/span&gt;

&lt;span class="c1"&gt;# Normalize coordinates (assuming image is 640x640)
&lt;/span&gt;&lt;span class="n"&gt;x_n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;w_n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;h_n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;normalize_to_yolo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_center&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_center&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;w_pixel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;h_pixel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;640&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;640&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Print or write to a YOLO label file (e.g. class 0)
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0 &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;x_n&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;y_n&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;w_n&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;h_n&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Reverse: YOLO Bounding Boxes back to Masks
&lt;/h2&gt;

&lt;p&gt;What if you want to reconstruct binary masks from your bounding boxes for validation or visualization? You can draw filled rectangles onto a black canvas of the original image dimensions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Initialize a black canvas (grayscale)
&lt;/span&gt;&lt;span class="n"&gt;reconstructed_mask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;img_height&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;img_width&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uint8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Denormalize coordinates
&lt;/span&gt;&lt;span class="n"&gt;x_min&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;x_n&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;w_n&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;img_width&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y_min&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;y_n&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;h_n&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;img_height&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;x_max&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;x_n&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;w_n&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;img_width&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y_max&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;y_n&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;h_n&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;img_height&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Draw a filled white rectangle on the canvas
&lt;/span&gt;&lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rectangle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reconstructed_mask&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_min&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_min&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_max&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_max&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;thickness&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Save mask
&lt;/span&gt;&lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;imwrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;reconstructed_mask.png&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reconstructed_mask&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Streamlining the Workflow
&lt;/h2&gt;

&lt;p&gt;While writing this manually is great for one-off tasks, managing it across hundreds of images, handling missing metadata, splitting datasets into train/test folders, and generating the &lt;code&gt;data.yaml&lt;/code&gt; config file can take hours. &lt;/p&gt;

&lt;p&gt;If you want a lightweight package that packages all of this (including batch conversions, CSV/JSON metadata parsing, and visualizations), I've open-sourced a helper package called &lt;code&gt;segment-toolkit&lt;/code&gt; that does it in a few commands.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to use it:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Install it via pip:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   pip &lt;span class="nb"&gt;install &lt;/span&gt;segment-toolkit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Convert a directory of masks to YOLO labels:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   segment-toolkit mask-to-yolo &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;--image-dir&lt;/span&gt; datasets/images/ &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;--mask-dir&lt;/span&gt; datasets/masks/ &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;--output-dir&lt;/span&gt; datasets/labels/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Split into standard train/test structures for YOLO training:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   segment-toolkit &lt;span class="nb"&gt;split&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;--images&lt;/span&gt; datasets/images/ &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;--labels&lt;/span&gt; datasets/labels/ &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;--output&lt;/span&gt; final_dataset/ &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;--ratio&lt;/span&gt; 0.8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Whether you write your own script using the OpenCV math above or use the open-source toolkit, automating this step saves days of annotation work.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Source Code: &lt;a href="https://github.com/zkzkGamal/mask-to-yolo-toolkit" rel="noopener noreferrer"&gt;GitHub - mask-to-yolo-toolkit&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  PyPI Package: &lt;a href="https://pypi.org/project/segment-toolkit/" rel="noopener noreferrer"&gt;segment-toolkit&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;How do you handle dataset conversions in your machine learning workflows? Let me know in the comments below!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>opencv</category>
      <category>python</category>
      <category>computervision</category>
      <category>ai</category>
    </item>
    <item>
      <title>Agent vs Multi-Agent Systems: A Practical Guide with LangGraph &amp; LangChain</title>
      <dc:creator>zkaria gamal</dc:creator>
      <pubDate>Tue, 09 Jun 2026 19:09:53 +0000</pubDate>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8/agent-vs-multi-agent-systems-a-practical-guide-with-langgraph-langchain-3g8b</link>
      <guid>https://dev.to/zkaria_gamal_3cddbbff21c8/agent-vs-multi-agent-systems-a-practical-guide-with-langgraph-langchain-3g8b</guid>
      <description>&lt;p&gt;Hey everyone! 👋&lt;br&gt;
I've been deep in &lt;strong&gt;Agentic AI Engineering&lt;/strong&gt; lately, and one of the most common questions I get is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What's the real difference between a &lt;strong&gt;single Agent&lt;/strong&gt; and a &lt;strong&gt;Multi-Agent&lt;/strong&gt; system? And how do you actually build them in production?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Today I'll break it down clearly and show you &lt;strong&gt;exactly how I implemented both&lt;/strong&gt; in my repo:&lt;br&gt;&lt;br&gt;
&lt;strong&gt;&lt;a href="https://github.com/zkzkGamal/agentic-ai-engineering" rel="noopener noreferrer"&gt;zkzkGamal/agentic-ai-engineering&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  1. Single Agent (including ReAct)
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;Single Agent&lt;/strong&gt; is one LLM-powered entity that can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reason&lt;/li&gt;
&lt;li&gt;Call tools&lt;/li&gt;
&lt;li&gt;Maintain memory&lt;/li&gt;
&lt;li&gt;Loop until the task is done&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;ReAct Agent&lt;/strong&gt; is a popular implementation pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Re&lt;/strong&gt;ason → &lt;strong&gt;Act&lt;/strong&gt; (tool call) → &lt;strong&gt;Observe&lt;/strong&gt; (tool result) → Repeat&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pros&lt;/strong&gt;: Simple, easy to debug, great for focused tasks.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Cons&lt;/strong&gt;: One agent has to be good at &lt;em&gt;everything&lt;/em&gt; → can become bloated, slow, or less specialized.&lt;/p&gt;

&lt;p&gt;In my repo:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Chapter 4&lt;/strong&gt; teaches classic &lt;strong&gt;ReAct agents&lt;/strong&gt; with LangGraph.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chapter 5&lt;/strong&gt; uses a &lt;strong&gt;ReAct-style tool-calling agent&lt;/strong&gt; inside the &lt;code&gt;Execute&lt;/code&gt; node (via &lt;code&gt;create_tool_calling_agent&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  2. Multi-Agent System
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;Multi-Agent&lt;/strong&gt; system = multiple specialized agents (or nodes) working together like a team.&lt;/p&gt;

&lt;p&gt;Instead of one giant agent, you break the work into roles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Router / Supervisor&lt;/li&gt;
&lt;li&gt;Researcher&lt;/li&gt;
&lt;li&gt;Executor / Tool User&lt;/li&gt;
&lt;li&gt;Critic / Reviewer&lt;/li&gt;
&lt;li&gt;Summarizer&lt;/li&gt;
&lt;li&gt;etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Benefits&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Better specialization&lt;/li&gt;
&lt;li&gt;More efficient (cheap model for routing, powerful model only when needed)&lt;/li&gt;
&lt;li&gt;Easier to maintain and scale&lt;/li&gt;
&lt;li&gt;Clear separation of concerns&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  How My Repo Uses Both (Real Architecture)
&lt;/h3&gt;

&lt;p&gt;The crown jewel is in &lt;strong&gt;Chapter 5: Multi-Node LangGraph Agent with MCP Tools&lt;/strong&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Core Architecture (Single Graph, Multi-Node = Multi-Agent flavor)
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Key Nodes (Specialized "Agents")&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Router Node&lt;/strong&gt; (&lt;code&gt;agent/nodes/router.py&lt;/code&gt;)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fast intent classification (Math? Email? Just chat?)&lt;/li&gt;
&lt;li&gt;Acts as a &lt;strong&gt;Supervisor&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Execute Node&lt;/strong&gt; (&lt;code&gt;agent/nodes/execute.py&lt;/code&gt;)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Runs a full &lt;strong&gt;ReAct tool-calling agent&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Connects to tools via &lt;strong&gt;MCP (Model Context Protocol)&lt;/strong&gt; server (math + email tools)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Summarize Node&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Takes raw tool output (JSON, etc.) and makes it human-friendly&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Conversation Node&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lightweight chat fallback (avoids heavy tool path)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is &lt;strong&gt;not&lt;/strong&gt; a pure peer-to-peer multi-agent (like Researcher → Writer → Critic), but a &lt;strong&gt;hierarchical multi-node LangGraph&lt;/strong&gt; — which is one of the most practical and production-friendly multi-agent patterns today.&lt;/p&gt;

&lt;p&gt;You also see &lt;strong&gt;pure multi-agent collaboration&lt;/strong&gt; examples in &lt;strong&gt;Chapter 4&lt;/strong&gt; (Researcher + Writer + Critic working together).&lt;/p&gt;




&lt;h3&gt;
  
  
  Why This Architecture Rocks
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Separation of concerns&lt;/strong&gt; → easier debugging and testing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Efficiency&lt;/strong&gt; → cheap routing + targeted execution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt; → Tools run in isolated &lt;strong&gt;MCP server&lt;/strong&gt; (not directly in the LLM agent)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability&lt;/strong&gt; → Each node is a clear step you can log/monitor&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extensibility&lt;/strong&gt; → Want to add a "Research" intent? Just add a new node and update the router.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is built with &lt;strong&gt;LangGraph + LangChain&lt;/strong&gt; (no heavy LlamaIndex or basic OpenAI-only examples).&lt;/p&gt;




&lt;h3&gt;
  
  
  Key Takeaways
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Single Agent (ReAct)&lt;/th&gt;
&lt;th&gt;Multi-Node / Multi-Agent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Complexity&lt;/td&gt;
&lt;td&gt;Simpler&lt;/td&gt;
&lt;td&gt;More structured&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Specialization&lt;/td&gt;
&lt;td&gt;One agent does everything&lt;/td&gt;
&lt;td&gt;Each node/agent has a clear role&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Efficiency&lt;/td&gt;
&lt;td&gt;Can be wasteful&lt;/td&gt;
&lt;td&gt;Optimized routing &amp;amp; execution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Debugging&lt;/td&gt;
&lt;td&gt;Easier at first&lt;/td&gt;
&lt;td&gt;Better long-term traceability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best For&lt;/td&gt;
&lt;td&gt;Focused tasks&lt;/td&gt;
&lt;td&gt;Complex, real-world workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Modern reality (2026)&lt;/strong&gt;: Most production agent systems are &lt;strong&gt;multi-node LangGraph&lt;/strong&gt; setups that combine ReAct inside specialized nodes.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Want to see it in action?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;→ Clone the repo:&lt;br&gt;&lt;br&gt;
&lt;code&gt;git clone https://github.com/zkzkGamal/agentic-ai-engineering.git&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Start with &lt;strong&gt;Chapter 4&lt;/strong&gt; for fundamentals, then run the full &lt;strong&gt;Chapter 5&lt;/strong&gt; system (LangGraph assistant + separate MCP tool server).&lt;/p&gt;

&lt;p&gt;It includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pytest + GitHub Actions CI&lt;/li&gt;
&lt;li&gt;Local Ollama support&lt;/li&gt;
&lt;li&gt;Memory, streaming, and more&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Would love your feedback or contributions! ⭐&lt;/p&gt;

</description>
      <category>agentaichallenge</category>
      <category>agents</category>
      <category>opensource</category>
      <category>ai</category>
    </item>
    <item>
      <title>I Got Tired of Rewriting the Same Mask-to-YOLO Script So I Shipped a PyPI Package</title>
      <dc:creator>zkaria gamal</dc:creator>
      <pubDate>Mon, 08 Jun 2026 12:09:26 +0000</pubDate>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8/i-got-tired-of-rewriting-the-same-mask-to-yolo-script-so-i-shipped-a-pypi-package-1akh</link>
      <guid>https://dev.to/zkaria_gamal_3cddbbff21c8/i-got-tired-of-rewriting-the-same-mask-to-yolo-script-so-i-shipped-a-pypi-package-1akh</guid>
      <description>&lt;p&gt;I got tired of writing the same 50 lines of OpenCV boilerplate every new project.&lt;/p&gt;

&lt;p&gt;Every pipeline that uses SAM, U-Net, or any segmentation model hands you binary masks. YOLO training wants bounding box labels. The standard move is a custom script — cv2.findContours, normalize coordinates, handle edge cases, repeat. No reusable package existed that did this cleanly end-to-end.&lt;/p&gt;

&lt;p&gt;So I built one and shipped it to PyPI.&lt;/p&gt;

&lt;p&gt;The install&lt;/p&gt;

&lt;p&gt;pip install segment-toolkit&lt;/p&gt;

&lt;p&gt;What it does&lt;/p&gt;

&lt;p&gt;segment-toolkit is a bidirectional pipeline between binary segmentation masks and YOLO bounding box labels.&lt;/p&gt;

&lt;p&gt;Forward: binary mask to YOLO label (axis-aligned or rotated minimum-area box via cv2.minAreaRect)&lt;/p&gt;

&lt;p&gt;Reverse: YOLO label back to binary mask&lt;/p&gt;

&lt;p&gt;Visualizer: overlay bounding boxes on source images&lt;/p&gt;

&lt;p&gt;Dataset split: auto train/test split with data.yaml output&lt;/p&gt;

&lt;p&gt;Class mapping: batch conversion with CSV or JSON ground truth&lt;/p&gt;

&lt;p&gt;CLI usage&lt;/p&gt;

&lt;p&gt;Convert a single mask to a YOLO label:&lt;/p&gt;

&lt;p&gt;segment-toolkit mask-to-yolo \&lt;br&gt;
  --image images/ISIC_0024310.jpg \&lt;br&gt;
  --mask mask/ISIC_0024310_segmentation.png \&lt;br&gt;
  --output-txt labels/ISIC_0024310.txt \&lt;br&gt;
  --class-id 4&lt;/p&gt;

&lt;p&gt;Reconstruct the mask back from the label:&lt;/p&gt;

&lt;p&gt;segment-toolkit yolo-to-mask \&lt;br&gt;
  --label labels/ISIC_0024310.txt \&lt;br&gt;
  --output-mask masks_reconstructed/ISIC_0024310_segmentation.png&lt;/p&gt;

&lt;p&gt;Visualize the bounding box overlay:&lt;/p&gt;

&lt;p&gt;segment-toolkit visualize \&lt;br&gt;
  --image images/ISIC_0024310.jpg \&lt;br&gt;
  --label labels/ISIC_0024310.txt \&lt;br&gt;
  --output visualization.png&lt;/p&gt;

&lt;p&gt;Split into train/test with data.yaml:&lt;/p&gt;

&lt;p&gt;segment-toolkit split \&lt;br&gt;
  --images images/ \&lt;br&gt;
  --labels labels/ \&lt;br&gt;
  --output dataset/ \&lt;br&gt;
  --ratio 0.8 \&lt;br&gt;
  --seed 42&lt;/p&gt;

&lt;p&gt;Python API&lt;/p&gt;

&lt;p&gt;from segment_toolkit import MaskToYoloConverter, YoloToMaskConverter&lt;/p&gt;

&lt;p&gt;conv = MaskToYoloConverter(target_size=(640, 640), bbox_type="standard")&lt;br&gt;
conv.convert_single(&lt;br&gt;
    image_path="images/ISIC_0024310.jpg",&lt;br&gt;
    mask_path="mask/ISIC_0024310_segmentation.png",&lt;br&gt;
    output_txt_path="labels/ISIC_0024310.txt",&lt;br&gt;
    class_id=4&lt;br&gt;
)&lt;/p&gt;

&lt;p&gt;Validated on&lt;/p&gt;

&lt;p&gt;ISIC melanoma skin lesion segmentation and PlantVillage leaf disease. Both tested end-to-end: mask in, YOLO label out, mask reconstructed back, overlay rendered.&lt;/p&gt;

&lt;p&gt;Repo&lt;/p&gt;

&lt;p&gt;github.com/zkzkGamal/mask-to-yolo-toolkit&lt;/p&gt;

&lt;p&gt;If you hit a bug or want a feature, open an issue.&lt;/p&gt;

</description>
      <category>computervision</category>
      <category>python</category>
      <category>opensource</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>From Segmentation Masks to YOLO Labels: My Dataset Prep Pipeline</title>
      <dc:creator>zkaria gamal</dc:creator>
      <pubDate>Thu, 04 Jun 2026 14:32:56 +0000</pubDate>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8/from-segmentation-masks-to-yolo-labels-my-dataset-prep-pipeline-3ph8</link>
      <guid>https://dev.to/zkaria_gamal_3cddbbff21c8/from-segmentation-masks-to-yolo-labels-my-dataset-prep-pipeline-3ph8</guid>
      <description>&lt;p&gt;I just finished a small but useful pipeline for skin lesion dataset preparation and annotation validation.&lt;/p&gt;

&lt;p&gt;𝗧𝗵𝗲 𝗽𝗿𝗼𝗷𝗲𝗰𝘁 𝗵𝗮𝗻𝗱𝗹𝗲𝘀 𝘁𝘄𝗼 𝘄𝗼𝗿𝗸𝗳𝗹𝗼𝘄𝘀:&lt;br&gt;
&amp;nbsp;&amp;nbsp;• Converting binary segmentation masks into YOLO labels&lt;br&gt;
&amp;nbsp;&amp;nbsp;• Converting YOLO labels back into masks for validation and visualization&lt;/p&gt;

&lt;p&gt;It was built around ISIC-style skin lesion data with 7 classes:&lt;br&gt;
AKIEC, BCC, BKL, DF, MEL, NV, and VASC.&lt;/p&gt;

&lt;p&gt;𝗪𝗵𝗮𝘁 𝗜 𝗹𝗲𝗮𝗿𝗻𝗲𝗱 𝗳𝗿𝗼𝗺 𝘁𝗵𝗶𝘀 𝗽𝗿𝗼𝗷𝗲𝗰𝘁:&lt;br&gt;
&amp;nbsp;&amp;nbsp;• Clean annotation pipelines save a lot of debugging time&lt;br&gt;
&amp;nbsp;&amp;nbsp;• A quick visual validation step catches label issues early&lt;br&gt;
&amp;nbsp;&amp;nbsp;• Even simple format conversions can reveal bad labels or inconsistent data&lt;/p&gt;

&lt;p&gt;This project helped me better understand the full path from segmentation masks to training-ready YOLO annotations.&lt;/p&gt;

&lt;p&gt;In the next phase, I plan to turn it into a more reusable Python package with a cleaner structure, better error handling, and a more maintainable workflow so it can be easier to use and adapt for future datasets.&lt;/p&gt;

&lt;p&gt;If you work with medical imaging or dataset preparation, I’d love to hear how you validate your labels before training.&lt;br&gt;
&lt;a href="https://github.com/zkzkGamal/mask-to-yolo-toolkit" rel="noopener noreferrer"&gt;project repo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F98sev1vajobh0f1884pk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F98sev1vajobh0f1884pk.png" alt=" " width="800" height="652"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70iyc6ppr488muhrsmsl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70iyc6ppr488muhrsmsl.png" alt=" " width="800" height="652"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  MachineLearning #ComputerVision #YOLO #DeepLearning #MedicalImaging #DataAnnotation #ISIC #Python #OpenCV
&lt;/h1&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>productivity</category>
      <category>python</category>
    </item>
    <item>
      <title>Building Zero-Shared-State Auth Middleware and Real-Time Whisper STT Pipeline for Voice AI</title>
      <dc:creator>zkaria gamal</dc:creator>
      <pubDate>Sat, 30 May 2026 09:41:38 +0000</pubDate>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8/building-zero-shared-state-auth-middleware-and-real-time-whisper-stt-pipeline-for-voice-ai-158j</link>
      <guid>https://dev.to/zkaria_gamal_3cddbbff21c8/building-zero-shared-state-auth-middleware-and-real-time-whisper-stt-pipeline-for-voice-ai-158j</guid>
      <description>&lt;p&gt;I recently built a &lt;strong&gt;production-grade real-time Voice AI workspace&lt;/strong&gt; from scratch. While the whole system has many moving parts, two components required the most careful engineering: the &lt;strong&gt;authentication middleware&lt;/strong&gt; between services and the &lt;strong&gt;Speech-to-Text (STT) pipeline&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Here’s exactly how I approached and solved both.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Middleware Problem
&lt;/h4&gt;

&lt;p&gt;I needed two local microservices — a &lt;strong&gt;WebRTC audio server&lt;/strong&gt; and a &lt;strong&gt;FastMCP server&lt;/strong&gt; — to communicate securely. &lt;/p&gt;

&lt;p&gt;I didn’t want to introduce a database, Redis, or any hardcoded secrets. The solution had to be lightweight, stateless, and still reasonably secure for internal communication.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;So I built a dynamic time-locked API key generator.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Both services independently calculate the same cryptographic key using the current UTC timestamp.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Take the current timestamp&lt;/li&gt;
&lt;li&gt;Divide it by a 5-second epoch window&lt;/li&gt;
&lt;li&gt;Generate a deterministic key from that value&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If a request arrives outside the valid 5-second window, it is immediately rejected.&lt;/p&gt;

&lt;p&gt;This approach gives me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No shared state&lt;/li&gt;
&lt;li&gt;No persistent storage&lt;/li&gt;
&lt;li&gt;No single point of failure&lt;/li&gt;
&lt;li&gt;Automatic key rotation every 5 seconds&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  The Real-Time STT Pipeline
&lt;/h4&gt;

&lt;p&gt;I wanted &lt;strong&gt;low-latency transcription&lt;/strong&gt; with zero cold starts and no HTTP polling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Here’s the exact flow I created:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browser captures audio at &lt;strong&gt;48kHz&lt;/strong&gt; via WebRTC&lt;/li&gt;
&lt;li&gt;Audio is downsampled to &lt;strong&gt;16kHz&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Voice Activity Detection (VAD) runs in &lt;strong&gt;30ms&lt;/strong&gt; lockstep&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2.0 seconds&lt;/strong&gt; of continuous silence = speech boundary&lt;/li&gt;
&lt;li&gt;Audio segment is sent to the FastMCP server&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Whisper "small"&lt;/strong&gt; model is preloaded at boot (zero cold starts)&lt;/li&gt;
&lt;li&gt;Transcription result is pushed back to the React frontend over &lt;strong&gt;WebRTC DataChannel&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This gives a true real-time feeling with sub-second end-to-end latency in most cases.&lt;/p&gt;

&lt;h4&gt;
  
  
  Architecture Flow
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8z2myq3570ef9r9oghbk.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8z2myq3570ef9r9oghbk.jpg" alt="Architecture Flow" width="800" height="537"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Why This Design Works Well
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Completely stateless middleware removes infrastructure complexity.&lt;/li&gt;
&lt;li&gt;Preloading Whisper eliminates cold start delays.&lt;/li&gt;
&lt;li&gt;Using WebRTC DataChannel for transcription delivery removes polling overhead.&lt;/li&gt;
&lt;li&gt;Clear separation of concerns with VAD segmentation and MCP tooling.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The full project is open source and meant to serve as an educational blueprint for developers working with WebRTC, MCP, and real-time AI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repository:&lt;/strong&gt; &lt;a href="https://lnkd.in/dFbE44e3" rel="noopener noreferrer"&gt;https://lnkd.in/dFbE44e3&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Contributions are welcome — especially on the agent routing and LLM orchestration layer that’s currently in progress.&lt;/p&gt;

&lt;p&gt;Let me know in the comments if you’d like me to dive deeper into any specific part (VAD tuning, Whisper post-processing, rate limiting, etc.).&lt;/p&gt;

</description>
      <category>webrtc</category>
      <category>fastmcp</category>
      <category>ai</category>
    </item>
    <item>
      <title>How I Built a Zero-Shared-State Auth Middleware for a Real-Time Voice AI Agent (WebRTC + FastMCP + Whisper)</title>
      <dc:creator>zkaria gamal</dc:creator>
      <pubDate>Mon, 25 May 2026 19:17:58 +0000</pubDate>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8/how-i-built-a-zero-shared-state-auth-middleware-for-a-real-time-voice-ai-agent-webrtc-fastmcp--56do</link>
      <guid>https://dev.to/zkaria_gamal_3cddbbff21c8/how-i-built-a-zero-shared-state-auth-middleware-for-a-real-time-voice-ai-agent-webrtc-fastmcp--56do</guid>
      <description>&lt;p&gt;I've been building an open-source real-time voice AI workspace for the past few weeks and I want to walk through the architecture decisions that were actually hard — not the happy-path stuff you see in tutorials.&lt;/p&gt;

&lt;p&gt;The stack: React client → WebRTC Python backend → FastMCP server (Whisper STT, Mail, Calendar) → transcript delivered back over a WebRTC DataChannel. The LLM orchestration layer is still in progress, but the pipeline underneath it is fully live and tested.&lt;/p&gt;

&lt;p&gt;Here's what I want to focus on: three engineering decisions that weren't obvious.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem With Securing Local Microservices&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When two services run on the same machine — in this case the WebRTC server and the MCP server — the standard advice is to put them behind a shared secret or an API key stored in an environment variable. That works, but it has failure modes: leaked &lt;code&gt;.env&lt;/code&gt; files, rotation pain, and the cognitive overhead of managing secrets across services that should be able to trust each other without a database call.&lt;/p&gt;

&lt;p&gt;I wanted something stateless and self-expiring.&lt;/p&gt;

&lt;p&gt;The solution I landed on is a time-locked hash generator. Both servers independently compute the same key by applying deterministic math to the current UTC timestamp divided by a 5-second epoch window:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_api_key&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;epoch_window&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
    &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log10&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;epoch_window&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;The&lt;/span&gt; &lt;span class="n"&gt;Starlette&lt;/span&gt; &lt;span class="n"&gt;middleware&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;MCP&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="n"&gt;recomputes&lt;/span&gt; &lt;span class="n"&gt;this&lt;/span&gt; &lt;span class="nb"&gt;hash&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt; &lt;span class="n"&gt;every&lt;/span&gt; &lt;span class="n"&gt;incoming&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;compares&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;If&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;timestamp&lt;/span&gt; &lt;span class="n"&gt;window&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;off&lt;/span&gt; &lt;span class="n"&gt;by&lt;/span&gt; &lt;span class="n"&gt;more&lt;/span&gt; &lt;span class="n"&gt;than&lt;/span&gt; &lt;span class="n"&gt;one&lt;/span&gt; &lt;span class="n"&gt;epoch&lt;/span&gt; &lt;span class="err"&gt;—&lt;/span&gt; &lt;span class="n"&gt;five&lt;/span&gt; &lt;span class="n"&gt;seconds&lt;/span&gt; &lt;span class="err"&gt;—&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;rejected&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;No&lt;/span&gt; &lt;span class="n"&gt;database&lt;/span&gt; &lt;span class="n"&gt;lookup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;No&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;No&lt;/span&gt; &lt;span class="n"&gt;rotation&lt;/span&gt; &lt;span class="n"&gt;script&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;The&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="n"&gt;rotates&lt;/span&gt; &lt;span class="n"&gt;itself&lt;/span&gt; &lt;span class="n"&gt;every&lt;/span&gt; &lt;span class="n"&gt;five&lt;/span&gt; &lt;span class="n"&gt;seconds&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;both&lt;/span&gt; &lt;span class="n"&gt;sides&lt;/span&gt; &lt;span class="n"&gt;always&lt;/span&gt; &lt;span class="n"&gt;agree&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt; &lt;span class="n"&gt;what&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt; &lt;span class="n"&gt;should&lt;/span&gt; &lt;span class="n"&gt;be&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="n"&gt;This&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;production&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;grade&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;internet&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;facing&lt;/span&gt; &lt;span class="nf"&gt;services &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TOTP&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;proper&lt;/span&gt; &lt;span class="n"&gt;shared&lt;/span&gt; &lt;span class="n"&gt;seed&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;better&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;that&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;but&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;securing&lt;/span&gt; &lt;span class="n"&gt;local&lt;/span&gt; &lt;span class="n"&gt;inter&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt; &lt;span class="n"&gt;communication&lt;/span&gt; &lt;span class="n"&gt;during&lt;/span&gt; &lt;span class="n"&gt;development&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;staging&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;clean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;auditable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;has&lt;/span&gt; &lt;span class="n"&gt;zero&lt;/span&gt; &lt;span class="n"&gt;ops&lt;/span&gt; &lt;span class="n"&gt;overhead&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;The&lt;/span&gt; &lt;span class="n"&gt;Dual&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;Rate&lt;/span&gt; &lt;span class="n"&gt;Audio&lt;/span&gt; &lt;span class="n"&gt;Pipeline&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;

&lt;span class="n"&gt;WebRTC&lt;/span&gt; &lt;span class="n"&gt;gives&lt;/span&gt; &lt;span class="n"&gt;you&lt;/span&gt; &lt;span class="n"&gt;audio&lt;/span&gt; &lt;span class="n"&gt;at&lt;/span&gt; &lt;span class="mi"&gt;48&lt;/span&gt;&lt;span class="n"&gt;kHz&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;Whisper&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;happiest&lt;/span&gt; &lt;span class="n"&gt;at&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="n"&gt;kHz&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="sb"&gt;`webrtcvad`&lt;/span&gt; &lt;span class="n"&gt;only&lt;/span&gt; &lt;span class="n"&gt;accepts&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="n"&gt;kHz&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;Feeding&lt;/span&gt; &lt;span class="n"&gt;everything&lt;/span&gt; &lt;span class="n"&gt;through&lt;/span&gt; &lt;span class="n"&gt;one&lt;/span&gt; &lt;span class="n"&gt;sample&lt;/span&gt; &lt;span class="n"&gt;rate&lt;/span&gt; &lt;span class="n"&gt;loses&lt;/span&gt; &lt;span class="n"&gt;either&lt;/span&gt; &lt;span class="n"&gt;fidelity&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;transcription&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;compatibility&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;VAD&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="n"&gt;The&lt;/span&gt; &lt;span class="n"&gt;backend&lt;/span&gt; &lt;span class="n"&gt;handles&lt;/span&gt; &lt;span class="n"&gt;both&lt;/span&gt; &lt;span class="n"&gt;independently&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;same&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="n"&gt;ms&lt;/span&gt; &lt;span class="n"&gt;processing&lt;/span&gt; &lt;span class="n"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;The&lt;/span&gt; &lt;span class="n"&gt;full&lt;/span&gt; &lt;span class="mi"&gt;48&lt;/span&gt;&lt;span class="n"&gt;kHz&lt;/span&gt; &lt;span class="n"&gt;PCM&lt;/span&gt; &lt;span class="nb"&gt;buffer&lt;/span&gt; &lt;span class="n"&gt;accumulates&lt;/span&gt; &lt;span class="n"&gt;separately&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;Whisper&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt; &lt;span class="n"&gt;parallel&lt;/span&gt; &lt;span class="n"&gt;downsampled&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="n"&gt;kHz&lt;/span&gt; &lt;span class="n"&gt;frame&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt; &lt;span class="n"&gt;feeds&lt;/span&gt; &lt;span class="sb"&gt;`webrtcvad`&lt;/span&gt; &lt;span class="n"&gt;at&lt;/span&gt; &lt;span class="n"&gt;aggressiveness&lt;/span&gt; &lt;span class="n"&gt;level&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt; &lt;span class="n"&gt;sliding&lt;/span&gt; &lt;span class="n"&gt;window&lt;/span&gt; &lt;span class="n"&gt;tracks&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;ratio&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;active&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;silent&lt;/span&gt; &lt;span class="n"&gt;frames&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;When&lt;/span&gt; &lt;span class="n"&gt;fewer&lt;/span&gt; &lt;span class="n"&gt;than&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="n"&gt;frames&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;last&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt; &lt;span class="n"&gt;seconds&lt;/span&gt; &lt;span class="n"&gt;are&lt;/span&gt; &lt;span class="n"&gt;active&lt;/span&gt; &lt;span class="err"&gt;—&lt;/span&gt; &lt;span class="n"&gt;that&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s the boundary

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
python&lt;br&gt;
SILENCE_RATIO_THRESHOLD = 0.1&lt;br&gt;
SILENCE_DURATION_SECONDS = 2.0&lt;/p&gt;

&lt;p&gt;active_frames = sum(vad_window)&lt;br&gt;
total_frames = len(vad_window)&lt;br&gt;
if active_frames / total_frames &amp;lt; SILENCE_RATIO_THRESHOLD:&lt;br&gt;
    trigger_pipeline()&lt;/p&gt;

&lt;p&gt;Splitting the buffers means you get high-quality STT input and accurate VAD detection without either compromising the other.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Service Singletons and the Cold Start Problem&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Whisper is not fast to load. If you initialize the model on the first request, your first transcription takes 3–6 seconds depending on hardware. Every user who speaks first gets a broken experience.&lt;/p&gt;

&lt;p&gt;The fix is a &lt;code&gt;LoadModelService&lt;/code&gt; singleton that runs at server startup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LoadModelService&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="nd"&gt;@classmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_model&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;whisper&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;small&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_model&lt;/span&gt;

&lt;span class="n"&gt;This&lt;/span&gt; &lt;span class="n"&gt;gets&lt;/span&gt; &lt;span class="n"&gt;called&lt;/span&gt; &lt;span class="n"&gt;inside&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;FastMCP&lt;/span&gt; &lt;span class="n"&gt;lifespan&lt;/span&gt; &lt;span class="n"&gt;hook&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;so&lt;/span&gt; &lt;span class="n"&gt;by&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;first&lt;/span&gt; &lt;span class="n"&gt;WebSocket&lt;/span&gt; &lt;span class="n"&gt;connection&lt;/span&gt; &lt;span class="n"&gt;arrives&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;already&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;Every&lt;/span&gt; &lt;span class="n"&gt;subsequent&lt;/span&gt; &lt;span class="n"&gt;transcription&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt; &lt;span class="n"&gt;hits&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;warm&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="n"&gt;The&lt;/span&gt; &lt;span class="n"&gt;same&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt; &lt;span class="n"&gt;applies&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;mail&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;calendar&lt;/span&gt; &lt;span class="n"&gt;services&lt;/span&gt; &lt;span class="err"&gt;—&lt;/span&gt; &lt;span class="n"&gt;singletons&lt;/span&gt; &lt;span class="n"&gt;initialized&lt;/span&gt; &lt;span class="n"&gt;once&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reused&lt;/span&gt; &lt;span class="n"&gt;across&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="n"&gt;rate&lt;/span&gt; &lt;span class="nf"&gt;limiter &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;Gmail&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;sitting&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;front&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;anything&lt;/span&gt; &lt;span class="n"&gt;that&lt;/span&gt; &lt;span class="n"&gt;touches&lt;/span&gt; &lt;span class="n"&gt;an&lt;/span&gt; &lt;span class="n"&gt;external&lt;/span&gt; &lt;span class="n"&gt;API&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;The&lt;/span&gt; &lt;span class="n"&gt;Pytest&lt;/span&gt; &lt;span class="n"&gt;Suite&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;

&lt;span class="n"&gt;You&lt;/span&gt; &lt;span class="n"&gt;can&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t calibrate a VAD pipeline without tests. The suite covers:

- Frame decimation accuracy at different sample rates
- Speech onset boundary detection under various silence patterns
- SMTP integration with mock SMTP server
- Calendar tool with automatic `.ics` fallback when no calendar service is configured

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;[ RUN  ] test_frame_decimation_48k_to_16k&lt;br&gt;
[ OK   ] test_frame_decimation_48k_to_16k&lt;br&gt;
[ RUN  ] test_vad_silence_boundary_2s&lt;br&gt;
[ OK   ] test_vad_silence_boundary_2s&lt;br&gt;
[ RUN  ] test_smtp_send_integration&lt;br&gt;
[ OK   ] test_smtp_send_integration&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;


Running `pytest tests/ -v` from the `mcp/` directory gives you live output with real pass/fail visibility — not just a summary at the end.

**What's Next**

The LLM orchestration and conversation routing layer is actively in development. Once that's in, the full loop closes: speech → STT → LLM agent → tool use → response.

The entire codebase is open source and structured as an educational reference for WebRTC, MCP, and secure microservices. If you're building anything in this space — voice agents, real-time audio pipelines, MCP tool servers — I'd love contributions, issues, or just a look.


![Stt Flow](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/dapxxg3ypbj526ci35oh.jpeg)

GitHub: https://github.com/zkzkGamal/AI-RTC-Agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>webrtc</category>
      <category>python</category>
      <category>machinelearning</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I built a fully tested Agentic AI system with LangGraph + MCP and open-sourced the whole thing</title>
      <dc:creator>zkaria gamal</dc:creator>
      <pubDate>Mon, 18 May 2026 10:24:32 +0000</pubDate>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8/i-built-a-fully-tested-agentic-ai-system-with-langgraph-mcp-and-open-sourced-the-whole-thing-3gb3</link>
      <guid>https://dev.to/zkaria_gamal_3cddbbff21c8/i-built-a-fully-tested-agentic-ai-system-with-langgraph-mcp-and-open-sourced-the-whole-thing-3gb3</guid>
      <description>&lt;p&gt;Most LLM tutorials stop at "here's how to call the OpenAI API."&lt;/p&gt;

&lt;p&gt;Mine doesn't.&lt;/p&gt;

&lt;p&gt;I just shipped &lt;strong&gt;v1.1.0&lt;/strong&gt; of &lt;a href="https://github.com/zkzkGamal/Agentic-AI-Tutorial" rel="noopener noreferrer"&gt;Agentic AI Tutorial&lt;/a&gt; — a 5-chapter open-source repo that takes you from your first raw API call all the way to a &lt;strong&gt;production-style multi-node autonomous agent&lt;/strong&gt; with a CI pipeline, pytest suite, and MCP server integration.&lt;/p&gt;

&lt;p&gt;Here's what's inside and why I built it the way I did.&lt;/p&gt;




&lt;h2&gt;
  
  
  🏗️ The Architecture (Chapter 5)
&lt;/h2&gt;

&lt;p&gt;The final agent uses a &lt;strong&gt;LangGraph StateGraph&lt;/strong&gt; with 4 decoupled nodes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Router&lt;/strong&gt; — classifies user intent with a cheap, fast LLM call&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execute&lt;/strong&gt; — runs a LangChain ReAct agent bound to a local FastMCP server&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Summarize&lt;/strong&gt; — converts raw tool JSON into natural language&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Conversation&lt;/strong&gt; — handles chitchat directly, skipping tool execution entirely&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd9a1x8csb5c9qo7uzl31.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd9a1x8csb5c9qo7uzl31.png" alt="Architecture" width="295" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The MCP server exposes math and email tools over SSE. The agent never touches your credentials directly — it talks to the server, which acts as a secure boundary.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧪 Why I Added Tests to an AI Project
&lt;/h2&gt;

&lt;p&gt;Here's the uncomfortable truth about agentic systems: &lt;strong&gt;they don't fail loudly. They drift.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Change one node prompt, and suddenly the router misclassifies 20% of requests. No exception thrown. No stack trace. Just wrong output that you may not catch until a user reports it.&lt;/p&gt;

&lt;p&gt;So v1.1.0 ships with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;pytest suite&lt;/strong&gt; that validates each node's logic and MCP tool contracts independently — no live API calls needed&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;GitHub Actions CI workflow&lt;/strong&gt; that runs on every push across multiple Python versions&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;custom &lt;code&gt;conftest.py&lt;/code&gt;&lt;/strong&gt; reporter that gives real-time output with zero buffering lag
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pytest Chapter5/SimpleChatAgent/ &lt;span class="nt"&gt;-v&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  📚 Full Roadmap (All 5 Chapters)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Chapter&lt;/th&gt;
&lt;th&gt;Focus&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;LLM fundamentals — OpenAI, Gemini, Ollama, streaming&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;LangChain, LCEL, chains, tool binding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Memory, entity tracking, RAG with Chroma/FAISS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;LangGraph agents — ReAct, Router, Multi-Agent, Human-in-the-Loop&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Multi-node agent + FastMCP Server + CI/pytest&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🚀 Get Started in 3 Commands
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/zkzkGamal/Agentic-AI-Tutorial.git
&lt;span class="nb"&gt;cd &lt;/span&gt;Agentic-AI-Tutorial
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each chapter has its own &lt;code&gt;.env.example&lt;/code&gt;. Ollama users can run everything &lt;strong&gt;100% locally, no API keys needed&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;If this saves you time or teaches you something new, a ⭐ on the repo helps others find it.&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;&lt;a href="https://github.com/zkzkGamal/Agentic-AI-Tutorial" rel="noopener noreferrer"&gt;github.com/zkzkGamal/Agentic-AI-Tutorial&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Happy to answer questions in the comments — what agentic patterns are you building?&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>langchain</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Building Strong ML Foundations: Chapter 2 - Classification is Now Live</title>
      <dc:creator>zkaria gamal</dc:creator>
      <pubDate>Sun, 10 May 2026 08:58:15 +0000</pubDate>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8/building-strong-ml-foundations-chapter-2-classification-is-now-live-2j5</link>
      <guid>https://dev.to/zkaria_gamal_3cddbbff21c8/building-strong-ml-foundations-chapter-2-classification-is-now-live-2j5</guid>
      <description>&lt;p&gt;A few weeks ago I published Chapter 1 of my hands-on AI tutorial series, focused on Regression. Today, I'm excited to share that &lt;strong&gt;Chapter 2: Classification&lt;/strong&gt; is complete.&lt;/p&gt;

&lt;p&gt;This series isn't just another collection of notebook tutorials. I'm building it to truly understand how these algorithms work under the hood — implementing them from scratch where it makes sense, comparing them properly, and focusing on concepts that actually matter in interviews and real projects.&lt;/p&gt;

&lt;h3&gt;
  
  
  What’s in Chapter 2
&lt;/h3&gt;

&lt;p&gt;I implemented and analyzed five core classification algorithms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Logistic Regression&lt;/strong&gt; (implemented from scratch with NumPy, plus scikit-learn version)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;K-Nearest Neighbors (KNN) Classifier&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Random Forest Classifier&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;XGBoost Classifier&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Support Vector Classifier (SVC)&lt;/strong&gt; with different kernels&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Focus Areas
&lt;/h3&gt;

&lt;p&gt;This chapter goes deeper than just training models. I spent a lot of time on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Visualizing decision boundaries for each algorithm&lt;/li&gt;
&lt;li&gt;Understanding probability estimates and calibration&lt;/li&gt;
&lt;li&gt;Bias-variance tradeoff in classification problems&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Precision vs Recall&lt;/strong&gt; — one of the most important topics for ML interviews. I dedicated a good portion explaining when to optimize for precision, when to prioritize recall, and how to use F1-score effectively depending on the problem.&lt;/li&gt;
&lt;li&gt;Confusion matrices, ROC-AUC, and proper model evaluation&lt;/li&gt;
&lt;li&gt;Why ensemble methods (Random Forest and XGBoost) consistently outperform single models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything is implemented cleanly using NumPy, scikit-learn, and XGBoost, with real datasets and detailed explanations.&lt;/p&gt;

&lt;p&gt;You can check out the full chapter here:&lt;br&gt;&lt;br&gt;
&lt;strong&gt;→&lt;/strong&gt; &lt;a href="https://github.com/zkzkGamal/hands-on-ai-tutorial/tree/main/ml_fundamentals/chapter2" rel="noopener noreferrer"&gt;https://github.com/zkzkGamal/hands-on-ai-tutorial/tree/main/ml_fundamentals/chapter2&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Chapter 1 (Regression) is available in the same repository.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why I’m Doing This Publicly
&lt;/h3&gt;

&lt;p&gt;I got tired of only knowing how to call &lt;code&gt;model.fit()&lt;/code&gt; without understanding what was happening inside. This project is my way of forcing myself to learn deeply while creating a resource that can help others who want the same.&lt;/p&gt;

&lt;p&gt;If you're a developer transitioning into ML, preparing for machine learning interviews, or simply want stronger fundamentals, I believe this series can be useful.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's Next?
&lt;/h3&gt;

&lt;p&gt;I'm planning Chapter 3 soon. I'm thinking about &lt;strong&gt;Dimensionality Reduction (PCA, t-SNE, UMAP)&lt;/strong&gt; or &lt;strong&gt;Advanced Model Evaluation &amp;amp; Hyperparameter Tuning&lt;/strong&gt;. Let me know in the comments what you'd like to see next.&lt;/p&gt;

&lt;p&gt;Feedback is always welcome — whether it's about the code, explanations, or structure.&lt;/p&gt;

&lt;p&gt;Happy to connect if you're on a similar learning journey.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxvv26811iumef90gvz58.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxvv26811iumef90gvz58.png" alt=" " width="800" height="494"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>machinelearning</category>
      <category>classification</category>
      <category>ai</category>
    </item>
    <item>
      <title>I Built My Own Hands-on AI Tutorial – Chapter 1: Regression (From Scratch + XGBoost)</title>
      <dc:creator>zkaria gamal</dc:creator>
      <pubDate>Tue, 05 May 2026 13:41:41 +0000</pubDate>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8/i-built-my-own-hands-on-ai-tutorial-chapter-1-regression-from-scratch-xgboost-273h</link>
      <guid>https://dev.to/zkaria_gamal_3cddbbff21c8/i-built-my-own-hands-on-ai-tutorial-chapter-1-regression-from-scratch-xgboost-273h</guid>
      <description>&lt;p&gt;&lt;strong&gt;A few weeks ago, I revisited my old AI/ML projects.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As I looked through the code, I felt something was missing. I was using models like &lt;code&gt;RandomForestRegressor&lt;/code&gt; and &lt;code&gt;XGBRegressor&lt;/code&gt;, getting decent results… but I didn’t feel I &lt;em&gt;truly understood&lt;/em&gt; what was happening under the hood.&lt;/p&gt;

&lt;p&gt;So I made a decision:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Instead of consuming more tutorials, I would &lt;strong&gt;build my own comprehensive Hands-on AI Tutorial&lt;/strong&gt; — first for myself, and then for the community.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Today, I’m happy to announce that &lt;strong&gt;Chapter 1: Regression is complete&lt;/strong&gt;! 🎉&lt;/p&gt;

&lt;h3&gt;
  
  
  What’s Inside Chapter 1
&lt;/h3&gt;

&lt;p&gt;I implemented and compared &lt;strong&gt;5 different regression techniques&lt;/strong&gt; on real-world datasets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Linear Regression&lt;/strong&gt; — Implemented from scratch using the Normal Equation (NumPy only)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Decision Tree Regression&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Random Forest Regression&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;XGBoost Regression&lt;/strong&gt; — This one consistently delivered impressive performance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Support Vector Regression (SVR)&lt;/strong&gt; with linear, RBF, and polynomial kernels&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For every algorithm, I did the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Built a from-scratch version (where applicable)&lt;/li&gt;
&lt;li&gt;Compared it with the industry library version (scikit-learn / XGBoost)&lt;/li&gt;
&lt;li&gt;Explained the math intuitively&lt;/li&gt;
&lt;li&gt;Ran experiments on multiple datasets (House Prices, Life Expectancy, Advertising, Student Performance, etc.)&lt;/li&gt;
&lt;li&gt;Evaluated using MSE, RMSE, R², and residual plots&lt;/li&gt;
&lt;li&gt;Generated visualizations and saved models&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Learnings
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Why simple Linear Regression is still a powerful baseline&lt;/li&gt;
&lt;li&gt;How Decision Trees can overfit and why ensembles (Random Forest &amp;amp; XGBoost) fix many of those issues&lt;/li&gt;
&lt;li&gt;The real power of &lt;strong&gt;boosting&lt;/strong&gt; vs &lt;strong&gt;bagging&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;The importance of hyperparameter tuning and model evaluation&lt;/li&gt;
&lt;li&gt;How kernels work in SVR&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most satisfying moment was watching &lt;strong&gt;XGBoost and Random Forest&lt;/strong&gt; outperform everything else — and finally understanding &lt;em&gt;why&lt;/em&gt; that happens.&lt;/p&gt;

&lt;h3&gt;
  
  
  Project Structure (Clean &amp;amp; Practical)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ml_fundamentals/chapter1/
├── notebooks/          &lt;span class="c"&gt;# Interactive Jupyter Notebook&lt;/span&gt;
├── src/                &lt;span class="c"&gt;# From-scratch implementations&lt;/span&gt;
├── docs/               &lt;span class="c"&gt;# Deep math explanations&lt;/span&gt;
├── configs/            &lt;span class="c"&gt;# Easy-to-modify YAML configs&lt;/span&gt;
├── data/               &lt;span class="c"&gt;# Real datasets&lt;/span&gt;
├── results/            &lt;span class="c"&gt;# Plots + reports&lt;/span&gt;
└── models/             &lt;span class="c"&gt;# Saved models&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Who Is This For?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Beginners who know Python and want to start ML properly&lt;/li&gt;
&lt;li&gt;Juniors who want to move from “copy-paste” to deep understanding&lt;/li&gt;
&lt;li&gt;Anyone who wants both theory and practical code in one place&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Try It Yourself
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Repository:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://github.com/zkzkGamal/hands-on-ai-tutorial" rel="noopener noreferrer"&gt;https://github.com/zkzkGamal/hands-on-ai-tutorial&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Just clone, install the dependencies, and start with the Chapter 1 notebook.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/zkzkGamal/hands-on-ai-tutorial.git
&lt;span class="nb"&gt;cd &lt;/span&gt;hands-on-ai-tutorial
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I’m already working on &lt;strong&gt;Chapter 2: Classification&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyrsb5v0zy7gcttvu6oy0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyrsb5v0zy7gcttvu6oy0.png" alt="Model Comparetion" width="800" height="335"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>tutorial</category>
      <category>python</category>
    </item>
    <item>
      <title>I Spent 4 Hours Fixing Broken Imports – So I Built a Complete Agentic AI Tutorial</title>
      <dc:creator>zkaria gamal</dc:creator>
      <pubDate>Mon, 06 Apr 2026 08:06:23 +0000</pubDate>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8/i-spent-4-hours-fixing-broken-imports-so-i-built-a-complete-agentic-ai-tutorial-5eee</link>
      <guid>https://dev.to/zkaria_gamal_3cddbbff21c8/i-spent-4-hours-fixing-broken-imports-so-i-built-a-complete-agentic-ai-tutorial-5eee</guid>
      <description>&lt;p&gt;*&lt;em&gt;A true story from last month: *&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I was building an intelligent agent using &lt;strong&gt;LangGraph + MCP&lt;/strong&gt;, and I asked Claude for the latest code to implement a Multi-Node Agent.&lt;/p&gt;

&lt;p&gt;It gave me a clean-looking code. I copied it, ran it… &lt;strong&gt;Import error.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I went to GPT-4o. Different code, but still outdated imports.&lt;br&gt;&lt;br&gt;
Tried Gemini. Same problem.&lt;/p&gt;

&lt;p&gt;I lost &lt;strong&gt;over 4 hours&lt;/strong&gt; tweaking imports, updating StateGraph, digging through LangChain's changelog… until I hit complete frustration.&lt;/p&gt;

&lt;p&gt;That's when I said: &lt;em&gt;Enough.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I decided to do something completely different.&lt;/p&gt;

&lt;p&gt;I started collecting &lt;strong&gt;only the modern code that actually works in 2026&lt;/strong&gt;, tested it myself, fixed what was broken, and organized everything in one place.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The result?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
🔥 &lt;a href="https://github.com/zkzkGamal/Agentic-AI-Tutorial" rel="noopener noreferrer"&gt;&lt;strong&gt;Agentic AI Tutorial&lt;/strong&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;
A complete, up-to-date reference for building Agentic AI using the latest versions of &lt;strong&gt;LangChain + LangGraph + MCP&lt;/strong&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  Why I built this repo
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;To end the daily struggle of "outdated/copied-pasted-broken code"&lt;/li&gt;
&lt;li&gt;To help you build powerful agents without wasting hours on debugging&lt;/li&gt;
&lt;li&gt;To give you &lt;strong&gt;one trusted, updated reference&lt;/strong&gt; where everything actually runs&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  What's inside?
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Chapter&lt;/th&gt;
&lt;th&gt;Content&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;LLM basics + Streaming + Advanced Prompts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;LangChain LCEL + Tools + Chains&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Advanced Memory + Full RAG (Chroma &amp;amp; FAISS)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Advanced LangGraph (ReAct, Router, Multi-Agent, Self-Refine, Human-in-the-Loop)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Complete MCP + FastMCP Server + Multi-Node Agent System (Router → Execution → Summary)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Everything is available as &lt;strong&gt;Jupyter Notebooks + Python files&lt;/strong&gt;, ready to run.&lt;br&gt;&lt;br&gt;
Works &lt;strong&gt;locally&lt;/strong&gt; (Ollama) and &lt;strong&gt;cloud&lt;/strong&gt; (GPT-4o, Gemini).&lt;/p&gt;




&lt;h3&gt;
  
  
  A question for you:
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Are you already at Chapter 5, or still at the beginning?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Drop a comment below 👇&lt;/p&gt;

&lt;h1&gt;
  
  
  AgenticAI&lt;code&gt;&lt;/code&gt;#LangChain&lt;code&gt;&lt;/code&gt;#LangGraph&lt;code&gt;&lt;/code&gt;#MCP&lt;code&gt;&lt;/code&gt;#Python&lt;code&gt;&lt;/code&gt;#LLM&lt;code&gt;&lt;/code&gt;#GenerativeAI&lt;code&gt;&lt;/code&gt;#Opensource
&lt;/h1&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>tutorial</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
