<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: zkaria gamal</title>
    <description>The latest articles on DEV Community by zkaria gamal (@zkaria_gamal_3cddbbff21c8).</description>
    <link>https://dev.to/zkaria_gamal_3cddbbff21c8</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3769631%2F3d68bd01-7a2c-4665-9e8b-5c879b3811e5.jpg</url>
      <title>DEV Community: zkaria gamal</title>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/zkaria_gamal_3cddbbff21c8"/>
    <language>en</language>
    <item>
      <title>How I Built a Zero-Shared-State Auth Middleware for a Real-Time Voice AI Agent (WebRTC + FastMCP + Whisper)</title>
      <dc:creator>zkaria gamal</dc:creator>
      <pubDate>Mon, 25 May 2026 19:17:58 +0000</pubDate>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8/how-i-built-a-zero-shared-state-auth-middleware-for-a-real-time-voice-ai-agent-webrtc-fastmcp--56do</link>
      <guid>https://dev.to/zkaria_gamal_3cddbbff21c8/how-i-built-a-zero-shared-state-auth-middleware-for-a-real-time-voice-ai-agent-webrtc-fastmcp--56do</guid>
      <description>&lt;p&gt;I've been building an open-source real-time voice AI workspace for the past few weeks and I want to walk through the architecture decisions that were actually hard — not the happy-path stuff you see in tutorials.&lt;/p&gt;

&lt;p&gt;The stack: React client → WebRTC Python backend → FastMCP server (Whisper STT, Mail, Calendar) → transcript delivered back over a WebRTC DataChannel. The LLM orchestration layer is still in progress, but the pipeline underneath it is fully live and tested.&lt;/p&gt;

&lt;p&gt;Here's what I want to focus on: three engineering decisions that weren't obvious.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem With Securing Local Microservices&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When two services run on the same machine — in this case the WebRTC server and the MCP server — the standard advice is to put them behind a shared secret or an API key stored in an environment variable. That works, but it has failure modes: leaked &lt;code&gt;.env&lt;/code&gt; files, rotation pain, and the cognitive overhead of managing secrets across services that should be able to trust each other without a database call.&lt;/p&gt;

&lt;p&gt;I wanted something stateless and self-expiring.&lt;/p&gt;

&lt;p&gt;The solution I landed on is a time-locked hash generator. Both servers independently compute the same key by applying deterministic math to the current UTC timestamp divided by a 5-second epoch window:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_api_key&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;epoch_window&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
    &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log10&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;epoch_window&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;The&lt;/span&gt; &lt;span class="n"&gt;Starlette&lt;/span&gt; &lt;span class="n"&gt;middleware&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;MCP&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="n"&gt;recomputes&lt;/span&gt; &lt;span class="n"&gt;this&lt;/span&gt; &lt;span class="nb"&gt;hash&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt; &lt;span class="n"&gt;every&lt;/span&gt; &lt;span class="n"&gt;incoming&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;compares&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;If&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;timestamp&lt;/span&gt; &lt;span class="n"&gt;window&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;off&lt;/span&gt; &lt;span class="n"&gt;by&lt;/span&gt; &lt;span class="n"&gt;more&lt;/span&gt; &lt;span class="n"&gt;than&lt;/span&gt; &lt;span class="n"&gt;one&lt;/span&gt; &lt;span class="n"&gt;epoch&lt;/span&gt; &lt;span class="err"&gt;—&lt;/span&gt; &lt;span class="n"&gt;five&lt;/span&gt; &lt;span class="n"&gt;seconds&lt;/span&gt; &lt;span class="err"&gt;—&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;rejected&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;No&lt;/span&gt; &lt;span class="n"&gt;database&lt;/span&gt; &lt;span class="n"&gt;lookup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;No&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;No&lt;/span&gt; &lt;span class="n"&gt;rotation&lt;/span&gt; &lt;span class="n"&gt;script&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;The&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="n"&gt;rotates&lt;/span&gt; &lt;span class="n"&gt;itself&lt;/span&gt; &lt;span class="n"&gt;every&lt;/span&gt; &lt;span class="n"&gt;five&lt;/span&gt; &lt;span class="n"&gt;seconds&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;both&lt;/span&gt; &lt;span class="n"&gt;sides&lt;/span&gt; &lt;span class="n"&gt;always&lt;/span&gt; &lt;span class="n"&gt;agree&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt; &lt;span class="n"&gt;what&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt; &lt;span class="n"&gt;should&lt;/span&gt; &lt;span class="n"&gt;be&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="n"&gt;This&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;production&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;grade&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;internet&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;facing&lt;/span&gt; &lt;span class="nf"&gt;services &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TOTP&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;proper&lt;/span&gt; &lt;span class="n"&gt;shared&lt;/span&gt; &lt;span class="n"&gt;seed&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;better&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;that&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;but&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;securing&lt;/span&gt; &lt;span class="n"&gt;local&lt;/span&gt; &lt;span class="n"&gt;inter&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt; &lt;span class="n"&gt;communication&lt;/span&gt; &lt;span class="n"&gt;during&lt;/span&gt; &lt;span class="n"&gt;development&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;staging&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;clean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;auditable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;has&lt;/span&gt; &lt;span class="n"&gt;zero&lt;/span&gt; &lt;span class="n"&gt;ops&lt;/span&gt; &lt;span class="n"&gt;overhead&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;The&lt;/span&gt; &lt;span class="n"&gt;Dual&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;Rate&lt;/span&gt; &lt;span class="n"&gt;Audio&lt;/span&gt; &lt;span class="n"&gt;Pipeline&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;

&lt;span class="n"&gt;WebRTC&lt;/span&gt; &lt;span class="n"&gt;gives&lt;/span&gt; &lt;span class="n"&gt;you&lt;/span&gt; &lt;span class="n"&gt;audio&lt;/span&gt; &lt;span class="n"&gt;at&lt;/span&gt; &lt;span class="mi"&gt;48&lt;/span&gt;&lt;span class="n"&gt;kHz&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;Whisper&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;happiest&lt;/span&gt; &lt;span class="n"&gt;at&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="n"&gt;kHz&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="sb"&gt;`webrtcvad`&lt;/span&gt; &lt;span class="n"&gt;only&lt;/span&gt; &lt;span class="n"&gt;accepts&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="n"&gt;kHz&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;Feeding&lt;/span&gt; &lt;span class="n"&gt;everything&lt;/span&gt; &lt;span class="n"&gt;through&lt;/span&gt; &lt;span class="n"&gt;one&lt;/span&gt; &lt;span class="n"&gt;sample&lt;/span&gt; &lt;span class="n"&gt;rate&lt;/span&gt; &lt;span class="n"&gt;loses&lt;/span&gt; &lt;span class="n"&gt;either&lt;/span&gt; &lt;span class="n"&gt;fidelity&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;transcription&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;compatibility&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;VAD&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="n"&gt;The&lt;/span&gt; &lt;span class="n"&gt;backend&lt;/span&gt; &lt;span class="n"&gt;handles&lt;/span&gt; &lt;span class="n"&gt;both&lt;/span&gt; &lt;span class="n"&gt;independently&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;same&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="n"&gt;ms&lt;/span&gt; &lt;span class="n"&gt;processing&lt;/span&gt; &lt;span class="n"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;The&lt;/span&gt; &lt;span class="n"&gt;full&lt;/span&gt; &lt;span class="mi"&gt;48&lt;/span&gt;&lt;span class="n"&gt;kHz&lt;/span&gt; &lt;span class="n"&gt;PCM&lt;/span&gt; &lt;span class="nb"&gt;buffer&lt;/span&gt; &lt;span class="n"&gt;accumulates&lt;/span&gt; &lt;span class="n"&gt;separately&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;Whisper&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt; &lt;span class="n"&gt;parallel&lt;/span&gt; &lt;span class="n"&gt;downsampled&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="n"&gt;kHz&lt;/span&gt; &lt;span class="n"&gt;frame&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt; &lt;span class="n"&gt;feeds&lt;/span&gt; &lt;span class="sb"&gt;`webrtcvad`&lt;/span&gt; &lt;span class="n"&gt;at&lt;/span&gt; &lt;span class="n"&gt;aggressiveness&lt;/span&gt; &lt;span class="n"&gt;level&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt; &lt;span class="n"&gt;sliding&lt;/span&gt; &lt;span class="n"&gt;window&lt;/span&gt; &lt;span class="n"&gt;tracks&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;ratio&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;active&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;silent&lt;/span&gt; &lt;span class="n"&gt;frames&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;When&lt;/span&gt; &lt;span class="n"&gt;fewer&lt;/span&gt; &lt;span class="n"&gt;than&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="n"&gt;frames&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;last&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt; &lt;span class="n"&gt;seconds&lt;/span&gt; &lt;span class="n"&gt;are&lt;/span&gt; &lt;span class="n"&gt;active&lt;/span&gt; &lt;span class="err"&gt;—&lt;/span&gt; &lt;span class="n"&gt;that&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s the boundary

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
python&lt;br&gt;
SILENCE_RATIO_THRESHOLD = 0.1&lt;br&gt;
SILENCE_DURATION_SECONDS = 2.0&lt;/p&gt;

&lt;p&gt;active_frames = sum(vad_window)&lt;br&gt;
total_frames = len(vad_window)&lt;br&gt;
if active_frames / total_frames &amp;lt; SILENCE_RATIO_THRESHOLD:&lt;br&gt;
    trigger_pipeline()&lt;/p&gt;

&lt;p&gt;Splitting the buffers means you get high-quality STT input and accurate VAD detection without either compromising the other.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Service Singletons and the Cold Start Problem&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Whisper is not fast to load. If you initialize the model on the first request, your first transcription takes 3–6 seconds depending on hardware. Every user who speaks first gets a broken experience.&lt;/p&gt;

&lt;p&gt;The fix is a &lt;code&gt;LoadModelService&lt;/code&gt; singleton that runs at server startup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LoadModelService&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="nd"&gt;@classmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_model&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;whisper&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;small&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_model&lt;/span&gt;

&lt;span class="n"&gt;This&lt;/span&gt; &lt;span class="n"&gt;gets&lt;/span&gt; &lt;span class="n"&gt;called&lt;/span&gt; &lt;span class="n"&gt;inside&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;FastMCP&lt;/span&gt; &lt;span class="n"&gt;lifespan&lt;/span&gt; &lt;span class="n"&gt;hook&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;so&lt;/span&gt; &lt;span class="n"&gt;by&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;first&lt;/span&gt; &lt;span class="n"&gt;WebSocket&lt;/span&gt; &lt;span class="n"&gt;connection&lt;/span&gt; &lt;span class="n"&gt;arrives&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;already&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;Every&lt;/span&gt; &lt;span class="n"&gt;subsequent&lt;/span&gt; &lt;span class="n"&gt;transcription&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt; &lt;span class="n"&gt;hits&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;warm&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="n"&gt;The&lt;/span&gt; &lt;span class="n"&gt;same&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt; &lt;span class="n"&gt;applies&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;mail&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;calendar&lt;/span&gt; &lt;span class="n"&gt;services&lt;/span&gt; &lt;span class="err"&gt;—&lt;/span&gt; &lt;span class="n"&gt;singletons&lt;/span&gt; &lt;span class="n"&gt;initialized&lt;/span&gt; &lt;span class="n"&gt;once&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reused&lt;/span&gt; &lt;span class="n"&gt;across&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="n"&gt;rate&lt;/span&gt; &lt;span class="nf"&gt;limiter &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;Gmail&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;sitting&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;front&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;anything&lt;/span&gt; &lt;span class="n"&gt;that&lt;/span&gt; &lt;span class="n"&gt;touches&lt;/span&gt; &lt;span class="n"&gt;an&lt;/span&gt; &lt;span class="n"&gt;external&lt;/span&gt; &lt;span class="n"&gt;API&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;The&lt;/span&gt; &lt;span class="n"&gt;Pytest&lt;/span&gt; &lt;span class="n"&gt;Suite&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;

&lt;span class="n"&gt;You&lt;/span&gt; &lt;span class="n"&gt;can&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t calibrate a VAD pipeline without tests. The suite covers:

- Frame decimation accuracy at different sample rates
- Speech onset boundary detection under various silence patterns
- SMTP integration with mock SMTP server
- Calendar tool with automatic `.ics` fallback when no calendar service is configured

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;[ RUN  ] test_frame_decimation_48k_to_16k&lt;br&gt;
[ OK   ] test_frame_decimation_48k_to_16k&lt;br&gt;
[ RUN  ] test_vad_silence_boundary_2s&lt;br&gt;
[ OK   ] test_vad_silence_boundary_2s&lt;br&gt;
[ RUN  ] test_smtp_send_integration&lt;br&gt;
[ OK   ] test_smtp_send_integration&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;


Running `pytest tests/ -v` from the `mcp/` directory gives you live output with real pass/fail visibility — not just a summary at the end.

**What's Next**

The LLM orchestration and conversation routing layer is actively in development. Once that's in, the full loop closes: speech → STT → LLM agent → tool use → response.

The entire codebase is open source and structured as an educational reference for WebRTC, MCP, and secure microservices. If you're building anything in this space — voice agents, real-time audio pipelines, MCP tool servers — I'd love contributions, issues, or just a look.


![Stt Flow](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/dapxxg3ypbj526ci35oh.jpeg)

GitHub: https://github.com/zkzkGamal/AI-RTC-Agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>webrtc</category>
      <category>python</category>
      <category>machinelearning</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I built a fully tested Agentic AI system with LangGraph + MCP and open-sourced the whole thing</title>
      <dc:creator>zkaria gamal</dc:creator>
      <pubDate>Mon, 18 May 2026 10:24:32 +0000</pubDate>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8/i-built-a-fully-tested-agentic-ai-system-with-langgraph-mcp-and-open-sourced-the-whole-thing-3gb3</link>
      <guid>https://dev.to/zkaria_gamal_3cddbbff21c8/i-built-a-fully-tested-agentic-ai-system-with-langgraph-mcp-and-open-sourced-the-whole-thing-3gb3</guid>
      <description>&lt;p&gt;Most LLM tutorials stop at "here's how to call the OpenAI API."&lt;/p&gt;

&lt;p&gt;Mine doesn't.&lt;/p&gt;

&lt;p&gt;I just shipped &lt;strong&gt;v1.1.0&lt;/strong&gt; of &lt;a href="https://github.com/zkzkGamal/Agentic-AI-Tutorial" rel="noopener noreferrer"&gt;Agentic AI Tutorial&lt;/a&gt; — a 5-chapter open-source repo that takes you from your first raw API call all the way to a &lt;strong&gt;production-style multi-node autonomous agent&lt;/strong&gt; with a CI pipeline, pytest suite, and MCP server integration.&lt;/p&gt;

&lt;p&gt;Here's what's inside and why I built it the way I did.&lt;/p&gt;




&lt;h2&gt;
  
  
  🏗️ The Architecture (Chapter 5)
&lt;/h2&gt;

&lt;p&gt;The final agent uses a &lt;strong&gt;LangGraph StateGraph&lt;/strong&gt; with 4 decoupled nodes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Router&lt;/strong&gt; — classifies user intent with a cheap, fast LLM call&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execute&lt;/strong&gt; — runs a LangChain ReAct agent bound to a local FastMCP server&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Summarize&lt;/strong&gt; — converts raw tool JSON into natural language&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Conversation&lt;/strong&gt; — handles chitchat directly, skipping tool execution entirely&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd9a1x8csb5c9qo7uzl31.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd9a1x8csb5c9qo7uzl31.png" alt="Architecture" width="295" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The MCP server exposes math and email tools over SSE. The agent never touches your credentials directly — it talks to the server, which acts as a secure boundary.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧪 Why I Added Tests to an AI Project
&lt;/h2&gt;

&lt;p&gt;Here's the uncomfortable truth about agentic systems: &lt;strong&gt;they don't fail loudly. They drift.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Change one node prompt, and suddenly the router misclassifies 20% of requests. No exception thrown. No stack trace. Just wrong output that you may not catch until a user reports it.&lt;/p&gt;

&lt;p&gt;So v1.1.0 ships with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;pytest suite&lt;/strong&gt; that validates each node's logic and MCP tool contracts independently — no live API calls needed&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;GitHub Actions CI workflow&lt;/strong&gt; that runs on every push across multiple Python versions&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;custom &lt;code&gt;conftest.py&lt;/code&gt;&lt;/strong&gt; reporter that gives real-time output with zero buffering lag
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pytest Chapter5/SimpleChatAgent/ &lt;span class="nt"&gt;-v&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  📚 Full Roadmap (All 5 Chapters)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Chapter&lt;/th&gt;
&lt;th&gt;Focus&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;LLM fundamentals — OpenAI, Gemini, Ollama, streaming&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;LangChain, LCEL, chains, tool binding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Memory, entity tracking, RAG with Chroma/FAISS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;LangGraph agents — ReAct, Router, Multi-Agent, Human-in-the-Loop&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Multi-node agent + FastMCP Server + CI/pytest&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🚀 Get Started in 3 Commands
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/zkzkGamal/Agentic-AI-Tutorial.git
&lt;span class="nb"&gt;cd &lt;/span&gt;Agentic-AI-Tutorial
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each chapter has its own &lt;code&gt;.env.example&lt;/code&gt;. Ollama users can run everything &lt;strong&gt;100% locally, no API keys needed&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;If this saves you time or teaches you something new, a ⭐ on the repo helps others find it.&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;&lt;a href="https://github.com/zkzkGamal/Agentic-AI-Tutorial" rel="noopener noreferrer"&gt;github.com/zkzkGamal/Agentic-AI-Tutorial&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Happy to answer questions in the comments — what agentic patterns are you building?&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>langchain</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Building Strong ML Foundations: Chapter 2 - Classification is Now Live</title>
      <dc:creator>zkaria gamal</dc:creator>
      <pubDate>Sun, 10 May 2026 08:58:15 +0000</pubDate>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8/building-strong-ml-foundations-chapter-2-classification-is-now-live-2j5</link>
      <guid>https://dev.to/zkaria_gamal_3cddbbff21c8/building-strong-ml-foundations-chapter-2-classification-is-now-live-2j5</guid>
      <description>&lt;p&gt;A few weeks ago I published Chapter 1 of my hands-on AI tutorial series, focused on Regression. Today, I'm excited to share that &lt;strong&gt;Chapter 2: Classification&lt;/strong&gt; is complete.&lt;/p&gt;

&lt;p&gt;This series isn't just another collection of notebook tutorials. I'm building it to truly understand how these algorithms work under the hood — implementing them from scratch where it makes sense, comparing them properly, and focusing on concepts that actually matter in interviews and real projects.&lt;/p&gt;

&lt;h3&gt;
  
  
  What’s in Chapter 2
&lt;/h3&gt;

&lt;p&gt;I implemented and analyzed five core classification algorithms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Logistic Regression&lt;/strong&gt; (implemented from scratch with NumPy, plus scikit-learn version)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;K-Nearest Neighbors (KNN) Classifier&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Random Forest Classifier&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;XGBoost Classifier&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Support Vector Classifier (SVC)&lt;/strong&gt; with different kernels&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Focus Areas
&lt;/h3&gt;

&lt;p&gt;This chapter goes deeper than just training models. I spent a lot of time on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Visualizing decision boundaries for each algorithm&lt;/li&gt;
&lt;li&gt;Understanding probability estimates and calibration&lt;/li&gt;
&lt;li&gt;Bias-variance tradeoff in classification problems&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Precision vs Recall&lt;/strong&gt; — one of the most important topics for ML interviews. I dedicated a good portion explaining when to optimize for precision, when to prioritize recall, and how to use F1-score effectively depending on the problem.&lt;/li&gt;
&lt;li&gt;Confusion matrices, ROC-AUC, and proper model evaluation&lt;/li&gt;
&lt;li&gt;Why ensemble methods (Random Forest and XGBoost) consistently outperform single models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything is implemented cleanly using NumPy, scikit-learn, and XGBoost, with real datasets and detailed explanations.&lt;/p&gt;

&lt;p&gt;You can check out the full chapter here:&lt;br&gt;&lt;br&gt;
&lt;strong&gt;→&lt;/strong&gt; &lt;a href="https://github.com/zkzkGamal/hands-on-ai-tutorial/tree/main/ml_fundamentals/chapter2" rel="noopener noreferrer"&gt;https://github.com/zkzkGamal/hands-on-ai-tutorial/tree/main/ml_fundamentals/chapter2&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Chapter 1 (Regression) is available in the same repository.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why I’m Doing This Publicly
&lt;/h3&gt;

&lt;p&gt;I got tired of only knowing how to call &lt;code&gt;model.fit()&lt;/code&gt; without understanding what was happening inside. This project is my way of forcing myself to learn deeply while creating a resource that can help others who want the same.&lt;/p&gt;

&lt;p&gt;If you're a developer transitioning into ML, preparing for machine learning interviews, or simply want stronger fundamentals, I believe this series can be useful.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's Next?
&lt;/h3&gt;

&lt;p&gt;I'm planning Chapter 3 soon. I'm thinking about &lt;strong&gt;Dimensionality Reduction (PCA, t-SNE, UMAP)&lt;/strong&gt; or &lt;strong&gt;Advanced Model Evaluation &amp;amp; Hyperparameter Tuning&lt;/strong&gt;. Let me know in the comments what you'd like to see next.&lt;/p&gt;

&lt;p&gt;Feedback is always welcome — whether it's about the code, explanations, or structure.&lt;/p&gt;

&lt;p&gt;Happy to connect if you're on a similar learning journey.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxvv26811iumef90gvz58.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxvv26811iumef90gvz58.png" alt=" " width="800" height="494"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>machinelearning</category>
      <category>classification</category>
      <category>ai</category>
    </item>
    <item>
      <title>I Built My Own Hands-on AI Tutorial – Chapter 1: Regression (From Scratch + XGBoost)</title>
      <dc:creator>zkaria gamal</dc:creator>
      <pubDate>Tue, 05 May 2026 13:41:41 +0000</pubDate>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8/i-built-my-own-hands-on-ai-tutorial-chapter-1-regression-from-scratch-xgboost-273h</link>
      <guid>https://dev.to/zkaria_gamal_3cddbbff21c8/i-built-my-own-hands-on-ai-tutorial-chapter-1-regression-from-scratch-xgboost-273h</guid>
      <description>&lt;p&gt;&lt;strong&gt;A few weeks ago, I revisited my old AI/ML projects.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As I looked through the code, I felt something was missing. I was using models like &lt;code&gt;RandomForestRegressor&lt;/code&gt; and &lt;code&gt;XGBRegressor&lt;/code&gt;, getting decent results… but I didn’t feel I &lt;em&gt;truly understood&lt;/em&gt; what was happening under the hood.&lt;/p&gt;

&lt;p&gt;So I made a decision:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Instead of consuming more tutorials, I would &lt;strong&gt;build my own comprehensive Hands-on AI Tutorial&lt;/strong&gt; — first for myself, and then for the community.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Today, I’m happy to announce that &lt;strong&gt;Chapter 1: Regression is complete&lt;/strong&gt;! 🎉&lt;/p&gt;

&lt;h3&gt;
  
  
  What’s Inside Chapter 1
&lt;/h3&gt;

&lt;p&gt;I implemented and compared &lt;strong&gt;5 different regression techniques&lt;/strong&gt; on real-world datasets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Linear Regression&lt;/strong&gt; — Implemented from scratch using the Normal Equation (NumPy only)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Decision Tree Regression&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Random Forest Regression&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;XGBoost Regression&lt;/strong&gt; — This one consistently delivered impressive performance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Support Vector Regression (SVR)&lt;/strong&gt; with linear, RBF, and polynomial kernels&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For every algorithm, I did the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Built a from-scratch version (where applicable)&lt;/li&gt;
&lt;li&gt;Compared it with the industry library version (scikit-learn / XGBoost)&lt;/li&gt;
&lt;li&gt;Explained the math intuitively&lt;/li&gt;
&lt;li&gt;Ran experiments on multiple datasets (House Prices, Life Expectancy, Advertising, Student Performance, etc.)&lt;/li&gt;
&lt;li&gt;Evaluated using MSE, RMSE, R², and residual plots&lt;/li&gt;
&lt;li&gt;Generated visualizations and saved models&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Learnings
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Why simple Linear Regression is still a powerful baseline&lt;/li&gt;
&lt;li&gt;How Decision Trees can overfit and why ensembles (Random Forest &amp;amp; XGBoost) fix many of those issues&lt;/li&gt;
&lt;li&gt;The real power of &lt;strong&gt;boosting&lt;/strong&gt; vs &lt;strong&gt;bagging&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;The importance of hyperparameter tuning and model evaluation&lt;/li&gt;
&lt;li&gt;How kernels work in SVR&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most satisfying moment was watching &lt;strong&gt;XGBoost and Random Forest&lt;/strong&gt; outperform everything else — and finally understanding &lt;em&gt;why&lt;/em&gt; that happens.&lt;/p&gt;

&lt;h3&gt;
  
  
  Project Structure (Clean &amp;amp; Practical)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ml_fundamentals/chapter1/
├── notebooks/          &lt;span class="c"&gt;# Interactive Jupyter Notebook&lt;/span&gt;
├── src/                &lt;span class="c"&gt;# From-scratch implementations&lt;/span&gt;
├── docs/               &lt;span class="c"&gt;# Deep math explanations&lt;/span&gt;
├── configs/            &lt;span class="c"&gt;# Easy-to-modify YAML configs&lt;/span&gt;
├── data/               &lt;span class="c"&gt;# Real datasets&lt;/span&gt;
├── results/            &lt;span class="c"&gt;# Plots + reports&lt;/span&gt;
└── models/             &lt;span class="c"&gt;# Saved models&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Who Is This For?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Beginners who know Python and want to start ML properly&lt;/li&gt;
&lt;li&gt;Juniors who want to move from “copy-paste” to deep understanding&lt;/li&gt;
&lt;li&gt;Anyone who wants both theory and practical code in one place&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Try It Yourself
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Repository:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://github.com/zkzkGamal/hands-on-ai-tutorial" rel="noopener noreferrer"&gt;https://github.com/zkzkGamal/hands-on-ai-tutorial&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Just clone, install the dependencies, and start with the Chapter 1 notebook.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/zkzkGamal/hands-on-ai-tutorial.git
&lt;span class="nb"&gt;cd &lt;/span&gt;hands-on-ai-tutorial
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I’m already working on &lt;strong&gt;Chapter 2: Classification&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyrsb5v0zy7gcttvu6oy0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyrsb5v0zy7gcttvu6oy0.png" alt="Model Comparetion" width="800" height="335"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>tutorial</category>
      <category>python</category>
    </item>
    <item>
      <title>I Spent 4 Hours Fixing Broken Imports – So I Built a Complete Agentic AI Tutorial</title>
      <dc:creator>zkaria gamal</dc:creator>
      <pubDate>Mon, 06 Apr 2026 08:06:23 +0000</pubDate>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8/i-spent-4-hours-fixing-broken-imports-so-i-built-a-complete-agentic-ai-tutorial-5eee</link>
      <guid>https://dev.to/zkaria_gamal_3cddbbff21c8/i-spent-4-hours-fixing-broken-imports-so-i-built-a-complete-agentic-ai-tutorial-5eee</guid>
      <description>&lt;p&gt;*&lt;em&gt;A true story from last month: *&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I was building an intelligent agent using &lt;strong&gt;LangGraph + MCP&lt;/strong&gt;, and I asked Claude for the latest code to implement a Multi-Node Agent.&lt;/p&gt;

&lt;p&gt;It gave me a clean-looking code. I copied it, ran it… &lt;strong&gt;Import error.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I went to GPT-4o. Different code, but still outdated imports.&lt;br&gt;&lt;br&gt;
Tried Gemini. Same problem.&lt;/p&gt;

&lt;p&gt;I lost &lt;strong&gt;over 4 hours&lt;/strong&gt; tweaking imports, updating StateGraph, digging through LangChain's changelog… until I hit complete frustration.&lt;/p&gt;

&lt;p&gt;That's when I said: &lt;em&gt;Enough.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I decided to do something completely different.&lt;/p&gt;

&lt;p&gt;I started collecting &lt;strong&gt;only the modern code that actually works in 2026&lt;/strong&gt;, tested it myself, fixed what was broken, and organized everything in one place.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The result?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
🔥 &lt;a href="https://github.com/zkzkGamal/Agentic-AI-Tutorial" rel="noopener noreferrer"&gt;&lt;strong&gt;Agentic AI Tutorial&lt;/strong&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;
A complete, up-to-date reference for building Agentic AI using the latest versions of &lt;strong&gt;LangChain + LangGraph + MCP&lt;/strong&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  Why I built this repo
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;To end the daily struggle of "outdated/copied-pasted-broken code"&lt;/li&gt;
&lt;li&gt;To help you build powerful agents without wasting hours on debugging&lt;/li&gt;
&lt;li&gt;To give you &lt;strong&gt;one trusted, updated reference&lt;/strong&gt; where everything actually runs&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  What's inside?
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Chapter&lt;/th&gt;
&lt;th&gt;Content&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;LLM basics + Streaming + Advanced Prompts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;LangChain LCEL + Tools + Chains&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Advanced Memory + Full RAG (Chroma &amp;amp; FAISS)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Advanced LangGraph (ReAct, Router, Multi-Agent, Self-Refine, Human-in-the-Loop)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Complete MCP + FastMCP Server + Multi-Node Agent System (Router → Execution → Summary)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Everything is available as &lt;strong&gt;Jupyter Notebooks + Python files&lt;/strong&gt;, ready to run.&lt;br&gt;&lt;br&gt;
Works &lt;strong&gt;locally&lt;/strong&gt; (Ollama) and &lt;strong&gt;cloud&lt;/strong&gt; (GPT-4o, Gemini).&lt;/p&gt;




&lt;h3&gt;
  
  
  A question for you:
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Are you already at Chapter 5, or still at the beginning?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Drop a comment below 👇&lt;/p&gt;

&lt;h1&gt;
  
  
  AgenticAI&lt;code&gt;&lt;/code&gt;#LangChain&lt;code&gt;&lt;/code&gt;#LangGraph&lt;code&gt;&lt;/code&gt;#MCP&lt;code&gt;&lt;/code&gt;#Python&lt;code&gt;&lt;/code&gt;#LLM&lt;code&gt;&lt;/code&gt;#GenerativeAI&lt;code&gt;&lt;/code&gt;#Opensource
&lt;/h1&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>tutorial</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Building Production-Ready Agentic AI: From Tutorial to High-Performance Serving (vLLM vs SGLang Benchmark)</title>
      <dc:creator>zkaria gamal</dc:creator>
      <pubDate>Sun, 29 Mar 2026 11:14:24 +0000</pubDate>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8/building-production-ready-agentic-ai-from-tutorial-to-high-performance-serving-vllm-vs-sglang-58nh</link>
      <guid>https://dev.to/zkaria_gamal_3cddbbff21c8/building-production-ready-agentic-ai-from-tutorial-to-high-performance-serving-vllm-vs-sglang-58nh</guid>
      <description>&lt;h1&gt;
  
  
  Building Production-Ready Agentic AI: From Tutorial to Real-World Serving Benchmark
&lt;/h1&gt;

&lt;p&gt;Hey devs 👋&lt;/p&gt;

&lt;p&gt;If you’ve been building &lt;strong&gt;ReAct agents&lt;/strong&gt; with LangGraph, you’ve probably faced the same question I did:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“I can build a cool agent in a tutorial… but which serving engine should I actually use in production?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s why I connected my two repositories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/zkzkGamal/Agentic-AI-Tutorial" rel="noopener noreferrer"&gt;Agentic-AI-Tutorial&lt;/a&gt;&lt;/strong&gt; → Learn how to build a full ReAct agent from scratch  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/zkzkGamal/concurrent-llm-serving" rel="noopener noreferrer"&gt;concurrent-llm-serving&lt;/a&gt;&lt;/strong&gt; → Benchmark &lt;strong&gt;vLLM vs SGLang&lt;/strong&gt; under heavy agent load&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now the two repos are linked: the &lt;strong&gt;exact same agent&lt;/strong&gt; from the tutorial is included as &lt;code&gt;simpleagent/&lt;/code&gt; inside the benchmark repo.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s Inside the Agentic AI Tutorial
&lt;/h2&gt;

&lt;p&gt;You start with a clean, production-style &lt;strong&gt;LangGraph ReAct Agent&lt;/strong&gt; that has three nodes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Conversation&lt;/strong&gt; – Handles multi-turn dialogue&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Act&lt;/strong&gt; – Calls real tools (DuckDuckGo Search + Calculator)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Summarize&lt;/strong&gt; – Processes long document context (10k+ tokens)&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Everything is explained step-by-step:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Tool calling&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Structured outputs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Memory management&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Error handling&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Repo → &lt;a href="https://github.com/zkzkGamal/Agentic-AI-Tutorial" rel="noopener noreferrer"&gt;https://github.com/zkzkGamal/Agentic-AI-Tutorial&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Missing Piece: Which Engine Should You Serve It With?
&lt;/h2&gt;

&lt;p&gt;Tutorials usually stop at “run it locally.”  &lt;/p&gt;

&lt;p&gt;I wanted to go further.&lt;/p&gt;

&lt;p&gt;So I took the &lt;strong&gt;exact same agent&lt;/strong&gt; and stress-tested it under &lt;strong&gt;3 concurrent sessions&lt;/strong&gt; (5 turns each, up to ~25,000 tokens total context) using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Model: &lt;strong&gt;Qwen3.5-0.8B&lt;/strong&gt; (single GPU)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Engines: &lt;strong&gt;vLLM&lt;/strong&gt; vs &lt;strong&gt;SGLang&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Full benchmark report is here:  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zkzkGamal/concurrent-llm-serving/blob/main/README_agent_benchmark.md" rel="noopener noreferrer"&gt;README_agent_benchmark.md&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  High-Level Results
&lt;/h3&gt;

&lt;p&gt;| Metric                        | vLLM          | SGLang         | Winner      |&lt;/p&gt;

&lt;p&gt;|-------------------------------|---------------|----------------|-------------|&lt;/p&gt;

&lt;p&gt;| Total Wall Time (3 sessions)  | &lt;strong&gt;229.8s&lt;/strong&gt;    | 255.8s         | &lt;strong&gt;vLLM&lt;/strong&gt; (-11%) |&lt;/p&gt;

&lt;p&gt;| Context Limit Errors          | &lt;strong&gt;0&lt;/strong&gt;         | 2              | &lt;strong&gt;vLLM&lt;/strong&gt;    |&lt;/p&gt;

&lt;p&gt;| Successful Sessions           | 3/3           | 3/3            | Tie         |&lt;/p&gt;

&lt;h3&gt;
  
  
  Node-Level Breakdown (this is where it gets interesting)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Act Node (Tool Calling)&lt;/strong&gt; → &lt;strong&gt;SGLang wins by 71%&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thanks to &lt;strong&gt;RadixAttention&lt;/strong&gt; prefix caching — perfect for repeated tool calls.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Summarize Node (Long Context)&lt;/strong&gt; → &lt;strong&gt;vLLM wins&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Much more stable when context balloons to 10k+ tokens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verdict:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Use &lt;strong&gt;SGLang&lt;/strong&gt; if your agents do a lot of tool calling in loops.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use &lt;strong&gt;vLLM&lt;/strong&gt; if your agents handle heavy RAG or summarization workloads.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to Use Both Repos Together (The Full Flow)
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Clone the tutorial and build your agent
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
   git clone https://github.com/zkzkGamal/Agentic-AI-Tutorial.git

   &lt;span class="nb"&gt;cd &lt;/span&gt;Agentic-AI-Tutorial

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Move to the serving benchmark repo (now includes &lt;code&gt;simpleagent/&lt;/code&gt;)
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
   git clone https://github.com/zkzkGamal/concurrent-llm-serving.git

   &lt;span class="nb"&gt;cd &lt;/span&gt;concurrent-llm-serving/simpleagent

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Run the exact same agent with either engine using the provided launch scripts.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Everything is documented — you can literally go from learning the agent pattern to benchmarking production serving in minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;Most agent tutorials leave you with a notebook.  &lt;/p&gt;

&lt;p&gt;This project gives you the &lt;strong&gt;complete pipeline&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Build the agent ✅&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Understand the serving trade-offs ✅&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Choose the right engine for your workload ✅&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deploy it at scale ✅&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Tutorial repo: &lt;a href="https://github.com/zkzkGamal/Agentic-AI-Tutorial" rel="noopener noreferrer"&gt;Agentic-AI-Tutorial&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Benchmark repo (with integrated simpleagent): &lt;a href="https://github.com/zkzkGamal/concurrent-llm-serving" rel="noopener noreferrer"&gt;concurrent-llm-serving&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftpfvsbt9o4g5ttj8illf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftpfvsbt9o4g5ttj8illf.png" alt="log demo for sglang" width="800" height="243"&gt;&lt;/a&gt;&lt;br&gt;
Would love to hear from you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;What serving engine are you using for your agents today?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Have you noticed the same trade-offs between vLLM and SGLang?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Want me to add more models / workloads / frameworks (CrewAI, AutoGen, etc.)?&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Drop your thoughts below 👇&lt;/p&gt;

&lt;p&gt;Happy building!  &lt;/p&gt;

&lt;p&gt;— zkzkGamal&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Concurrent LLM Serving: Benchmarking vLLM vs SGLang vs Ollama</title>
      <dc:creator>zkaria gamal</dc:creator>
      <pubDate>Mon, 16 Mar 2026 08:47:59 +0000</pubDate>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8/concurrent-llm-serving-benchmarking-vllm-vs-sglang-vs-ollama-1cpn</link>
      <guid>https://dev.to/zkaria_gamal_3cddbbff21c8/concurrent-llm-serving-benchmarking-vllm-vs-sglang-vs-ollama-1cpn</guid>
      <description>&lt;p&gt;I wanted to know exactly how the three most popular open-source LLM serving engines perform when &lt;strong&gt;real users hit your server at the same time&lt;/strong&gt;. So I built this educational repo and ran identical tests on a single GPU.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/zkzkGamal/concurrent-llm-serving" rel="noopener noreferrer"&gt;https://github.com/zkzkGamal/concurrent-llm-serving&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model:&lt;/strong&gt; Qwen/Qwen3.5-0.8B  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hardware:&lt;/strong&gt; Single GPU  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Concurrency:&lt;/strong&gt; 16 simultaneous requests (only 4 for Ollama)  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Task:&lt;/strong&gt; Diverse AI &amp;amp; programming questions (max_tokens=150)&lt;/p&gt;

&lt;h2&gt;
  
  
  The Results (spoiler: one engine destroys the others)
&lt;/h2&gt;

&lt;p&gt;| Engine   | Requests | Total Time | Avg per Request       | Concurrency Model          |&lt;/p&gt;

&lt;p&gt;|----------|----------|------------|-----------------------|----------------------------|&lt;/p&gt;

&lt;p&gt;| &lt;strong&gt;SGLang&lt;/strong&gt; | 16       | &lt;strong&gt;2.47s&lt;/strong&gt;  | 0.68–2.46s            | True parallel batching + RadixAttention |&lt;/p&gt;

&lt;p&gt;| &lt;strong&gt;vLLM&lt;/strong&gt;   | 16       | &lt;strong&gt;11.26s&lt;/strong&gt; | ~10.25–11.26s         | PagedAttention + continuous batching |&lt;/p&gt;

&lt;p&gt;| &lt;strong&gt;Ollama&lt;/strong&gt; | 4        | &lt;strong&gt;134.72s&lt;/strong&gt;| 26–134s               | Sequential (time-sliced)   |&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SGLang was 4.6× faster&lt;/strong&gt; than vLLM and completely smoked Ollama.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the huge difference? The Core Algorithms
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. KV-Cache &amp;amp; Memory Management
&lt;/h3&gt;

&lt;p&gt;Every LLM needs to store Key/Value vectors for previous tokens. Without smart caching, you waste VRAM and kill concurrency.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;vLLM&lt;/strong&gt; → &lt;strong&gt;PagedAttention&lt;/strong&gt; (treats KV cache like OS virtual memory pages → no fragmentation)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SGLang&lt;/strong&gt; → &lt;strong&gt;RadixAttention&lt;/strong&gt; (trie-based prefix tree → shares any common prefix across requests automatically)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Continuous Batching
&lt;/h3&gt;

&lt;p&gt;Instead of waiting for a full batch, new requests join the GPU forward pass instantly. Both vLLM and SGLang do this. Ollama does &lt;strong&gt;not&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Other Tricks
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;SGLang: Chunked prefill + custom Triton kernels + zero warm-up&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;vLLM: Broad model support + CUDA graph warm-up&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ollama: GGUF quantization + llama.cpp (great for single-user, terrible for concurrency)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When Should You Use Each Engine?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SGLang&lt;/strong&gt; → Maximum throughput, structured JSON/regex output, production serving&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;vLLM&lt;/strong&gt; → Stability, 50+ model architectures, when you need reliability over raw speed&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Ollama&lt;/strong&gt; → Quick prototyping, local development, zero-config experience&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to Reproduce the Tests Yourself
&lt;/h2&gt;

&lt;p&gt;The repo includes everything:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;install.sh&lt;/code&gt; (one-click setup)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;sglang_concurrent_test.py&lt;/code&gt; / &lt;code&gt;vllm_concurrent_test.py&lt;/code&gt; / &lt;code&gt;ollama_concurrent_test.py&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Raw logs + result markdowns&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Video demos (&lt;code&gt;test_sglang.mkv&lt;/code&gt;, &lt;code&gt;ollama_test.mkv&lt;/code&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Just clone and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
git clone https://github.com/zkzkGamal/concurrent-llm-serving

&lt;span class="nb"&gt;cd &lt;/span&gt;concurrent-llm-serving

bash install.sh

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(Full startup commands and API compatibility notes are in the README.)&lt;/p&gt;

&lt;h2&gt;
  
  
  Project Structure
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
├── sglang_concurrent_test.py     # 16 concurrent requests

├── vllm_concurrent_test.py

├── ollama_concurrent_test.py

├── install.sh

├── *_results.md                  # Formatted benchmark outputs

└── README.md                     # Full deep-dive guide

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Concurrent serving is no longer optional — it's table stakes for any serious LLM application. The difference between "works on my machine" and "handles 16 users at once" is huge, and the right engine choice can save you GPUs (and money).&lt;/p&gt;

&lt;p&gt;If you're building anything with local LLMs — agents, RAG, chat apps, etc. — I highly recommend trying SGLang first.&lt;/p&gt;

&lt;p&gt;⭐ Star the repo if you found it useful!  &lt;/p&gt;

&lt;p&gt;Feedback, PRs, and questions are all welcome.&lt;/p&gt;

&lt;p&gt;What engine are you using right now? Have you hit concurrency limits yet? Drop a comment below 👇&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq67n5on61tf9xudgnrnj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq67n5on61tf9xudgnrnj.png" alt="Output demo for sglang execution" width="800" height="525"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  llm #vllm #sglang #ollama #aiserving #machinelearning #opensource #gpu #inference
&lt;/h1&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>cloudnative</category>
    </item>
    <item>
      <title>Building a Production-Ready Agentic AI System with LangGraph and MCP</title>
      <dc:creator>zkaria gamal</dc:creator>
      <pubDate>Tue, 10 Mar 2026 12:57:31 +0000</pubDate>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8/building-a-production-ready-agentic-ai-system-with-langgraph-and-mcp-4kfh</link>
      <guid>https://dev.to/zkaria_gamal_3cddbbff21c8/building-a-production-ready-agentic-ai-system-with-langgraph-and-mcp-4kfh</guid>
      <description>&lt;p&gt;As AI engineers, we often start by giving a massive LLM (like GPT-4) a giant prompt and a long list of Python functions (tools) it can call. This monolithic approach works for simple scripts, but it quickly becomes expensive, slow, and a &lt;strong&gt;security risk&lt;/strong&gt; when the LLM has direct access to sensitive resources like email passwords or file systems.&lt;/p&gt;

&lt;p&gt;To move beyond prototypes, we need to &lt;strong&gt;separate the "thinking" from the "doing."&lt;/strong&gt; In this chapter of my open‑source tutorial, we rebuild an agentic assistant using two powerful technologies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LangGraph&lt;/strong&gt; – for routing logic across specialized nodes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt; – for secure, decoupled tool execution.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is a production‑ready, decoupled AI system that is safer, faster, and more reusable.&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://github.com/zkzkGamal/Agentic-AI-Tutorial" rel="noopener noreferrer"&gt;GitHub Repository: Agentic-AI-Tutorial&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🏗️ Architecture: Brain vs. Hands
&lt;/h2&gt;

&lt;p&gt;Instead of a single monolithic agent, our application consists of two distinct parts communicating over &lt;strong&gt;Server‑Sent Events (SSE)&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Hands: FastMCP Server
&lt;/h3&gt;

&lt;p&gt;All physical tools (math logic, Python SMTP/IMAP email operations) live in a standalone ASGI server running on port 8000. Using the &lt;strong&gt;Model Context Protocol&lt;/strong&gt; (an open standard), we expose these functions securely.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# McpServer/tools/weather.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;mcp&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_temperature&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get the current temperature for a given city.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The weather in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; is 72°F.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Thanks to MCP, the server automatically reads the docstrings and type hints, generating a standardized JSON schema. Crucially, the LangGraph agent &lt;strong&gt;never possesses your passwords&lt;/strong&gt; nor directly executes this code – it only sends requests via the protocol.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The Brain: LangGraph Orchestrator
&lt;/h3&gt;

&lt;p&gt;On the other side, we built a &lt;code&gt;StateGraph&lt;/code&gt; with distinct nodes to handle user requests efficiently.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;router.py&lt;/code&gt; (The Fast Path)&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Uses a fast, cheap LLM to classify the user's intent: &lt;code&gt;math&lt;/code&gt;, &lt;code&gt;email&lt;/code&gt;, or &lt;code&gt;conversation&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;execute.py&lt;/code&gt; (The Heavy Lifter)&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
If the router chooses a tool‑based intent, this node takes over. It uses LangChain’s &lt;code&gt;create_tool_calling_agent&lt;/code&gt; and dynamically binds to the remote MCP server.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;summarize.py&lt;/code&gt; (The Formatter)&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Takes the raw JSON output from the MCP server and uses an LLM to synthesize a polite, conversational response for the user.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;conversation.py&lt;/code&gt; (The Chitchat Fallback)&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
If the user just says “Hello,” we skip all heavy execution. This node feeds the conversation history directly to the LLM, saving tokens and time.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🚀 Why This Matters
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Security &amp;amp; Isolation
&lt;/h3&gt;

&lt;p&gt;The MCP server acts as a secure boundary. You can host the LangGraph agent in the cloud while running the MCP server on your local corporate intranet to access private databases – all without exposing credentials.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reusability
&lt;/h3&gt;

&lt;p&gt;Once you build an MCP server (like our Mail/Math server), you can plug it into &lt;strong&gt;any&lt;/strong&gt; MCP‑compatible client. The same tools work in LangChain, Claude Desktop, Cursor, or your own custom UIs – no rewriting needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Run It 100% Offline with Ollama
&lt;/h3&gt;

&lt;p&gt;Don’t want to pay for OpenAI API keys? Standardising with LangChain and MCP makes swapping LLMs trivial. You can run the entire workflow locally and for free using Ollama!&lt;/p&gt;

&lt;p&gt;Simply update the agent nodes (e.g., &lt;code&gt;router.py&lt;/code&gt;, &lt;code&gt;execute.py&lt;/code&gt;) from:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;
&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-3.5-turbo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_ollama&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOllama&lt;/span&gt;
&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOllama&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama3.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because LangChain provides unified interfaces, the node routing and tool calling continue to work seamlessly with your local Llama 3 model.&lt;/p&gt;




&lt;h2&gt;
  
  
  💻 Try It Yourself!
&lt;/h2&gt;

&lt;p&gt;All code, diagrams, and step‑by‑step instructions are available in &lt;strong&gt;Chapter 5&lt;/strong&gt; of my open‑source tutorial repository:&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://github.com/zkzkGamal/Agentic-AI-Tutorial" rel="noopener noreferrer"&gt;GitHub: Agentic-AI-Tutorial&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Clone it, spin up the FastMCP server, and watch the LangGraph nodes gracefully orchestrate tool execution!&lt;/p&gt;




&lt;h2&gt;
  
  
  📸 Demo: From User Request to Delivered Email
&lt;/h2&gt;

&lt;p&gt;Here’s a real example of the agent in action, showing the entire flow from a user asking to send an email to the confirmation that it actually arrived.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The User Request &amp;amp; Agent's Execution
&lt;/h3&gt;

&lt;p&gt;In the screenshot below, you can see the user submitting a request: "send email to Gamal saying hello from the agent". The LangGraph router correctly classifies this as an &lt;code&gt;email&lt;/code&gt; intent, and the execute node invokes the remote MCP server's email tool. The agent then reports back that the email was sent successfully.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6w69fz79h88wg5917kay.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6w69fz79h88wg5917kay.png" alt="User requesting an email and agent confirming it was sent" width="663" height="716"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Proof of Execution
&lt;/h3&gt;

&lt;p&gt;To verify that the MCP server actually performed the task—and that the agent wasn't just "hallucinating" success—we can check the real destination. The following screenshot shows the actual "Gamal" inbox with the email delivered exactly as requested. This confirms the secure, end-to-end functionality of the decoupled architecture.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flohf7lhgumaz3rvqnw1t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flohf7lhgumaz3rvqnw1t.png" alt="The actual email appearing in the recipient's inbox" width="686" height="365"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This walkthrough visually confirms that the &lt;strong&gt;brain&lt;/strong&gt; (LangGraph agent) correctly interprets the request and delegates to the &lt;strong&gt;hands&lt;/strong&gt; (MCP server), which securely performs the action without exposing any credentials or internal logic to the agent itself.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 What’s Next?
&lt;/h2&gt;

&lt;p&gt;What tools are you planning to build for your MCP servers? Let me know in the comments below – I’d love to hear about your ideas and see what you create!&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Follow me for more tutorials on production‑ready AI systems.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>tutorial</category>
      <category>python</category>
    </item>
    <item>
      <title>I Built a Local AI Agent That Plans Before Executing Linux Commands (Now Fully Dockerized)</title>
      <dc:creator>zkaria gamal</dc:creator>
      <pubDate>Tue, 03 Mar 2026 11:05:29 +0000</pubDate>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8/i-built-a-local-ai-agent-that-plans-before-executing-linux-commands-now-fully-dockerized-3dkj</link>
      <guid>https://dev.to/zkaria_gamal_3cddbbff21c8/i-built-a-local-ai-agent-that-plans-before-executing-linux-commands-now-fully-dockerized-3dkj</guid>
      <description>&lt;h2&gt;
  
  
  I Built a Local AI Agent That Plans Before Executing Linux Commands (Now Fully Dockerized)
&lt;/h2&gt;

&lt;p&gt;Most “AI agents” that run shell commands follow a simple flow:&lt;/p&gt;

&lt;p&gt;User prompt → LLM → Execute command&lt;/p&gt;

&lt;p&gt;That’s powerful.&lt;/p&gt;

&lt;p&gt;It’s also dangerous.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;ZkzkAgent&lt;/strong&gt;, a fully local Linux AI assistant that &lt;strong&gt;thinks and routes before it acts&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  🚨 The Problem with Most Terminal AI Wrappers
&lt;/h2&gt;

&lt;p&gt;A lot of open-source agents do this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Send user prompt to an LLM&lt;/li&gt;
&lt;li&gt;Generate shell command&lt;/li&gt;
&lt;li&gt;Execute immediately&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There’s:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No routing logic&lt;/li&gt;
&lt;li&gt;No conditional branching&lt;/li&gt;
&lt;li&gt;No confirmation flow&lt;/li&gt;
&lt;li&gt;No safety model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For real system environments, that’s risky.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 What Makes ZkzkAgent Different
&lt;/h2&gt;

&lt;p&gt;ZkzkAgent introduces a structured agent architecture:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User
  ↓
Router Node
  ├── Conversation Node
  ├── Retrieval Node
  └── Tool Execution Node
        ↓
  Confirmation (if needed)
        ↓
  Execution
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of blindly executing:&lt;/p&gt;

&lt;p&gt;✔ It decides &lt;em&gt;what type of task this is&lt;/em&gt;&lt;br&gt;
✔ It branches based on context&lt;br&gt;
✔ It enforces confirmation for dangerous actions&lt;br&gt;
✔ It logs and returns results back into the conversation loop&lt;/p&gt;

&lt;p&gt;Built with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LangGraph (stateful agent flow)&lt;/li&gt;
&lt;li&gt;Ollama (local LLM execution)&lt;/li&gt;
&lt;li&gt;Explicit tool safety filters&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  🔐 Safety Design Principles
&lt;/h2&gt;

&lt;p&gt;I designed ZkzkAgent with 5 rules:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;No hidden execution&lt;/li&gt;
&lt;li&gt;Human confirmation for destructive commands&lt;/li&gt;
&lt;li&gt;Deterministic routing&lt;/li&gt;
&lt;li&gt;Full local-first architecture&lt;/li&gt;
&lt;li&gt;Transparent tool layer&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This makes it suitable for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Developers&lt;/li&gt;
&lt;li&gt;Linux power users&lt;/li&gt;
&lt;li&gt;Self-hosted environments&lt;/li&gt;
&lt;li&gt;AI experimentation&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  🐳 New: Docker Support
&lt;/h2&gt;

&lt;p&gt;One of the biggest barriers to adoption was setup complexity.&lt;/p&gt;

&lt;p&gt;Now ZkzkAgent includes official Docker support.&lt;/p&gt;

&lt;p&gt;You can spin it up in a clean, isolated environment without touching your base system.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/zkzkGamal/zkzkAgent
cd zkzkAgent
docker build -t zkzkagent .
docker run -it zkzkagent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Reproducible.&lt;br&gt;
Isolated.&lt;br&gt;
Clean.&lt;/p&gt;




&lt;h2&gt;
  
  
  🎯 Why I Built This
&lt;/h2&gt;

&lt;p&gt;I’m deeply interested in agentic AI systems — not just chatbots.&lt;/p&gt;

&lt;p&gt;I wanted to experiment with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Router-based architectures&lt;/li&gt;
&lt;li&gt;Branching decision logic&lt;/li&gt;
&lt;li&gt;Human-in-the-loop safety&lt;/li&gt;
&lt;li&gt;Local execution models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of building another “AI assistant,” I focused on &lt;strong&gt;architecture control&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔮 What’s Next
&lt;/h2&gt;

&lt;p&gt;Planned improvements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More granular permission layers&lt;/li&gt;
&lt;li&gt;Plugin-style tool system&lt;/li&gt;
&lt;li&gt;Sandboxed execution modes&lt;/li&gt;
&lt;li&gt;Better observability dashboard&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  💬 Feedback Welcome
&lt;/h2&gt;

&lt;p&gt;If you’re experimenting with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI agents&lt;/li&gt;
&lt;li&gt;LangGraph workflows&lt;/li&gt;
&lt;li&gt;Local-first LLM systems&lt;/li&gt;
&lt;li&gt;OS-level automation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’d love your thoughts.&lt;/p&gt;

&lt;p&gt;Repository:&lt;br&gt;
&lt;a href="https://github.com/zkzkGamal/zkzkAgent" rel="noopener noreferrer"&gt;https://github.com/zkzkGamal/zkzkAgent&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>langchain</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>I Gave an Open-Source AI Full Access to My Linux Terminal (And Lived to Tell the Tale)</title>
      <dc:creator>zkaria gamal</dc:creator>
      <pubDate>Thu, 26 Feb 2026 11:16:55 +0000</pubDate>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8/i-gave-an-open-source-ai-full-access-to-my-linux-terminal-and-lived-to-tell-the-tale-jf1</link>
      <guid>https://dev.to/zkaria_gamal_3cddbbff21c8/i-gave-an-open-source-ai-full-access-to-my-linux-terminal-and-lived-to-tell-the-tale-jf1</guid>
      <description>&lt;h1&gt;
  
  
  The Problem with Cloud AI
&lt;/h1&gt;

&lt;p&gt;We all love ChatGPT and Claude, but there is a fundamental disconnect for developers: &lt;strong&gt;they can't actually &lt;em&gt;do&lt;/em&gt; things on your machine&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If I want an AI to install a package, troubleshoot my system logs, run a script, or clean up my temporary files, I have to constantly copy-paste commands back and forth.&lt;/p&gt;

&lt;p&gt;That felt incredibly outdated for 2026.&lt;/p&gt;

&lt;p&gt;So, I decided to build &lt;strong&gt;zkzkAgent&lt;/strong&gt;: a fully autonomous, local AI assistant designed specifically for Linux System Management. No expensive API keys, no data leaving my machine, and full terminal execution capabilities.&lt;/p&gt;

&lt;p&gt;Here is how I built it—and how you can build one too.&lt;/p&gt;

&lt;h2&gt;
  
  
  🚀 Meet zkzkAgent v3
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/zkzkGamal/zkzkAgent" rel="noopener noreferrer"&gt;zkzkAgent&lt;/a&gt; (now in v3!) is a human-in-the-loop autonomous agent that runs entirely on your local machine using &lt;strong&gt;Ollama&lt;/strong&gt;, &lt;strong&gt;LangChain&lt;/strong&gt;, and &lt;strong&gt;LangGraph&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It doesn't just answer questions. It &lt;em&gt;takes action&lt;/em&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Features:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;🛠️ Autonomous System Management:&lt;/strong&gt; It can find files, read logs, check running processes, and kill rogue nodes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;📦 Safe Package Management:&lt;/strong&gt; It features conflict-aware package management, meaning it knows how to install dependencies without breaking your Debian/Ubuntu setup.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🗣️ Voice Interactive:&lt;/strong&gt; Integrated with Kokoro TTS and Whisper for seamless voice interactions. I built it so we can just talk to our systems.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🛡️ Human-in-the-Loop Security:&lt;/strong&gt; It can read files and search the web by itself, but for dangerous actions (like &lt;code&gt;rm&lt;/code&gt; or installing packages), it strictly requires your explicit confirmation before executing.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🏗️ How It Works (The Architecture)
&lt;/h2&gt;

&lt;p&gt;To make it reliable, I used a &lt;strong&gt;Router-Planner-Executor architecture&lt;/strong&gt; built with LangGraph:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The Router:&lt;/strong&gt; When you prompt the agent, a sophisticated classification node decides if the request is "Conversational", needs "Direct Execution" (like a simple ping), or requires "Planning" (multi-step complex tasks).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Planner:&lt;/strong&gt; For complex tasks, the agent drafts a step-by-step execution plan natively.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Executor:&lt;/strong&gt; The agent loops through the plan, sequentially using specific tools (like &lt;code&gt;run_command&lt;/code&gt;, &lt;code&gt;check_internet&lt;/code&gt;, &lt;code&gt;find_file&lt;/code&gt;) one at a time. It observes the output of each command before deciding to proceed.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Because it runs entirely locally, it's fast, free, and completely private. It feels like having a junior sysadmin sitting in your terminal.&lt;/p&gt;




&lt;h2&gt;
  
  
  🎓 Want to Build Your Own? (Free Tutorial!)
&lt;/h2&gt;

&lt;p&gt;see this post&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://dev.to/zkaria_gamal_3cddbbff21c8/build-autonomous-ai-agents-step-by-step-my-free-langchain-langgraph-tutorial-2026-edition-1b6"&gt;Learn Agent AI&lt;/a&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  💬 Let's Discuss
&lt;/h2&gt;

&lt;p&gt;I'm currently working on improving its contextual awareness and adding a sleek web UI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'd love to hear your thoughts!&lt;/strong&gt;&lt;br&gt;
Would you trust an autonomous agent to run commands on your system? What safety checks would you consider absolutely mandatory?&lt;/p&gt;

&lt;p&gt;Let me know in the comments below! 👇&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you found this interesting, I'd appreciate a 💖 or 🦄 to help others find it!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
      <category>linux</category>
    </item>
    <item>
      <title>Build Autonomous AI Agents Step-by-Step – My Free LangChain + LangGraph Tutorial (2026 Edition)</title>
      <dc:creator>zkaria gamal</dc:creator>
      <pubDate>Tue, 24 Feb 2026 13:20:05 +0000</pubDate>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8/build-autonomous-ai-agents-step-by-step-my-free-langchain-langgraph-tutorial-2026-edition-1b6</link>
      <guid>https://dev.to/zkaria_gamal_3cddbbff21c8/build-autonomous-ai-agents-step-by-step-my-free-langchain-langgraph-tutorial-2026-edition-1b6</guid>
      <description>&lt;p&gt;&lt;strong&gt;Hey dev community! 👋  **&lt;br&gt;
I'm Zkzk from Cairo, and I've been deep in agentic AI lately. After struggling to find a clear, hands-on path from "hello LLM" to "production-ready autonomous agent," I built my own tutorial repo: **&lt;a href="https://github.com/zkzkGamal/Agentic-AI-Tutorial" rel="noopener noreferrer"&gt;Agentic-AI-Tutorial&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It's a 5-chapter progression (4 complete, Chapter 5 launching soon) with &lt;strong&gt;executed Jupyter notebooks&lt;/strong&gt;, local Ollama support, OpenAI/Gemini options, and a real-world capstone: a &lt;strong&gt;Personal Finance Tracker Agent&lt;/strong&gt; that tracks expenses, analyzes patterns with RAG, and gives smart advice via FastAPI + ChromaDB.&lt;/p&gt;

&lt;p&gt;If you're in Cairo or anywhere grinding on AI agents in 2026, this is for you. Let's break it down!&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Agentic AI Matters Right Now
&lt;/h2&gt;

&lt;p&gt;LLMs are powerful, but they're stateless chatbots without memory or tools.&lt;br&gt;&lt;br&gt;
Agentic AI changes that: agents reason, use tools, remember context, self-refine, and even collaborate.&lt;/p&gt;

&lt;p&gt;My tutorial shows you how — practically, reproducibly, and &lt;strong&gt;locally-first&lt;/strong&gt; (privacy + zero cost with Ollama).&lt;/p&gt;

&lt;h2&gt;
  
  
  Chapter Breakdown: The Journey
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Chapter 1: LLM Foundations
&lt;/h3&gt;

&lt;p&gt;Call models from Ollama (local Gemma/Qwen), OpenAI (gpt-4o-mini), Gemini.&lt;br&gt;&lt;br&gt;
Learn streaming, system prompts, token counting.&lt;/p&gt;

&lt;p&gt;Notebook highlights:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple chat&lt;/li&gt;
&lt;li&gt;Emoji-styled responses&lt;/li&gt;
&lt;li&gt;Multi-provider switching&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Great for beginners testing local vs. cloud.&lt;/p&gt;

&lt;h3&gt;
  
  
  Chapter 2: LangChain Chains &amp;amp; Tools
&lt;/h3&gt;

&lt;p&gt;Build interactive workflows with LCEL.&lt;br&gt;&lt;br&gt;
Add memory (RunnableWithMessageHistory), bind tools, create routers/parallel chains.&lt;/p&gt;

&lt;p&gt;Demo: Stateful chat that remembers your name + routes queries dynamically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Chapter 3: Memory + RAG Pipelines
&lt;/h3&gt;

&lt;p&gt;Persistent memory types + local embeddings (sentence-transformers).&lt;br&gt;&lt;br&gt;
RAG setup with ChromaDB prep (full vector store coming in Ch5).&lt;/p&gt;

&lt;p&gt;Focus: Cost/privacy with everything local.&lt;/p&gt;

&lt;h3&gt;
  
  
  Chapter 4: LangGraph Orchestration (My Favorite!)
&lt;/h3&gt;

&lt;p&gt;Move from chains to graphs.&lt;br&gt;&lt;br&gt;
Build ReAct agents, multi-agent collab, self-refinement loops.&lt;/p&gt;

&lt;p&gt;Notebook magic:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Graph visualization with Graphviz (nodes/edges appear live!)&lt;/li&gt;
&lt;li&gt;State updates in real-time&lt;/li&gt;
&lt;li&gt;Tool calls (e.g., math/search)&lt;/li&gt;
&lt;li&gt;Conditional routing + human-in-the-loop&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This chapter feels "alive" — agents actually think!&lt;/p&gt;

&lt;h3&gt;
  
  
  Chapter 5: Real-World Capstone – Personal Finance Tracker Agent
&lt;/h3&gt;

&lt;p&gt;Tying it all together:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;FastAPI&lt;/strong&gt; backend for API endpoints (/track_expense, /query_budget)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ChromaDB&lt;/strong&gt; for embedding/storing transactions + finance tips (local RAG)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LangChain&lt;/strong&gt; chains for parsing/categorizing/calculating&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LangGraph&lt;/strong&gt; for orchestration: Parse → Retrieve → Analyze → Advise (ReAct + refinement)&lt;/li&gt;
&lt;li&gt;Stateful memory per user/session&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example flow:&lt;br&gt;
User: "Groceries at Carrefour, 500 EGP"&lt;br&gt;
Agent: Queries patterns → "Your food spending up 20% this month – try these local deals!"&lt;/p&gt;

&lt;p&gt;Deploy locally with uvicorn, ready for extensions (Streamlit UI, EGP exchange rates).&lt;/p&gt;

&lt;h2&gt;
  
  
  Tech Stack (All Reproducible)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.10+&lt;/li&gt;
&lt;li&gt;LangChain, LangGraph&lt;/li&gt;
&lt;li&gt;Ollama (local), OpenAI, Google Gemini&lt;/li&gt;
&lt;li&gt;ChromaDB + sentence-transformers&lt;/li&gt;
&lt;li&gt;FastAPI + uvicorn&lt;/li&gt;
&lt;li&gt;.env for keys, venv for isolation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No paid walls — run everything locally!&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Clone: &lt;code&gt;git clone https://github.com/zkzkGamal/Agentic-AI-Tutorial&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cd&lt;/code&gt; into folder → &lt;code&gt;python -m venv venv&lt;/code&gt; → activate&lt;/li&gt;
&lt;li&gt;&lt;code&gt;pip install -r requirements.txt&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Set up .env (API keys if using cloud)&lt;/li&gt;
&lt;li&gt;Run notebooks in order (Chapter1 → Chapter5)&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;p&gt;Chapter 5 notebook + deployment guide dropping soon.&lt;br&gt;&lt;br&gt;
Planning extensions: Streamlit frontend, more tools (e.g., currency API), multi-user persistence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Call to Action
&lt;/h2&gt;

&lt;p&gt;If this resonates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⭐ Star the repo – it really helps!&lt;/li&gt;
&lt;li&gt;🍴 Fork &amp;amp; build your own agent (maybe a Cairo event planner next?)&lt;/li&gt;
&lt;li&gt;Comment below: What real-world agent would YOU build with these tools?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/zkzkGamal/Agentic-AI-Tutorial" rel="noopener noreferrer"&gt;https://github.com/zkzkGamal/Agentic-AI-Tutorial&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's build agentic AI together&lt;/p&gt;

&lt;h1&gt;
  
  
  AIAgents #LangGraph #LangChain #OpenSource #Python #Tutorial
&lt;/h1&gt;

</description>
      <category>ai</category>
      <category>langchain</category>
      <category>langgraph</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>zkzkAgent v3 – now with safe, conflict-aware package management</title>
      <dc:creator>zkaria gamal</dc:creator>
      <pubDate>Sun, 22 Feb 2026 15:54:02 +0000</pubDate>
      <link>https://dev.to/zkaria_gamal_3cddbbff21c8/zkzkagent-v3-now-with-safe-conflict-aware-package-management-4l4p</link>
      <guid>https://dev.to/zkaria_gamal_3cddbbff21c8/zkzkagent-v3-now-with-safe-conflict-aware-package-management-4l4p</guid>
      <description>&lt;p&gt;Managing Linux is powerful but exhausting. Remembering exact commands, hunting down files, killing rogue processes, checking network, deploying scripts — it adds up fast.&lt;/p&gt;

&lt;p&gt;What if you had a local AI that doesn’t just tell you what to do — it actually executes safely, with your approval every time anything dangerous happens?&lt;/p&gt;

&lt;p&gt;That’s why I built zkzkAgent — fully offline, no cloud, no telemetry, privacy-first Linux system manager.&lt;/p&gt;

&lt;p&gt;Repo → &lt;a href="https://github.com/zkzkGamal/zkzkAgent" rel="noopener noreferrer"&gt;https://github.com/zkzkGamal/zkzkAgent&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🚀 What’s Working Really Well&lt;/p&gt;

&lt;p&gt;🛠 1. Real System Control&lt;br&gt;
zkzkAgent can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Search files &amp;amp; folders intelligently&lt;/li&gt;
&lt;li&gt;Find &amp;amp; kill processes&lt;/li&gt;
&lt;li&gt;Run safe shell commands (ls, date, whoami, etc.)&lt;/li&gt;
&lt;li&gt;Deploy scripts with AI help choosing options&lt;/li&gt;
&lt;li&gt;Handle all destructive actions (rm, install, remove) with explicit human confirmation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It reads your intent, plans the safest path, and streams results live.&lt;/p&gt;

&lt;p&gt;🌐 2. Network Smarts&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Auto-checks internet before any web/search/browser task&lt;/li&gt;
&lt;li&gt;Reconnects Wi-Fi via nmcli if dropped&lt;/li&gt;
&lt;li&gt;Searches DuckDuckGo or grabs images straight to your media folder&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Makes remote work, quick lookups, and downloads seamless.&lt;/p&gt;

&lt;p&gt;🎤 3. Optional Voice Mode&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Whisper for speech-to-text&lt;/li&gt;
&lt;li&gt;Coqui TTS (working on XTTS-v2 cloning for better anime-ish voices)&lt;/li&gt;
&lt;li&gt;Hands-free control — talk to your system like a real assistant&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Type or speak — it responds either way.&lt;/p&gt;

&lt;p&gt;🔥 New: Smart Package Management (the part I’m most excited about)&lt;br&gt;
Just added a package tool that actually feels safe &amp;amp; useful for developers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Human-in-the-loop on every install/remove/upgrade/clean&lt;/li&gt;
&lt;li&gt;Detects OS once → uses safe priority to avoid dependency hell:

&lt;ol&gt;
&lt;li&gt;Special cases (no search needed):

&lt;ul&gt;
&lt;li&gt;postman → sudo snap install postman&lt;/li&gt;
&lt;li&gt;code/vscode → sudo snap install --classic code&lt;/li&gt;
&lt;li&gt;discord/slack → snap or flatpak&lt;/li&gt;
&lt;li&gt;zoom → wget .deb + dpkg&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Flatpak first for GUI/dev tools (sandbox + shared runtimes = fewer fights)&lt;/li&gt;
&lt;li&gt;Snap next&lt;/li&gt;
&lt;li&gt;apt only for CLI/system utils&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;Checks if already installed (command -v / snap list / flatpak list) before suggesting&lt;/li&gt;
&lt;li&gt;Shows full command preview + explanation (dry-run style)&lt;/li&gt;
&lt;li&gt;Logs every command + your approval&lt;/li&gt;
&lt;li&gt;No blind runs — always "yes/no" for anything that touches the system&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example flow — "install postman":&lt;/p&gt;

&lt;p&gt;→ planner sees snap path&lt;br&gt;
→ executor: check_internet() once → propose "sudo snap install postman"&lt;br&gt;
→ shows preview → waits for "yes"&lt;br&gt;
→ runs → verifies with postman --version&lt;/p&gt;

&lt;p&gt;Why this matters for devs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Avoids the classic "apt install nodejs → wrong/old version" nightmare&lt;/li&gt;
&lt;li&gt;Reduces snap vs flatpak vs apt version conflicts (huge pain on Ubuntu)&lt;/li&gt;
&lt;li&gt;Everything is auditable — you see exactly what ran&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🧠 How I Built It&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LangGraph → stateful agent graph (planner → executor → tools → human interrupt)&lt;/li&gt;
&lt;li&gt;Ollama → local inference (Llama 3.1 8B or whatever model you run)&lt;/li&gt;
&lt;li&gt;Tools → simple Python wrappers (subprocess for shell, etc.)&lt;/li&gt;
&lt;li&gt;Safety → interrupt + input() for confirmation on dangerous actions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything 100% local — no data leaves your machine.&lt;/p&gt;

&lt;p&gt;🧪 Honest Limitations Right Now&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only tested on Ubuntu/Debian (apt/snap/flatpak paths)&lt;/li&gt;
&lt;li&gt;No automatic rollback if install fails (yet)&lt;/li&gt;
&lt;li&gt;Voice TTS prosody still needs work (switching to XTTS-v2 cloning soon)&lt;/li&gt;
&lt;li&gt;Package tool is smart but not perfect — edge cases exist&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Quick Start (try it in ~2 minutes)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/zkzkGamal/zkzkAgent.git
&lt;span class="nb"&gt;cd &lt;/span&gt;zkzkAgent
&lt;span class="nb"&gt;chmod&lt;/span&gt; +x install.sh
./install.sh
&lt;span class="nb"&gt;source &lt;/span&gt;venv/bin/activate
python main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then just type:&lt;/p&gt;

&lt;p&gt;"install postman"&lt;br&gt;
"install discord"&lt;br&gt;
"check node version"&lt;br&gt;
"find all python files in my project"&lt;br&gt;
"kill firefox"&lt;/p&gt;

&lt;p&gt;🎯 Looking for Dev Feedback&lt;/p&gt;

&lt;p&gt;Better conflict detection? (e.g. dpkg -s / snap list before proposing apt?)&lt;br&gt;
Should it auto-suggest nvm/pyenv/volta for node/python/go runtimes?&lt;br&gt;
Flatpak vs snap — which should be default for GUI apps in 2026?&lt;br&gt;
Any scary package edge cases I missed? (PPA hell, broken deps, etc.)&lt;/p&gt;

&lt;p&gt;PRs, issues, forks, brutal roasts — all welcome.&lt;br&gt;
Built in Cairo with ❤️ by zkaria (@zkzkgamal11)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;This version keeps your voice: honest, technical enough for devs, shows real value, admits limitations, and invites collaboration — just like your original example.

Let me know if you want it shorter, more code-heavy, or tweaked for a specific platform (X thread, Reddit, DEV.to, etc.). Ready to ship! 🚀
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>ai</category>
      <category>agentaichallenge</category>
      <category>langchain</category>
      <category>langgrapgh</category>
    </item>
  </channel>
</rss>
