<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Abdul Hakkeem P A</title>
    <description>The latest articles on DEV Community by Abdul Hakkeem P A (@abdulhakkeempa).</description>
    <link>https://dev.to/abdulhakkeempa</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F822600%2F4af1e965-fe7d-4178-984a-a4fb1c96f7dd.jpeg</url>
      <title>DEV Community: Abdul Hakkeem P A</title>
      <link>https://dev.to/abdulhakkeempa</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/abdulhakkeempa"/>
    <language>en</language>
    <item>
      <title>Building a Multimodal Local AI Stack: Gemma 4 E2B, vLLM, and Hermes Agent</title>
      <dc:creator>Abdul Hakkeem P A</dc:creator>
      <pubDate>Sat, 04 Apr 2026 17:11:49 +0000</pubDate>
      <link>https://dev.to/abdulhakkeempa/building-a-multimodal-local-ai-stack-gemma-4-e2b-vllm-and-hermes-agent-k8l</link>
      <guid>https://dev.to/abdulhakkeempa/building-a-multimodal-local-ai-stack-gemma-4-e2b-vllm-and-hermes-agent-k8l</guid>
      <description>&lt;p&gt;The Local AI movement just hit a massive milestone. With the release of Google's Gemma 4, 2-billion parameter models are no longer toys for simple chat. They're multimodal powerhouses purpose-built for advanced reasoning and agentic workflows.&lt;/p&gt;

&lt;p&gt;In this guide, we'll break down how to harness the Gemma 4 E2B (Effective 2B) model using &lt;strong&gt;vLLM&lt;/strong&gt; and integrate it with the &lt;strong&gt;Hermes Agent&lt;/strong&gt; for a fully local, multimodal stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Gemma 4?
&lt;/h2&gt;

&lt;p&gt;Google released Gemma 4 in four sizes: Effective 2B (E2B), Effective 4B (E4B), 26B Mixture of Experts, and 31B Dense. We're focused on the E2B; the one that fits on consumer hardware.&lt;/p&gt;

&lt;p&gt;Key capabilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal from day one&lt;/strong&gt; - all models natively process text, images, and video. The E2B and E4B edge models also support audio input for speech recognition.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long context&lt;/strong&gt; - edge models like E2B feature a 128K context window.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apache 2.0 licensed&lt;/strong&gt; - commercially permissive, no strings attached.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why E2B + vLLM for a local agent stack?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Instruction tuning&lt;/strong&gt; — Gemma 4 excels at following system prompts, critical for an agent managing many skills.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Native tool calling&lt;/strong&gt; — function calling, structured JSON output, and native system instructions are built in, letting the agent reliably interact with tools and APIs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Efficiency&lt;/strong&gt; — at 2B effective parameters, it leaves plenty of VRAM for the agent's KV cache, keeping response times fast even on an RTX 3060/4060.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Step 1: Install vLLM
&lt;/h3&gt;

&lt;p&gt;You'll need a HuggingFace account and access token, since Gemma 4 requires accepting Google's license on HF first. Then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;uv
uv venv &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate
uv pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-U&lt;/span&gt; vllm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Serve Gemma 4 E2B
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hf download google/gemma-4-e2b-it &lt;span class="nt"&gt;--local-dir&lt;/span&gt; ~/models/gemma-4-e2b-it

vllm serve google/gemma-4-e2b-it &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt; 8000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--max-model-len&lt;/span&gt; 32768 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-auto-tool-choice&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tool-call-parser&lt;/span&gt; Hermes &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--gpu-memory-utilization&lt;/span&gt; 0.85

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your OpenAI-compatible endpoint is now live at &lt;code&gt;http://localhost:8000/v1&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Install Hermes Agent
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Point Hermes at your local model
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hermes model
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This walks you through selecting a provider. Choose &lt;strong&gt;Custom Endpoint&lt;/strong&gt; and enter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Base URL:&lt;/strong&gt; &lt;code&gt;http://localhost:8000/v1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model:&lt;/strong&gt; &lt;code&gt;google/gemma-4-E2B-it&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's it. You now have a fully local, multimodal agent with 40+ built-in tools; web search, file operations, terminal access, browser automation, and more with zero cloud dependency.&lt;/p&gt;




&lt;p&gt;Have you tried Gemma 4 yet? Drop your opinion in the comments!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemma</category>
      <category>hermes</category>
      <category>mcp</category>
    </item>
  </channel>
</rss>
