<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gabe Dev</title>
    <description>The latest articles on DEV Community by Gabe Dev (@gabe_mishra_5e5eae1be4db0).</description>
    <link>https://dev.to/gabe_mishra_5e5eae1be4db0</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3936604%2F3a9129c9-7573-4876-8506-216ef74ccd92.png</url>
      <title>DEV Community: Gabe Dev</title>
      <link>https://dev.to/gabe_mishra_5e5eae1be4db0</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gabe_mishra_5e5eae1be4db0"/>
    <language>en</language>
    <item>
      <title>Why your local agent keeps dropping brackets (and how Hermes fixes it)</title>
      <dc:creator>Gabe Dev</dc:creator>
      <pubDate>Mon, 18 May 2026 05:10:53 +0000</pubDate>
      <link>https://dev.to/gabe_mishra_5e5eae1be4db0/why-your-local-agent-keeps-dropping-brackets-and-how-hermes-fixes-it-362h</link>
      <guid>https://dev.to/gabe_mishra_5e5eae1be4db0/why-your-local-agent-keeps-dropping-brackets-and-how-hermes-fixes-it-362h</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkuo5t72xtnjb73t6q95l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkuo5t72xtnjb73t6q95l.png" alt=" " width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you’ve tried running a local agent loop using standard frameworks, you know exactly when it breaks: loop 3 or 4, right when the model needs to call a tool.&lt;/p&gt;

&lt;p&gt;Most frameworks force you to dump massive JSON schemas of your entire tool repository directly into the system prompt. While this works fine when you're burning OpenAI credits on a massive cloud model, the moment you drop down to local hardware (like a quantized 8B or 32B model running on a consumer GPU), two things happen: your context window gets eaten alive by structural boilerplate, and the model eventually drops a closing brace } mid-generation, completely crashing your regex parser.&lt;/p&gt;

&lt;p&gt;While hacking on Hermes Agent for the DEV challenge, I realized its biggest architectural win isn't the slick integration list—it's how it completely bypasses this JSON parsing tax.&lt;/p&gt;

&lt;p&gt;**The JSON Schema Tax on Local VRAM&lt;br&gt;
**When an agent framework uses passive prompt coercion, it passes your Python functions through a serializer to generate something like this in your system prompt:&lt;/p&gt;

&lt;p&gt;JSON&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"query_db"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Lookup user records"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"parameters"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"integer"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Multiply that by five or ten tools, and you’re wasting thousands of tokens just explaining how to format a response. On local hardware, this context bloat dilutes the model’s actual reasoning attention and spikes your Time-to-First-Token (TTFT) latency.&lt;/p&gt;

&lt;p&gt;**How Hermes Shifts to Native Token Steering&lt;br&gt;
**Hermes doesn't try to bully a raw text model into outputting valid JSON via heavy system prompting. Instead, it leverages the fact that the underlying Nous Hermes models are natively fine-tuned to treat tool execution as a structural token sequence using hardcoded XML tags.&lt;/p&gt;

&lt;p&gt;Instead of parsing a massive prompt blocks, the framework expects and guides the model into a deterministic streaming state:&lt;/p&gt;

&lt;p&gt;XML&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;scratchpad&amp;gt;&lt;/span&gt;
Dependencies resolved. Need to check user status before updating the record.
&lt;span class="nt"&gt;&amp;lt;/scratchpad&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;tool_call&amp;gt;&lt;/span&gt;
{"name": "query_db", "arguments": {"user_id": 4022}}
&lt;span class="nt"&gt;&amp;lt;/tool_call&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The engineering win here is subtle but massive: &lt;strong&gt;State Isolation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The  is a sandbox: The model handles its internal reasoning tokens before it hits the tool execution tokens. This stops the "thinking" process from bleeding into the actual execution syntax.&lt;/p&gt;

&lt;p&gt;Zero Prompt Bloat: Because the model's weight distribution naturally favors these tags for tool routing, you don't need a 400-token system prompt lecturing the model on bracket placement.&lt;/p&gt;

&lt;p&gt;**The Bottom Line&lt;br&gt;
**When you move tool routing from the prompt layer down to the token-generation layer, you save significant VRAM and eliminate the syntax degradation that plagues small models. It’s the reason you can run highly reliable, multi-step tool pipelines on a local 8B model inside Hermes that would normally require a massive, unquantized cloud API just to keep the JSON valid.&lt;/p&gt;

&lt;p&gt;If you’re building an entry for the challenge, skip the massive prompt-engineering wrappers. Lean into the native XML schema, keep your system prompts minimalist, and let the model’s structural tuning do the heavy lifting.&lt;/p&gt;

</description>
      <category>hermesagentchallenge</category>
      <category>devchallenge</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
