<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: George Larson</title>
    <description>The latest articles on DEV Community by George Larson (@george_larson_3cc4a57b08b).</description>
    <link>https://dev.to/george_larson_3cc4a57b08b</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2194433%2F5a531220-012f-4369-8a1e-d8540450a2b5.jpg</url>
      <title>DEV Community: George Larson</title>
      <link>https://dev.to/george_larson_3cc4a57b08b</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/george_larson_3cc4a57b08b"/>
    <language>en</language>
    <item>
      <title>I Took a 2012 Mozilla Demo and Turned It Into a Production MMO With AI</title>
      <dc:creator>George Larson</dc:creator>
      <pubDate>Tue, 24 Mar 2026 20:02:56 +0000</pubDate>
      <link>https://dev.to/george_larson_3cc4a57b08b/i-took-a-2012-mozilla-demo-and-turned-it-into-a-production-mmo-with-ai-3eka</link>
      <guid>https://dev.to/george_larson_3cc4a57b08b/i-took-a-2012-mozilla-demo-and-turned-it-into-a-production-mmo-with-ai-3eka</guid>
      <description>&lt;p&gt;In 2012, Mozilla and Little Workshop released &lt;a href="https://github.com/mozilla/BrowserQuest" rel="noopener noreferrer"&gt;BrowserQuest&lt;/a&gt;, an HTML5 multiplayer demo that proved browsers could handle real-time games. It was a tech demo. No types. No tests. No persistence beyond &lt;code&gt;localStorage&lt;/code&gt;. No separation of concerns. One massive &lt;code&gt;Player&lt;/code&gt; class doing everything from combat to inventory to chat.&lt;/p&gt;

&lt;p&gt;It served its purpose and was abandoned.&lt;/p&gt;

&lt;p&gt;I picked it up and asked a simple question: &lt;strong&gt;what would it take to turn this into something you'd actually ship?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The answer is &lt;a href="https://github.com/georgeglarson/Fracture" rel="noopener noreferrer"&gt;Fracture&lt;/a&gt;. You can &lt;a href="https://fracture.georgelarson.me" rel="noopener noreferrer"&gt;play it live&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9mj45eoerknwytete8r3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9mj45eoerknwytete8r3.png" alt="Fracture gameplay — AI-powered thought bubbles and real-time combat" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this project
&lt;/h2&gt;

&lt;p&gt;I've spent 25 years modernizing legacy systems: manufacturing execution platforms, enterprise infrastructure, database tuning, security hardening. Python, Go, Rust, C#, PHP, TypeScript, Bash, Perl. Whatever the system was written in, the work was the same: understand what exists, establish contracts, decompose responsibilities, add observability, write tests, and ship without breaking what already works.&lt;/p&gt;

&lt;p&gt;I wanted to demonstrate that the methodology I've applied to manufacturing automation and enterprise platforms translates directly to any domain, and that AI as a development agent changes the economics of what one engineer can ship.&lt;/p&gt;

&lt;p&gt;Not "AI wrote my code." More like "AI enables one engineer do what used to take a team."&lt;/p&gt;

&lt;p&gt;A game is the perfect vehicle. It has real-time networking, state management, persistence, external API integrations, and enough complexity that architecture actually matters. And unlike yet another CRUD app, people can play it and immediately understand what it does.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I started with
&lt;/h2&gt;

&lt;p&gt;BrowserQuest's server was a single JavaScript file ecosystem with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No type safety:&lt;/strong&gt; everything was &lt;code&gt;any&lt;/code&gt;, passed through string-keyed message arrays&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No tests.&lt;/strong&gt; Zero.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No persistence.&lt;/strong&gt; Die, refresh, start over&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No observability.&lt;/strong&gt; &lt;code&gt;console.log&lt;/code&gt; or nothing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;God classes.&lt;/strong&gt; &lt;code&gt;Player&lt;/code&gt; handled auth, combat, movement, inventory, chat, and serialization&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No security.&lt;/strong&gt; Client-trusted positions, no rate limiting, no input validation&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The methodology
&lt;/h2&gt;

&lt;p&gt;Same playbook as enterprise, just applied to a game server:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Type safety first
&lt;/h3&gt;

&lt;p&gt;Migrated everything to TypeScript strict mode. Every &lt;code&gt;any&lt;/code&gt; replaced. Every message format typed.&lt;br&gt;
This is the unsexy work that makes everything else possible; but it's easier to refactor what is easily understood.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Phase 1: Type Safety - Replace any types in message.ts
Phase 1: Type Safety - Replace any types in player.ts
Phase 1: Type Safety - Fix ws.ts, character.ts, map.ts
Phase 1: Type Safety - Add typed imports to world.ts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These were the first commits. No features. Just contracts.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Decompose by responsibility
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;Player&lt;/code&gt; class had grown to 1,742 lines handling over a dozen concerns. I extracted each into its own module:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Module&lt;/th&gt;
&lt;th&gt;Responsibility&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;combat/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Aggro policy, combat tracker, kill streaks, nemesis system&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;player/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;MessageRouter + 14 handler modules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;world/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Spatial manager, spawn manager, game loop&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;storage/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;SQLite persistence layer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;inventory/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Item management and serialization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;zones/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Zone boundaries, bonuses, level scaling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;party/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Invite, XP sharing, proximity tracking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;rifts/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Endgame dungeons with stacking modifiers&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Post-refactor, &lt;code&gt;Player&lt;/code&gt; is 726 lines. Every module has one job.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Event-driven architecture
&lt;/h3&gt;

&lt;p&gt;Systems communicate through a typed &lt;code&gt;EventBus&lt;/code&gt;, not direct method calls. Combat doesn't know about achievements. The narrator doesn't know about inventory. They publish events; interested systems subscribe.&lt;/p&gt;

&lt;p&gt;This is the same pattern you'd use in a microservice architecture, applied at the module level. It makes testing trivial; you can verify each system in isolation.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Test everything
&lt;/h3&gt;

&lt;p&gt;The codebase has &lt;strong&gt;3,161 tests across 65 test files&lt;/strong&gt; with zero failures. Coverage by module (statement coverage):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Party, Shop, Zones, Events: &lt;strong&gt;100%&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Rifts: &lt;strong&gt;98%&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Equipment, Inventory: &lt;strong&gt;97-100%&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Storage: &lt;strong&gt;76%&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Player (aggregate): &lt;strong&gt;75%&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Storage tests use in-memory SQLite, no mocks pretending to be a database. Coverage thresholds are enforced in CI. Tests run on Node 20 and 22.&lt;/p&gt;

&lt;p&gt;I wrote tests &lt;em&gt;before&lt;/em&gt; refactors, not after. When you're decomposing a class that handles 13 responsibilities, you need to know immediately when you break something.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Production-grade observability
&lt;/h3&gt;

&lt;p&gt;All &lt;code&gt;console.log&lt;/code&gt; calls across the codebase were replaced with Pino structured logging, then wired in OpenTelemetry distributed tracing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Every message handler is traced&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;span&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startSpan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`player.message.&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Every database call&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;span&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startSpan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;storage.saveCharacter&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Every AI call with latency tracking&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;span&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startSpan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ai.venice&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Logs and traces are correlated. Every log line carries a &lt;code&gt;trace_id&lt;/code&gt; and &lt;code&gt;span_id&lt;/code&gt;. The whole stack ships to a self-hosted SigNoz instance backed by ClickHouse, with public Grafana dashboards for portfolio visitors.&lt;/p&gt;

&lt;p&gt;This is the same OTel + Pino + SigNoz stack used in production microservices. I just applied it to a game server.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7r3c3c2op9bxuozuxl26.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7r3c3c2op9bxuozuxl26.png" alt="Fracture Grafana dashboard — production observability on a game server" width="800" height="289"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Where AI changed the game
&lt;/h2&gt;

&lt;p&gt;Claude was a development partner throughout this project.&lt;br&gt;
Here's what that actually means in practice:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI as force multiplier, not replacement.&lt;/strong&gt; I made every architectural decision. I chose SRP decomposition. I chose event-driven communication and OpenTelemetry over custom metrics.&lt;br&gt;
AI didn't make those calls; 25 years of experience did. But AI let me &lt;em&gt;execute&lt;/em&gt; those decisions at a pace that was previously unthinkable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI for the tedious but important work.&lt;/strong&gt; Migrating every console.log call to structured logging with proper context? Writing 3,161 tests? Extracting 14 handler modules from a monolithic class? This is work that matters but takes forever when you're doing it alone. AI compressed weeks into days.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI-powered game features.&lt;/strong&gt; AI also ships as part of the product:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;NPC dialogue:&lt;/strong&gt; Every NPC generates contextual responses via Venice AI (llama-3.3-70b). The village priest talks about "the time before the sky broke." The guard asks where you came from. Conversations have memory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Entity thoughts:&lt;/strong&gt; Mobs display visible AI-generated thought bubbles. Rats think about cheese. Skeletons scheme about revenge. 25% AI-generated, 75% template-based, refreshed on a 5-minute cycle.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Narrator system:&lt;/strong&gt; Zone-specific narrative voices describe events with unique vocabularies. Deaths are mourned. Achievements are celebrated. Voice synthesis via Fish Audio TTS.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graceful degradation:&lt;/strong&gt; If Venice goes down, the game keeps running with static fallbacks. AI enhances the experience; it never blocks it. Circuit breaker opens after 5 failures, recovers automatically.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdewmfm6wo9e0iuitpvcz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdewmfm6wo9e0iuitpvcz.png" alt="NPC dialogue generated by Venice AI — every conversation is unique" width="800" height="475"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Client&lt;/td&gt;
&lt;td&gt;HTML5 Canvas, TypeScript 5.8, Webpack 5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Server&lt;/td&gt;
&lt;td&gt;Node.js, TypeScript 5.8, Socket.IO 4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Database&lt;/td&gt;
&lt;td&gt;SQLite (better-sqlite3)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI&lt;/td&gt;
&lt;td&gt;Venice AI (llama-3.3-70b), Fish Audio TTS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;OpenTelemetry, Pino, SigNoz, Grafana, ClickHouse&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Testing&lt;/td&gt;
&lt;td&gt;Vitest 4, v8 coverage, CI on Node 20 + 22&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Production&lt;/td&gt;
&lt;td&gt;nginx, Let's Encrypt SSL, Docker Compose&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;~215 TypeScript source files. 280 including tests. 105 WebSocket message types. 50 levels, 7 zones, 6 roaming bosses, a nemesis system where mobs track grudges and power up against players who've killed them before.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fida05bwv9cugde0t38zk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fida05bwv9cugde0t38zk.png" alt="Nethack-style TUI debugger for live server introspection" width="800" height="1067"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd tell someone doing this at work
&lt;/h2&gt;

&lt;p&gt;Legacy modernization with AI follows the same rules as legacy modernization without it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Type safety is not optional.&lt;/strong&gt; You cannot safely refactor code you cannot reason about. This is always step one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test before you refactor.&lt;/strong&gt; Decomposing a 1,700-line class is not the time to discover you broke chat because it was coupled to combat through a shared mutable array.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI amplifies your judgment, not replaces it.&lt;/strong&gt; If you don't want SRP patterns, AI may not suggest it. If you do know what you want, AI will help you implement it 10x faster.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability is not a luxury.&lt;/strong&gt; Structured logging and tracing aren't just for microservices; they're for anything you plan to operate.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Play:&lt;/strong&gt; &lt;a href="https://fracture.georgelarson.me" rel="noopener noreferrer"&gt;fracture.georgelarson.me&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Code:&lt;/strong&gt; &lt;a href="https://github.com/georgeglarson/Fracture" rel="noopener noreferrer"&gt;github.com/georgeglarson/Fracture&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The game runs 24/7 on a real server with real observability. Walk around, fight mobs, read their thoughts, talk to NPCs. Everything you see (the combat, the AI dialogue, the persistence, the spatial partitioning) is the result of applying boring enterprise modernization patterns to a fun problem.&lt;/p&gt;

&lt;p&gt;That's the whole point. The strategies transfer. The methodology scales.&lt;br&gt;
AI just makes it possible to do it alone.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;George Larson, 25 years in software engineering, infrastructure, manufacturing systems, and cybersecurity. Currently looking for Director/VP or senior engineering roles. More at &lt;a href="https://georgelarson.me" rel="noopener noreferrer"&gt;georgelarson.me&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gamedev</category>
      <category>typescript</category>
      <category>legacy</category>
    </item>
    <item>
      <title>Hermes Agent: Honest Review</title>
      <dc:creator>George Larson</dc:creator>
      <pubDate>Fri, 20 Mar 2026 19:11:05 +0000</pubDate>
      <link>https://dev.to/george_larson_3cc4a57b08b/hermes-agent-honest-review-1557</link>
      <guid>https://dev.to/george_larson_3cc4a57b08b/hermes-agent-honest-review-1557</guid>
      <description>&lt;p&gt;Hermes Agent. An agent that grows with you.&lt;/p&gt;

&lt;p&gt;Here is what's actually under the bonnet.&lt;/p&gt;

&lt;h2&gt;
  
  
  The short version
&lt;/h2&gt;

&lt;p&gt;Hermes is an autonomous agent framework with genuine multi-platform integration: Telegram, Discord, WhatsApp, Slack, Signal, Home Assistant, and more. If you need an AI agent that lives on messaging platforms, Hermes is the most complete option available.&lt;/p&gt;

&lt;p&gt;If you're a software engineer working in a terminal, the coding tools will overlap with what you already use. The gateway is where the real value is.&lt;/p&gt;

&lt;h2&gt;
  
  
  The model
&lt;/h2&gt;

&lt;p&gt;Hermes-4-405B is a supervised fine-tune of Meta's Llama 3.1 405B. The &lt;a href="https://huggingface.co/NousResearch/Hermes-4-405B" rel="noopener noreferrer"&gt;HuggingFace model card&lt;/a&gt; lists the base model explicitly. Every Hermes model since version 1 has been a Llama fine-tune. NousResearch is fundamentally a Llama fine-tuning shop.&lt;/p&gt;

&lt;p&gt;The fine-tuning is competent: ~5 million training samples, ~60 billion tokens, tool-calling format baked in. But the moment you interact with it, you feel Llama. If you've used Llama 3.1 405B through any other provider, you already know what Hermes-4 feels like.&lt;/p&gt;

&lt;p&gt;The agent itself is model-agnostic. You can point it at Claude, GPT, Gemini, or anything via OpenRouter. Oddly, the default configuration ships pointed at Claude Opus via OpenRouter, not their own model. Getting Hermes-4 running on their own inference portal took some troubleshooting. The portal is the actual business model (free agent, paid inference) but the onboarding doesn't make it easy.&lt;/p&gt;

&lt;h2&gt;
  
  
  "Grows with you"
&lt;/h2&gt;

&lt;p&gt;The marketing implies something approaching learning. The reality: Hermes writes markdown files to &lt;code&gt;~/.hermes/memories/&lt;/code&gt;. A &lt;code&gt;MEMORY.md&lt;/code&gt; (and optionally a &lt;code&gt;USER.md&lt;/code&gt;) with section delimiters, loaded into context at the start of each session.&lt;/p&gt;

&lt;p&gt;This is the same pattern used by Claude Code, OpenCode, and every other tool with a config file. The implementation is well-engineered: atomic writes via temp files, file locking, injection scanning, character budgets, frozen snapshots for cache stability. But "grows with you" is a stretch for what amounts to structured note-taking.&lt;/p&gt;

&lt;h2&gt;
  
  
  Skills are system prompts
&lt;/h2&gt;

&lt;p&gt;Hermes has a skills system. Skills are markdown files with YAML frontmatter. When activated, their content is injected into the model's context. That's it.&lt;/p&gt;

&lt;p&gt;I asked Hermes to critique my resume. It created a "portfolio analysis skill," which was a markdown file describing how to analyze portfolios. This is structured prompt injection with a CRUD layer, not a capability. The progressive disclosure design (metadata loaded first, full content on demand) is genuinely good token management.&lt;/p&gt;

&lt;p&gt;To be fair, calling these "skills" is an industry-wide convention, not something Hermes invented. Claude Code, OpenAI's custom GPTs, and most agent frameworks use similar language for what amounts to structured context injection. Hermes's implementation is actually better-engineered than most.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's real vs. what's a wrapper
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Real engineering
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Multi-platform gateway.&lt;/strong&gt; 12 messaging platform integrations, each with hundreds to thousands of lines of adapter code. Discord alone is 2,085 lines. Telegram, Slack, Signal, WhatsApp, Matrix, Home Assistant, email, SMS. These are real, substantial integrations with media handling, threading, and typing indicators. This is the genuinely unique thing Hermes offers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Terminal tool.&lt;/strong&gt; Six execution backends: local subprocess, Docker, Singularity, Modal (cloud), SSH, and Daytona. Persistent shell that preserves state across calls. Dangerous command approval system. Environment variable isolation to prevent API key leakage. Real engineering on top of subprocess.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory system.&lt;/strong&gt; Flat files with atomic writes, file locking, injection/exfiltration scanning, and frozen snapshots for prefix cache stability. Well-thought-out engineering for what is fundamentally markdown on disk.&lt;/p&gt;

&lt;h3&gt;
  
  
  Wrapper layer
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Web tools.&lt;/strong&gt; Configurable wrapper around Firecrawl, Parallel, or Tavily. The value-add is an LLM post-processing layer that summarizes results to reduce token usage. Functional but not novel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mixture of Agents.&lt;/strong&gt; Sends the same prompt to four frontier models (Claude, Gemini, GPT, DeepSeek) in parallel, then aggregates with a fifth. ~550 lines implementing a &lt;a href="https://arxiv.org/abs/2406.04692" rel="noopener noreferrer"&gt;published paper&lt;/a&gt;. Works, but expensive: five frontier model calls per query.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Browser tool.&lt;/strong&gt; Uses accessibility tree snapshots for text-based page interaction, a better pattern than DOM selectors for LLM agents. Supports local Chromium, Browserbase, and Browser Use as backends. Solid design, but the same approach is available via Vercel's &lt;a href="https://github.com/vercel-labs/agent-browser" rel="noopener noreferrer"&gt;agent-browser&lt;/a&gt; as a standalone tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who this is for
&lt;/h2&gt;

&lt;p&gt;If you manage a community, run a Discord server, coordinate a team on Slack, or want an AI assistant on Signal/WhatsApp/Telegram, Hermes is the most complete agent framework for that. I haven't found anything else with this level of multi-platform gateway support. The engineering is real.&lt;/p&gt;

&lt;p&gt;If you're a software engineer working in a terminal, the coding tools probably overlap with what you already use. But if you coordinate across messaging platforms, this is worth a serious look.&lt;/p&gt;

&lt;h2&gt;
  
  
  The business model
&lt;/h2&gt;

&lt;p&gt;The agent is MIT-licensed and free. You bring your own API keys: OpenRouter, Anthropic, OpenAI, whatever you prefer. The monetization is &lt;a href="https://portal.nousresearch.com" rel="noopener noreferrer"&gt;Nous Portal&lt;/a&gt;, their inference service that hosts Hermes-4. You get $5 in free credits and the agent has first-class OAuth integration with Nous as a provider.&lt;/p&gt;

&lt;p&gt;The strategy: give away the agent, sell the inference. Smart model, and the free tier makes it easy to evaluate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom line
&lt;/h2&gt;

&lt;p&gt;Hermes Agent is real software with real engineering effort: 40 tool modules, 12 platform adapters, active development. The multi-platform gateway is genuinely impressive and has no equivalent in the ecosystem.&lt;/p&gt;

&lt;p&gt;The "grows with you" and "gets more capable" framing is a stretch for what amounts to structured note-taking, but the underlying implementation is solid. The naming conventions are the same ones the whole industry uses.&lt;/p&gt;

&lt;p&gt;If your use case is "AI agent accessible on messaging platforms," Hermes is the best option I've found. If you primarily work in a terminal, the coding tools aren't bringing anything novel.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;canonical_url: &lt;a href="https://georgelarson.me/writing/2026-03-19-hermes-review/" rel="noopener noreferrer"&gt;https://georgelarson.me/writing/2026-03-19-hermes-review/&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;em&gt;George Larson, 25 years in software engineering, infrastructure, manufacturing systems, and cybersecurity. Currently looking for Director/VP or senior engineering roles. More at &lt;a href="https://georgelarson.me" rel="noopener noreferrer"&gt;georgelarson.me&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>review</category>
      <category>llm</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
