<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Levash0v</title>
    <description>The latest articles on DEV Community by Levash0v (@levash0v).</description>
    <link>https://dev.to/levash0v</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3958350%2F6e4edddf-8ef1-4d2c-8266-bbd13bf26cab.jpeg</url>
      <title>DEV Community: Levash0v</title>
      <link>https://dev.to/levash0v</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/levash0v"/>
    <language>en</language>
    <item>
      <title>I Turned Hermes Agent into a Verifiable Agent Operating System</title>
      <dc:creator>Levash0v</dc:creator>
      <pubDate>Sat, 30 May 2026 14:06:53 +0000</pubDate>
      <link>https://dev.to/levash0v/i-turned-hermes-agent-into-a-verifiable-agent-operating-system-3kd0</link>
      <guid>https://dev.to/levash0v/i-turned-hermes-agent-into-a-verifiable-agent-operating-system-3kd0</guid>
      <description>&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I did not build another chatbot.&lt;/p&gt;

&lt;p&gt;I built a memory hygiene system around Hermes Agent: a workflow that tells the agent what to remember, what to turn into a skill, what to write into the repo, what to track in a task system, and what to leave behind.&lt;/p&gt;

&lt;p&gt;The core idea is simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Agent memory is not one bucket.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Long-running agent work breaks when chat history, global memory, project state, reusable procedures, task ownership, and public side effects are treated as the same thing. They have different lifetimes. Putting all of them into “memory” creates drift.&lt;/p&gt;

&lt;p&gt;So I built a small repo-local harness and operating discipline around Hermes Agent.&lt;/p&gt;

&lt;p&gt;Hermes Agent is the local agent runtime I use for tool-calling work: terminal commands, file edits, browser/search workflows, persistent memory, reusable skills, scheduled jobs, and gateway integrations.&lt;/p&gt;

&lt;p&gt;Multica is the external task layer I use for active work ownership and routing. In this setup, it replaced local Hermes Kanban as the source of truth for current tasks.&lt;/p&gt;

&lt;p&gt;The system separates agent work into durable layers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Responsibility&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hermes memory&lt;/td&gt;
&lt;td&gt;Stable facts only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hermes skills&lt;/td&gt;
&lt;td&gt;Reusable procedures&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Repo files&lt;/td&gt;
&lt;td&gt;Project-local state and conventions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multica&lt;/td&gt;
&lt;td&gt;Task ownership and routing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Session search&lt;/td&gt;
&lt;td&gt;Historical recall&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Human approval&lt;/td&gt;
&lt;td&gt;External side effects&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The operating rule:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Memory for stable facts. Skills for reusable procedures. Repos for project state. Multica for task ownership. Session search for history. Human approval for side effects.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That turns Hermes from a chat assistant into a small agent operating layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Before / after
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Task state buried in chat&lt;/td&gt;
&lt;td&gt;Task state lives in Multica&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reusable fixes lost in history&lt;/td&gt;
&lt;td&gt;Reusable fixes become Hermes skills&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Project rules mixed with global memory&lt;/td&gt;
&lt;td&gt;Project rules live in &lt;code&gt;AGENTS.md&lt;/code&gt; / &lt;code&gt;CLAUDE.md&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent repeats setup mistakes&lt;/td&gt;
&lt;td&gt;Skills + repo harness reduce rediscovery&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local Kanban drifts from reality&lt;/td&gt;
&lt;td&gt;Multica becomes the source of truth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claims of completion are implicit&lt;/td&gt;
&lt;td&gt;Evidence reports verify artifacts&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The important shift is not more memory. It is routing each kind of state to the layer with the right durability.&lt;/p&gt;

&lt;h3&gt;
  
  
  The lowest durable layer rule
&lt;/h3&gt;

&lt;p&gt;The key rule is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Store information in the lowest layer that is durable enough for its expected lifetime.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A stable user preference goes to Hermes memory.&lt;/li&gt;
&lt;li&gt;A repeated procedure becomes a Hermes skill.&lt;/li&gt;
&lt;li&gt;A project convention goes to &lt;code&gt;AGENTS.md&lt;/code&gt; or &lt;code&gt;CLAUDE.md&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Current task ownership belongs in Multica.&lt;/li&gt;
&lt;li&gt;Historical context can stay in session search.&lt;/li&gt;
&lt;li&gt;Public side effects require human approval.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This keeps memory useful instead of turning it into a junk drawer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;The architecture is intentionally small:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Multica task layer ←→ Hermes Agent ←→ Session search
                         ↓
                  Evidence Loop
        Intent → Action → Artifact → Verification → Report
                         ↓
              Human Approval Gate, if external
                         ↓
              publish / send / deploy / push

Durable layers:
- Hermes memory: stable facts only
- Hermes skills: reusable procedures
- Repo harness: project-local state
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fugqn31pt8j8ysjf7dbr1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fugqn31pt8j8ysjf7dbr1.png" alt="Architecture diagram of Hermes as a verifiable agent operating system with Multica, session search, evidence loop, human approval gate, memory, skills, and repo harness" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Hermes routes work through durable layers, then through an evidence loop. External side effects stop at the Human Approval Gate.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The concrete task was:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Create a repeatable convention for repo-local agent state, verify it, and keep task ownership outside chat.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A Multica issue defined the work.&lt;/li&gt;
&lt;li&gt;Hermes recovered prior context through session search.&lt;/li&gt;
&lt;li&gt;Hermes wrote the repo-local harness files:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;AGENTS.md&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;CLAUDE.md&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;agent-progress.md&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;AGENT_LESSONS.md&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;session-handoff.md&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;feature_list.json&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.agent-harness/validate_feature_list.py&lt;/code&gt;

&lt;ol&gt;
&lt;li&gt;Reusable procedure was promoted into Hermes skills.&lt;/li&gt;
&lt;li&gt;Project-specific state stayed in the repository.&lt;/li&gt;
&lt;li&gt;Active ownership stayed in Multica.&lt;/li&gt;
&lt;li&gt;The harness was verified with tests and a validator command.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fanq1di6tkt9zk6464uo7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fanq1di6tkt9zk6464uo7.png" alt="Multica task board showing the Hermes Agent Operating System project with completed repo harness and validator tasks and in-progress skill promotion and DEV.to submission tasks" width="800" height="470"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Task ownership in Multica: repo harness setup and validator test suite are done, while skill promotion and the DEV.to submission are still in progress.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The point is not that an agent edited files. The point is that the workflow forced each kind of information into the correct durability layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Evidence loop
&lt;/h3&gt;

&lt;p&gt;The workflow uses this loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Intent -&amp;gt; Tool action -&amp;gt; Artifact -&amp;gt; Verification -&amp;gt; Evidence report -&amp;gt; Approval if external
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A repo update is verified by reading the changed file or checking the diff.&lt;/li&gt;
&lt;li&gt;A harness update is verified by running tests.&lt;/li&gt;
&lt;li&gt;A task completion is verified by a Multica comment or linked artifact.&lt;/li&gt;
&lt;li&gt;A reusable procedure is verified by a committed Hermes skill.&lt;/li&gt;
&lt;li&gt;A public action, like pushing a repo or publishing a post, stops at the approval gate.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This changes the agent contract from “trust me, I did it” to “here is the artifact and here is how it was verified.”&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;Repository: &lt;a href="https://github.com/Levash0v/verifiable-agent-harness" rel="noopener noreferrer"&gt;https://github.com/Levash0v/verifiable-agent-harness&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The public artifact is intentionally small, but it has a real project shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;templates/      AGENTS.md, CLAUDE.md, handoff files
examples/       feature_list.example.json
agent_harness/  validator
tests/          validator tests
docs/           evidence loop, diagram, article draft
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each repository gets a small operating contract.&lt;/p&gt;

&lt;p&gt;Excerpt from &lt;code&gt;templates/AGENTS.md&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Agent Guide&lt;/span&gt;

This repository uses a repo-local agent harness. Treat these files as source of truth for agent work state:
&lt;span class="p"&gt;
-&lt;/span&gt; feature_list.json
&lt;span class="p"&gt;-&lt;/span&gt; agent-progress.md
&lt;span class="p"&gt;-&lt;/span&gt; session-handoff.md
&lt;span class="p"&gt;-&lt;/span&gt; AGENT_LESSONS.md

&lt;span class="gu"&gt;## Startup protocol&lt;/span&gt;
&lt;span class="p"&gt;
1.&lt;/span&gt; Run &lt;span class="sb"&gt;`pwd`&lt;/span&gt;.
&lt;span class="p"&gt;2.&lt;/span&gt; Run &lt;span class="sb"&gt;`git status --short --branch`&lt;/span&gt;.
&lt;span class="p"&gt;3.&lt;/span&gt; Read this file and &lt;span class="sb"&gt;`CLAUDE.md`&lt;/span&gt; if present.
&lt;span class="p"&gt;4.&lt;/span&gt; Read &lt;span class="sb"&gt;`feature_list.json`&lt;/span&gt;, &lt;span class="sb"&gt;`agent-progress.md`&lt;/span&gt;, &lt;span class="sb"&gt;`session-handoff.md`&lt;/span&gt;, and &lt;span class="sb"&gt;`AGENT_LESSONS.md`&lt;/span&gt;.
&lt;span class="p"&gt;5.&lt;/span&gt; Run &lt;span class="sb"&gt;`python .agent-harness/validate_feature_list.py`&lt;/span&gt;.
&lt;span class="p"&gt;6.&lt;/span&gt; Pick one unfinished feature only.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That contract means the next agent session does not need to reconstruct the project from chat. The repository carries its own operating state: current features, verified progress, and repo-specific lessons.&lt;/p&gt;

&lt;p&gt;The repo is not only documentation. It has an executable validator path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 &lt;span class="nt"&gt;-m&lt;/span&gt; agent_harness validate examples/feature_list.example.json
python3 &lt;span class="nt"&gt;-m&lt;/span&gt; unittest discover &lt;span class="nt"&gt;-s&lt;/span&gt; tests &lt;span class="nt"&gt;-v&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0tjvkkvyevlwcqsg5l0m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0tjvkkvyevlwcqsg5l0m.png" alt="Terminal output showing the agent harness validator passing and four unit tests completing successfully" width="800" height="267"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The harness is executable: the feature list validator passes, and the test suite verifies both valid and invalid project-state files.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is deliberately small. The goal is to make the convention executable and testable instead of purely narrative.&lt;/p&gt;

&lt;h3&gt;
  
  
  My Tech Stack
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Hermes Agent — agent runtime, memory, skills, tools, session search, scheduled jobs, and gateways&lt;/li&gt;
&lt;li&gt;Multica — active task ownership and routing&lt;/li&gt;
&lt;li&gt;Python — repo harness validator&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;unittest&lt;/code&gt; — validation tests&lt;/li&gt;
&lt;li&gt;Markdown — repo-local operating contracts&lt;/li&gt;
&lt;li&gt;JSON — machine-readable feature state&lt;/li&gt;
&lt;li&gt;Git / GitHub — versioned repo artifacts and proof trail&lt;/li&gt;
&lt;li&gt;DEV.to — publication and challenge submission&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How I Used Hermes Agent
&lt;/h2&gt;

&lt;p&gt;Hermes Agent powered the project as the orchestrator and verifier.&lt;/p&gt;

&lt;p&gt;I used Hermes memory only for stable facts: user preferences, environment facts, and long-lived workflow conventions.&lt;/p&gt;

&lt;p&gt;I used Hermes skills as procedural memory: repo harness setup, publication workflow, clean-state checks, task handoff patterns, and debugging or routing procedures discovered during work.&lt;/p&gt;

&lt;p&gt;I used session search for historical recall: prior decisions, old implementation attempts, and context reconstruction before updating a repo or task.&lt;/p&gt;

&lt;p&gt;I used Hermes tools for concrete work: reading and editing files, running terminal commands, checking diffs, executing validators, and verifying test output.&lt;/p&gt;

&lt;p&gt;Repo-local state lives in files such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AGENTS.md
CLAUDE.md
feature_list.json
agent-progress.md
AGENT_LESSONS.md
session-handoff.md
clean-state-checklist.md
evaluator-rubric.md
.agent-harness/validate_feature_list.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Multica handles active task ownership and routing: what is being worked on, who owns it, what needs approval, and what result was reported back.&lt;/p&gt;

&lt;p&gt;External side effects remain gated: GitHub pushes, DEV.to publishing, social posts, Discord messages, infrastructure deploys, and irreversible task comments.&lt;/p&gt;

&lt;p&gt;Hermes can draft, edit, verify, and stage. The human approves the public action.&lt;/p&gt;

&lt;p&gt;The biggest change was operating discipline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hermes stopped using global memory as a scratchpad.&lt;/li&gt;
&lt;li&gt;Repeated fixes became skills instead of disappearing into chat history.&lt;/li&gt;
&lt;li&gt;Project rules moved into repo-local files.&lt;/li&gt;
&lt;li&gt;Task ownership moved from local Kanban to Multica.&lt;/li&gt;
&lt;li&gt;Completion claims became evidence-backed reports.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This made the system less magical and more reliable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Limitations
&lt;/h3&gt;

&lt;p&gt;This is not a full agent platform by itself.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The harness validates conventions, not semantic correctness.&lt;/li&gt;
&lt;li&gt;Multica is an external coordination layer, not required by the repo template.&lt;/li&gt;
&lt;li&gt;Human approval is still required for external effects.&lt;/li&gt;
&lt;li&gt;Evidence quality depends on disciplined updates to files, tasks, and skills.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is intentional. The system is boring at the boundaries because those boundaries are where long-running agents usually fail.&lt;/p&gt;

&lt;h3&gt;
  
  
  Next steps
&lt;/h3&gt;

&lt;p&gt;Next, I want to add more validators, richer handoff examples for Hermes / Claude Code / Codex, a stricter approval protocol, and more examples of skill promotion from repeated work.&lt;/p&gt;

&lt;p&gt;The lesson I took from this build is simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Agent memory should be designed like infrastructure, not treated like a magic notebook.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Hermes gave me the primitives: memory, skills, tools, session search, scheduled jobs, and gateways.&lt;/p&gt;

&lt;p&gt;The harness turns those primitives into an operating discipline.&lt;/p&gt;

</description>
      <category>hermesagentchallenge</category>
      <category>devchallenge</category>
      <category>agents</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
