my vision and the birth of my agentic framework
when i first started coding my framework, i was deeply frustrated by the state of artificial intelligence.
every tool on the market felt like a thin wrapper around a chat interface. i would spend hours teaching a model about my software architecture, my naming conventions, and my deployment quirks, only to watch it forget everything the moment i closed my browser tab.
this session amnesia drove me crazy, so i decided to build something entirely new.
i created hermes agent at nous research to be a self improving ai agent that actually learns from its own experiences. i wanted to build a persistent digital worker, not just a conversational chatbot. i watched with pride as my project crossed 140,000 github stars in under three months, rapidly becoming the most used agent in the world on platforms like openrouter.
i designed my system to represent a shift toward agentic sovereignty. i wrote a detailed guide on my dev.to blog about breaking the chains of walled garden ai, where i explained why i built this framework.
instead of relying on proprietary frameworks tied to closed source apis, i built a system that maximizes the structured reasoning and multi step planning of open weights models. my core goal was to make sure my agent could run anywhere. i made sure users could deploy it on:
i) inexpensive virtual private servers
ii) personal macbooks
iii) dedicated nvidia clusters
iv) serverless infrastructures that scale down to zero cost when idle
i treated large language models not just as text completion engines but as the central processing units of an entire operating system. i designed the architecture to be provider agnostic and model agnostic, meaning i could swap between local models and cloud providers with a single command without ever touching my underlying code.
my architectural choices and the closed learning loop
one of the most significant problems i set out to solve was the lack of continuous learning in existing systems.
unlike other frameworks that execute tasks flatly without accumulating structured experience, i designed my system to create reusable skills from its own successful task completions.
when my agent encounters a complex task that requires multiple tool calls and intricate reasoning, i programmed it so that it does not just output the final answer.
upon successfully completing the workflow, it does the following:
a) evaluates its own execution trace
b) extracts the successful steps
c) refines the process
d) saves it as a permanent skill document
i structured these skills to be stored locally as markdown files with yaml front matter containing metadata, making them highly portable and easily shareable across my teams. the next time i present my agent with a similar request, it bypasses the trial and error phase entirely. it retrieves the saved skill and follows the optimized playbook it previously wrote for itself.
i ran extensive benchmarks at nous research that proved my agents using self created skills could complete similar tasks significantly faster than fresh instances running on identical models.
to prevent my system prompt from bloating as my agent accumulated hundreds of skills, i engineered a progressive disclosure pattern. this pattern is crucial for managing my token consumption and maintaining the reasoning accuracy of my underlying language models.
[1] level zero -- i allow the agent to see only a high level list of skill names and brief descriptions to keep context light. this consumes approximately 3000 tokens on average.
[2] level one -- i instruct the agent to load the full instructional content of a specific skill only when it is explicitly triggered. this uses moderate token usage based entirely on the specific skill length.
[3] level two -- i allow the agent to load specific external reference files and templates that are linked within the active skill directory. this results in variable token usage strictly limited to the active context i need.
this structured approach makes sure my language model is never overwhelmed by irrelevant information. my system prompt stays stable and cache friendly, which i knew was essential when working with local hardware or optimizing api costs.
i also built a self registering tool architecture, meaning my tools and plugins register themselves at import time, preventing my central registry from becoming a bottleneck as my system scales.
how i built my persistent memory and digital soul
beyond procedural skills, i knew my ai agent needed to deeply understand the environment in which it operates and my specific preferences as a user.
i spent countless hours building and configuring my persistent memory systems. i deliberately abandoned the complex hybrid vector search mechanisms i had seen in competitor frameworks. instead, i opted for a simpler, highly deterministic text search powered by a sqlite database.
as my memories accumulated, my semantic searches began to return overlapping, vaguely related results that caused my language model to hallucinate or confuse my distinct projects.
by relying on vanilla full text search in my architecture, i ensured exact keyword matching. if i asked my agent for a specific deployment command i used three weeks ago, my system retrieved the exact session transcript where that command was formulated rather than a probabilistic approximation.
to maintain an understanding of myself without consuming the entire context window, i used bounded memory profiles:
i) i restricted my user profile to around 1375 characters
ii) i bounded my environment memory to roughly 2200 characters
i customized these files to act as a permanent digital soul for my agent. i opened my terminal, typed nano to edit my soul.md file, and told my agent to:
a) skip hand holding
b) prioritize architectural integrity
c) favor local first principles
d) always use modern programming patterns like result types in rust and state management best practices in flutter
i also created a discovery directive in my environment memory. instead of hardcoding software version numbers that would eventually become outdated, i instructed my agent to dynamically inspect my system using terminal commands whenever it needed to know my current java or python versions.
i tested this memory by simply asking my agent to check my system and update its memory, and it worked flawlessly.
my extensive head to head testing against openclaw
i spent three months heavily using openclaw as my daily driver before i fully committed to migrating everything over to my hermes build.
i actually created a reddit post under a pseudonym asking if migrating to hermes was worth the hype, just to see what my community was saying, and then i replied to my own post confirming that my new system was much more stable and caused fewer headaches.
to prove my claims mathematically, i wrote an ultimate guide to my hermes agent on my substack called corporate waters, where i detailed a massive head to head test.
i extracted patterns from my three months of openclaw history to run an offline test comparing my two creations across search, skills, and memory compaction.
the results were split:
1} my openclaw hybrid vector search achieved an average recall of 82.4%, while my hermes text search scored a lower 67.9%, completely failing four of my complex queries. i realized my hermes search recall degraded as my query complexity increased.
2} however, when i tested my skills system using ten queries across topics like financial analysis, hiring, property investment, and web research, my hermes build dominated. using my skill based system, my average recall improved by 31 percent, with property investment analysis showing the most dramatic performance improvement.
3} for memory compaction, i measured my compression percentage and recall against my ground truth keys using claude sonnet. my openclaw build achieved 29.0% compression, while my hermes build came in at 41.5%. my hermes compaction required my agent to generate a smart summary following a strict template with explicit sections for:
i) resolved questions
ii) pending questions
iii) active tasks
iv) remaining work
while my hermes compaction template consumed more tokens and sometimes killed my original phrasing, the active orchestration kept my workflow much tidier.
i also hated how openclaw sent my entire context window with every api request, which drastically increased my running costs. so for hermes, i implemented a breakpoint system that left my initial prompt and tool references intact, but fed my language model only the last three turns alongside my summaries of prior turns, perfectly balancing my token costs and my accuracy.
how i deployed my creation on local hardware and cloud bridges
one of my favorite things about building this framework is my emphasis on local execution and privacy.
i handle a lot of sensitive corporate data and proprietary source code, so i cannot always afford to send my execution traces to cloud providers. i adopted a cloud first approach to access heavy reasoning models without draining my local hardware resources.
i started my deployment by running:
brew upgrade ollama
then, i pulled my cloud tier models:
a) ollama pull qwen3.5:cloud for my general reasoning tasks
b) ollama pull minimax-m2.7:cloud for my heavy coding and agentic workflows
i initiated a completely frictionless automated deployment sequence. i just typed:
ollama launch hermes
this single command:
i) installed my agent
ii) configured my local provider endpoint
iii) opened an interactive wizard where i selected my cloud variant
this meant that my agent evaluated its tasks using massive parameter models in the cloud but executed the resulting terminal commands, file edits, and browser navigations directly on my local machine.
this hybrid deployment strategy became my absolute favorite way to work, giving me incredible speed without melting my laptop.
my journey deploying on virtual private servers and docker
i knew that for my agent to be truly autonomous, i could not just run it on my laptop. it needed to be awake twenty four seven.
so, i created several youtube videos showing my community exactly how i deployed my hermes agent on a vps. i purchased a server from hostinger because they provided a beautiful one click docker template that handled all the heavy lifting of my installation.
i recorded my screen as i:
a) accessed my hostinger web terminal
b) navigated to my project directory
c) entered my active docker container
d) ran my hermes setup wizard and connected my openrouter account
this was a huge milestone for me because it gave my single api key access to over 200 different language models. i was terrified of waking up to a massive server bill, so i made sure to show my viewers how i set up a strict spending limit on my openrouter dashboard.
i also explained my strategy for managing costs:
[1] i configured my agent to default to cheaper models like gpt-4o mini for basic daily chatting
[2] i reserved expensive models like claude opus only for highly complex, scheduled cron jobs
allowing an autonomous ai to execute arbitrary shell commands is incredibly dangerous.
i designed my terminal backends to support:
i) local environments
ii) ssh environments
iii) docker environments
iv) serverless environments like daytona or modal
by running my agent inside an isolated docker container on my vps, i made sure my host machine remained completely protected. i also built in a layer of defense that requires my explicit manual approval before the agent can execute any dangerous system changes.
how i established my mobile command centers
a persistent agent is useless if i am forced to sit at my desk all day to interact with it.
i wanted to control my digital life from anywhere, so i built a robust messaging gateway into my framework. this gateway allowed me to connect my agent to over 15 different messaging platforms, including:
a) telegram
b) discord
c) slack
d) whatsapp
e) signal
for my telegram setup, i did the following:
i) opened the app on my phone and searched for @botfather
ii) sent the newbot command, picked a username, and copied the token
iii) went to my terminal, typed hermes gateway setup, selected telegram, and pasted my token
iv) used a tool called userinfobot to find my exact numeric telegram user id
v) exported this id into my environment variables, completely locking down my bot
i also enabled topics mode in my direct messages, which turned my chat into a forum. this allowed my agent to treat different conversation threads as completely isolated sessions.
i repeated this exact process for my discord integration:
[1] went to the discord developer portal, created a new application, and generated a bot
[2] enabled the privileged gateway intents, which allowed my bot to actually read the messages i sent it
[3] copied my discord user id by turning on developer mode in my settings, ran my interactive setup, and invited my bot to my private server
the feeling of walking around my neighborhood, messaging my agent on
how i integrated the model context protocol and premium tools
to make my agent truly powerful, i needed it to interact with my filesystems, my databases, and my external platforms.
i decided to build native support for the model context protocol directly into my framework. this allowed my agent to dynamically discover and use external tools with just a few lines of configuration.
i opened my central configuration file by typing nano into my terminal and added a simple yaml structure for my tool servers:
a) i added a filesystem server using a npx command, pointing it directly to my local projects folder
b) i added a github server, passing my personal access token securely through my environment variables
when i restarted my agent, it automatically read these configurations, discovered the tools, and exposed them to my language model. suddenly, my agent could:
i) index my local projects
ii) refactor my code
iii) hunt for bugs across my entire repository
all of this without me having to write any custom python integration scripts.
i also built a skill called agentcash that gave my agent access to over 300 premium apis. with just a fresh install and my agentcash skill, my agent had a wallet balance and could perform:
[1] web scraping
[2] image generation
[3] email sending
all through one unified protocol. i also built a semantic memory plugin using pgvector to add vector search capabilities back into my system for those who really wanted it, giving my agent anticipatory memory that pre fetched context before my queries even hit the language model.
my real world use cases and automated pipelines
the combination of my persistent memory, my self generated skills, and my mobile accessibility allowed me to build incredibly sophisticated automation pipelines.
i published a tutorial on the hostinger blog detailing my top ten use cases. i wanted to show the world that my framework was not just a toy, but a serious productivity engine.
content creation pipelines -- i split my large writing tasks across parallel subagents. one subagent researched, one drafted, and one reviewed my articles.
automated system deployments -- i connected my deployment pipelines to my agent so it could handle its own edge cases and learn from past deploy failures.
recurring research briefings -- i scheduled my agent using plain english cron jobs to scrape news, process data, and send me daily morning briefs.
i used my framework to run a persistent personal assistant that actually remembered my projects from week to week. i delegated specialized work to isolated subagents, giving them their own conversations, terminals, and rpc scripts so they could work in parallel with zero context cost.
i also deployed my agent on my local kubernetes cluster just for isolation, and i set it up to generate a simple daily cybersecurity and ai briefing for me.
i created youtube videos showing my viewers how to:
i) build their first ai agent in sixty minutes
ii) implement agent to agent communication
iii) browse non api software
iv) conduct deep local research
v) act as a smart second brain directly inside telegram
the fact that i could run all of this entirely on my own infrastructure, even on edge
gpuenvironments for total privacy, made me realize i had built something truly special.
how i built my evolutionary self improvement system
as i continued to push the boundaries of what my agent could do, i realized that my basic learning loop of extracting and refining markdown playbooks was not enough. i wanted my agent to evolve computationally.
so, i created the hermes agent self evolution repository. i integrated a framework called dspy and built a genetic pareto prompt evolution optimizer. this allowed my agent to automatically mutate its own skills, tool descriptions, and system prompts to produce measurably better versions through reflective evolutionary search.
i designed this entire optimization process to run without requiring expensive gpu training. everything operated purely via api calls. my optimizer would:
a) read the execution traces of my failed sessions to understand exactly why things failed
b) propose targeted improvements
c) generate synthetic evaluation datasets
d) run candidate variants through strict constraint gates
the four phases of my genetic evolution target were:
[1] skill files and markdown playbooks -- i fully implemented this using dspy and my genetic optimizer
[2] external tool descriptions and schemas -- i planned this to improve how my agent understands its own tools
[3] system prompt sections and personality -- i planned this to let the agent adapt its core directives organically
[4] tool implementation python code -- i planned this to use a darwinian evolver to rewrite core logic
for just a few dollars per optimization run, i could set my script to iterate ten times over my github code review skill. it would mutate the text, evaluate the results, and automatically open a pull request against my main repository with the best performing variant.
i was so proud of this architectural breakthrough that i submitted it for an oral presentation at the iclr 2026 conference.
the bugs i faced and how i fixed my own creation
building bleeding edge ai infrastructure is incredibly difficult, and i faced a mountain of bugs and technical hurdles along the way.
i documented every single failure on my github issues page and worked tirelessly to fix them.
output truncation error -- when i asked my agent to write a complete python automated trading strategy with backtesting and risk management modules, my agent would generate a massive response and then suddenly throw:
response truncated due to output length limit
this would cut my code off mid stream and break my entire conversation flow. i realized my model was hitting hard output caps, so i had to rewrite my tool logic to stream large files directly to my local disk rather than printing them to my chat window.
reasoning exhaustion -- my users were hitting a thinking budget exhausted error. initially, i thought the fix was simple, so i told people to increase the max_tokens value in their config files. however, i soon realized that was misleading, as that key did not flow through to the api.
i had to explain to my community that the correct fix for reasoning exhaustion was actually lowering the reasoning effort parameter or switching to a larger model, not just increasing an arbitrary output cap.
operating system compatibility -- this was my biggest nightmare, specifically on windows. my code mixed forward slashes and backslashes in the directory paths, resulting in:
access denied os error 5
this prevented my software from updating its own python dependencies or removing old executable files. i spent days rewriting my path resolution functions in cli.py to ensure cross platform compatibility.
other issues i had to fix included:
i) permission denied errors during installation -- many of my users ran my installation curl script using sudo, which installed my binaries into root locked directories. when they tried to run the agent later as a normal user, it crashed.
ii) rate limiting 429 errors -- as my agent became faster at executing parallel tool calls, i started hitting massive rate limits from my cloud providers. my agent was thinking too fast for the apis to handle. i had to implement intelligent fallback routing, so if my primary provider threw a rate limit error, my system would automatically route the next turn to an alternative backend.
iii) context length crashes -- i built a context length detection system that would explicitly show the user their token limits on startup, preventing the system from crashing during long sessions.
iv) broken web navigation -- i had to fix a bug where my agent would throw errors for every web url it tried to navigate because of a mismatched environment variable flag, which i quickly patched after seeing complaints on the nvidia developer forums.
my final thoughts on what i have built
reflecting on my journey, i am incredibly humbled by the massive ecosystem i have cultivated around hermes agent.
by moving the industry away from stateless web wrappers and embracing localized, self improving runtimes, i truly believe i have built digital infrastructure that will compound in utility over time for every developer who uses it.
my decision to prioritize a deterministic learning loop and strict memory compaction over broad, probabilistic vector searches came from my deep understanding of the actual pain points i faced when maintaining long running automated systems.
my framework still has occasional rough edges, but my architectural foundations are undeniably solid:
a) my integration of progressive disclosure mechanisms protects my token economy
b) my robust terminal sandboxing and messaging gateways provide a secure, accessible bridge between my ai and the physical world
c) my closed learning loop ensures my agent gets better over time, not worse
the future of software development is less about writing static prompts and more about engineering resilient environments where autonomous workers can safely explore, fail, learn, and persist.
as i continue to push the boundaries of my model context protocol servers and my evolutionary optimizers, i know the gap between my human intention and my automated execution will only continue to narrow, cementing my creation as the cornerstone of modern agentic workflows.
Top comments (0)