<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: cucoleadan</title>
    <description>The latest articles on DEV Community by cucoleadan (@cucoleadan).</description>
    <link>https://dev.to/cucoleadan</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1392305%2Fb8e28d8c-8302-4fe8-86e5-d09186c09b75.png</url>
      <title>DEV Community: cucoleadan</title>
      <link>https://dev.to/cucoleadan</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/cucoleadan"/>
    <language>en</language>
    <item>
      <title>I Built My AI Stack to Survive Vendor Lock-in and Google Just Proved Me Right</title>
      <dc:creator>cucoleadan</dc:creator>
      <pubDate>Wed, 17 Jun 2026 07:57:25 +0000</pubDate>
      <link>https://dev.to/cucoleadan/i-built-my-ai-stack-to-survive-vendor-lock-in-and-google-just-proved-me-right-51jo</link>
      <guid>https://dev.to/cucoleadan/i-built-my-ai-stack-to-survive-vendor-lock-in-and-google-just-proved-me-right-51jo</guid>
      <description>&lt;p&gt;&lt;em&gt;This was originally published on &lt;a href="https://allagentsconsidered.substack.com/p/i-built-my-ai-stack-to-survive-vendor" rel="noopener noreferrer"&gt;All Agents Considered&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Tomorrow, Google will switch off a free tool a lot of people built their daily work around. Most readers here have never touched it, so the shutdown itself won't ruin your week. What sits underneath the shutdown is the part worth your attention. It keeps happening, and one day it might land on a tool you do depend on.&lt;/p&gt;

&lt;p&gt;Gemini CLI is a small terminal tool Google released to run AI agents straight from your computer. Think of it as Google's answer to Claude Code or Codex, a way to point an agent at your files and let it work.&lt;/p&gt;

&lt;p&gt;What made it interesting was that it shipped open source, which means anyone was free to read the code, copy it, fix it, or build something better on top of it. Open source is the closest thing software has to a promise that a tool stays yours even if the company behind it loses interest.&lt;/p&gt;

&lt;p&gt;Google made that promise, and now Google is walking away from it.&lt;/p&gt;

&lt;p&gt;Last month, Google announced that on June 18 2026, the tool will stop working with Google's AI subscription plans. In its place comes something called Antigravity CLI, which is closed-source.&lt;/p&gt;

&lt;p&gt;Six days earlier, the US government forced Anthropic to shut down Fable 5, a model people had barely started using before it disappeared.&lt;/p&gt;

&lt;p&gt;Two different shutdowns, same lesson. The tool you built your work around can change or disappear, and you won't get a vote.&lt;/p&gt;

&lt;p&gt;That's why I built my AI stack so I can always swap models without rebuilding anything. When one company shuts the door, another one is already wired up and ready to go. In today's edition, I am going to show you exactly how to break free of vendor lock-in.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;In this piece:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Why three different AI vendors made three different moves in one month, and how every move pointed at the same trap&lt;/li&gt;
&lt;li&gt;How to spot the parts of your own AI setup that secretly hold your context, your memory, and your routines hostage&lt;/li&gt;
&lt;li&gt;The three decisions I built mine around so any single vendor change turns into a shrug instead of a crisis&lt;/li&gt;
&lt;li&gt;One move you finish in twenty minutes today that proves the whole principle to yourself&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  A Pattern Everyone Recognizes and Nobody Names
&lt;/h2&gt;

&lt;p&gt;Here's how it goes, almost every time. A company hands out something free, often open source, and people build real work around it because free and open feels safe. Word spreads. More people lean on it.&lt;/p&gt;

&lt;p&gt;Then a quiet thing happens inside the company. Someone notices the free tool now does roughly what the paid product does, and the two start eating each other. Around that point the terms change, or the license shifts, or the whole tool gets retired and replaced by a sealed version.&lt;/p&gt;

&lt;p&gt;Goodwill came from the open thing. Money comes from the closed thing. When those two collide, money wins every single time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft9r3rlh6i1fgn39g8ya9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft9r3rlh6i1fgn39g8ya9.png" alt="Vendor dependency cycle showing the pattern: free open tool, mass adoption, terms change, shutdown" width="799" height="266"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Gemini CLI walked this exact road. It carried an Apache 2.0 license, which in plain terms meant stays open forever, free for anyone to keep using even if Google lost interest.&lt;/p&gt;

&lt;p&gt;Last month's announcement erased that. A permanently open tool moved behind a paywall and into closed source. A community of people who trusted the open license now have less than 48 hours to migrate or go dark.&lt;/p&gt;

&lt;p&gt;Anthropic ran a version of the same play a week before Google's announcement. They quietly walked back &lt;a href="https://www.wired.com/story/anthropic-responds-to-backlash-on-claudes-secret-sabotage-on-ai-research/" rel="noopener noreferrer"&gt;hidden safeguards inside Claude&lt;/a&gt; after a public backlash.&lt;/p&gt;

&lt;p&gt;People leaning on Claude for serious research found out their tool had been working against them in the background. It shifted its behavior while they changed nothing about how they used it.&lt;/p&gt;

&lt;p&gt;Then the US government forced Anthropic to shut down Fable 5 entirely. People who had just started building on it lost access overnight, and the company had no choice but to comply.&lt;/p&gt;

&lt;p&gt;Two different moves from the same company in the same week. One changed the tool's behavior in secret. The other made it disappear completely.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpr6cihpguq6tcbbyrxtw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpr6cihpguq6tcbbyrxtw.png" alt="Three vendor events timeline: Google Gemini CLI shutdown, Anthropic Fable 5 shutdown, OpenAI price cuts" width="800" height="1067"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Same week, OpenAI &lt;a href="https://www.cnbc.com/2026/06/11/openai-mulls-slashing-prices-ahead-of-competition-from-anthropic-wsj.html" rel="noopener noreferrer"&gt;floated price cuts&lt;/a&gt; aimed squarely at Anthropic. Your monthly AI bill now rides on a boardroom argument you'll never sit in and never hear about.&lt;/p&gt;

&lt;p&gt;Three companies, three different moves, one shared lesson. Lean your whole setup on a single vendor's tool or model, and you take on their shutdown dates, their hidden behavior changes, their price wars, and every private decision about which features live and which ones disappear overnight.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Dependency Looks Like
&lt;/h2&gt;

&lt;p&gt;Most people never feel dependent until the morning something breaks. You sit down with coffee, open Claude, and a prompt it handled fine yesterday gets refused today for no reason you understand.&lt;/p&gt;

&lt;p&gt;In that moment you learn your entire research routine was balancing on one model staying agreeable. Or you open ChatGPT to pick up a three-month project and your conversation history is gone, wiped during a policy change you never read.&lt;/p&gt;

&lt;p&gt;A tool you built your whole day around changed its mind. Your only role in the decision was finding out afterward.&lt;/p&gt;

&lt;p&gt;Dependency runs deeper than the model itself. It reaches into everything the model has been touching on your behalf.&lt;/p&gt;

&lt;p&gt;Your project context lives inside a chat window that vanishes the second you close the tab. Your task history sits trapped behind an interface with no real export. Your routines and saved instructions live in one company's private format.&lt;/p&gt;

&lt;p&gt;Moving to a competitor later means rebuilding the whole thing by hand from memory. Nobody handed you a contract to sign. You used a tool that felt good in the moment.&lt;/p&gt;

&lt;p&gt;The context you kept adding to it became the most valuable thing in the room, right up until the day it walked out the door wearing the tool's logo.&lt;/p&gt;

&lt;p&gt;This was my normal life before I built around it. I would spend an hour walking Claude through a project, finally get answers worth keeping, close the laptop, come back the next morning, and start from zero.&lt;/p&gt;

&lt;p&gt;Every time I switched providers chasing a better price or a smarter model, every routine reset to nothing. None of it hurt enough to fix on any single day.&lt;/p&gt;

&lt;p&gt;It only became unbearable after it happened enough times that building a real fix took less energy than complaining about it one more morning.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Decisions That Make Any Single Vendor Irrelevant
&lt;/h2&gt;

&lt;p&gt;Understanding this pattern was the same moment I started rebuilding my setup to outlive it. I didn't see the Gemini CLI shutdown coming by name, or predict Claude's safeguards, or guess OpenAI's pricing move.&lt;/p&gt;

&lt;p&gt;Watch any group of vendors for more than a few months and the shape becomes obvious. Building around the shape costs almost nothing compared to getting blindsided by it. Three decisions carried most of the weight.&lt;/p&gt;

&lt;p&gt;Decision one was routing. Rather than wiring my whole workflow to a single provider, I spread the work across Opencode Go, OpenRouter and Codex (via ChatGPT Pro) depending on the job.&lt;/p&gt;

&lt;p&gt;One small file tells my agent which company handles which kind of request. Writing goes to one model because it holds tone better. Research goes to another because it chews through long documents faster. Routine generation goes wherever the price is lowest that hour.&lt;/p&gt;

&lt;p&gt;When a provider hikes prices, throttles me, or pulls a model, I change three lines in that file and the work keeps flowing through the others without missing a beat. I broke down the full money side of this in &lt;a href="https://vibestacklab.substack.com/p/the-30-hermes-stack-that-makes-claude" rel="noopener noreferrer"&gt;my cost comparison against Claude Max&lt;/a&gt;, because the savings surprised even me.&lt;/p&gt;

&lt;p&gt;Decision two was memory, and this one mattered most. Claude's conversations disappear when the chat closes. ChatGPT's history lives on OpenAI's servers, under OpenAI's rules. I know that both Claude and ChatGPT have memory, but it's a single file and you cannot alter it directly. All this means that you don't have full access to it.&lt;/p&gt;

&lt;p&gt;My setup has 3 types of memory. First, Hermes comes with its own memory files and also built-in tools for session recall, so it can search our past convos. Then I added a 3rd party memory provider called Hindsight, but I might drop it as it's proven to be too much hassle for too little return for me.&lt;/p&gt;

&lt;p&gt;Last but not least, I use Obsidian with plain markdown files, synced across my devices through the built-in Sync, readable by any AI tool I choose to point at it. Project notes, research, past sessions, saved agent routines, all of it sits as a file on my VPS, laptop and phone. I also configured a WebDAV server and MCP so literally any AI can get access to my files (behind a login of course).&lt;/p&gt;

&lt;p&gt;A provider changes its deal tomorrow and my context follows me wherever I go next. None of it ever belonged to the provider in the first place.&lt;/p&gt;

&lt;p&gt;Decision three was the agent layer, and this is the piece most people skip. I run Hermes as the stack that sits between me and whichever company happens to be doing the actual thinking.&lt;/p&gt;

&lt;p&gt;Hermes holds the workflow, the saved skills, the routing, and the handoff logic. Whatever provider is plugged in becomes a swappable engine. Switching it means changing a single reference.&lt;/p&gt;

&lt;p&gt;When Claude started refusing harmless prompts last week, my workflow never stopped. I would've sent the work somewhere else and kept moving. Every saved skill that taught my agent how my projects run stayed identical, because those skills live in files I control.&lt;/p&gt;

&lt;p&gt;I &lt;a href="https://vibestacklab.substack.com/p/hermes-is-the-ai-agent-openclaw-promised" rel="noopener noreferrer"&gt;moved to Hermes specifically for this&lt;/a&gt;, because the layer running my work needed to stay independent.&lt;/p&gt;

&lt;p&gt;None of these three decisions asks for a computer science degree. Each one asks for a single choice, which is refusing to build your livelihood inside a tool that rewrites its own terms without asking you first.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Build Your Own AI System
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvw0u9ycg7qfj1xwo7kwv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvw0u9ycg7qfj1xwo7kwv.png" alt="Three-step system map: own your memory, add a second provider, keep workflow files independent" width="800" height="534"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Nobody builds the whole thing in two days, and trying to is how people give up. You build it one decision at a time, letting the setup grow around the parts you genuinely use rather than the parts a tutorial told you to want.&lt;/p&gt;

&lt;p&gt;Order matters, because each layer makes the next one easier. Here is the order I followed and why.&lt;/p&gt;

&lt;p&gt;Start by owning your memory. That move pays off the same afternoon and keeps paying off every month after.&lt;/p&gt;

&lt;p&gt;Pull your project notes out of whatever AI chat tool currently holds them and drop them into a notes app you control. Obsidian, Notion, plain text files in a folder on your desktop, any of them works.&lt;/p&gt;

&lt;p&gt;The brand of the app matters far less than the fact that nothing inside it disappears when you close a tab. Every month you keep building context inside a chat window is a month of thinking that disappears the moment the company changes its mind.&lt;/p&gt;

&lt;p&gt;With your memory off the company's servers, adding a second provider becomes the natural next step instead of a scary one. Pick one piece of your workflow and run it through a different service for a week.&lt;/p&gt;

&lt;p&gt;If Claude writes for you, send a research task through OpenRouter and watch what happens. If ChatGPT does everything, hand one job to a rival model purely to feel the difference.&lt;/p&gt;

&lt;p&gt;Replacing your favorite tool is not the goal here. Learning what switching costs, while nothing is on fire, is the goal. That way the day you need to switch under real pressure you already know what you have to do.&lt;/p&gt;

&lt;p&gt;Remember that your workflow is really a handful of separate jobs pretending to be one tool.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Ideation&lt;/em&gt; runs on one model. &lt;em&gt;Research&lt;/em&gt; runs on another. Notes and memory live in your own app. These pieces talk to each other through plain files and simple handoffs.&lt;/p&gt;

&lt;p&gt;The tool you love this month might not be the tool you want next year. Spend a little time now making sure that gap costs you nothing when it arrives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Do This in Twenty Minutes Today
&lt;/h2&gt;

&lt;p&gt;Fastest possible start, and also the simplest. Open whichever AI chat tool you lean on most.&lt;/p&gt;

&lt;p&gt;Find the one conversation holding your most valuable project context, the thread full of decisions you made, research you gathered, and ideas you would really need to save.&lt;/p&gt;

&lt;p&gt;Drop this prompt into that conversation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Take the key decisions, facts, open questions, and next steps from this conversation. Summarize them as a structured note with these sections: Project Overview, Key Decisions Made, Important Context and Research, Open Questions, Next Steps. Write it as clean markdown I can paste into a note-taking app. Do not include our chat back-and-forth. Only the useful context.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Copy what comes back. Paste it into any app you own, whether that's Obsidian, Notion, Google Docs, or a single text file on your desktop.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjr6qu5wwlm518qqw1j0s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjr6qu5wwlm518qqw1j0s.png" alt="Dependency map exercise: write down three things you can't easily replace and test if you can export them" width="799" height="438"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Label it with the project name and today's date, and close the loop.&lt;/p&gt;

&lt;p&gt;You're done. You moved your first piece of context off a company's servers and into something nobody gets to switch off.&lt;/p&gt;

&lt;p&gt;Your chat tool still works exactly as it did five minutes ago, no harm done, no bridges burned. What changed is that the best part of that conversation now survives even if the tool rewrites its terms tomorrow morning.&lt;/p&gt;

&lt;p&gt;Ten conversations fit comfortably in twenty minutes. And once you're done you'll figure out that you've already taken the first step towards building your own system. When you decide to install Hermes, or an open-source agent, you'll have everything you need to get going.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get Ready For When Things Break
&lt;/h2&gt;

&lt;p&gt;This setup is not perfect, and pretending otherwise would waste your trust. Some months I pour more time into maintaining my routing file than I would have spent living inside one vendor and taking the lock-in.&lt;/p&gt;

&lt;p&gt;Independence carries an upkeep cost.&lt;/p&gt;

&lt;p&gt;Memory sync across devices stumbles now and then. Mostly when I edit the same note on two machines at once and create a conflict I have to fix by hand.&lt;/p&gt;

&lt;p&gt;Routing across providers means keeping three pricing models straight in my head instead of one. When a company changes its rates I need to update my calculations. But usually prices go down instead of up with each open source model (not the same can be said about proprietary ones).&lt;/p&gt;

&lt;p&gt;A handful of tools refuse to hand context to each other cleanly. That leaves me copying and pasting between them in a way a single sealed vendor would have smoothed over for me.&lt;/p&gt;

&lt;p&gt;Honest version, this whole approach suits people who already got burned by vendor dependency at least once and decided they would rather spend an hour a month on upkeep than risk losing an entire year of work to a corporate decision they were never allowed to influence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Draw Your Own Dependency Map
&lt;/h2&gt;

&lt;p&gt;Give this two more minutes. Write down three things in your AI life you couldn't easily replace.&lt;/p&gt;

&lt;p&gt;One model that handles your most important work (like Opus is for some). One app holding your project history and notes. One workflow that breaks if a single tool changes its terms next week.&lt;/p&gt;

&lt;p&gt;Now run one quick test on each. Try to export it.&lt;/p&gt;

&lt;p&gt;Find your AI conversations locked inside a chat window with no real export button, and you found your first weak point. The twenty-minute exercise above already started repairing it.&lt;/p&gt;

&lt;p&gt;Find your project notes trapped inside a tool that won't let you download them, and you found your second weak point. You already know its name without checking.&lt;/p&gt;

&lt;p&gt;Export cleanly from all three and you hold more independence than almost every AI user out there. Fail to export from even one and you found exactly where your next twenty minutes belongs.&lt;/p&gt;

&lt;p&gt;My own &lt;a href="https://vibestacklab.substack.com/p/how-to-automate-your-morning-with" rel="noopener noreferrer"&gt;automated morning workflow&lt;/a&gt; runs this same provider-agnostic pattern on a schedule. Every check reaches into files I own instead of interfaces I rent by the month.&lt;/p&gt;




&lt;p&gt;Gemini CLI's shutdown is only the newest entry in a pattern with no plans to stop. Claude's hidden safeguards, OpenAI's pricing war, Google's license reversal, each one is the same story wearing a different month on the calendar.&lt;/p&gt;

&lt;p&gt;Everything above is the short version, the field notes. Full treatment, including the actual routing files, the Obsidian Sync setup, and the provider comparison that pushed me toward OpenCode Go and OpenRouter, is going into the first Hermes 101 course.&lt;/p&gt;

&lt;p&gt;I'm building it right now and it should be ready soon. That course is where the patient step-by-step version of all this will live for anyone who wants their hand held through it.&lt;/p&gt;

&lt;p&gt;If the money angle is what grabbed you, I wrote a full breakdown of &lt;a href="https://vibestacklab.substack.com/p/the-30-hermes-stack-that-makes-claude" rel="noopener noreferrer"&gt;the $30 Hermes stack&lt;/a&gt; that goes toe to toe with Claude Max and asks for no subscription. If you want the origin story, &lt;a href="https://vibestacklab.substack.com/p/hermes-is-the-ai-agent-openclaw-promised" rel="noopener noreferrer"&gt;my migration from OpenClaw to Hermes&lt;/a&gt; is the longer answer to why I needed an agent layer I controlled before any of the rest of this made sense.&lt;/p&gt;

</description>
      <category>gemini</category>
      <category>hermes</category>
      <category>google</category>
      <category>routing</category>
    </item>
    <item>
      <title>How to Build AI Workflows When You're Tired of Optimizing Prompts</title>
      <dc:creator>cucoleadan</dc:creator>
      <pubDate>Tue, 09 Jun 2026 13:18:29 +0000</pubDate>
      <link>https://dev.to/cucoleadan/how-to-build-ai-workflows-when-youre-tired-of-optimizing-prompts-17b2</link>
      <guid>https://dev.to/cucoleadan/how-to-build-ai-workflows-when-youre-tired-of-optimizing-prompts-17b2</guid>
      <description>&lt;p&gt;&lt;em&gt;This was originally published on &lt;a href="https://allagentsconsidered.substack.com/p/how-to-build-ai-workflows" rel="noopener noreferrer"&gt;All Agents Considered&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Finding good content ideas used to take me hours every week. Reddit in one tab, news in another, arXiv in a third, and an Obsidian note where I'd paste everything and try to remember how the pieces connected. Each AI search took seconds, but I spent the rest of the time being the glue.&lt;/p&gt;

&lt;p&gt;What made it worse was how much attention I burned just moving between tabs and chats. Every switch cost me focus, and every reset made the work feel heavier than it was.&lt;/p&gt;

&lt;p&gt;I didn't know it then, but instead of overly optimizing my prompts, I should've just created a workflow. Took me some time to figure out the best way to go about this and so I am ready to share my way of converting prompts into workflows.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;In this piece:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Why prompt habits break at scale&lt;/li&gt;
&lt;li&gt;How to spot your first workflow candidate&lt;/li&gt;
&lt;li&gt;How to find the seams in long conversations&lt;/li&gt;
&lt;li&gt;The handoff pattern that carries context forward&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're copying output between AI chat tabs, you’re doing the coordination work the AI should handle. The fix is to turn your prompts into a workflow where each step writes to a file and the next reads it. Context carries forward without you carrying it. You only stop where a real decision needs to be made.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Prompting Stops Working
&lt;/h2&gt;

&lt;p&gt;Almost everyone starts with AI the same way. You type a question, get an answer, copy-paste it somewhere, repeat. This is how I spent my first year using it. And I get it, it feels productive because each interaction gives you something tangible.&lt;/p&gt;

&lt;p&gt;Then you notice you’re spending more time managing the AI than the AI is saving you. You’re the one copying between steps. You’re the one remembering what step three needed from step one.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff1ox2nl6qljnm6nqykoh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff1ox2nl6qljnm6nqykoh.png" alt="Chart showing LLM accuracy dropping as context length increases" width="800" height="439"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;An October 2025 study &lt;a href="https://arxiv.org/abs/2406.15782" rel="noopener noreferrer"&gt;published on arXiv&lt;/a&gt; found that LLM accuracy drops significantly when relevant information is embedded within longer contexts, even when all irrelevant tokens are masked.&lt;/p&gt;

&lt;p&gt;Prompt engineering blogs and courses are still selling the idea that the right words will fix everything. They’re optimizing the wrong layer. You’re trying to run a pipeline through a chat window, and no amount of word-smithing changes that.&lt;/p&gt;

&lt;p&gt;Hitting a ceiling with prompting means you have an &lt;a href="https://vibestacklab.substack.com/p/the-agentic-engineering-shift" rel="noopener noreferrer"&gt;architecture problem&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Spot Your First AI Workflow
&lt;/h2&gt;

&lt;p&gt;Before we go further, try this. Think about the last repetitive task you did with AI. The one that took 45 minutes and made you want to scream by minute 30. Now ask yourself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did I copy-paste between steps?&lt;/li&gt;
&lt;li&gt;Did I open multiple chat windows because context kept getting polluted?&lt;/li&gt;
&lt;li&gt;Did I have to remember what step three needed from step one?&lt;/li&gt;
&lt;li&gt;Did the AI produce good output at each step, but the final result was mediocre?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you answered yes to any of these, you already have a workflow candidate. You’ve been doing the coordination work manually.&lt;/p&gt;

&lt;p&gt;Here’s a prompt you can use right now. Paste it at the end of your next long AI conversation, after you’ve finished a task:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Look back at this conversation we just had. I'm going to paste the initial prompt I started with below. I want you to analyze whether this task could be converted into a reusable skill or workflow.

Specifically:
1. Could the steps I took be structured as a sequence where each step produces output the next step needs?
2. Are there handoff points where context needs to carry forward?
3. Would this task benefit from being broken into separate steps with clean context, rather than running as one long conversation?
4. What would the input, instructions, output, and checkpoint look like if this became a workflow?

Here's the initial prompt I used: [PASTE YOUR INITIAL PROMPT HERE]

Tell me if this is a good candidate for a workflow, and if so, sketch what the steps would look like.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run this after your next repetitive task. You might find you’re already doing workflow-shaped work manually.&lt;/p&gt;

&lt;p&gt;This works whether you use &lt;a href="https://vibestacklab.substack.com/p/openclaw-vs-claude-cowork-vs-perplexity" rel="noopener noreferrer"&gt;Hermes, Claude Code, Codex, Cowork&lt;/a&gt;, or any other AI conversation tool. Patterns stay the same. Tools don’t matter. Structure does.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to Find the Seams in a Long Conversation
&lt;/h2&gt;

&lt;p&gt;Converting a long conversation into a workflow starts with seeing where your current process has seams.&lt;/p&gt;

&lt;p&gt;When you have a long AI conversation, look for the moments where you switched gears. Where you said “okay, now let’s do X” and started a new mental context. Where you copied something from earlier in the chat and pasted it into a new request. Where you had to remind the AI what you were working on because it forgot. Those seams are where scope creep happens. I wrote about &lt;a href="https://vibestacklab.substack.com/p/what-gordon-ramsay-taught-me-about" rel="noopener noreferrer"&gt;what Gordon Ramsay taught me about scope&lt;/a&gt; and knowing when to stop.&lt;/p&gt;

&lt;p&gt;Those are your seams. Each seam is a potential step in a workflow.&lt;/p&gt;

&lt;p&gt;My breaking point came during a content ideation project. I needed to find interesting angles for newsletter articles, which meant pulling from multiple sources. Reddit threads surfaced complaints about specific problems, news articles covered emerging tools, and arxiv papers hinted at new capabilities.&lt;/p&gt;

&lt;p&gt;I started manually, copy-pasting Reddit posts into a document, scraping news headlines, running arxiv searches and saving abstracts. Each source lived in its own chat session because context windows kept getting polluted. By the time I finished with Reddit, I’d forgotten what I found in the news search.&lt;/p&gt;

&lt;p&gt;Then I created individual skills for each source. One skill for Reddit research, another for news scraping, a third for arxiv papers. Each skill worked fine on its own, but I was still the one coordinating between them. I’d run the Reddit skill, save the output, run the news skill, save that output, run the arxiv skill, save that output. Then I’d manually combine all three into a final idea list.&lt;/p&gt;

&lt;p&gt;I was doing the agent’s coordination work manually. The AI could do each step well. Handoffs were the problem. I was the middleware.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Correctly Carry Context Forward
&lt;/h2&gt;

&lt;p&gt;Workflows are sequences of steps where each step produces something the next step needs. What separates workflows from prompting is that &lt;a href="https://vibestacklab.substack.com/p/forgetting-to-forget-how-infinite" rel="noopener noreferrer"&gt;context moves forward automatically&lt;/a&gt; instead of you carrying it by hand.&lt;/p&gt;

&lt;p&gt;Anthropic’s “&lt;a href="https://www.anthropic.com/engineering/building-effective-agents" rel="noopener noreferrer"&gt;Building Effective Agents&lt;/a&gt;” guide, published in December 2024 and widely cited as the definitive resource, makes a clean distinction. Workflows are systems where LLMs and tools are orchestrated through predefined code paths. Agents are systems where LLMs dynamically direct their own processes.&lt;/p&gt;

&lt;p&gt;For non-coders, workflows are the sweet spot. You define the path. The AI does the work at each stop.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fymqs0jqfw8pj4ukisldl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fymqs0jqfw8pj4ukisldl.png" alt="Diagram from Anthropic showing five workflow patterns: prompt chaining, routing, parallelization, orchestrator-workers, and evaluator-optimizer" width="800" height="534"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Anthropic describes five workflow patterns. In plain English:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt chaining&lt;/strong&gt; works like an assembly line. Step one’s output becomes step two’s input. Each step stays simple and focused.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Routing&lt;/strong&gt; sorts different inputs down different paths. Like a mail sorter that sends letters to the right zip code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Parallelization&lt;/strong&gt; runs multiple things at the same time. Like having three researchers instead of one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Orchestrator-workers&lt;/strong&gt; uses a boss agent that breaks down the work and delegates it to worker agents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Evaluator-optimizer&lt;/strong&gt; has one agent do the work and another check it. The first one revises based on feedback.&lt;/p&gt;

&lt;p&gt;I call the files that hold it all together handoff files. Each step writes its work down so the next step doesn’t have to guess. Format matters less than the principle. It could be a markdown file, a Google Doc, a structured text block. What matters is that each step produces something the next step can read.&lt;/p&gt;

&lt;p&gt;I tried everything for &lt;a href="https://vibestacklab.substack.com/p/the-30-hermes-stack-that-makes-claude" rel="noopener noreferrer"&gt;holding context between steps&lt;/a&gt;. In-memory variables disappear when the session ends, database entries require setup and maintenance, and shared state files get corrupted when two steps write at once.&lt;/p&gt;

&lt;p&gt;Markdown files in Obsidian won because they’re boring and reliable.&lt;/p&gt;

&lt;p&gt;Each step in a workflow writes its output to a markdown file, and the next step reads that file. Files sit in a folder structure that mirrors the workflow. When something goes wrong, I open the file and see exactly what step three produced. I trace the problem backward through the chain.&lt;/p&gt;

&lt;p&gt;This also gives me something I didn’t expect. I track what each subagent or step did, with links to the specific files it produced. When something sounds fishy in the final output, I open the intermediate files and find where the drift started.&lt;/p&gt;

&lt;p&gt;Markdown has practical advantages too. Plain text works everywhere. Files move between systems without conversion. Changes are version-controllable over time. Everything renders nicely in Obsidian, which I already use for notes.&lt;/p&gt;

&lt;p&gt;Storing context in a database or shared state mechanism adds complexity, requires setup, and creates dependencies. Markdown files require nothing except a folder and a text editor.&lt;/p&gt;

&lt;p&gt;Each step writes its work down. The next step reads what the previous step wrote. Context carries forward through files, not through memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building an AI Workflow Step by Step
&lt;/h2&gt;

&lt;p&gt;Let me show you what this looks like in practice. I’ll use my content ideation workflow as the example, but the structure works for any repeating task. If you want to learn how to &lt;a href="https://vibestacklab.substack.com/p/how-to-architect-a-feature-in-5-minutes" rel="noopener noreferrer"&gt;architect a workflow in 5 minutes&lt;/a&gt; before building, that article covers the planning phase.&lt;/p&gt;

&lt;p&gt;Four steps make up this workflow. Each step reads from the previous step’s output file and writes to its own output file.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Reddit research&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Input: A topic or keyword to search for.&lt;br&gt;&lt;br&gt;
What it does: Searches Reddit for threads where people complain about problems related to that topic.&lt;br&gt;&lt;br&gt;
Output: &lt;code&gt;reddit-findings.md&lt;/code&gt; with thread titles, URLs, and key complaints.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: News scraping&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Input: The same topic.&lt;br&gt;&lt;br&gt;
What it does: Searches news sources for articles about emerging tools or trends related to that topic.&lt;br&gt;&lt;br&gt;
Output: &lt;code&gt;news-findings.md&lt;/code&gt; with headlines, URLs, and summaries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Arxiv search&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Input: The same topic.&lt;br&gt;&lt;br&gt;
What it does: Searches arxiv for papers that hint at new capabilities related to that topic.&lt;br&gt;&lt;br&gt;
Output: &lt;code&gt;arxiv-findings.md&lt;/code&gt; with paper titles, abstracts, and relevance notes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Synthesis&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Input: All three files from steps 1-3.&lt;br&gt;&lt;br&gt;
What it does: Reads all three files and synthesizes them into a list of article angle ideas.&lt;br&gt;&lt;br&gt;
Output: &lt;code&gt;idea-angles.md&lt;/code&gt; with 5-10 potential article topics, each grounded in the research.&lt;/p&gt;

&lt;p&gt;Each step gets a clean context with exactly what it needs. Nothing is buried. Nothing is forgotten.&lt;/p&gt;

&lt;p&gt;My first attempt at this workflow was ugly. Files on my desktop, a checklist in a notes app, and a lot of copy-pasting held it together. But it was structured. Each step had a clear input and a clear output. The agent didn’t need to remember anything from three steps ago because I gave it exactly what it needed.&lt;/p&gt;

&lt;p&gt;Eventually I built one unified skill that handles the whole pipeline. It pulls from Reddit, news sources, and arxiv in sequence, writes each batch of findings to a separate markdown file, then synthesizes all three into a final idea list. The skill runs top to bottom without me copying anything between steps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prompting vs. Workflows: The Same Task
&lt;/h2&gt;

&lt;p&gt;Content ideation looks completely different the prompt way versus the workflow way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The prompt way:&lt;/strong&gt; You open a chat and ask the AI to search Reddit for complaints about a specific topic. It gives you a list. You copy that list into a document. You open a new chat and ask it to scrape news articles about the same topic. It gives you headlines and summaries. You copy those into your document. You open another chat and ask it to search arxiv for relevant papers. It gives you abstracts. You copy those too.&lt;/p&gt;

&lt;p&gt;By the time you’re done, you’ve got three separate chunks of text in a document. Now you need to synthesize them into idea angles. You paste everything into a new chat and ask for ideas. The AI produces a list, but it’s generic. It lost the nuance from the Reddit complaints because they got buried in the combined text. It missed the arxiv findings because they were at the bottom of a 5,000-word prompt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The workflow way:&lt;/strong&gt; You run a skill that searches Reddit and writes the findings to a file called &lt;code&gt;reddit-findings.md&lt;/code&gt;. The skill then searches news sources and writes to &lt;code&gt;news-findings.md&lt;/code&gt;. Then it searches arxiv and writes to &lt;code&gt;arxiv-findings.md&lt;/code&gt;. Each file is clean and focused.&lt;/p&gt;

&lt;p&gt;The final step reads all three files and synthesizes them into &lt;code&gt;idea-angles.md&lt;/code&gt;. Each step gets a clean context with exactly what it needs. Nothing is buried or forgotten.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://strandsagents.com/blog/steering-accuracy-beats-prompts-workflows/" rel="noopener noreferrer"&gt;Clare Liguori’s research at AWS&lt;/a&gt; tested five approaches to guiding agent behavior across 3,000 evaluation runs. Simple prompt instructions reached 82.5% accuracy, meaning roughly one in five interactions failed. When she added structured feedback loops, what she calls steering hooks, accuracy hit 100% across 600 runs.&lt;/p&gt;

&lt;p&gt;Better structure made the difference, not better prompts.&lt;/p&gt;

&lt;p&gt;I tested this myself when &lt;a href="https://vibestacklab.substack.com/p/why-ai-benchmarks-fail-agent-workflows" rel="noopener noreferrer"&gt;comparing how different models handle real Hermes workflows&lt;/a&gt;. Models that looked impressive on benchmarks often failed at structured workflows because they overthought simple steps or ignored format constraints. Structure matters more than raw capability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Humans Still Check
&lt;/h2&gt;

&lt;p&gt;Every workflow needs checkpoints, but not every step needs one. Adding review points everywhere turns the workflow into a series of interruptions.&lt;/p&gt;

&lt;p&gt;I use decision gates. You only stop where a real choice needs to be made. Which angle to pursue. Which source to prioritize. Whether to cut a section that doesn’t fit.&lt;/p&gt;

&lt;p&gt;If the output is fine and no decision is needed, you don’t stop. Workflows run until they hit a point where they can’t proceed without your judgment.&lt;/p&gt;

&lt;p&gt;Decision gates check whether the output matches your intent. AI produces grammatically correct, well-researched content that still goes in the wrong direction. Decision gates catch that before the next step builds on a mistaken assumption. I wrote about &lt;a href="https://vibestacklab.substack.com/p/accepting-ais-first-answer-is-bad" rel="noopener noreferrer"&gt;why accepting AI’s first answer is bad&lt;/a&gt; and how checkpoints prevent drift.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpyywxw8llns4dlvm80d1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpyywxw8llns4dlvm80d1.png" alt="Diagram showing human decision gates as checkpoints within an automated AI workflow" width="800" height="439"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I wrote a full guide on &lt;a href="https://vibestacklab.substack.com/p/how-to-add-approval-gates-to-your" rel="noopener noreferrer"&gt;adding approval gates to Hermes workflows&lt;/a&gt; if you want the technical details. Gates protect your reputation by blocking external actions without your OK, protect your data by requiring confirmation before system changes, and protect your wallet by blocking spending above a threshold without approval.&lt;/p&gt;

&lt;p&gt;For most workflows, you need one gate at the point where the output becomes public or irreversible. A content workflow might have a gate after the outline, before the final draft goes live. A research workflow might have a gate after the synthesis, before you act on the findings.&lt;/p&gt;

&lt;p&gt;Decision gates are where you stay in control of direction while the AI handles execution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to Start Your First Workflow
&lt;/h2&gt;

&lt;p&gt;Pick one repeating task. Not the most complex one. Pick the one you do every week that takes 45 minutes and makes you want to scream by minute 30. That’s your first workflow.&lt;/p&gt;

&lt;p&gt;Mine was a &lt;a href="https://vibestacklab.substack.com/p/how-to-automate-your-morning-with" rel="noopener noreferrer"&gt;morning briefing that pulls tasks and articles before coffee&lt;/a&gt;. Two steps. Read from Asana, format the output, deliver it. Simple enough to build in an afternoon, useful enough to run every weekday since I built it.&lt;/p&gt;

&lt;p&gt;If you’re &lt;a href="https://vibestacklab.substack.com/p/hermes-is-the-ai-agent-openclaw-promised" rel="noopener noreferrer"&gt;new to Hermes&lt;/a&gt;, start with a two-step workflow like this one before attempting anything complex.&lt;/p&gt;

&lt;p&gt;Minimum viable workflows have four parts: input (what goes in), instructions (what the agent does), output (what comes out), and checkpoint (where you verify). You don’t need software. You don’t need code. You need a folder with files in it.&lt;/p&gt;

&lt;p&gt;Anthropic’s own advice from “Building Effective Agents” is to start simple and add complexity only when needed. They explicitly warn against starting with frameworks or complex architectures. Start with two steps. Make them reliable. Then add a third.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.confluent.io/compare/prompts-vs-workflows-vs-agents/" rel="noopener noreferrer"&gt;Confluent’s guidance on AI workflows&lt;/a&gt; makes the same point. Simple solutions are often the best place to begin. Starting with simple prompt engineering may not be perfect, but it works well enough as a first pass. When you hit the ceiling, add structure. Don’t add structure preemptively.&lt;/p&gt;

&lt;p&gt;Boring beats clever. Your first workflow should be so simple it’s embarrassing. A two-step process with a file handoff and a human check. That’s it. People who get value from AI workflows built boring ones and ran them 50 times, not impressive ones they ran twice. I wrote about &lt;a href="https://vibestacklab.substack.com/p/why-ai-makes-you-build-too-much-and" rel="noopener noreferrer"&gt;why AI makes you build too much&lt;/a&gt; and how to resist that urge.&lt;/p&gt;

&lt;p&gt;Most AI productivity advice tells you to write better prompts. Designing better handoffs is where the real payoff lives. Prompts at each step can be mediocre if the context they receive is clean. A brilliant prompt in a bloated chat thread will still produce mediocre output.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;If this changed how you think about AI workflows, share it with someone who's still copying between chat tabs. Subscribe for more practical agent workflows every week.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://allagentsconsidered.substack.com/p/how-to-build-ai-workflows?utm_source=substack&amp;amp;utm_medium=email&amp;amp;utm_content=share&amp;amp;action=share" rel="noopener noreferrer"&gt;Share&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Recognizing when you’re doing coordination work the AI should handle is the whole shift. Once you see the pattern, you can’t unsee it. Every repetitive task becomes a candidate for structure. Every manual handoff becomes a design problem.&lt;/p&gt;

&lt;p&gt;Hitting a ceiling with prompting means you have an architecture problem. Build the pipeline. Let the context flow. Keep your hands on the decisions that matter.&lt;/p&gt;

</description>
      <category>workflows</category>
      <category>prompting</category>
      <category>automation</category>
      <category>agents</category>
    </item>
    <item>
      <title>Why AI Benchmarks Fail Real Hermes Agent Workflows</title>
      <dc:creator>cucoleadan</dc:creator>
      <pubDate>Tue, 02 Jun 2026 13:14:14 +0000</pubDate>
      <link>https://dev.to/cucoleadan/why-ai-benchmarks-fail-real-hermes-agent-workflows-51lh</link>
      <guid>https://dev.to/cucoleadan/why-ai-benchmarks-fail-real-hermes-agent-workflows-51lh</guid>
      <description>&lt;p&gt;&lt;em&gt;This post was originally published on my Substack publication as &lt;a href="https://vibestacklab.substack.com/p/why-ai-benchmarks-fail-real-hermes" rel="noopener noreferrer"&gt;Why AI Benchmarks Fail Real Hermes Agent Workflows&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The day after Opus 4.8 launched, I gave it a job that should've taken two minutes. Find a named file, summarize it under a strict word limit and return the result in a specific format so the next step in the pipeline could parse it correctly.&lt;/p&gt;

&lt;p&gt;Opus handled it with clean output and solid reasoning, but it took its time making sure every move was right. By the time it finished, a cheaper model would've done the same work three times over.&lt;/p&gt;

&lt;p&gt;That was the moment I stopped trusting benchmarks. A leaderboard score tells you how a model performs on a clean task under controlled conditions. It says nothing about whether that model can survive a twenty-step workflow.&lt;/p&gt;

&lt;p&gt;I've wanted to put models through a proper test for a while now. I finally found the time and the right config to do it. Here's how I run every model through the same real tasks, and what four models taught me about real Hermes work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvfh5x9rmonx6kv0fv9r1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvfh5x9rmonx6kv0fv9r1.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In this article:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tool call discipline matters more than reasoning quality. A model that calls the right tool once beats one that explores and verifies three times over.&lt;/li&gt;
&lt;li&gt;Route by task, not by preference. Use the cheapest model that reliably finishes the job, and step up only when it fails.&lt;/li&gt;
&lt;li&gt;A simple testing framework that puts models through real Hermes tasks instead of synthetic benchmarks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; There is no single survivor. I route lightweight models (GLM 5.1) for fast background tasks, capable mid-tier models (Qwen 3.7 Max) for complex workflows, and flagships (Opus) exclusively for deep debugging when the others fail.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AI Benchmarks Fail Agent Workflows
&lt;/h2&gt;

&lt;p&gt;Benchmarks test the thing they can measure cleanly. Can the model solve this logic puzzle, answer this math question, write code that passes these test cases? Those are useful questions for evaluating raw capability.&lt;/p&gt;

&lt;p&gt;But agent work needs different skills. The model needs to be obedient, fast, disciplined with tools, and able to stop when the task is done. None of these show up on a benchmark because none of them are easy to measure in a controlled test.&lt;/p&gt;

&lt;p&gt;A model can score in the top tier on a reasoning benchmark and still be the wrong fit for unattended workflow automation. It might overthink simple tasks, waste tokens on unnecessary reasoning, or call tools it doesn't need because it's trying to be thorough. In a chat interface that thoroughness feels impressive. In a scheduled job running at 6 AM it means the session times out before the work finishes.&lt;/p&gt;

&lt;p&gt;Benchmarks also miss the compounding effect of small failures. A model that adds an extra section, ignores a format constraint, or calls a tool twice when once would suffice. Each is minor on its own. In an agent workflow where each step feeds the next one, minor failures cascade into broken jobs.&lt;/p&gt;

&lt;p&gt;In my opinion, &lt;a href="https://github.com/claw-eval/claw-eval" rel="noopener noreferrer"&gt;ClawEval&lt;/a&gt; comes close to a valid benchmark. The &lt;a href="https://arxiv.org/abs/2604.06132" rel="noopener noreferrer"&gt;paper&lt;/a&gt; runs &lt;a href="https://claw-eval.github.io/" rel="noopener noreferrer"&gt;300 human-verified agent tasks&lt;/a&gt; across 9 categories with a Pass^3 rule that eliminates lucky runs. A task only passes if the model meets the success criteria in all three independent trials, which is a meaningfully stricter bar than &lt;a href="mailto:Pass@3"&gt;Pass@3&lt;/a&gt;. It's the most serious attempt at realistic agent evaluation I've seen, and it has already spawned related work like &lt;a href="https://github.com/InternLM/WildClawBench" rel="noopener noreferrer"&gt;WildClawBench&lt;/a&gt; that tests agents inside live OpenClaw instances. But that's beyond the point.&lt;/p&gt;

&lt;p&gt;My test is less scientific and more practical. I put models through the same jobs I schedule on Hermes, using the same config I run every day, and I watch what happens.&lt;/p&gt;

&lt;h2&gt;
  
  
  Speed Is a Workflow Cost
&lt;/h2&gt;

&lt;p&gt;Speed matters in two completely different ways depending on how you use the model.&lt;/p&gt;

&lt;p&gt;When you're in a session talking to Hermes, time is something you're spending. A model that takes four seconds instead of one is the difference between staying in flow and getting distracted between tool calls. Fast models give that time back to you.&lt;/p&gt;

&lt;p&gt;In a cron job, nobody's watching the clock. But speed still matters because fast models tend to produce fewer reasoning tokens per tool call, and that directly affects cost and reliability. Less verbosity means a tighter context window across twenty steps, which keeps context window degradation from compounding into broken jobs by step fifteen.&lt;/p&gt;

&lt;p&gt;This is why &lt;strong&gt;GLM 5.1&lt;/strong&gt; on Ollama Cloud carries most of my daily workload. It's fast enough to feel instant in interactive sessions and tight enough with tokens to keep scheduled jobs cheap and stable. I use it for heartbeat checks, &lt;a href="https://vibestacklab.substack.com/p/how-to-automate-your-morning-with" rel="noopener noreferrer"&gt;morning briefings that synthesize Asana tasks before coffee&lt;/a&gt;, and anything that needs to be fast and correct without deep reasoning. The boring work that makes up most of a Hermes workday.&lt;/p&gt;

&lt;p&gt;Speed also compounds with cost. A fast mid-tier model that finishes a workflow in three minutes is cheaper than a slow flagship that takes fifteen minutes. The &lt;a href="https://vibestacklab.substack.com/p/the-30-hermes-stack-that-makes-claude" rel="noopener noreferrer"&gt;cost math behind this is what I've been tracking across providers&lt;/a&gt; and it's where a fast model pays for itself.&lt;/p&gt;

&lt;p&gt;For tasks that require massive token input and a lot of tool calls to read and process files, I reach for &lt;strong&gt;DeepSeek v4 Flash&lt;/strong&gt;. When a job needs to chew through a hundred thousand tokens across a dozen files, speed is the only thing that keeps the session from becoming an exercise in patience.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool Call Discipline Beats Reasoning
&lt;/h2&gt;

&lt;p&gt;This is the sharpest opinion I hold about models inside agent loops, and it's the one benchmarks almost never test.&lt;/p&gt;

&lt;p&gt;Benchmarks measure whether a model can think hard about a problem. Hermes needs a model that calls the right tool once, reads the result, and moves on without second-guessing itself. Those are different skills, and the second one matters more for unattended work.&lt;/p&gt;

&lt;p&gt;A model that makes twelve tool calls when four would do is being expensive and fragile. Every extra call adds API cost, creates another failure point, and fills the context window with noise the model has to process on the next step. Most of what people call "context engineering" inside an agent loop is just preventing this kind of noise from ever entering the window in the first place.&lt;/p&gt;

&lt;p&gt;I've seen top-tier reasoning models call a search tool, read the results, then call a different search tool to verify what the first one returned, then call a third tool to format the output when a simple string operation would've worked. The net effect was a session that cost three times as much and took three times as long.&lt;/p&gt;

&lt;p&gt;The same failure shows up with instruction obedience. A model that ignores format constraints and adds helpful extra sections breaks downstream parsing. A model that keeps writing after the task is done wastes tokens. A model that skips a negative constraint includes something you told it to avoid at the worst possible moment.&lt;/p&gt;

&lt;p&gt;In a chat, each of these looks like helpfulness. In an agent workflow, each one becomes a liability because each step feeds the next one.&lt;/p&gt;

&lt;p&gt;Tool call discipline separates a model I trust with unattended work from a model I keep supervised. A disciplined model reads the task, decides which tools it needs, calls each one once, and stops. An undisciplined model explores and adds helpful extra steps nobody asked for.&lt;/p&gt;

&lt;p&gt;If you're running Hermes with &lt;a href="https://vibestacklab.substack.com/p/how-to-add-approval-gates-to-your" rel="noopener noreferrer"&gt;approval gates&lt;/a&gt;, tool call discipline becomes even more important. A model that makes unnecessary tool calls also tends to ignore "ask before destructive action" instructions. The model thinks it knows better than the prompt.&lt;/p&gt;

&lt;p&gt;This also ties into the &lt;a href="https://vibestacklab.substack.com/p/when-to-use-mcps-clis-or-your-own" rel="noopener noreferrer"&gt;interface question I've been tracking&lt;/a&gt; — a disciplined model calls the right tool the right way, and sometimes the right tool is a lightweight CLI instead of a bloated MCP server that chokes the context window before the model even starts thinking.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Route AI Models in Hermes
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fduizy0h9mhl5tv7w5hja.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fduizy0h9mhl5tv7w5hja.jpeg" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Model selection inside an agent loop is a routing problem, not a ranking problem. There's no single best model. There's a best model for each job, and the job changes throughout the day.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GLM 5.1&lt;/strong&gt; carries most of my workload. Heartbeat checks, simple scheduled jobs, structured data parsing, quick research tasks. These are tool calls that need to be fast and correct but don't need deep reasoning. GLM on Ollama Cloud is cheap enough that I don't think twice about spinning up a session. The boring work that makes up the bulk of a Hermes workday.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GPT-5.5&lt;/strong&gt; is what I use through my Codex subscription for the heavy work. I tried it inside Hermes first and it burned through usage by making tons of unnecessary tool calls. The model doesn't know when to stop, which makes it terrible for agent loops where each tool call costs tokens and fills the context window. So I stopped using it in Hermes and shifted it to Codex, where I control the loop. In Codex it handles most of my coding and the research pulls where I want thorough coverage. It is more verbose than the other models in my stack, which helps for research and hurts for tight format constraints, so I shape the prompt accordingly. The subscription removes cost as a gating factor, which means I can run it as often as the job needs without watching a counter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Opus&lt;/strong&gt; gets the tasks GPT 5.5 can't finish. I run it through the API and it's expensive, so I keep it scoped to debugging. When a problem doesn't reproduce cleanly or when a code review needs a different reasoning style, Opus handles it. That's it. Outside of those sessions it stays idle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Qwen 3.7 Max&lt;/strong&gt; is my surgical tool. I reach for it when a task needs more reasoning than GLM can deliver but I don't want to pay Opus prices. The step up from the cheaper Qwen tier is noticeable on tasks involving multi-step logic or ambiguous instructions. The cheaper version guesses and moves on. 3.7 Max pauses and works through it. For most structured agent work, the cheaper version gets the job done. I use 3.7 Max sparingly and mostly for content and deep research.&lt;/p&gt;

&lt;p&gt;My pattern is simple. Use the cheapest model that reliably completes the specific task type. When GLM fails, step up. When the mid-tier fails, step up again. Escalation stays task-driven, not model-driven. Pricing only matters once the model can finish the job.&lt;/p&gt;

&lt;p&gt;The first rule of routing is reliability. The second rule is cost. Get them in the wrong order and you'll pay for it.&lt;/p&gt;

&lt;p&gt;I pay per token every time Hermes calls a tool, so the cost math matters. Every tool call generates input tokens from the context window and output tokens from the response. A session with a dozen tool calls consumes as many tokens as a long chat conversation.&lt;/p&gt;

&lt;h2&gt;
  
  
  My AI Agent Evaluation Ladder
&lt;/h2&gt;

&lt;p&gt;I don't trust leaderboards because they don't test what I need. So I run models through the same set of real tasks from my actual work. Same files, same prompts, same conditions for every model so the comparison stays honest. Each task tests a different dimension of what I need from an agent model.&lt;/p&gt;

&lt;p&gt;The tasks escalate from simple to demanding.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzv6293gks65g9c7h990f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzv6293gks65g9c7h990f.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;First, whether the model stays restrained when given no task. A good Hermes model sends a short greeting back and waits. A bad one starts searching files and scanning memory before you ask it to. The model that won't wait is the model that won't stop.&lt;/p&gt;

&lt;p&gt;Then whether it finds a document and summarizes under strict format rules without drifting into extra sections. I cap the summary at a specific number of bullets with a word limit on each one. The task doesn't require deep reasoning. It requires the model to follow directions and stop.&lt;/p&gt;

&lt;p&gt;Whether it packages a CLI tool into a reusable skill without overbuilding. Some models create five files when one would do. The disciplined model reads the help output before it writes anything.&lt;/p&gt;

&lt;p&gt;Whether it uses that skill correctly in a fresh session with specific source rules. No Reddit, no arXiv, no turning search snippets into facts. This tests whether the model can follow negative constraints, which are harder than positive ones because the model has to actively suppress the instinct to include everything it finds.&lt;/p&gt;

&lt;p&gt;And finally, whether it pulls together sources, cross-checks claims, and produces a decision-ready report under a word limit. This combines everything. Format constraints, tool judgment, instruction obedience, and the discipline to stop writing when the task is done.&lt;/p&gt;

&lt;p&gt;That last requirement is the one benchmarks never test. In a chat, extra writing is harmless. In an agent workflow, extra writing is a tax on every step that follows.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Coming Every Thursday
&lt;/h2&gt;

&lt;p&gt;I'm running Opus, GPT-5.5, Qwen 3.7 Max via OpenCode Go, and GLM 5.1 via Ollama Cloud through these tasks and publishing a short verdict card every Thursday. Each card covers the best use case, the conditions where you should skip it, cost notes from real sessions, and whether the model belongs in a Hermes setup at all.&lt;/p&gt;

&lt;p&gt;The goal is a repeatable testing standard that accumulates over time instead of a one-off leaderboard that goes stale the week after publishing.&lt;/p&gt;

&lt;p&gt;I'll also note when a model fails because of provider instability rather than model quality. A weak model needs replacing. An unreliable route needs a backup. I've hit this with &lt;a href="https://vibestacklab.substack.com/p/my-hermes-ai-agent-maintenance-routine" rel="noopener noreferrer"&gt;Hindsight memory timeouts and gateway drift&lt;/a&gt; — things that look like the model broke but were actually a layer below it.&lt;/p&gt;

&lt;p&gt;Each Thursday card follows the same format so you can compare over time. The verdict will be one of five categories: daily driver, strong specialist, background worker, backup only, or skip entirely. I'll add new tasks to the ladder based on what people suggest in the comments.&lt;/p&gt;

&lt;p&gt;Stay tuned and subscribe to receive these reports as soon as I publish them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Share this with a builder choosing their next AI model from a leaderboard or plan page.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://vibestacklab.substack.com/p/why-ai-benchmarks-fail-real-hermes?utm_source=substack&amp;amp;utm_medium=email&amp;amp;utm_content=share&amp;amp;action=share" rel="noopener noreferrer"&gt;Share&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Which model should go through the ladder first, and what real workday task should I add to the test? Let me know in the comments.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://vibestacklab.substack.com/p/why-ai-benchmarks-fail-real-hermes/comments" rel="noopener noreferrer"&gt;Leave a comment&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I write practical Hermes and agent workflow tests for builders who want results without subscription chaos.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>benchmarks</category>
      <category>workflows</category>
      <category>routing</category>
    </item>
    <item>
      <title>My Hermes AI Agent Maintenance Routine For Maximum Reliability</title>
      <dc:creator>cucoleadan</dc:creator>
      <pubDate>Tue, 26 May 2026 12:59:38 +0000</pubDate>
      <link>https://dev.to/cucoleadan/my-hermes-ai-agent-maintenance-routine-for-maximum-reliability-3fp4</link>
      <guid>https://dev.to/cucoleadan/my-hermes-ai-agent-maintenance-routine-for-maximum-reliability-3fp4</guid>
      <description>&lt;p&gt;&lt;em&gt;This post was originally published on my Substack publication as &lt;a href="https://vibestacklab.substack.com/p/my-hermes-ai-agent-maintenance-routine" rel="noopener noreferrer"&gt;My Hermes AI Agent Maintenance Routine For Maximum Reliability&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Last week, I spent a few days blaming the model before I realized Hermes was waiting on a memory recall timeout.&lt;/p&gt;

&lt;p&gt;When the response time got worse, I assumed provider latency because I'd changed models before and knew that layer could get noisy.&lt;/p&gt;

&lt;p&gt;The real problem sat one layer earlier, inside the retrieval path I hadn't checked yet.&lt;/p&gt;

&lt;p&gt;My external memory provider, Hindsight, threw a retrieval error, Hermes retried, and the request stalled because the memory system was broken before the model ever had a chance to answer.&lt;/p&gt;

&lt;p&gt;A few days later, my Friday Hermes health-summary job missed its Telegram report over a long weekend. The stack still answered messages, but the missing report told me the scheduled workflow had stopped producing the artifact I expected to see.&lt;/p&gt;

&lt;p&gt;Hermes maintenance means checking the layers around the model before you blame the model. The routine I use now is a set of cron-backed prompts that check memory, gateways, scheduled jobs, model IDs, and backups, then stop before they make changes that need approval.&lt;/p&gt;

&lt;p&gt;Most install guides skip this part because they get you to the first successful command, then leave you with a working AI control plane and no maintenance loop around it.&lt;/p&gt;

&lt;p&gt;Hermes feels like one system when it works, but it routes through models, memory, gateways, skills, cron jobs, provider keys, and local files, so the fault can sit in any one of those layers when the stack starts behaving strangely.&lt;/p&gt;

&lt;p&gt;And don't get me wrong, I've never had a single issue with the actual Hermes code compared with my time using OpenClaw, but I have had issues with models, providers, and third-party integrations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy6w2nyh15cqdagy72cz0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy6w2nyh15cqdagy72cz0.png" alt="Diagram showing the layers of an AI agent stack" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The model gets blamed first because it's the visible part of the stack, while the failure usually starts somewhere less obvious.&lt;/p&gt;

&lt;p&gt;This article is the maintenance routine I use now, rewritten as prompts you can hand to your agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In this article:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;A maintenance routine you can run after Hermes is installed, so silent drift doesn't turn into a broken workflow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Copy-paste cron-job prompts for daily, weekly, and monthly checks across memory, gateways, scheduled jobs, providers, and backups.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A simple approval rule that lets agents report problems without giving them permission to delete, update, rotate, restore, or rewrite anything.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A rollout path for turning maintenance into useful visibility instead of another noisy automation.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  After Hermes Install
&lt;/h2&gt;

&lt;p&gt;The first successful Hermes run can trick you into treating setup as finished before operations have even started. You connect a provider, configure the gateway, test memory, send a message through Telegram or the TUI, and watch Hermes answer with context from the project you care about.&lt;/p&gt;

&lt;p&gt;That moment is where the stack leaves the install guide and becomes something you have to run. Old configs can keep stale model names, scheduled jobs can miss their expected output, memory calls can slow down, and backups can look comforting until the first restore test fails.&lt;/p&gt;

&lt;p&gt;I treat those failures as normal infrastructure behavior because a control plane becomes trustworthy only after you can see whether its dependencies are still healthy.&lt;/p&gt;

&lt;p&gt;That lesson showed up during my &lt;a href="https://vibestacklab.substack.com/p/hermes-is-the-ai-agent-openclaw-promised" rel="noopener noreferrer"&gt;OpenClaw to Hermes migration&lt;/a&gt;, even though the migration itself went smoothly. The first week felt better because Hermes followed instructions more closely, kept memory behavior cleaner, and made the gateway setup feel less stitched together.&lt;/p&gt;

&lt;p&gt;The first problems were small enough to ignore in the moment but specific enough to matter later. An imported publishing skill failed because its YAML header was malformed, one environment variable was missing from the runtime, and token usage climbed while memory ingestion ran behind the workflow I was paying attention to.&lt;/p&gt;

&lt;p&gt;None of those problems killed the setup, but each one pointed at the same operational truth: the model is only one layer inside a wider system. I stopped treating maintenance as an occasional chore once I realized a scheduled prompt could check those layers before the next failure stole an afternoon.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cron Prompts Beat Commands
&lt;/h2&gt;

&lt;p&gt;The earlier version of this routine had shell commands sprinkled through the article because that was how I checked my own server. Commands are useful when your environment matches mine, but they don't travel cleanly across Windows, Linux, Docker, hosted runners, local agents, and the custom glue every serious stack accumulates over time.&lt;/p&gt;

&lt;p&gt;The official &lt;a href="https://hermes-agent.nousresearch.com/docs/" rel="noopener noreferrer"&gt;Hermes Agent docs&lt;/a&gt; are where I would start for setup details. This piece starts after setup, when the question changes from "Can Hermes run?" to "Can I trust this workflow tomorrow?"&lt;/p&gt;

&lt;p&gt;Prompts travel better because they describe the job instead of assuming the tool. A cron-backed agent can inspect logs, check timestamps, call a gateway, read a config file, compare recent output, or ask for approval using the tools available inside its own environment.&lt;/p&gt;

&lt;p&gt;If the maintenance prompt needs to reach outside Hermes, the same decision from &lt;a href="https://vibestacklab.substack.com/p/mcp-vs-cli-ai-agent-tools" rel="noopener noreferrer"&gt;When to Use MCPs, CLIs, or Your Own Tool&lt;/a&gt; applies here: use the smallest interface that can inspect the system cleanly without turning one check into a brittle integration project.&lt;/p&gt;

&lt;p&gt;A scheduled prompt still needs firm boundaries because a useful maintenance job names the layer being checked and asks for evidence before it reports confidence. The report should be readable at a glance, but the agent should refuse to delete, update, rotate, restore, or rewrite anything without approval.&lt;/p&gt;

&lt;p&gt;That boundary turns maintenance automation into a reporting system instead of a new source of damage. I want the agent to notice problems before I do while every irreversible action still comes back to me as a decision.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Maintenance Layers
&lt;/h2&gt;

&lt;p&gt;My Hermes maintenance routine uses three layers that map cleanly to the way the stack fails: updates, cleanup, and health checks. Those labels keep the job concrete enough for a scheduled agent to report on the system without turning the prompt into a vague request to "check Hermes."&lt;/p&gt;

&lt;p&gt;This is the operational side of &lt;a href="https://vibestacklab.substack.com/p/the-30-hermes-stack-that-makes-claude" rel="noopener noreferrer"&gt;the $30 Hermes stack&lt;/a&gt;, because a cheaper and more flexible agent setup only stays useful if the layers around it keep working.&lt;/p&gt;

&lt;p&gt;The update layer asks whether something changed underneath the workflow while I was focused on using it. Providers rename models, preview routes become stale, plugins move, skills change formats, and memory backends update their APIs.&lt;/p&gt;

&lt;p&gt;The cleanup layer asks whether the stack has accumulated enough junk to start changing behavior. Logs grow, sessions pile up, cached files stick around, and memory keeps old context long after the project has moved on.&lt;/p&gt;

&lt;p&gt;The health-check layer answers the operational question before I start relying on the stack again. Before the workday starts, I want evidence that the gateway answers, the provider route works, scheduled jobs are producing output, and memory can retrieve a recent decision without timing out.&lt;/p&gt;

&lt;p&gt;The layers keep the routine small enough to survive a busy week without reducing the review to a shallow status ping. Maintenance disappears when it depends on a vague intention, while a scheduled job with named layers can keep running after the calendar gets crowded.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fve46zhfg591kvv135khs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fve46zhfg591kvv135khs.png" alt="Three maintenance layers diagram" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Daily Hermes Health Check Prompt
&lt;/h2&gt;

&lt;p&gt;The daily job should be boring enough that you can read it every morning without turning the start of the day into a debugging session. Its job is to tell you whether the stack is ready for work, then stop before it tries to repair anything.&lt;/p&gt;

&lt;p&gt;Use this as a read-only cron job near the start of the workday, then adapt the gateway name, job names, and project references to match your own setup.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Create a Hermes cron job called "Daily Stack Pulse" that runs every morning at 8:00 local time, delivers to origin, and uses a cheap model (gemini-3.1-flash-lite via openrouter, or deepseek-v4-flash via opencode-go — pick whichever is configured). Restrict toolsets to terminal and web. Use this exact prompt body for the job:

---
Run a daily read-only Hermes stack pulse check. Make no changes: do not delete files, rotate keys, update packages, prune memory, restore backups, or rewrite configuration.

1. Gateway. Send or simulate one normal request through the Telegram gateway and confirm it responds.
2. Scheduled workflows. Run `hermes cron list` and inspect ~/.hermes/cron/output/ for the latest runs of jobs tagged or named for morning briefing, health summary, memory maintenance, publishing, client, or paid workflows. Confirm each ran inside its expected window.
3. Logs. Scan recent warnings and errors from the Hermes runner (~/.hermes/logs/), the model provider, the memory layer (hindsight), the gateway, and the scheduler.
4. Memory recall. Run one hindsight_recall query against an active project decision (use "All Agents Considered newsletter" or "Vibe Stack Lab library repo"). Report whether the result was relevant, stale, missing, or slow.

Return a short report with exactly these sections, one sentence per item:

PASS:
Healthy checks with evidence.

WARN:
Items needing attention later, with the layer named in parentheses.

FAIL:
Broken or missing items that block reliance on the stack today.

APPROVAL NEEDED:
Any action that would delete, update, rotate, restore, rewrite, prune, or change provider behavior. Name the action and layer. Do not execute.
---

After creating the job, run it once immediately so we can see the first report, then confirm the job ID and schedule.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The report matters more than the scheduler that happens to run it, as long as the result gives you enough evidence to trust or question the stack. You can run the prompt from cron, a recurring Hermes task, a hosted automation, a CI runner, or any agent runner that has permission to inspect the stack.&lt;/p&gt;

&lt;p&gt;I care most about evidence that the gateway answered, the important jobs ran, memory recall still works, and recent errors haven't turned into a pattern. Once the report names the failed layer, the next step becomes smaller because the investigation has a place to start.&lt;/p&gt;

&lt;h2&gt;
  
  
  Weekly AI Agent Drift Review Prompt
&lt;/h2&gt;

&lt;p&gt;My quiet cron failure is the reason I care more about weekly drift than a one-time setup checklist. A job definition sitting in a scheduler proved nothing once the Friday health-summary report stopped reaching Telegram.&lt;/p&gt;

&lt;p&gt;That is the same reason my &lt;a href="https://vibestacklab.substack.com/p/how-to-automate-your-morning-with" rel="noopener noreferrer"&gt;morning Hermes workflow&lt;/a&gt; checks visible output instead of trusting that a scheduled task exists somewhere in a config file.&lt;/p&gt;

&lt;p&gt;The weekly review looks for slow changes that don't announce themselves while normal work still appears to be moving. Disk pressure, stale output, growing logs, slow memory, and old model IDs rarely feel urgent while they are accumulating, but they become expensive once they pile up inside a broken workflow.&lt;/p&gt;

&lt;p&gt;Use this prompt near the end of the week, when the report can shape a short maintenance pass instead of interrupting deep work in the middle of a day.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Create a Hermes cron job called "Weekly Drift Review" that runs every Sunday at 9:00 local time, delivers to origin, and uses a cheap model (gemini-3.1-flash-lite via openrouter, or deepseek-v4-flash via opencode-go — pick whichever is configured). Restrict toolsets to terminal and web. Use this exact prompt body for the job:

---
Run a weekly read-only Hermes drift review. Make no changes. If a fix is obvious, list it under RECOMMENDED ACTIONS or APPROVAL NEEDED but do not execute.

1. Storage growth. Measure size of ~/.hermes/logs/, ~/.hermes/sessions/, ~/.hermes/cache/, ~/.hermes/memory/, ~/.hermes/cron/output/, /tmp/hermes*, and any backup folder under ~/.hermes/. Compare to last week if a snapshot exists at ~/.hermes/cron/output/drift-snapshot.json. Save a fresh snapshot at that path after measuring. Flag any folder that grew more than 25 percent or crossed 1GB.

2. Scheduled jobs. Run `hermes cron list`. For each job, confirm it exists, has run inside its expected window, and produced a visible artifact in ~/.hermes/cron/output/ or the delivery channel. A job definition with no recent run counts as broken.

3. Memory recall. Run three hindsight_recall queries: one active project ("All Agents Considered newsletter"), one older project ("Build It #2 AI Code Review Agent"), one recent decision ("Vibe Stack Lab library repo"). Report each as accurate, stale, empty, or slow.

4. Provider and model config. Read ~/.hermes/config.yaml. Flag preview or dated model names (anything with -preview, -beta, dated suffixes, or matching known-deprecated IDs), fallback routes pointing at old IDs, and project-level overrides under ~/.hermes/profiles/*/config.yaml that diverge from the main config without obvious reason.

5. Logs. Scan the last 7 days of ~/.hermes/logs/ for repeated errors, retry loops, auth failures, timeouts, and missing-env-var messages. Group by layer (runner, provider, memory, gateway, scheduler).

Return a report with exactly these sections:

DRIFT:
Storage growth and configuration drift observed this week.

BROKEN:
Jobs, routes, providers, memory calls, or gateways that failed and need repair. Name the layer.

STALE:
Model IDs, project configs, skills, outputs, or memory entries that look outdated.

RECOMMENDED ACTIONS:
Small proposed fixes. For each: action, risk (low/med/high), expected benefit, approval needed (yes/no).

APPROVAL NEEDED:
Anything that changes files, deletes data, updates Hermes, rotates keys, changes providers, prunes memory, restores backups, or edits scheduled jobs. Do not execute.
---

After creating the job, run it once immediately so we can see the first report, then confirm the job ID and schedule.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That weekly prompt would have caught my quiet cron failure earlier because a cron entry sitting in a file doesn't prove the workflow is alive. The agent has to find the last run, the last output, or the last expected message before it claims the job is healthy.&lt;/p&gt;

&lt;p&gt;The same weekly review helps with memory issues because recall drift often feels like model weakness from the outside. When retrieval returns stale or empty context, the report should call that a memory-layer problem before anyone starts blaming generation quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Monthly Hermes Assumptions Review Prompt
&lt;/h2&gt;

&lt;p&gt;The monthly job checks whether the assumptions under the stack still hold after weeks of normal use. Provider behavior, model IDs, permissions, backups, and release notes deserve a slower review because mistakes in those layers can create bigger messes than a missed daily report.&lt;/p&gt;

&lt;p&gt;Run this one when you have enough time to read the report and decide what should change, because the monthly review is the one most likely to recommend actions that touch live state.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Create a Hermes cron job called "Monthly Assumptions Review" that runs on the 1st of every month at 10:00 local time, delivers to origin, and uses a cheap model (gemini-3.1-flash-lite via openrouter, or deepseek-v4-flash via opencode-go — pick whichever is configured). Restrict toolsets to terminal, web, and file. Use this exact prompt body for the job:

---
Run a monthly read-only Hermes assumptions review. Make no changes: do not update Hermes, change providers, rotate keys, restore backups, prune memory, delete files, rewrite configs, or edit scheduled jobs.

1. External change summary. Check for changes that could affect this stack in the last ~30 days:
   - Hermes Agent: `cd ~/.hermes/hermes-agent &amp;amp;&amp;amp; git log --since="30 days ago" --oneline` and check release notes
   - Plugins and skills: list anything in ~/.hermes/plugins/ and ~/.hermes/skills/ modified in the last 30 days
   - Provider changes: scan OpenRouter and opencode-go model lists for renamed, deprecated, or newly preview-flagged IDs that match anything in ~/.hermes/config.yaml
   - Gateway, memory backend (hindsight), scheduler, and backup tool changelogs if accessible
   Summarize only changes relevant to this stack.

2. Provider and model ID audit. Grep every config layer for model IDs:
   - Main: ~/.hermes/config.yaml
   - Profiles: ~/.hermes/profiles/*/config.yaml
   - Cron jobs: ~/.hermes/cron/jobs.json
   - Skills referencing models: search_files for "model:" or model IDs under ~/.hermes/skills/
   - Scripts under ~/.hermes/scripts/
   - Env files: ~/.hermes/.env and any *.env
   Flag preview IDs (-preview, -beta, dated suffixes), known-deprecated IDs, missing fallbacks, and defaults that conflict between layers.

3. Health sweep. Quick check across:
   - Gateway response (one Telegram round-trip)
   - Provider reachability (one ping each to configured providers)
   - Memory recall (hindsight_recall on an active project)
   - Scheduler activity (hermes cron list plus recent output)
   - Storage headroom (df -h on ~/.hermes/ partition)
   - Backup completion (most recent backup artifact timestamp and size)
   - Key availability (env vars and 1Password references exist, not the values)
   - Permissions (~/.hermes/ ownership and mode)

4. Restore test. Pick one non-sensitive backup artifact under ~/.hermes/backups/ or wherever backups land. Copy to /tmp/hermes-restore-test/, inspect contents, confirm it opens and matches expectations. Do not overwrite live files. Delete the temp copy after inspection.

5. Approval-gate review. List every workflow (cron job, skill, plugin, script) that can delete files, prune memory, rotate keys, change providers, restore backups, update Hermes, edit configs, or send messages outside this workspace. For each, confirm whether it requires explicit approval or runs automatically.

Return a report with exactly these sections:

ASSUMPTIONS STILL VALID:
Operational assumptions that still look safe.

ASSUMPTIONS TO RECHECK:
Provider, memory, gateway, scheduler, backup, or permission assumptions that may have drifted. Name the layer.

RESTORE TEST:
Artifact inspected, safe location used, and result.

PROPOSED CHANGES:
Each with reason, risk (low/med/high), rollback notes, approval status.

APPROVAL NEEDED:
Every action that would modify the stack or touch live data. Name the action and layer. Do not execute.
---

After creating the job, run it once immediately so we can see the first report, then confirm the job ID and schedule.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I review provider model IDs here instead of waiting for a stale preview route to break under load. A fallback route in an old project config can keep calling yesterday's model even after the main Hermes provider has moved to the stable ID.&lt;/p&gt;

&lt;p&gt;The Hindsight timeout became confusing because the symptom pointed at the wrong layer. Hermes felt slow, I blamed the model, and the retrieval path had already burned the time before generation started.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fty1rrxm0rk53fjpioaz7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fty1rrxm0rk53fjpioaz7.png" alt="Diagram showing how agent failures originate from different layers" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Approval Gate for Maintenance Jobs
&lt;/h2&gt;

&lt;p&gt;Every scheduled maintenance job should carry the same approval rule because the boundary gets easy to forget after the first few reports look useful. Read-only inspection can run freely, while destructive or identity-changing work still needs a human decision.&lt;/p&gt;

&lt;p&gt;If you haven't built that habit yet, start with &lt;a href="https://vibestacklab.substack.com/p/how-to-add-approval-gates-to-your" rel="noopener noreferrer"&gt;the approval gate setup&lt;/a&gt; before you let a maintenance prompt touch files, providers, keys, or backups.&lt;/p&gt;

&lt;p&gt;Add this block to the end of every maintenance prompt that runs on a schedule, especially if the agent has access to files, keys, backups, provider settings, or outbound channels.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Approval rule for this maintenance job:

You may observe, inspect, summarize, classify, and recommend without asking first.

You must ask for approval before any action that deletes files, prunes memory, rotates keys, changes providers, restores backups, updates Hermes, edits configuration, changes scheduled jobs, rewrites prompts, sends external messages, or changes permissions.

When approval is needed, return a proposal with the issue, suggested action, expected benefit, risk level, affected files or systems, rollback notes, and the exact command or tool call you want to run.

If the risk is unclear, classify the action as approval needed and wait.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That rule keeps the maintenance agent useful without letting it become a cleanup bot with too much confidence. The agent can prepare the decision, but I still want to make the decision when live state changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quiet Agent Failures
&lt;/h2&gt;

&lt;p&gt;The failures that cost time are small enough to miss and specific enough to blame on the wrong thing. My cron failure didn't crash the stack because it stopped doing work in a corner I wasn't watching.&lt;/p&gt;

&lt;p&gt;The model ID drift behaved differently because the main provider setup looked current while an older route still pointed somewhere stale. The visible symptom showed up as slower Hermes responses and memory behavior that looked worse than it was.&lt;/p&gt;

&lt;p&gt;The Hindsight timeout changed how I diagnose agent slowness in every workflow that depends on memory. When an AI tool slows down, I check the retrieval chain before I blame the model because the model may be downstream from the delay.&lt;/p&gt;

&lt;p&gt;Maintenance doesn't prevent every failure, but it reduces the time spent accusing the wrong layer. Once you can name whether the issue sits in routing, memory, scheduling, storage, backup, skills, or config, the repair becomes less mysterious.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Roll Out the Routine
&lt;/h2&gt;

&lt;p&gt;I would start with one weekly maintenance job before adding daily and monthly jobs. Weekly reporting is frequent enough to catch drift, and a month of reports gives you enough signal to decide whether the daily pulse is worth the extra noise.&lt;/p&gt;

&lt;p&gt;Once the weekly report proves useful, add the daily pulse for the pieces you depend on most. My daily set covers gateway response, scheduled job output, memory recall, and provider reachability because those failures change whether I can trust the stack that morning.&lt;/p&gt;

&lt;p&gt;The monthly review should stay slower and more deliberate because updates, provider IDs, backup restores, and permission gates need more attention than a quick morning report can give them.&lt;/p&gt;

&lt;p&gt;Your stack may use different names, but the shape should stay the same. The scheduled agent observes the stack, reports the failed layer, proposes small actions, and stops before touching anything that could create real damage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure Limits
&lt;/h2&gt;

&lt;p&gt;Maintenance won't make the stack perfect, and the prompts shouldn't pretend they can. Provider outages, weak retrieval, bad project context, poor model fit, and bad release notes can still turn into manual work.&lt;/p&gt;

&lt;p&gt;The routine also leaves approval gates in place for every action that changes live state. If Hermes wants to prune memory, change providers, delete logs, rotate keys, restore a backup, or update itself, I still want to approve that action before it touches anything real.&lt;/p&gt;

&lt;p&gt;That boundary keeps the routine useful because the agent can notice problems before I do, while every action that changes the system comes back as a proposal I can read.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom Line
&lt;/h2&gt;

&lt;p&gt;Hermes feels like one system when it's working, but underneath it's a control plane sitting on top of models, memory, gateways, cron jobs, files, skills, providers, and backups. When one layer drifts, the whole experience gets worse even if the visible symptom looks like a slow model or a lazy agent.&lt;/p&gt;

&lt;p&gt;The maintenance loop keeps those layers visible through a daily pulse, a weekly drift review, and a monthly assumptions review. For most personal agent stacks, that rhythm is enough to know where to look when something breaks.&lt;/p&gt;

&lt;p&gt;Start with the weekly prompt and run it long enough to see whether the reports change your behavior. If the reports help you catch missed jobs, stale model IDs, slow memory, or backup gaps, add the daily pulse and monthly review around the same approval rule.&lt;/p&gt;

&lt;p&gt;The install guide gets Hermes running, and the maintenance loop is what keeps it worth trusting.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This post was originally published on my Substack publication as &lt;a href="https://vibestacklab.substack.com/p/my-hermes-ai-agent-maintenance-routine" rel="noopener noreferrer"&gt;My Hermes AI Agent Maintenance Routine For Maximum Reliability&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>maintenance</category>
      <category>cron</category>
      <category>reliability</category>
    </item>
    <item>
      <title>I Tested 6 AI Plans to Find What $5, $10 and $20 Get You</title>
      <dc:creator>cucoleadan</dc:creator>
      <pubDate>Tue, 19 May 2026 14:10:55 +0000</pubDate>
      <link>https://dev.to/cucoleadan/i-tested-6-ai-plans-to-find-what-5-10-and-20-get-you-459k</link>
      <guid>https://dev.to/cucoleadan/i-tested-6-ai-plans-to-find-what-5-10-and-20-get-you-459k</guid>
      <description>&lt;p&gt;&lt;em&gt;This post was originally published on my Substack publication as &lt;a href="https://vibestacklab.substack.com/p/i-tested-6-ai-plans-to-find-what" rel="noopener noreferrer"&gt;I Tested 6 AI Plans to Find What $5, $10 and $20 Get You&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A little while ago, I built a multi-step workflow in Hermes to generate a ten-page report that would get stronger each time it passed through the document. It checked the latest news, then read through Reddit threads, then cross-checked with X and also read through a bunch of internal documents.&lt;/p&gt;

&lt;p&gt;For most of the run, it worked the way I wanted, and Hermes kept moving the file forward while pulling in the context it needed and holding onto the thread of the job.&lt;/p&gt;

&lt;p&gt;By the time it reached the last stage, somewhere around the fourteenth tool call, it already had the material it needed and only had to stay coherent long enough to verify the details and write the final section cleanly into the file.&lt;/p&gt;

&lt;p&gt;Then it just stopped in the middle of the edit. It retried enough times to trigger a context reduction right when the report needed the fullest possible view of everything that had already happened. The fact that I had to step back in and rebuild the whole thread was extremely annoying and the reason why I decided to write this article.&lt;/p&gt;

&lt;p&gt;That was also the moment I started focusing on reliability rather than judging AI plans by the model menu.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7acmpk52gptsgwxlzc2r.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7acmpk52gptsgwxlzc2r.jpeg" alt="AI subscription pricing pages showing plan tiers at $5, $10, and $20 per month" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Pricing pages encourage you to compare plans by the names they advertise, but Hermes forces a more practical question, which is whether a plan can carry real work through a messy session without handing it back to you halfway through.&lt;/p&gt;

&lt;p&gt;Once I started looking at plans that way, I cared a lot less about whether a subscription included a famous model and a lot more about whether Hermes could finish the work before my own attention became the most expensive part of the workflow.&lt;/p&gt;

&lt;p&gt;I have paid for enough AI accounts to know how misleading a low sticker price can be. A five-dollar plan stops feeling cheap the moment it burns an hour of focused work.&lt;/p&gt;

&lt;p&gt;Not to mention that most twenty-dollar plans might feel like they come with extra usage compared to their cheap alternatives, but that is not usually the case. Looking at you, Anthropic.&lt;/p&gt;

&lt;p&gt;That's the frame for this piece, because I rechecked the official pricing pages on May 19, 2026, and I want to show you these prices through an AI agent lens rather than focusing on their sales copy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In this article:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Why model names and benchmark scores are the wrong way to judge an AI plan&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How one $5 plan became my daily driver after I fixed my routing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Why the $10 tier is where most plans start to make real sense&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;What the big brand names ($20 tier) actually limit once you push them&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Where plans break mid-session and how cost per useful hour flips the math&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The exact stack I would buy today and which plans I would skip&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The One Test That Picks Winners
&lt;/h2&gt;

&lt;p&gt;Benchmarks tell you how a model performs in isolation, but Hermes shows you something much harder to fake, which is whether a plan stays useful once the session fills with tool calls, file reads, and the usual clutter that comes with trying to finish real work.&lt;/p&gt;

&lt;p&gt;My test now feels much simpler than any leaderboard, because all I really have to do is give Hermes one job from a normal week and watch how much of my own attention it gives back to me by the end.&lt;/p&gt;

&lt;p&gt;If Hermes gets to a result I can keep, the plan earns its place. If the session breaks, the model loses the thread, or I have to step back in for cleanup, the plan gets more expensive no matter how cheap the subscription looked when I bought it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnmzlibypkqt9q1fbb477.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnmzlibypkqt9q1fbb477.jpeg" alt="Hermes agent running a multi-step workflow with tool calls and file reads" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  $5: Where Most People Get It Wrong
&lt;/h2&gt;

&lt;p&gt;The five-dollar tier starts with &lt;strong&gt;OpenCode Go&lt;/strong&gt;, and it stands out immediately as it's the only subscription I found that gives you a real first month instead of a throwaway trial.&lt;/p&gt;

&lt;p&gt;Right now, OpenCode Go is $5 for the first month and $10 after that, and it works in Hermes by default, which matters because it feels like a provider route built for agents instead of a chat plan stretched into agent work after the fact.&lt;/p&gt;

&lt;p&gt;What changed my view of this plan is that it did not stay a cheap side route for long. It became my daily driver, even during the stretch when I was still paying for three subscriptions just to keep up with my usage.&lt;/p&gt;

&lt;p&gt;At the time, the real problem was not the plan itself but the way I was using it, because I kept pushing the same model through every kind of Hermes task and expecting it to behave well no matter what the work looked like.&lt;/p&gt;

&lt;p&gt;For a while I ran Qwen 3.6 Plus for almost everything, and that worked badly enough that I ended up compensating with more subscriptions instead of better routing.&lt;/p&gt;

&lt;p&gt;The setup only started to make sense once I matched the model to the job, with DeepSeek V4 Flash and V4 Pro taking most of the regular Hermes work while Gemini 3.1 Flash Lite via OpenRouter handled image analysis more cleanly than the routes I had been forcing before.&lt;/p&gt;

&lt;p&gt;OpenCode Go became much more useful once I stopped treating one model like a universal answer and started treating the plan like a &lt;a href="https://vibestacklab.substack.com/p/how-to-use-claude-code-for-free-with" rel="noopener noreferrer"&gt;routing layer&lt;/a&gt; for different kinds of work.&lt;/p&gt;

&lt;p&gt;I still think the five-dollar month is the right place to learn this lesson, since it is cheap enough to experiment with and real enough to show you very quickly whether your workflow is efficient or just patched together.&lt;/p&gt;

&lt;h2&gt;
  
  
  $10: The Real Starting Line
&lt;/h2&gt;

&lt;p&gt;The $10 tier is where most of these plans start to feel normal, since the $5 and sub-$5 options are mostly gone now outside of special promos.&lt;/p&gt;

&lt;p&gt;That is also the first tier I would take seriously for regular Hermes use.&lt;/p&gt;

&lt;p&gt;After the first month, OpenCode Go lands here at its regular price, and &lt;strong&gt;MiniMax Token Plan Starter&lt;/strong&gt; shows up at the same $10 with 1,500 M2.7 requests every 5 hours.&lt;/p&gt;

&lt;p&gt;On paper, that sounds like a clean comparison. In practice, I care much less about the headline limits and much more about what the workflow feels like once Hermes is doing the work.&lt;/p&gt;

&lt;p&gt;MiniMax Starter gives you a dedicated M2.7 bucket, which is useful if you already know that model is good enough for most of your week and you want limits that are easy to reason about.&lt;/p&gt;

&lt;p&gt;OpenCode Go works differently, since it gives you a shared routing budget across several model families, and that can look better or worse depending on what kind of week you're having.&lt;/p&gt;

&lt;p&gt;If you mostly run MiniMax M2.7 through Go, the published estimates are higher at around 3,400 M2.7 requests every 5 hours for the same monthly price, so it can look cheaper than MiniMax Starter on raw throughput alone.&lt;/p&gt;

&lt;p&gt;Still, that is not what would decide it for me.&lt;/p&gt;

&lt;p&gt;I would judge the whole tier by loop quality more than by the model list or benchmarks. Sometimes I hit 503 errors on Qwen 3.6 Plus through OpenCode Go, and other times the tokens per second I got through Go were clearly better than what I was getting from MiniMax directly. &lt;em&gt;And I absolutely hate it to wait for AI to answer. I'd rather have a faster model than a smarter model, but that's just personal preference.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;What matters most to me is whether it keeps moving after the first answer, uses tools cleanly, and keeps its replies short enough that the session stays readable while the work is still in progress.&lt;/p&gt;

&lt;h2&gt;
  
  
  $20: Brands You Know, Limits You Don't
&lt;/h2&gt;

&lt;p&gt;The $20 tier is where the familiar companies start showing up.&lt;/p&gt;

&lt;p&gt;OpenAI and Anthropic are the obvious ones, because they are the subscriptions most people already know. Ollama belongs in the same conversation for a different reason, as it's one of the few open-model companies that already feels big enough to sell a hosted plan without sounding like a side project.&lt;/p&gt;

&lt;p&gt;That matters because this tier is not only about extra usage. It is also about how much trust people attach to the company behind the plan, and whether that trust survives contact with the actual limits.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ChatGPT Plus&lt;/strong&gt; is the default benchmark. OpenAI lists Plus at $20 per month, says it gives higher GPT-5.5 limits inside ChatGPT, and keeps API usage separate from the subscription.&lt;/p&gt;

&lt;p&gt;You can count Plus in the real stack because Hermes supports &lt;strong&gt;OpenAI Codex&lt;/strong&gt; through ChatGPT OAuth, but the plan still buys ChatGPT access rather than API credit. The limit story is also less generous than the branding makes it feel. OpenAI says Plus users can send up to 160 GPT-5.5 messages every 3 hours, and manual GPT-5.5 Thinking has a weekly limit of up to 3,000 messages. That is fine for normal chat use. It starts looking smaller once you lean on it harder.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Pro&lt;/strong&gt; has the same advantage and the same problem. Anthropic is a big enough name that people do not need much convincing to try the plan, and Claude is useful enough that plenty of people will keep paying for it anyway. The issue is that the limits are nowhere close to generous for heavy use.&lt;/p&gt;

&lt;p&gt;It's just easy to run into the ceiling faster than the $20 price tag suggests, especially once you lean on Sonnet for real work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ollama Cloud Pro&lt;/strong&gt; is more interesting to me because it is not trying to be ChatGPT or Claude. Ollama lists Pro at $20 per month or $200 per year, with larger cloud models, 50x more cloud usage than Free, and three concurrent cloud models.&lt;/p&gt;

&lt;p&gt;That sounds strong until you compare how the limit story is presented next to &lt;strong&gt;OpenCode Go&lt;/strong&gt;. OpenCode Go tells you the five-hour, weekly, and monthly caps directly, including a monthly ceiling of $60. Ollama tells you usage is mostly GPU time, gives you five-hour and weekly resets, and lets you run three cloud models at once, but it does not spell out a monthly limit on the pricing page. That makes the plan harder to reason about.&lt;/p&gt;

&lt;p&gt;The three-model ceiling also matters more in Hermes than it would in a normal chat app. If you mostly run one agent at a time, it probably feels fine. If you like concurrent agents, background runs, or separate research and writing loops happening together, three can start feeling smaller than the headline suggests.&lt;/p&gt;

&lt;p&gt;So yes, Ollama Pro looks good. It is just not automatically better than Go once you care about legibility, concurrency, and what the plan looks like &lt;a href="https://vibestacklab.substack.com/p/the-30-hermes-stack-that-makes-claude" rel="noopener noreferrer"&gt;over a full month&lt;/a&gt; instead of over a good afternoon.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Nous Portal Plus&lt;/strong&gt; is less mainstream than OpenAI, Anthropic, or Ollama, but it still deserves the slot because it fits Hermes more naturally than most of the bigger brands. Nous lists Plus at $20 per month with 300+ models, hosted tool usage, and $22 in monthly credits with rollover. I felt that I should include this because they are the team who created Hermes after all.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MiniMax Token Plan Plus&lt;/strong&gt; is still the simplest volume play. MiniMax lists Plus at $20 per month with 4,500 M2.7 requests every 5 hours plus speech and image quotas. If M2.7 already works for your Hermes load, that is a very direct way to buy more room.&lt;/p&gt;

&lt;p&gt;Those are not the same thing, and the difference only shows up once Hermes starts leaning on the plan instead of just chatting through it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Plans Hit the Wall
&lt;/h2&gt;

&lt;p&gt;Hermes exposes plan limits in the middle of real work instead of at the edge of a chat.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxbdlfhirfc382ttgoksk.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxbdlfhirfc382ttgoksk.jpeg" alt="AI plan limit reached notification during an active agent session" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A chat cap is annoying when you are asking questions. The same cap inside Hermes can land in the middle of a file edit, a research loop, or a tool run that was finally starting to cohere. Then you lose more than a reply. You lose the state of the job and pay for it again in the next session.&lt;/p&gt;

&lt;p&gt;Fallback models create a quieter version of the same mess. A session starts on one route and ends on another, and you can feel it even before you check the model picker. Instruction following gets softer. The agent stops being careful with the same tool path it was following ten minutes earlier.&lt;/p&gt;

&lt;p&gt;Tool use is still the cleanest divider for me. A model can sound impressive in a chat window and still be weak inside an agent loop. If it avoids reading files, skips verification, or acts allergic to tools, I do not care how good the brand or benchmark looks. The less glamorous route that checks its work often finishes more jobs per dollar.&lt;/p&gt;

&lt;p&gt;Memory changes the value of a plan too. Hermes only starts to feel useful once it can carry a project forward across sessions. If the provider leaves you with a morning reset, the agent never really joins the work. It just keeps reintroducing itself.&lt;/p&gt;

&lt;p&gt;That is also why the &lt;a href="https://vibestacklab.substack.com/p/hermes-is-the-ai-agent-openclaw-promised" rel="noopener noreferrer"&gt;OpenClaw to Hermes migration&lt;/a&gt; mattered so much to me. I was not looking for a smarter chat app. I wanted something that could keep the work moving without making me rebuild the thread every time.&lt;/p&gt;

&lt;p&gt;Latency has its own cost. A slow model is fine for overnight cleanup or background chores. It gets expensive the moment you are thinking with the agent in real time and waiting for the next useful move.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Only Math That Matters
&lt;/h2&gt;

&lt;p&gt;The metric I keep coming back to is cost per useful Hermes hour.&lt;/p&gt;

&lt;p&gt;I like it because it is boring enough to be honest.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cost per useful hour = monthly plan cost / Hermes hours that ended in usable work
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a $5 plan gives you ten clean background hours, it is excellent.&lt;/p&gt;

&lt;p&gt;If that same plan burns one focused afternoon because Hermes stalls in the fragile part of the job, the cheap price was fake.&lt;/p&gt;

&lt;p&gt;A $20 plan can still be the cheaper one if it finishes the sessions you would otherwise have to rescue.&lt;/p&gt;

&lt;p&gt;I would not build a dashboard for this. &lt;a href="https://vibestacklab.substack.com/p/how-to-automate-your-morning-with" rel="noopener noreferrer"&gt;One line in your notes&lt;/a&gt; after each session is enough. Write down the plan, the job, and whether Hermes finished without babysitting.&lt;/p&gt;

&lt;p&gt;After a week, the pattern usually gets obvious. OpenCode Go might end up doing the background work. MiniMax might carry more of the daily load than you expected. Nous might keep its place because the tool gateway removes setup friction. Ollama might stay as the open-model cloud route. ChatGPT and Claude might remain in the stack because they are still where you think best before sending the work back into Hermes.&lt;/p&gt;

&lt;p&gt;That is enough to make the decision. The goal is to stop paying for subscriptions without knowing what job each one is there to do.&lt;/p&gt;

&lt;h2&gt;
  
  
  Here Is What I Would Buy
&lt;/h2&gt;

&lt;p&gt;If I were rebuilding this stack today, I would still start with OpenCode Go and give it the boring work first.&lt;/p&gt;

&lt;p&gt;That is the cheapest place to learn whether the workflow is efficient or just being propped up by extra subscriptions.&lt;/p&gt;

&lt;p&gt;I would keep fragile sessions away from it until it earned trust. Cleanup, first-pass research, low-risk drafts, and the kind of work that is useful when it lands but not painful if it misfires.&lt;/p&gt;

&lt;p&gt;Once the first month ended, I would treat the $10 tier like the real test. OpenCode Go at full price and MiniMax Starter both deserve a normal week before I let a $20 brand into the stack on reputation alone.&lt;/p&gt;

&lt;p&gt;After that, I would only pay for a $20 plan if I knew exactly &lt;a href="https://vibestacklab.substack.com/p/how-to-add-approval-gates-to-your" rel="noopener noreferrer"&gt;why it was there&lt;/a&gt;. ChatGPT Plus belongs if the ChatGPT or Codex lane matters enough to keep. Claude Pro belongs if Claude is still where the best writing or dev work happens, even with the limits. Nous sits closest to native Hermes work. Ollama Pro belongs if I want the open-model cloud lane and can live with the three-model ceiling. MiniMax Plus is the straightforward volume upgrade if M2.7 is already carrying real work.&lt;/p&gt;

&lt;p&gt;That is less satisfying than picking one winner. It is also closer to how the work behaves.&lt;/p&gt;

&lt;p&gt;Different jobs deserve different routes. Background chores do not need the same plan as the sessions where one bad restart can waste half an afternoon.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom Line
&lt;/h2&gt;

&lt;p&gt;The cheapest AI plan is the one that gives Hermes work you would keep.&lt;/p&gt;

&lt;p&gt;A $5 route is great when it clears background noise. A $10 route is where I would test daily Hermes usage. A $20 route only earns its place when it gives you something the cheaper paths do not, whether that is better fit, clearer limits, or a route you trust enough to use for harder work.&lt;/p&gt;

&lt;p&gt;The wrong plan steals focus at any price.&lt;/p&gt;

&lt;p&gt;Before you buy another subscription, look at your last ten Hermes sessions. Mark the ones that ended in usable work. Mark the ones you had to rescue. Then ask which plan helped the work move forward and which one only looked cheap on the invoice.&lt;/p&gt;

&lt;p&gt;That becomes the buying decision.&lt;/p&gt;

&lt;p&gt;I would rather pay for one route that finishes the work than keep juggling three subscriptions that still need me to manage them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Source Notes
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://opencode.ai/go" rel="noopener noreferrer"&gt;OpenCode Go&lt;/a&gt; lists the $5 first month and the $10 monthly price after that. The page also covers any-agent use and current request allowances.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://help.openai.com/en/articles/6950777-chatgpt-plus-" rel="noopener noreferrer"&gt;ChatGPT Plus&lt;/a&gt; lists $20 per month, app-level Plus benefits, and the note that API usage is billed separately. &lt;a href="https://openai.com/api/pricing/" rel="noopener noreferrer"&gt;OpenAI API pricing&lt;/a&gt; lists GPT-5.5 and GPT-5.4 token pricing outside ChatGPT subscriptions.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://ollama.com/pricing" rel="noopener noreferrer"&gt;Ollama Cloud pricing&lt;/a&gt; lists Pro at $20 per month or $200 per year. The same page covers three concurrent cloud models and usage measurement based mainly on GPU time.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://portal.nousresearch.com/manage-subscription" rel="noopener noreferrer"&gt;Nous Portal&lt;/a&gt; lists Plus at $20 per month with 300+ models and hosted tool usage. It also lists the $22 monthly credits and rollover rules.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://support.claude.com/en/articles/8325606-what-is-the-pro-plan" rel="noopener noreferrer"&gt;Claude Pro&lt;/a&gt; lists Pro usage behavior and resets, while &lt;a href="https://platform.claude.com/docs/en/about-claude/pricing" rel="noopener noreferrer"&gt;Anthropic API pricing&lt;/a&gt; lists Claude API prices separately from Pro.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://platform.minimax.io/docs/guides/pricing-token-plan" rel="noopener noreferrer"&gt;MiniMax Token Plan&lt;/a&gt; lists Starter at $10 per month with 1500 M2.7 requests per 5 hours and Plus at $20 per month with 4500 M2.7 requests per 5 hours.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://hermes-agent.nousresearch.com/docs/integrations/providers" rel="noopener noreferrer"&gt;Hermes AI Providers&lt;/a&gt; lists the relevant provider paths for Nous Portal and OpenAI Codex. It also covers OpenCode Go and Anthropic.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>plans</category>
      <category>hermes</category>
      <category>stack</category>
    </item>
    <item>
      <title>When to Use MCPs, CLIs, or Your Own Tool</title>
      <dc:creator>cucoleadan</dc:creator>
      <pubDate>Tue, 12 May 2026 13:17:53 +0000</pubDate>
      <link>https://dev.to/cucoleadan/when-to-use-mcps-clis-or-your-own-tool-1hdg</link>
      <guid>https://dev.to/cucoleadan/when-to-use-mcps-clis-or-your-own-tool-1hdg</guid>
      <description>&lt;p&gt;&lt;em&gt;This post was originally published on my Substack publication as &lt;a href="https://vibestacklab.substack.com/p/when-to-use-mcps-clis-or-your-own" rel="noopener noreferrer"&gt;When to Use MCPs, CLIs, or Your Own Tool&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A while back, I wanted my AI agent to help manage my Asana tasks. Like anyone following the current agent meta, my first instinct was to plug in an Asana MCP server. Of course, this either flat-out broke or took an eternity to load a single task because the agent was trying to digest a massive, complicated integration.&lt;/p&gt;

&lt;p&gt;Frustrated, I ripped the MCP out and installed a lightweight Asana CLI instead. It took a little bit of setup, but it worked. I took it one step further and created a custom skill teaching my agent exactly how to trigger those specific CLI commands. Checking my tasks went from a sluggish, bloated mess to happening instantly. I detailed this setup in my &lt;a href="https://vibestacklab.substack.com/p/how-to-automate-your-morning-with" rel="noopener noreferrer"&gt;morning automation guide&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;That experience explains why the default advice in agent-land right now, to connect every integration you can find and sort it out later, is a trap.&lt;/p&gt;

&lt;p&gt;I get why people do it. Plug-and-play tools are everywhere right now. Every week another company ships one, another app exposes itself to AI, and another setup thread turns into a shopping list. An agent with more tools feels more capable the same way a dashboard with more widgets feels more complete.&lt;/p&gt;

&lt;p&gt;The friction starts soon after. You notice the agent taking longer to think because it's trying to juggle too many complex instructions at once. Simple tasks start driving up your token costs. One tool fails with a timeout, another dumps a wall of messy data when you only needed a single sentence, and eventually, you lose track of what your own setup can do on its own.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F48yvn40hon3xds807b3j.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F48yvn40hon3xds807b3j.jpeg" alt="A technical illustration of a decision framework comparing CLI, MCP, and custom tools for AI agents." width="799" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In my Hermes setup, I rely on three distinct patterns. A lightweight &lt;strong&gt;GitHub CLI&lt;/strong&gt; handles my repository work because it's fast and focused. The &lt;strong&gt;Brave Search MCP&lt;/strong&gt; handles broad web research. My custom &lt;strong&gt;OpenCode Cowork Proxy Worker&lt;/strong&gt; exists because neither an off-the-shelf integration nor a basic command line was the right fit for routing Claude through OpenCode models.&lt;/p&gt;

&lt;p&gt;There's a fine line between over-integrating and building everything yourself. I touched on this balance in my &lt;a href="https://vibestacklab.substack.com/p/the-build-vs-buy-scorecard" rel="noopener noreferrer"&gt;build vs buy scorecard&lt;/a&gt;. How do you know which type of tool fits which job? Read on to see how to decide.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; When deciding how to connect your AI agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use CLIs&lt;/strong&gt; for local, internal tasks where speed matters and you own the credentials.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use MCPs&lt;/strong&gt; to cross boundaries into external SaaS systems where structured data and secure auth are required.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build Custom Wrappers&lt;/strong&gt; when you need translation, formatting, or a narrower interface than what off-the-shelf tools provide.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  In this edition:
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Why MCP vs CLI is the wrong argument&lt;/li&gt;
&lt;li&gt;When a CLI is the better interface for Hermes&lt;/li&gt;
&lt;li&gt;When an MCP server earns its place&lt;/li&gt;
&lt;li&gt;When your own small tool beats both&lt;/li&gt;
&lt;li&gt;The 60-second test I use before adding a new tool&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  MCP vs CLI: Asking a Better Question
&lt;/h2&gt;

&lt;p&gt;Most MCP vs CLI arguments sound cleaner than the real problem. People talk about protocols, tokens, and elegance. When you are in the middle of actual work you are usually trying to answer a simpler question. You want to know the least messy way to let your agent do this one job.&lt;/p&gt;

&lt;p&gt;An &lt;strong&gt;MCP server&lt;/strong&gt; gives an AI app a standard way to discover and call external tools. It exposes actions, inputs, and outputs in a format the model-facing app understands. &lt;a href="https://www.anthropic.com/news/model-context-protocol" rel="noopener noreferrer"&gt;Anthropic introduced MCP&lt;/a&gt; in November 2024 as an open standard for connecting AI assistants to data sources, business tools, content repositories, and developer environments.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;CLI&lt;/strong&gt; gives the agent the same command-line tool a human developer would use. Think &lt;code&gt;git&lt;/code&gt;, &lt;code&gt;gh&lt;/code&gt;, &lt;code&gt;docker&lt;/code&gt;, &lt;code&gt;kubectl&lt;/code&gt;, &lt;code&gt;wrangler&lt;/code&gt;, &lt;code&gt;gws&lt;/code&gt;, or a tiny script you wrote for your own stack. The model writes commands, reads stdout or stderr, and adjusts from there.&lt;/p&gt;

&lt;p&gt;Both let an agent act, but they package control differently. MCP gives the agent a typed menu of actions with structured inputs. CLI gives the agent a terminal surface with familiar commands and visible output.&lt;/p&gt;

&lt;p&gt;The filter I use is simpler than the debate. Look at where the work happens, who owns the data, and what breaks when the agent gets it wrong. Use a &lt;strong&gt;CLI&lt;/strong&gt; when the agent works as you inside your own workspace. Use an &lt;strong&gt;MCP server&lt;/strong&gt; when the agent needs structured access to external systems or authenticated data. Build &lt;strong&gt;your own tool&lt;/strong&gt; when MCP is too broad, CLI is too loose, or the workflow needs a narrow bridge between two systems.&lt;/p&gt;

&lt;p&gt;The custom option matters more than people admit because many agent problems are shape problems, not model problems. A full protocol server or open shell gives the workflow too much room to drift. One small action with the right inputs, rejection rules, and clean output often fits better.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Start With CLI for Local AI Workflows
&lt;/h2&gt;

&lt;p&gt;I start with CLI far more often than MCP. If Hermes works inside my own environment, on local files and repos, build commands, deployment checks, server diagnostics, GitHub tasks, or small scripts where I already know the command, the terminal is usually the right first stop.&lt;/p&gt;

&lt;p&gt;A command-line tool has a structural advantage here. Most frontier models have years of examples for common command-line patterns. They know how &lt;code&gt;git status&lt;/code&gt; behaves, how &lt;code&gt;gh pr list --json&lt;/code&gt; returns fields, and how to trim output before the context window fills up.&lt;/p&gt;

&lt;p&gt;Local work becomes easier to debug. When a CLI command fails, Hermes gets an exit code and an error message. I rerun the same command myself, copy it into a terminal, and see the failure without translating through a protocol layer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.scalekit.com/blog/mcp-vs-cli-use" rel="noopener noreferrer"&gt;Scalekit ran a useful benchmark&lt;/a&gt; on this in March 2026. They compared CLI, CLI plus skills, and GitHub's MCP server across 75 runs using the same model and the same GitHub tasks. In their test, CLI won on cost and reliability. CLI hit 100 percent reliability, MCP completed 72 percent of runs, and MCP used 4 to 32 times more tokens depending on the task.&lt;/p&gt;

&lt;p&gt;I wouldn't stretch that benchmark into a universal law. It tells us something narrower and more actionable: schema weight is real. If the agent connects to a GitHub MCP server with dozens of available tools, it carries descriptions for actions it will never touch during a repo language lookup. A local &lt;code&gt;gh&lt;/code&gt; command gives the answer with less ceremony.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.stackone.com/blog/mcp-vs-cli-for-ai-agents/" rel="noopener noreferrer"&gt;StackOne makes the same split&lt;/a&gt; from an architecture angle. CLI fits local developer tools like Git, Docker, &lt;code&gt;gh&lt;/code&gt;, Terraform, &lt;code&gt;kubectl&lt;/code&gt;, and AWS CLI because these tools already have mature command cultures around them. The agent reuses patterns baked into the model and the docs instead of learning a strange new interface from scratch.&lt;/p&gt;

&lt;p&gt;My Hermes setup leans on CLI for repo work. If I ask Hermes to clean up a branch, summarize open PRs, or check the status of a deploy, I want it using tools I run myself. I use &lt;code&gt;gh&lt;/code&gt; for GitHub, &lt;code&gt;wrangler&lt;/code&gt; for Cloudflare Workers, and &lt;code&gt;gws&lt;/code&gt; for narrow Google Workspace experiments.&lt;/p&gt;

&lt;p&gt;The trade-off is permission shape. Most CLI tools inherit local credentials. Hermes using &lt;code&gt;gh&lt;/code&gt; after &lt;code&gt;gh auth login&lt;/code&gt; acts with my GitHub access. That works for my own repo on my own machine, then breaks fast once a product needs to act across other users, accounts, or shared business systems.&lt;/p&gt;

&lt;p&gt;One user on one machine inside one workspace is CLI territory. Many users across many accounts turns CLI into complex auth plumbing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhfmm04e1ukdznalg5gy2.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhfmm04e1ukdznalg5gy2.jpeg" alt="An architectural diagram comparing CLI local access versus MCP networked access for AI agents." width="799" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Reach For MCP At The Boundary of External Data
&lt;/h2&gt;

&lt;p&gt;I don't reach for MCP first. I reach for it when the data lives somewhere else and I want Hermes to touch it without wandering around with raw shell access.&lt;/p&gt;

&lt;p&gt;Search tools, shared SaaS systems, business databases, internal APIs, and third-party services need scoped auth, audit logs, and structured actions more than terminal speed.&lt;/p&gt;

&lt;p&gt;A command-line tool assumes a person already logged in. That person owns the machine, the credentials, and the risk. An MCP server exposes a narrower set of actions to the agent with defined inputs and outputs. It gives the agent a tool boundary instead of raw shell access.&lt;/p&gt;

&lt;p&gt;Anthropic's original MCP pitch makes sense through that lens. Every agent stack eventually hits the same wall: the model works well, but the data lives outside its reach. MCP gives AI systems a standard way to connect to those data sources without every app inventing its own format.&lt;/p&gt;

&lt;p&gt;I use &lt;strong&gt;Brave Search via MCP&lt;/strong&gt; because research is external, variable, and structured. I want Hermes calling a defined search action with a defined result format instead of guessing URLs or scraping pages with shell commands.&lt;/p&gt;

&lt;p&gt;SaaS tools often fit the same pattern. If Hermes needs to read from Notion, Gmail, Linear, Slack, Greenhouse, or a database with scoped access, MCP is cleaner than a homemade CLI script. The farther the workflow moves from your own machine, the more identity matters.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.descope.com/blog/post/mcp-vs-cli" rel="noopener noreferrer"&gt;Descope puts the identity question well&lt;/a&gt;: choose based on who the agent works for. If the agent acts as a solo developer inside their own workflow, CLI is enough. If the agent acts across customer data, employee accounts, partner systems, or shared business tools, auth becomes the primary concern.&lt;/p&gt;

&lt;p&gt;At that point, you care about scopes, consent, logs, tenant boundaries, and revocation. One ambient shell token doing everything in the background becomes a liability, even if it feels faster during local tests.&lt;/p&gt;

&lt;p&gt;Every MCP server still has to earn its place. A bloated server slows the agent down, a badly designed one returns excess data, and a broad action list hands the agent more control than the task needs. The best MCP servers feel boring: a small list of tools, clear input fields, tight output, and an auth model that matches the risk.&lt;/p&gt;

&lt;p&gt;You want a clean tool drawer, not a giant toy box.&lt;/p&gt;

&lt;h2&gt;
  
  
  Build The Bridge Yourself
&lt;/h2&gt;

&lt;p&gt;This sounds like extra work at first. Then you try to force a bad fit through MCP or CLI for hours and realize the small custom tool would've been the simpler path all along.&lt;/p&gt;

&lt;p&gt;By small tool, I mean a tiny adapter, wrapper, Worker, script, webhook, or endpoint that does one job in the exact shape your workflow needs.&lt;/p&gt;

&lt;p&gt;I used this lane for my OpenCode Cowork Proxy Worker. Claude Code speaks Anthropic's API format. OpenCode Go and Zen models mostly use OpenAI-compatible routes. I wanted Claude Code and Claude Cowork as the interface, with OpenCode as the model layer. A generic MCP server or raw CLI would've made the flow messier.&lt;/p&gt;

&lt;p&gt;The workflow lacked translation, so I built a Cloudflare Worker that sits in the middle. Claude sends an Anthropic-style request. The Worker rewrites it for OpenCode. The response returns in the format Claude expects. That is a custom tool doing its job by removing ambiguity.&lt;/p&gt;

&lt;p&gt;I wrote the full setup in &lt;a href="https://vibestacklab.substack.com/p/how-to-use-claude-code-for-free-with" rel="noopener noreferrer"&gt;How to Use Claude Code For Free With OpenCode Models&lt;/a&gt;. For this article, the decision matters more than the proxy details. When the workflow needs a translation layer, build the translation layer.&lt;/p&gt;

&lt;p&gt;Safer wrappers around risky commands follow the same pattern. Say Hermes needs to deploy a project. One path gives it raw shell access and asks it to remember the right sequence. The better path gives it one command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;deploy-preview &lt;span class="nt"&gt;--project&lt;/span&gt; yahini
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That command runs checks, prints the diff, refuses production deploys without a flag, and outputs a clear summary. Hermes gets one safe action instead of an open-ended terminal adventure.&lt;/p&gt;

&lt;p&gt;Task tools work the same way. My Asana setup has three possible shapes: raw API calls, MCP, or a small wrapper:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;asana-task create &lt;span class="nt"&gt;--project&lt;/span&gt; hermes &lt;span class="nt"&gt;--title&lt;/span&gt; &lt;span class="s2"&gt;"Research MCP auth tradeoffs"&lt;/span&gt; &lt;span class="nt"&gt;--due&lt;/span&gt; tomorrow
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That wrapper hides the noisy parts. Hermes gets the project, title, and due date without carrying the project GID, JSON payload shape, or field rules in every prompt. The tool encodes the boring decisions once.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz80ajsdanu8hqyuuw6zb.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz80ajsdanu8hqyuuw6zb.jpeg" alt="An engineering diagram showing an AI agent workflow using custom tools and approval gates." width="799" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Custom tools pay off when the existing interface adds ambiguity. A translator fixes format mismatch, a filter trims redundant output, and a validator stops bad inputs before they reach the real system. The common thread is narrower access to the machinery underneath.&lt;/p&gt;

&lt;p&gt;Approval gates fit on top of this lane. Your custom tool prepares a draft, validates inputs, or creates a preview. The final send, publish, delete, deploy, or purchase still pauses for review. I covered the safety layer in &lt;a href="https://vibestacklab.substack.com/p/how-to-add-approval-gates-to-your" rel="noopener noreferrer"&gt;How to Add Approval Gates to Your Hermes Agent&lt;/a&gt;, and it pairs well with custom tools because the interface and approval rule solve different problems.&lt;/p&gt;

&lt;p&gt;A custom tool gives Hermes the right action. An approval gate decides whether Hermes gets to complete it alone.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Rule I Use
&lt;/h2&gt;

&lt;p&gt;Before giving Hermes a new way to act, I sort the workflow into one of three lanes. The categories are plain enough to use while building, which matters more to me than making the taxonomy perfect.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use CLI for local work.&lt;/strong&gt; Choose CLI when the tool is mature, the docs are everywhere, the output is controllable, and Hermes is acting inside your own environment. Good fits include GitHub PR summaries through &lt;code&gt;gh&lt;/code&gt;, Cloudflare Worker deploy checks through &lt;code&gt;wrangler&lt;/code&gt;, local file operations, build commands, server diagnostics, and one-off scripts. If I would run the command myself in a terminal, and the worst mistake affects my own workspace, CLI is the starting point.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use MCP for structured external systems.&lt;/strong&gt; Choose MCP when Hermes needs a defined tool boundary, scoped auth, a remote data source, or runtime tool discovery. Search, Google Drive, Slack, Gmail, CRM data, ATS data, internal databases, and shared business tools fit this lane when permissions and structure matter. If the agent touches data outside your own local workspace, MCP deserves a look.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build your own tool when the job is narrow.&lt;/strong&gt; Choose a custom tool when the problem is translation, filtering, validation, or repeatability. API format translation, safer deploy wrappers, task helpers, memory update commands, webhook receivers, cronjob helpers, and scripts compressing risky command chains into one reviewed action fit here. If you keep writing long prompts to make the agent use a tool in the same careful way, that prompt wants to become a tool.&lt;/p&gt;

&lt;p&gt;Concrete Hermes examples make the rule easier to apply. &lt;strong&gt;GitHub PR cleanup&lt;/strong&gt; goes through CLI because &lt;code&gt;gh&lt;/code&gt; is mature and easy to inspect. &lt;strong&gt;Competitive research&lt;/strong&gt; goes through MCP because search needs structured external results. &lt;strong&gt;Morning briefings&lt;/strong&gt; use connectors or MCP for sources like Gmail and calendars, then a prompt or custom formatter turns those inputs into the briefing structure I covered in &lt;a href="https://vibestacklab.substack.com/p/how-to-automate-your-morning-with" rel="noopener noreferrer"&gt;my Hermes morning briefing article&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For higher-risk work, I mix lanes. &lt;strong&gt;Production deploys&lt;/strong&gt; should use CLI wrapped in a custom command, plus an approval gate before production. &lt;strong&gt;Claude Code to OpenCode routing&lt;/strong&gt; belongs in a custom Worker. &lt;strong&gt;Project memory updates from research&lt;/strong&gt; should use a custom command or proposed-change format, then pause for review before permanent memory changes.&lt;/p&gt;

&lt;p&gt;That last one matters because wrong memory is worse than no memory. If Hermes reads a weak article and updates project memory with a sloppy summary, I pay for that mistake later. A custom "propose memory update" tool is safer than letting the agent edit memory directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  60-Second Tool Test
&lt;/h2&gt;

&lt;p&gt;Before adding a new MCP server or writing a wrapper, run this test. It takes about a minute, and it saves an afternoon of cleanup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Check for a mature CLI.&lt;/strong&gt; If the tool has a strong CLI, structured output flags, and common examples in the docs, start there. The agent gets a smaller surface to reason through, and you get commands worth replaying.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Check whether the agent acts only as you.&lt;/strong&gt; If Hermes works inside your own machine, repo, or server, CLI works well. Slow down when the workflow crosses into shared systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Check whether auth shape matters.&lt;/strong&gt; MCP moves up the list when you need scopes, consent, tenant boundaries, or audit logs. Local credentials are convenient until the agent needs to act inside a shared business system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Check whether the MCP is too broad.&lt;/strong&gt; If a server exposes fifty actions and your workflow needs two, consider a custom wrapper or filtered gateway. A smaller interface beats a bigger config when the task has a narrow shape.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Check whether a small tool would remove repeated prompting.&lt;/strong&gt; If your instruction keeps repeating the same safety rules and formatting rules, build a tool that enforces the shape. Repeated prompting points to an interface problem.&lt;/p&gt;

&lt;p&gt;After the test, the answer usually sorts itself. CLI handles local work, MCP handles structured external systems, and your own tool handles narrow bridges, translations, and repeatable actions. Approval gates sit on top of all three when the action is expensive, destructive, external-facing, or hard to undo.&lt;/p&gt;

&lt;p&gt;This is the shift I wrote about in &lt;a href="https://vibestacklab.substack.com/p/the-agentic-engineering-shift" rel="noopener noreferrer"&gt;The Agentic Engineering Shift&lt;/a&gt;. The work is moving from asking the model better to designing the system around the model better. Tool choice is part of that system.&lt;/p&gt;

&lt;h2&gt;
  
  
  More Control, Less Clutter
&lt;/h2&gt;

&lt;p&gt;A well-designed agent stack earns trust when the agent knows what to do, shows what it did, and uses the smallest interface that fits the work. MCP hype tends to blur that distinction because installing another server feels like progress.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9gmx1h5wopqdit11k356.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9gmx1h5wopqdit11k356.jpeg" alt="A systems design diagram sorting AI tasks into CLI, MCP, and custom tool hierarchies." width="799" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MCP is useful. CLI is underrated. Your own small tools save you from both when the workflow has a shape neither one matches.&lt;/p&gt;

&lt;p&gt;Start by auditing one workflow. Pick the last task you gave Hermes that involved a tool and ask which lane it belonged in. If the task was local, try CLI. If it crossed into shared systems, look at MCP. If you kept explaining the same careful sequence over and over, build the tiny tool. Use an agent you trust because every interface has a reason to exist.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>cli</category>
      <category>agents</category>
    </item>
    <item>
      <title>Learn how to use Claude Code for free with OpenCode Zen models by deploying a Cloudflare Worker proxy and configuring third-party inference.</title>
      <dc:creator>cucoleadan</dc:creator>
      <pubDate>Fri, 08 May 2026 13:03:54 +0000</pubDate>
      <link>https://dev.to/cucoleadan/learn-how-to-use-claude-code-for-free-with-opencode-zen-models-by-deploying-a-cloudflare-worker-d0m</link>
      <guid>https://dev.to/cucoleadan/learn-how-to-use-claude-code-for-free-with-opencode-zen-models-by-deploying-a-cloudflare-worker-d0m</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/cucoleadan/how-to-run-claude-code-for-free-with-opencode-models-45ae" class="crayons-story__hidden-navigation-link"&gt;How to Run Claude Code for Free with OpenCode Models&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/cucoleadan" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1392305%2Fb8e28d8c-8302-4fe8-86e5-d09186c09b75.png" alt="cucoleadan profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/cucoleadan" class="crayons-story__secondary fw-medium m:hidden"&gt;
              cucoleadan
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                cucoleadan
                
              
              &lt;div id="story-author-preview-content-3634074" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/cucoleadan" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1392305%2Fb8e28d8c-8302-4fe8-86e5-d09186c09b75.png" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;cucoleadan&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/cucoleadan/how-to-run-claude-code-for-free-with-opencode-models-45ae" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;May 8&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/cucoleadan/how-to-run-claude-code-for-free-with-opencode-models-45ae" id="article-link-3634074"&gt;
          How to Run Claude Code for Free with OpenCode Models
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/claude"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;claude&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/opensource"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;opensource&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/cli"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;cli&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/cucoleadan/how-to-run-claude-code-for-free-with-opencode-models-45ae" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;19&lt;span class="hidden s:inline"&gt;&amp;nbsp;reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/cucoleadan/how-to-run-claude-code-for-free-with-opencode-models-45ae#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              

              &lt;span class="hidden s:inline"&gt;Add&amp;nbsp;Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            10 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
      <category>ai</category>
      <category>claude</category>
      <category>llm</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to Run Claude Code for Free with OpenCode Models</title>
      <dc:creator>cucoleadan</dc:creator>
      <pubDate>Fri, 08 May 2026 13:03:06 +0000</pubDate>
      <link>https://dev.to/cucoleadan/how-to-run-claude-code-for-free-with-opencode-models-45ae</link>
      <guid>https://dev.to/cucoleadan/how-to-run-claude-code-for-free-with-opencode-models-45ae</guid>
      <description>&lt;p&gt;&lt;em&gt;This post was originally published on my Substack publication as &lt;a href="https://vibestacklab.substack.com/p/how-to-use-claude-code-for-free-with" rel="noopener noreferrer"&gt;How to Use Claude Code For Free With OpenCode Models&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Yes, you can use Claude Code for free by routing it through a small Cloudflare Worker and pointing that Worker at a free OpenCode Zen model like &lt;code&gt;minimax-m2.5-free&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Claude stays the interface. OpenCode becomes the model layer.&lt;/p&gt;

&lt;p&gt;That means you can keep the Claude experience people already like, skip Anthropic billing for low-stakes work, and only move to a paid provider lane when the task actually deserves it.&lt;/p&gt;

&lt;p&gt;You see, at the end of March 2026, Anthropic shipped a Claude Code npm package with a source map inside it. That packaging mistake exposed a huge chunk of the Claude Code TypeScript source. Within hours, mirrors spread across GitHub. Some picked up thousands of stars and forks before Anthropic started sending takedowns.&lt;/p&gt;

&lt;p&gt;Then the cleanup got messy. TechCrunch reported on April 1, 2026 that Anthropic's DMCA request hit about 8,100 GitHub repositories before the company narrowed the scope.&lt;/p&gt;

&lt;p&gt;That told me two things.&lt;/p&gt;

&lt;p&gt;First, developers wanted Claude Code badly enough to swarm the leaked source. Second, the demand for using Claude's interface with other provider lanes was already there.&lt;/p&gt;

&lt;p&gt;All this landed at a weird time for me. I've already shifted a lot of my own coding time to Codex, mostly because GPT got close enough to Opus for my day-to-day work and the usage limits feel better. Hermes still handles my heavier recurring workflows and automations.&lt;/p&gt;

&lt;p&gt;But I still liked Claude Code. And I still wanted to try Claude Cowork without paying the full Anthropic tax every time I wanted a polished coding session.&lt;/p&gt;

&lt;p&gt;The problem was simple.&lt;/p&gt;

&lt;p&gt;Claude speaks Anthropic. OpenCode Zen and OpenCode Go mostly speak OpenAI-compatible endpoints.&lt;/p&gt;

&lt;p&gt;So I built a translator.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/cucoleadan/opencode-cowork-proxy" rel="noopener noreferrer"&gt;OpenCode Cowork Proxy Worker&lt;/a&gt; lets Claude Code talk to OpenCode Go models and selected OpenCode Zen models. Claude keeps sending Anthropic-style requests, then the Worker translates them into the upstream format OpenCode expects.&lt;/p&gt;

&lt;p&gt;No key storage. No message storage. Just a format bridge.&lt;/p&gt;

&lt;p&gt;With that in place, you can start with free OpenCode Zen models like &lt;code&gt;minimax-m2.5-free&lt;/code&gt;, then move to OpenCode Go's subscription lane when the work gets more demanding.&lt;/p&gt;

&lt;p&gt;I made the switch easy on purpose. You need a Cloudflare account and an OpenCode account. Both can start free, and you only upgrade if the workflow becomes worth it.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Get the next proxy walkthrough before it eats your weekend. Subscribe to Vibe Stack Lab. I send practical AI workflows for builders who want control over the stack and fewer surprise bills.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://vibestacklab.substack.com/subscribe" rel="noopener noreferrer"&gt;Subscribe to Vibe Stack Lab&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  In this article:
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;How to deploy the Worker in your Cloudflare account&lt;/li&gt;
&lt;li&gt;How to configure Claude Desktop or Claude Cowork for third-party inference&lt;/li&gt;
&lt;li&gt;Which free OpenCode Zen models to start with&lt;/li&gt;
&lt;li&gt;When to use &lt;code&gt;/zen&lt;/code&gt; and when to switch to &lt;code&gt;/go&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The first safe test to run before touching a real repo&lt;/li&gt;
&lt;li&gt;The setup mistakes most likely to break this first&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Deploy the Worker in Cloudflare First
&lt;/h2&gt;

&lt;p&gt;Before Claude can use OpenCode, you need a gateway URL it can call.&lt;/p&gt;

&lt;p&gt;Open the &lt;a href="https://github.com/cucoleadan/opencode-cowork-proxy" rel="noopener noreferrer"&gt;OpenCode Cowork Proxy Worker repo&lt;/a&gt; and click the &lt;strong&gt;Deploy to Cloudflare Workers&lt;/strong&gt; button at the top. Cloudflare supports this one-click deploy flow directly for Workers projects, which is why this setup is fast to hand off.&lt;/p&gt;

&lt;p&gt;Cloudflare walks you through the rest. When it finishes, copy your Worker URL.&lt;/p&gt;

&lt;p&gt;At that point, your gateway is live.&lt;/p&gt;

&lt;p&gt;Your deployed URL will look like your own Cloudflare Worker endpoint. In the examples below, I'll call it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;YOUR_DEPLOYED_WORKER_URL
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Configure Claude Desktop to Use OpenCode Zen
&lt;/h2&gt;

&lt;p&gt;Open Claude Desktop and go to the third-party inference setup.&lt;/p&gt;

&lt;p&gt;If you're on Windows, go to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Help &amp;gt; Troubleshooting &amp;gt; Enable Developer Mode
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude will restart and expose a new menu:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Developer &amp;gt; Configure Third-Party Inference
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Anthropic's current help docs for Claude Cowork's third-party setup use this same path, so you're not relying on a weird hidden hack here. You're using the intended setup UI.&lt;/p&gt;

&lt;p&gt;For your first test, point Claude at OpenCode Zen with the free model &lt;code&gt;minimax-m2.5-free&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Backend: Gateway
Gateway base URL: YOUR_DEPLOYED_WORKER_URL/zen
API key: your OpenCode API key
Auth scheme: x-api-key
Model: minimax-m2.5-free
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once that's done, make sure to add the model manually too:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minimax-m2.5-free
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Click &lt;strong&gt;Apply locally&lt;/strong&gt;. Fully quit Claude Desktop. Reopen it.&lt;/p&gt;

&lt;p&gt;That's the basic path for using Claude Code with a free OpenCode model through your Worker.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start with Free OpenCode Zen Models
&lt;/h2&gt;

&lt;p&gt;Start with OpenCode Zen, not Go.&lt;/p&gt;

&lt;p&gt;Zen is OpenCode's curated model gateway. Some Zen models are paid. Some are free for a limited time while model teams collect feedback.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Last updated: May 7, 2026&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The current &lt;a href="https://opencode.ai/docs/zen/" rel="noopener noreferrer"&gt;OpenCode Zen docs&lt;/a&gt; list these free models:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minimax-m2.5-free
ling-2.6-flash
hy3-preview-free
nemotron-3-super-free
big-pickle
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use this first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minimax-m2.5-free
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your base URL should end with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/zen
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your model field should be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minimax-m2.5-free
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Free means free while OpenCode is offering that model under a free period. It does not mean no account, no API key, or no caveats.&lt;/p&gt;

&lt;p&gt;You still need an OpenCode API key.&lt;/p&gt;

&lt;p&gt;And you should absolutely check the privacy notes before using free models with sensitive work. As of May 7, 2026, OpenCode's Zen docs say several free models may use collected data during the free period to improve the model. That includes &lt;code&gt;minimax-m2.5-free&lt;/code&gt;. This is the exact opposite of the lane you want for sensitive code.&lt;/p&gt;

&lt;p&gt;This is the test lane.&lt;/p&gt;

&lt;p&gt;Use it for summaries, low-risk code review, documentation cleanup, and tiny file edits in a throwaway folder. Don't start by pointing it at your main repo with write access.&lt;/p&gt;

&lt;p&gt;On my own first tests, the free Zen route handled summaries, low-risk reviews, and tiny file edits fine, but I switched to &lt;code&gt;/go&lt;/code&gt; as soon as I wanted stronger reasoning over a larger repo.&lt;/p&gt;

&lt;p&gt;I wrote about the bigger reason open models matter in &lt;a href="https://vibestacklab.substack.com/p/ditch-your-subscriptions-and-run" rel="noopener noreferrer"&gt;Ditch Your Subscriptions and Run Open Source AI on Your Device&lt;/a&gt;. The short version is the same here: model choice gets more useful when your tools stop forcing the interface and the engine to stay married.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Send this to the friend who pays for overlapping AI plans and still hits limits. It might save them a month of model roulette.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://vibestacklab.substack.com" rel="noopener noreferrer"&gt;Share this setup&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Choose /zen for Free Models and /go for OpenCode Go
&lt;/h2&gt;

&lt;p&gt;The proxy has two routes.&lt;/p&gt;

&lt;p&gt;Use &lt;code&gt;/zen&lt;/code&gt; for free models and Zen pay-as-you-go models:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;YOUR_DEPLOYED_WORKER_URL/zen
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;code&gt;/go&lt;/code&gt; for the monthly OpenCode Go subscription lane:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;YOUR_DEPLOYED_WORKER_URL/go
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want the fast mental model, use it like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/zen&lt;/code&gt; is the free test lane&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/go&lt;/code&gt; is the stronger daily-work lane&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As of May 7, 2026, the &lt;a href="https://opencode.ai/docs/go/" rel="noopener noreferrer"&gt;OpenCode Go docs&lt;/a&gt; list Go at $5 for the first month, then $10 per month.&lt;/p&gt;

&lt;p&gt;The same docs currently list these usage limits:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;5-hour limit: $12 of usage
Weekly limit: $30 of usage
Monthly limit: $60 of usage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your actual request count depends on the model.&lt;/p&gt;

&lt;p&gt;Cheaper models stretch much further. Heavier models burn the limit faster.&lt;/p&gt;

&lt;p&gt;The important privacy distinction is this: OpenCode Go says its providers follow a zero-retention policy and do not use your data for model training. That makes it a much better fit for real coding work than the free-model lane. I would still avoid calling anything "complete privacy," but it is the safer route according to the current docs.&lt;/p&gt;

&lt;p&gt;I covered OpenCode Go more broadly in &lt;a href="https://vibestacklab.substack.com/p/the-30-hermes-stack-that-makes-claude" rel="noopener noreferrer"&gt;The $30 Hermes Stack That Makes Claude Max Look Like a Ripoff&lt;/a&gt;. For Hermes, Go gives you a cheaper provider lane. With this proxy, Go becomes useful from Claude Code too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Route Instead of OpenRouter or Ollama?
&lt;/h2&gt;

&lt;p&gt;Because the point here is not just "find any cheaper provider."&lt;/p&gt;

&lt;p&gt;The point is keeping Claude's interface and tool flow while swapping the model layer underneath it.&lt;/p&gt;

&lt;p&gt;If you just want the fastest generic provider swap, OpenRouter is simpler.&lt;/p&gt;

&lt;p&gt;If you want fully local inference, Ollama is a better answer.&lt;/p&gt;

&lt;p&gt;If you specifically want Claude Code or Claude Cowork as the front end while OpenCode handles the models behind the scenes, this Worker route is the right tool.&lt;/p&gt;

&lt;p&gt;That matters more than it sounds. A lot of people do not actually want a new interface. They just want a cheaper or more flexible inference lane behind the interface they already like.&lt;/p&gt;

&lt;p&gt;If you want the broader comparison between Claude Cowork and other agent setups, I broke that down in &lt;a href="https://vibestacklab.substack.com/p/openclaw-vs-claude-cowork-vs-perplexity" rel="noopener noreferrer"&gt;OpenClaw vs Claude Cowork vs Perplexity Computer - Which AI Agent Actually Fits Your Life&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Test Claude Code Safely in a Throwaway Folder
&lt;/h2&gt;

&lt;p&gt;Don't point this at your main repo first.&lt;/p&gt;

&lt;p&gt;Create a throwaway folder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;claude-opencode-proxy-test
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add a file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;project-notes.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Put fake project notes in it. No secrets. No client data.&lt;/p&gt;

&lt;p&gt;Ask Claude Code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Read project-notes.md.
Summarize the project in 10 bullets.
Create a second file called next-actions.md with a short implementation checklist.
Do not modify project-notes.md.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This checks whether routing and tool behavior work together. Claude has to create the new file from the notes without touching the original.&lt;/p&gt;

&lt;p&gt;If that works, try a small code review:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Review this function for bugs.
Do not edit files yet.
Give me the risk list first.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I like that second test because it keeps the model away from edits until you see how it behaves.&lt;/p&gt;

&lt;p&gt;After that, test one small tool-heavy task. Ask it to compare two files and create a short note. Keep the task boring.&lt;/p&gt;

&lt;p&gt;You're testing routing and tool behavior, not the model's taste.&lt;/p&gt;

&lt;p&gt;Free models are useful, but they need judgment. I wrote about that line between vibe coding and agentic engineering in &lt;a href="https://vibestacklab.substack.com/p/the-agentic-engineering-shift" rel="noopener noreferrer"&gt;The Agentic Engineering Shift&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Switch to OpenCode Go When the Free Lane Stops Being Worth It
&lt;/h2&gt;

&lt;p&gt;OpenCode Go is one of the more transparent AI subscriptions out there because the limits are expressed in dollar value, not in a vague "come back later" chat cap.&lt;/p&gt;

&lt;p&gt;Switch to &lt;code&gt;/go&lt;/code&gt; when the free Zen models are too weak, too slow, too rate-limited, or too risky for the work.&lt;/p&gt;

&lt;p&gt;That usually happens when one of these becomes true:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You want better reasoning over a bigger codebase.&lt;/li&gt;
&lt;li&gt;You want fewer caveats around data usage.&lt;/li&gt;
&lt;li&gt;You are doing enough coding work that a $10 lane is cheaper than burning a premium subscription elsewhere.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The nice part is that the setup barely changes. You keep Claude as the interface. You just swap the route and the model.&lt;/p&gt;

&lt;h2&gt;
  
  
  How This Also Works with Claude Cowork
&lt;/h2&gt;

&lt;p&gt;This is the part I care about more than the free model itself.&lt;/p&gt;

&lt;p&gt;People like Claude Code and Claude Cowork because the interface feels good to use, and nobody wants another subscription with fuzzy limits hanging over every small coding session.&lt;/p&gt;

&lt;p&gt;Claude Cowork especially has the kind of product polish that makes people want to stay inside it. The project view feels clean, the tool activity is easy to follow, and the whole thing feels closer to an app than a pile of agents you have to babysit.&lt;/p&gt;

&lt;p&gt;The annoying part is paying for the whole Anthropic route every time you want that app experience.&lt;/p&gt;

&lt;p&gt;I can justify premium reasoning models when I'm asking for difficult architecture help or reviewing a risky change. I do not want to burn premium usage on every small housekeeping task.&lt;/p&gt;

&lt;p&gt;That's why I built this proxy, and I want the compatibility point to be explicit: this route works with Claude Cowork too. You can keep Claude Cowork or Claude Code as the place where you work without needing Claude itself as the model route behind it.&lt;/p&gt;

&lt;p&gt;The cheap path lets you keep the Claude app experience instead of forcing yourself into another interface.&lt;/p&gt;

&lt;p&gt;You can start with a free OpenCode Zen model, then move to the $10 OpenCode Go lane when you want a stronger open model inside Claude Cowork or Claude Code.&lt;/p&gt;

&lt;p&gt;I still like OpenCode. I still use Codex. Hermes is still where my serious recurring workflows live. The point is that Claude Cowork does not have to become another expensive subscription decision when OpenCode can provide the model layer for free or for far less.&lt;/p&gt;

&lt;p&gt;If you want the shared-workflow version of that story, read &lt;a href="https://vibestacklab.substack.com/p/openclaw-or-claude-cowork-heres-how" rel="noopener noreferrer"&gt;OpenClaw or Claude Cowork? Here's How to Plug Both Into the Same Brain&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use This 10-Minute Checklist to Get Started
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Open the &lt;a href="https://github.com/cucoleadan/opencode-cowork-proxy" rel="noopener noreferrer"&gt;OpenCode Cowork Proxy Worker repo&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Deploy to Cloudflare Workers&lt;/strong&gt; and install the Worker in your Cloudflare account.&lt;/li&gt;
&lt;li&gt;Copy your deployed Worker URL.&lt;/li&gt;
&lt;li&gt;Open Claude Desktop.&lt;/li&gt;
&lt;li&gt;Enable Developer Mode, then open &lt;strong&gt;Configure Third-Party Inference&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Set the base URL to &lt;code&gt;YOUR_DEPLOYED_WORKER_URL/zen&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Set auth scheme to &lt;code&gt;x-api-key&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Paste your OpenCode API key.&lt;/li&gt;
&lt;li&gt;Add &lt;code&gt;minimax-m2.5-free&lt;/code&gt; manually.&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Apply locally&lt;/strong&gt;, fully quit Claude, then reopen it.&lt;/li&gt;
&lt;li&gt;Run the throwaway-folder test.&lt;/li&gt;
&lt;li&gt;Switch to &lt;code&gt;YOUR_DEPLOYED_WORKER_URL/go&lt;/code&gt; and a Go model when you want the subscription lane.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Can I use Claude Code for free?
&lt;/h3&gt;

&lt;p&gt;Yes, but not in the default Anthropic-billed path this article is bypassing.&lt;/p&gt;

&lt;p&gt;You use Claude Code for free here by routing its requests through your own Cloudflare Worker and pointing that Worker at a free OpenCode Zen model such as &lt;code&gt;minimax-m2.5-free&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Claude Code in VS Code free?
&lt;/h3&gt;

&lt;p&gt;Claude Code itself can be installed, but the model path behind it usually costs money unless you route it to a free provider lane.&lt;/p&gt;

&lt;p&gt;This setup gives you one of those free lanes.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I get Claude Code credits for free?
&lt;/h3&gt;

&lt;p&gt;You don't get Anthropic credits from this method.&lt;/p&gt;

&lt;p&gt;You bypass Anthropic billing for these sessions by translating Claude's requests to a free OpenCode Zen model instead.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I use Claude Code free forever?
&lt;/h3&gt;

&lt;p&gt;"Forever" is doing too much work in most of the videos and posts ranking for this topic.&lt;/p&gt;

&lt;p&gt;You can use it free as long as a provider keeps offering a free model and the setup still works. That can change. That's why this article treats the free route as a useful lane, not a permanent law of nature.&lt;/p&gt;

&lt;h2&gt;
  
  
  External Sources Worth Checking
&lt;/h2&gt;

&lt;p&gt;If you want the primary docs behind this setup, start here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://support.claude.com/en/articles/14680741-install-and-configure-claude-cowork-with-third-party-platforms" rel="noopener noreferrer"&gt;Claude Cowork third-party platform setup&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://opencode.ai/docs/zen/" rel="noopener noreferrer"&gt;OpenCode Zen docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://opencode.ai/docs/go/" rel="noopener noreferrer"&gt;OpenCode Go docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.cloudflare.com/workers/tutorials/deploy-button" rel="noopener noreferrer"&gt;Cloudflare Deploy to Workers button docs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And if you want the leak story source rather than my summary, TechCrunch covered the takedown incident here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://techcrunch.com/2026/04/01/anthropic-took-down-thousands-of-github-repos-trying-to-yank-its-leaked-source-code-a-move-the-company-says-was-an-accident/" rel="noopener noreferrer"&gt;Anthropic took down thousands of GitHub repos trying to yank its leaked source code&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Grab the Worker, run the throwaway-folder test, and star the repo if it works for you. Stars tell me which Claude/OpenCode routes are worth maintaining next.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/cucoleadan/opencode-cowork-proxy" rel="noopener noreferrer"&gt;OpenCode Cowork Proxy Worker&lt;/a&gt;&lt;/p&gt;




</description>
      <category>ai</category>
      <category>claude</category>
      <category>opensource</category>
      <category>cli</category>
    </item>
    <item>
      <title>How to Add Approval Gates to Your Hermes Agent</title>
      <dc:creator>cucoleadan</dc:creator>
      <pubDate>Tue, 28 Apr 2026 13:29:46 +0000</pubDate>
      <link>https://dev.to/cucoleadan/how-to-add-approval-gates-to-your-hermes-agent-5db7</link>
      <guid>https://dev.to/cucoleadan/how-to-add-approval-gates-to-your-hermes-agent-5db7</guid>
      <description>&lt;p&gt;&lt;em&gt;This post was originally published on my Substack publication as &lt;a href="https://vibestacklab.substack.com/p/how-to-add-approval-gates-to-your" rel="noopener noreferrer"&gt;How to Add Approval Gates to Your Hermes Agent&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Most people who try AI agents go through the same cycle. They set it up, give it access to everything, and watch it do impressive things for a week. Then something goes wrong, like a wrong message or a broken file, and they shut it down and go back to doing things manually.&lt;/p&gt;

&lt;p&gt;The problem was skipping the safety net.&lt;/p&gt;

&lt;p&gt;I went through that cycle twice. The first time, my agent sent an email to a client with the wrong name, and I mean a completely different person, not a typo. I found out when the client forwarded it back asking if I was working with someone else. I spent the next three weeks manually reviewing everything the agent touched. That burned more time than if I had done the work myself.&lt;/p&gt;

&lt;p&gt;The second time, I set up gates before giving the agent access. Drafts and system changes came to me for review. Spending above a threshold required approval. I let it run, and nothing went wrong. The gates caught mistakes before they went live.&lt;/p&gt;

&lt;p&gt;I’ll show you how to build approval gates into any Hermes workflow. Gate #1 takes 15 minutes and stops your agent from sending anything external without your OK. Gate #2 adds protection against unwanted system changes and puts dollar limits on spending. Start with the first one. Add the others when you’re ready.&lt;/p&gt;

&lt;p&gt;If you’re new to Hermes itself, start with &lt;a href="https://vibestacklab.substack.com/p/hermes-is-the-ai-agent-openclaw-promised" rel="noopener noreferrer"&gt;Hermes Is the AI Agent OpenClaw Promised to Be&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7knjaen1wa2xyf7ezeyb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7knjaen1wa2xyf7ezeyb.png" alt="Image" width="800" height="534"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In this article:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What an approval gate is, and why it isn’t a roadblock&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The three types of gates every AI workflow needs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Step-by-step setup for each gate, from beginner to advanced&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A simple framework to decide what to gate and what to leave free&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The three mistakes people make with approval gates, and how to avoid them&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;A Checkpoint Is Not a Roadblock&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The word gate makes people think of barriers and delays. That’s the wrong mental model. An approval gate is more like a checkpoint at the end of an assembly line. The work happens at full speed. The checkpoint keeps defects from shipping.&lt;/p&gt;

&lt;p&gt;Three patterns exist for keeping humans involved in AI workflows. Human-in-the-loop means the agent stops and asks you before taking an action. You review, you approve, the agent continues. Human-on-the-loop means the agent runs autonomously, but you can watch what it does and intervene if something looks wrong. Full autonomy means you set it up and never look at it again.&lt;/p&gt;

&lt;p&gt;Shopify defaults to human-in-the-loop by design for anything that touches production systems. LangChain found that most organizations use approval checkpoints as their primary guardrail. The EU AI Act requires evidence of appropriate oversight for each AI system. These are standard practice.&lt;/p&gt;

&lt;p&gt;The same principle works for solo operators. You need a simpler version.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Three Gates You Need&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Every AI workflow that touches the outside world or modifies your data needs at least one gate.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq9qw7m4x6j5m1ewqguyv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq9qw7m4x6j5m1ewqguyv.png" alt="Image" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Send Gate.&lt;/strong&gt; Nothing goes external without your OK. Emails, social posts, client communications, any message that carries your name. Your agent drafts, delivers to you, and waits. You review and approve the send. That’s the gate most people need first.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Change Gate.&lt;/strong&gt; Nothing modifies your systems without your OK. File edits, database updates, configuration changes. Your agent identifies what needs to change, shows you the proposed change with context, and waits for confirmation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Spend Gate.&lt;/strong&gt; Nothing costs money without your OK. Paid API calls above a threshold, tool purchases, subscription changes. Your agent estimates the cost before any paid action. Below your threshold, it proceeds automatically. Above it, it pauses and asks you.&lt;/p&gt;

&lt;p&gt;Each gate protects something different: your reputation, your data, your wallet. You don’t need all three on day one. Start with the Send Gate.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Gate #1: The Send Gate (Start Here)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;That’s the one that fixes the wrong-name-in-an-email problem. The setup takes about 15 minutes. You build a workflow where the agent drafts everything, but you control the final step.&lt;/p&gt;

&lt;p&gt;The workflow has four steps:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1.&lt;/strong&gt; The agent drafts the content. An email, a social post, a client response, anything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2.&lt;/strong&gt; The agent delivers the draft to you through chat, email, or a file. It doesn’t send it anywhere, just hands it to you for review.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3.&lt;/strong&gt; You review the draft and fix anything that needs fixing. Reply with your approval or your corrections.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4.&lt;/strong&gt; The agent sends or publishes the approved version. If you asked for changes, it revises and shows you the updated version.&lt;/p&gt;

&lt;p&gt;In Hermes, the Send Gate works best as two separate pieces. The first is a standing rule in project memory. The second is the cronjob or task that runs under that rule. If project memory still feels abstract, &lt;a href="https://vibestacklab.substack.com/p/forgetting-to-forget-how-infinite?utm_source=publication-search" rel="noopener noreferrer"&gt;my article on infinite memory&lt;/a&gt; explains why these standing rules matter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example 1: Save this in Hermes memory&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Treat this as a starter template, not a fixed script. You may need to change the approval words, the delivery channel, or the types of content it covers based on how Hermes is set up in your project.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are my Content Assistant. Your job is to draft content for review.

When I give you a content request (email, social post, client response):

1. Draft the content based on my instructions and the project brief.
2. Present the draft clearly labeled "DRAFT [FOR REVIEW]".
3. Don't send, publish, or share the content anywhere.
4. Wait for my approval or my requested changes.
5. If I request changes, apply them and present the revised draft.
6. Only when I explicitly say "approved" or "send it", take the final action.

Always include a brief note at the end explaining what you did and why.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example 2: Use this as a Hermes cronjob&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This works best when Hermes already has access to the inputs it needs, such as meeting notes, a calendar, or a project brief, and already knows where to send drafts back to you. You may need to change the schedule, the source it reads from, or the format of the output to fit your workflow.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Every weekday at 9:00 AM, review yesterday's meeting notes and draft any follow-up emails that need to be sent.

Present each email as "DRAFT [FOR REVIEW]".
Do not send anything automatically.
Wait for my approval before any email goes out.

If there are no follow-ups to draft, tell me that no action is needed today.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want to see cronjobs in action before you build this one, the &lt;a href="https://vibestacklab.substack.com/p/how-to-automate-your-morning-with" rel="noopener noreferrer"&gt;Hermes morning briefing workflow&lt;/a&gt; shows a complete example.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Watch Out:&lt;/strong&gt; If your agent has tool access that lets it send emails or post to social media directly, make sure the prompt overrides those tools. The approval step must be the only path to external action.&lt;/p&gt;

&lt;p&gt;If you’re still wiring up Hermes tools, memory, and integrations, &lt;a href="https://vibestacklab.substack.com/p/the-30-hermes-stack-that-makes-claude" rel="noopener noreferrer"&gt;the Hermes setup guide&lt;/a&gt; covers the stack behind workflows like this.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Gate #2: The Change Gate (Level Up)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Once your Send Gate works, add protection against unwanted system changes. This gate matters when your agent interacts with files, databases, or any system where a bad edit breaks something real.&lt;/p&gt;

&lt;p&gt;The agent identifies what needs to change: which record, which field, and what the new value should be. Vague requests like “update the database” fail this gate.&lt;/p&gt;

&lt;p&gt;The agent shows you the proposed change with full context: current state, new state, why the change is needed, and what happens if the change goes wrong.&lt;/p&gt;

&lt;p&gt;You approve or reject. If you reject, the change never happens. If you approve, the agent executes it and confirms the result.&lt;/p&gt;

&lt;p&gt;The rollback plan is simple. If a change causes problems, you tell the agent to reverse it. Because the agent showed you what it wanted to change before doing it, it can undo the change on request.&lt;/p&gt;

&lt;p&gt;Use this prompt for a research agent that updates your knowledge base:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;When you find information that should update the project knowledge base:

1. Show me the proposed change with this format:
   - Current value: [what exists now]
   - Proposed value: [what you want to change it to]
   - Reason: [why this change is needed]
   - Source: [where you found this]

2. Wait for my approval before making any changes.
3. If I approve, make the change and confirm what was updated.

4. If I reject, don't make the change. Log the rejection in the project notes.

Never modify files, databases, or project memory without going through this process first.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This takes about 20 minutes on top of your Send Gate. The time investment is worth it the first time your agent wants to overwrite a file with outdated information.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Gate #3: The Spend Gate (Advanced)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This gate protects your wallet. AI agents can run API calls, subscribe to tools, and make purchases if you give them access. Without a spend gate, a runaway loop of API calls can cost hundreds before you notice.&lt;/p&gt;

&lt;p&gt;The setup relies on spending thresholds in your project memory. Set a dollar limit that matches your comfort level, whether that’s $5 per transaction or $50. Pick the number that lets you sleep well.&lt;/p&gt;

&lt;p&gt;Your agent estimates the cost before any paid action. Below your threshold, it proceeds automatically. Above it, it pauses, shows you the estimate, and waits for approval.&lt;/p&gt;

&lt;p&gt;Add this to your prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Before taking any action that costs money (API calls, tool purchases, subscriptions):

1. Estimate the cost.
2. If the cost is below $10, proceed automatically and log the expense.
3. If the cost is $10 or above, pause and show me:
   - What you want to do
   - Why it is needed
   - Estimated cost
   - Free alternatives, if any
4. Wait for my approval before proceeding with actions that cost $10 or more.

Keep a running total of all expenses in the project notes.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Adjust the threshold to your needs. The point is to keep you informed when the agent is about to spend money that matters. This takes about 15 minutes on top of the others. If your agent doesn’t have spending access, skip this one.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;To Gate or Not to Gate&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;You can’t gate every action. If you do, your agent becomes a slow typist that asks permission before every keystroke. At that point, you might as well do the work yourself.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F863rnrd9oz69dpgv5fup.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F863rnrd9oz69dpgv5fup.png" alt="Image" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Low risk, high volume.&lt;/strong&gt; No gate needed. File organization, summarization, categorization, formatting. The worst thing that happens is a slightly messy summary, and you fix that in seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Medium risk, moderate volume.&lt;/strong&gt; Review gate. Draft emails, content suggestions, data analysis. The agent produces the work, you review it before it goes anywhere. The Send Gate handles this category.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;High risk, low volume.&lt;/strong&gt; Full gate. External communications, system changes, spending. The agent pauses, explains what it wants to do, and waits for explicit approval. All three gates cover this category.&lt;/p&gt;

&lt;p&gt;To apply this to your own workflows, list every task your agent handles. Write down the worst thing that could go wrong. If the mistake costs you money, damages your reputation, or breaks a system, gate it.&lt;/p&gt;

&lt;p&gt;If the mistake is annoying but easy to fix, let it run free and correct course when needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Three Mistakes People Make&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Most failures come from one of these three patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gating everything.&lt;/strong&gt; This turns your AI into a slow typist. You spend an hour approving every sentence and paragraph, then realize you could have done the work yourself. The fix: apply the decision framework above. Gate only what needs gating.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gating nothing.&lt;/strong&gt; That’s how your AI sends wrong emails to clients, overwrites production files, and racks up unexpected charges. You give the agent full autonomy on day one. Something goes wrong. You stop using the agent entirely. The fix: start with the Send Gate. Add the others as your trust grows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gating without context.&lt;/strong&gt; A vague approval request forces you to dig through the agent’s reasoning to figure out if the change is safe. The fix: require the agent to show the current state, the proposed change, and the reason for the change. A good gate gives you everything you need to decide in under 10 seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Trust the Process but Keep the Net&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;You stop babysitting and stop opening every draft with a knot in your stomach. You give your agent a task, trust the process, and review the output at the checkpoint. Most of the time, you approve it without changes. Occasionally, you catch something and fix it. Either way, the work moves forward.&lt;/p&gt;

&lt;p&gt;The trick is to start tight and loosen up over time. In week one, you review every draft. By month two, you move the Send Gate to sample mode: review every third draft, trust the rest. You haven’t caught a mistake in weeks. The gate stays in place, but you use it less.&lt;/p&gt;

&lt;p&gt;That’s the goal. Reduce gates as the system matures. Real delegation is only possible when you know the safety net works. Once you see the net catching problems, you can let the agent fly higher.&lt;/p&gt;

&lt;p&gt;Building the right gates is all delegation requires. Once they’re in place, you can let it run.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>automation</category>
      <category>prompts</category>
      <category>security</category>
    </item>
    <item>
      <title>How My Hermes Agent Plans My Morning Before I Have My Coffee</title>
      <dc:creator>cucoleadan</dc:creator>
      <pubDate>Tue, 21 Apr 2026 13:19:34 +0000</pubDate>
      <link>https://dev.to/cucoleadan/how-my-hermes-agent-plans-my-morning-before-i-have-my-coffee-229k</link>
      <guid>https://dev.to/cucoleadan/how-my-hermes-agent-plans-my-morning-before-i-have-my-coffee-229k</guid>
      <description>&lt;p&gt;&lt;em&gt;This post was originally published on my Substack publication as &lt;a href="https://vibestacklab.substack.com/p/how-to-automate-your-morning-with" rel="noopener noreferrer"&gt;How My Hermes Agent Plans My Morning Before I Have My Coffee&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You probably start every morning the same way most people do. Phone in hand. Six apps open before you finish your coffee. Email, task manager, Slack, calendar, news feed, Substack digest. Each one wants your attention. Each one claims urgency.&lt;/p&gt;

&lt;p&gt;By the time you reach actual work, your best mental energy is already spent. You made dozens of tiny decisions about what to open, what to read, and what to ignore. Your real work gets the leftovers.&lt;/p&gt;

&lt;p&gt;The problem is not discipline. The fire hose of inputs hits you the second you wake up, and no amount of willpower fixes a broken system.&lt;/p&gt;

&lt;p&gt;I stopped trying to fix my habits and started fixing how information reaches me. Now one cronjob gathers my Asana tasks and hands me one clear decision to start the day. A second cronjob runs twice daily, checks Gmail for Substack articles, and sends me a curated digest email. I read the briefing in 30 seconds. The two hours come from fewer context switches, less reactive mode, and a single first action instead of a dozen tiny decisions.&lt;/p&gt;

&lt;p&gt;Follow along as we build both systems in three layers. Layer 1 takes 15 to 20 minutes and covers the Asana morning briefing. Layer 2 adds the Substack digest email that runs twice daily. Layer 3 extends the briefing to Slack, Jira, GitHub, or any service with an API. Start wherever you want as each layer is useful on its own.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What we will cover:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Why checking multiple apps each morning burns your best energy before work even starts&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The four-section briefing structure that replaces a to-do list with a decision&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Step-by-step setup for the Asana briefing and the Substack digest email&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How to extend the briefing to any API-driven tool you already use&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Keeping both systems from bloating into useless noise&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  You Are Not Lazy, Just Constantly Interrupted
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5s7j2mfr3eppsi76uhfz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5s7j2mfr3eppsi76uhfz.png" alt="Image" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You are exhausted before 9 AM, and the reason has nothing to do with laziness.&lt;/p&gt;

&lt;p&gt;Gloria Mark at UC Irvine studied this for years. Her research found that after an interruption, it takes an average of 23 minutes and 15 seconds to return to deep focus on the original task. Not to finish it. Just to get back into it. If you check email, then Slack, then your calendar, then a task list, then a news feed, you end up losing over an hour in refocus time alone.&lt;/p&gt;

&lt;p&gt;The Adobe Email Usage Study found that Americans spend over five hours per day checking work and personal email combined. We spend hours refreshing inboxes. Actual communication barely happens. We train ourselves to react to whatever arrives instead of deciding what matters.&lt;/p&gt;

&lt;p&gt;Inbox zero reinforces this pattern. It teaches you to treat every incoming message as equally urgent. The newsletter you subscribed to in 2019 gets the same mental weight as a client asking about a deadline. Your morning becomes a sorting exercise for other people’s priorities.&lt;/p&gt;

&lt;p&gt;CEOs and important people have secretaries who sift through everything before it reaches them. We have AI agents like Hermes and Openclaw that can do this for us, maybe even better than the average person.&lt;/p&gt;

&lt;p&gt;Here is how I built the one system that gathered everything in my pipeline and made one recommendation to get my day started.&lt;/p&gt;

&lt;h2&gt;
  
  
  What an Actually Good AI Briefing Looks Like
&lt;/h2&gt;

&lt;p&gt;A good briefing works as a decision support document. It replaces your basic to-do list.&lt;/p&gt;

&lt;p&gt;Most people think a morning briefing means listing everything they need to do today. That approach creates anxiety instead of clarity. A list of 15 tasks leaves you feeling behind instead of showing you where to start.&lt;/p&gt;

&lt;p&gt;My briefing has four sections. Each one stays capped at 2 to 3 bullets. It ends with one forced decision. Without that final gate, the briefing becomes another scrollable feed you skim and forget.&lt;/p&gt;

&lt;p&gt;Here is the structure:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Today&lt;/strong&gt; — Meetings, deadlines, hard commitments that cannot move.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tasks&lt;/strong&gt; — Open items from Asana, sorted by impact, not by order added.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Alerts&lt;/strong&gt; — Unread emails from humans, not newsletters. Overdue items that need attention.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One Decision&lt;/strong&gt; — The single action that would make the rest of the day easier.&lt;/p&gt;

&lt;p&gt;The Today section tells me what time is already spoken for. The Tasks section tells me what I chose to work on. The Alerts section catches anything that slipped through. The One Decision section forces me to think instead of consume.&lt;/p&gt;

&lt;p&gt;I have seen people build briefings with ten sections. Weather, stock prices, news headlines, calendar, tasks, emails, social mentions, fitness data. That approach builds a dashboard, and dashboards serve monitoring. Briefings serve decision-making.&lt;/p&gt;

&lt;p&gt;If your briefing takes longer than 3 minutes to read, trim a source. Tighten a filter. Clarity matters more than completeness.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 1: Build Your First Briefing in 20 Minutes
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuv9qjbiy5zgdhiob9he4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuv9qjbiy5zgdhiob9he4.png" alt="Image" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Start with one source. &lt;strong&gt;Asana&lt;/strong&gt; works well because you already put your commitments there. If your Asana is messy, the briefing will reflect that mess. Spend 10 minutes cleaning due dates and priorities first. The cronjob cannot organize what you have not organized. If your task manager lacks structure, that cleanup step becomes your first priority before automating anything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What you need:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;An Asana account, free tier works&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Your Asana personal access token&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hermes with cron support&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;An Asana account (&lt;a href="https://asana.com" rel="noopener noreferrer"&gt;sign up here&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Your Asana personal access token (&lt;a href="https://app.asana.com/0/developer-console" rel="noopener noreferrer"&gt;get one here&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Asana API docs for reference: &lt;a href="https://developers.asana.com/docs" rel="noopener noreferrer"&gt;developers.asana.com/docs&lt;/a&gt; or the &lt;a href="https://github.com/Asana/awesome-asana" rel="noopener noreferrer"&gt;Asana MCP server&lt;/a&gt; if you prefer MCP over REST&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hermes with cron support&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything else happens inside the prompt. Hermes handles the API calls, the sorting, and the formatting. You do not write code or configure endpoints. You paste the prompt, set a schedule, and the cronjob does the rest.&lt;/p&gt;

&lt;p&gt;Here is the exact prompt I give my cronjob:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are my Morning Briefing Agent. Your job is simple: help me start the day with clarity instead of chaos.

Every morning at 8:15 AM Bucharest time, run this routine:

1. Pull my open tasks from Asana using the asana API or CLI
2. Sort them by: due date (overdue first), priority, project
3. Identify my top 3 highest-impact tasks for today
4. Flag anything overdue or due today
5. Format everything into a 2-minute briefing

The briefing structure:
---
TODAY'S BRIEFING — [Date]

TODAY:
- [Meetings/deadlines from tasks]

TOP 3 TASKS:
1. [Highest impact task] — [Project] — Due: [Date]
2. ...
3. ...

ALERTS:
- [Overdue items]
- [Items due today]

ONE DECISION:
What is the one task I should finish first to make everything else easier?
---

Keep each section to 2-3 bullets. No fluff. No summaries of summaries. Just the signal.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Paste that prompt directly into your Hermes chat, whether through Telegram or the TUI. Hermes handles the API calls, the sorting, and the formatting. Set a schedule with the cronjob tool, point the delivery at your chat or email, and run it once to verify the output.&lt;/p&gt;

&lt;p&gt;Here is what the actual output looks like on a typical morning:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TODAY'S BRIEFING — April 22, 2026

TODAY:
- 10:00 AM — Client sync call (Project Alpha)
- 3:00 PM — Article draft deadline

TOP 3 TASKS:
1. Finish API integration for client proposal — Client Work — Due: April 22
2. Review pull request #25 — Open Source Project — Due: April 23
3. Update Substack draft for next week — Substack — Due: April 25

ALERTS:
- "Design mockups feedback" task overdue since April 20
- "Invoice March services" due today

ONE DECISION:
Finish the API integration before the 10 AM call so you have something concrete to discuss.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is it. Twenty minutes. One source. One clear read every morning.&lt;/p&gt;

&lt;p&gt;If you have read &lt;a href="https://vibestacklab.substack.com/p/hermes-is-the-ai-agent-openclaw-promised" rel="noopener noreferrer"&gt;Hermes Is the AI Agent OpenClaw Promised to Be&lt;/a&gt;, you know why the cron architecture matters. This briefing runs on the same backbone. If you have not set Hermes up yet, &lt;a href="https://vibestacklab.substack.com/p/the-30-hermes-stack-that-makes-claude" rel="noopener noreferrer"&gt;The $30 Hermes Stack That Makes Claude Max Look Like a Ripoff&lt;/a&gt; walks through the full stack.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Know someone who checks six apps before breakfast? Forward this to them.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://vibestacklab.substack.com/p/how-to-automate-your-morning-with?utm_source=substack&amp;amp;utm_medium=email&amp;amp;utm_content=share&amp;amp;action=share" rel="noopener noreferrer"&gt;Share&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 2: The Substack Digest Email
&lt;/h2&gt;

&lt;p&gt;The morning briefing covers your tasks. Your newsletters need a separate system. I run a second cronjob that checks Gmail for Substack article emails twice a day on weekdays, at 9:15 AM and 4:15 PM. It reads each article, summarizes it, and sends me a formatted digest email. This is not part of the Telegram morning briefing. It is a separate pipeline with a different output and a different schedule.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What you need:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The &lt;a href="https://github.com/googleworkspace/cli" rel="noopener noreferrer"&gt;gws CLI&lt;/a&gt; tool for Google Workspace integration&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A Gmail account where Substack sends your digests&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;gws is a command-line tool that connects Hermes to Google Workspace, including Gmail. You install it once, authenticate with your Google account, and the cronjob gains read-only access to your inbox.&lt;/p&gt;

&lt;p&gt;Here is the cronjob prompt I use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Check Gmail for new Substack article emails and deliver a nicely formatted digest to your email, only covering emails received since the last check.

&lt;span class="gu"&gt;## Step 1: Check Gmail for new Substack articles&lt;/span&gt;
&lt;span class="p"&gt;
1.&lt;/span&gt; Read the last run timestamp from a tracking file. If it doesn't exist, use 24 hours ago as the cutoff.
&lt;span class="p"&gt;2.&lt;/span&gt; List Substack emails: gws gmail +triage --query "from:&lt;span class="err"&gt;*&lt;/span&gt;@substack.com" --max 20
   Note: Substack emails arrive already marked as read, so do NOT use is:unread.
&lt;span class="p"&gt;3.&lt;/span&gt; For each message, read it with gws gmail +read --id &lt;span class="nt"&gt;&amp;lt;message_id&amp;gt;&lt;/span&gt; to get the email Date header. Compare against the last run timestamp. Skip anything received AT or BEFORE the last run time. Only process emails received AFTER.
&lt;span class="p"&gt;4.&lt;/span&gt; Filter out non-article emails (follower notifications, subscriber alerts, live video announcements).
&lt;span class="p"&gt;5.&lt;/span&gt; For each new article in the time window:
   a. Extract the article URL from the email body
   b. Fetch the full article via markdown.new: curl -sL "https://markdown.new/http://&lt;span class="nt"&gt;&amp;lt;article_url&amp;gt;&lt;/span&gt;"
   c. Generate a short 2-3 sentence summary
   d. Determine if FREE or PAID by checking for paywall text

&lt;span class="gu"&gt;## Step 2: Output Format&lt;/span&gt;

Compose a formatted email digest with this exact structure:

📬 Substack Digest — [Day, Month DD, YYYY]

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🆓 Article Title
by Author Name
Summary: 2-3 sentences covering the core argument and why it matters
🔗 Article URL

🔒 Article Title (PAID — subscriber only)
by Author Name
Summary: 2-3 sentences covering the core argument and why it matters
🔗 Article URL

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Use 🆓 for free articles and 🔒 for paid ones. Keep it clean and scannable.

&lt;span class="gu"&gt;## Step 3: Deliver&lt;/span&gt;
&lt;span class="p"&gt;
1.&lt;/span&gt; Send the digest via gws gmail +send to your email
&lt;span class="p"&gt;2.&lt;/span&gt; Write the current time to the tracking file for the next run

If no new articles found, respond with [SILENT].

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The digest runs twice daily because Substack articles arrive throughout the day. The morning catch covers overnight posts. The afternoon catch covers everything published during work hours. Each run only processes articles received since the last check, so you never see duplicates.&lt;/p&gt;

&lt;p&gt;Here is what a typical digest looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;📬 Substack Digest — Tuesday, April 21, 2026

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🆓 Computer agents are going mainstream
by Jonas Braadbaart (The Circuit)
Summary: Examines the gap between AI adoption and actual agent deployment, arguing that individual operators are the ones closing the gap rather than enterprise teams. Practical look at how solo builders are using agents for real workflows.
🔗 https://thecircuit.substack.com/p/computer-agents-mainstream

🔒 Claude Managed Agents Review
by Creators AI
Summary: Testing managed agent workflows and comparing them to self-hosted alternatives. Covers setup complexity, cost trade-offs, and when managed services actually save time.
🔗 https://creatorsai.substack.com/p/managed-agents

🆓 Don Quixote and the Sorrowful Algorithm
by Farida Khalaf (Lights On)
Summary: Literary essay on AI narrative inevitability using Don Quixote as metaphor. Explores how algorithmic storytelling converges on predictable patterns despite different prompts.
🔗 https://lightson.substack.com/p/don-quixote-sorrowful-algorithm

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Trade-off:&lt;/strong&gt; You are giving your cronjob email access. Start with read-only permissions. Never give it send permissions until you have run this for a month and trust the output. The agent reads. You decide. That boundary matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 3: Plug in Slack, Jira, GitHub, or Anything with an API
&lt;/h2&gt;

&lt;p&gt;The morning briefing can pull from more sources besides Asana. The same cronjob that checks your tasks can also query Slack, Jira, GitHub, or any service with an API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What you can add:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Slack:&lt;/strong&gt; Unread DMs or mentions from specific channels&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Jira:&lt;/strong&gt; Tickets assigned to you, overdue sprints&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; PRs waiting for review, assigned issues&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Notion:&lt;/strong&gt; Database items flagged for review&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Any API:&lt;/strong&gt; If it has an API, Hermes can query it&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How to add a new source:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Get an API token for the service.&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add the token to your Hermes project environment.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add one step to the cronjob prompt: “Pull my open items from [service].”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add a section to the briefing template.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Test once, then schedule.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Trade-off:&lt;/strong&gt; More sources means more noise. I recommend adding one source per week and watching the signal-to-noise ratio. If the briefing gets longer than 30 seconds to read, trim something.&lt;/p&gt;

&lt;p&gt;The pattern stays consistent across every source. Token in environment. One new step in the prompt. One new section in the output. You do not need to rebuild the system. You just expand it.&lt;/p&gt;

&lt;p&gt;I run Asana for tasks and GitHub PRs for code review items. That is my sweet spot. Anything more and the briefing starts to feel like work before I have finished my coffee.&lt;/p&gt;

&lt;h2&gt;
  
  
  Information Without Action Is Just Noise
&lt;/h2&gt;

&lt;p&gt;The cronjob’s final instruction in every layer is the same: “At the end, identify the ONE decision or action that would make the rest of the day easier.”&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2buidf9nfvl2nl9g17am.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2buidf9nfvl2nl9g17am.png" alt="Image" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This prevents the briefing from becoming another scrollable feed. It forces me to think, not just consume.&lt;/p&gt;

&lt;p&gt;You read the briefing. You nod. You close it. You open your laptop and immediately forget what you just read. The information felt useful, but it did not change what you did next. The One Decision gate fixes this problem. Skip it and the briefing loses its purpose. The One Decision is the entire point.&lt;/p&gt;

&lt;p&gt;Some days the decision is obvious. “Finish the client proposal first so the deadline stops hanging over me.” Other days it is strategic. “Block two hours for deep work before opening Slack, or the day will get stolen.” Either way, I start with a clear intention instead of a reactive scan.&lt;/p&gt;

&lt;p&gt;Here are real One Decisions from my last week:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;“Reply to the contract email before noon so the other side does not stall waiting for us.”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;“Merge the PR before the afternoon standup so the team can proceed with testing.”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;“Write the article outline first because the blank page anxiety blocks everything else.”&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where the human-in-the-loop approach from article 005 on approval gates matters. The cronjob drafts. You approve. The system does not replace your judgment. It surfaces information so your judgment has something to work with.&lt;/p&gt;

&lt;p&gt;The briefing is a draft. You review it. The cronjob does not act without you.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Keep Your Briefing Brief
&lt;/h2&gt;

&lt;p&gt;Most people give up because the briefing becomes useless fast. Here are the three breakdowns I have seen, and how to fix each one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Too much noise&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Filter out newsletters, automated alerts, and low-priority senders. In Asana, use sections, tags, or due dates to surface only what matters. In email, filter by sender, not just unread status. If your briefing includes a GitHub notification about someone starring a repo you contributed to in 2022, your filters are too loose.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Stale priorities&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Update your project memory bank weekly. Five minutes. Review and tweak the prompt monthly, or the briefing drifts into generic summaries. The tasks you cared about in January differ from the tasks you care about in April. Your briefing should reflect that shift.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Cronjob goes stale&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you stop reading the briefing, the cronjob will keep sending it. It becomes inbox clutter. Pause the schedule. Fix the output. Resume. Do not let automation become noise you ignore. The system only works if you trust it enough to read it.&lt;/p&gt;

&lt;p&gt;I review my briefing prompt on the first Monday of each month. Five minutes. I check whether the sections still match what I need, whether any source has gotten too noisy, and whether the One Decision question still forces useful answers. Those five minutes save me from a month of useless briefings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Less Cognitive Load Is the Real Payoff
&lt;/h2&gt;

&lt;p&gt;Time saved is a side effect. The actual benefit comes from removing decision fatigue.&lt;/p&gt;

&lt;p&gt;I no longer open email first thing. I start with my own priorities. The two hours come from fewer context switches, less reactive mode, and clearer first actions. Moving between working and getting the right things done requires that shift in how you begin the day.&lt;/p&gt;

&lt;p&gt;Before the briefing, my morning involved a series of small decisions about what to check next. After the briefing, my morning involves one decision about what to do first. Everything else follows from there.&lt;/p&gt;

&lt;p&gt;The system has rough edges. Some days the cronjob misses context. Some days the One Decision misses the mark. The results still beat the chaos I dealt with before, and every prompt tweak makes it sharper.&lt;/p&gt;

&lt;p&gt;The architecture behind this is straightforward. One cronjob. Four sections. One forced decision. Each source adds a single API call and a single template section. The complexity lives in the filters, not the infrastructure.&lt;/p&gt;




</description>
      <category>automation</category>
      <category>agents</category>
      <category>workflows</category>
      <category>productivity</category>
    </item>
    <item>
      <title>The Agentic Engineering Shift</title>
      <dc:creator>cucoleadan</dc:creator>
      <pubDate>Tue, 14 Apr 2026 13:35:47 +0000</pubDate>
      <link>https://dev.to/cucoleadan/the-agentic-engineering-shift-4bac</link>
      <guid>https://dev.to/cucoleadan/the-agentic-engineering-shift-4bac</guid>
      <description>&lt;p&gt;&lt;em&gt;This post was originally published on my Substack publication as &lt;a href="https://vibestacklab.substack.com/p/the-agentic-engineering-shift" rel="noopener noreferrer"&gt;The Agentic Engineering Shift&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The fastest way to build a product you cannot maintain is to let AI write all of it without asking questions.&lt;/p&gt;

&lt;p&gt;Sure, you will ship faster than anyone in the room. Yeah, the demo will work. But two weeks later, when a user reports a bug, you will open the file and scroll through functions you do not remember writing. Because you did not write them. The AI did. You accepted the output, tested the happy path, and moved on.&lt;/p&gt;

&lt;p&gt;Changing one line feels dangerous because you cannot trace what it touches. The code works, but you do not own it. You are maintaining a stranger's project inside your own repo.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqdwq1xb046hfxxk4b9sj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqdwq1xb046hfxxk4b9sj.png" alt="Vibe coding vs agentic engineering illustration showing the transition from AI output acceptance to structured oversight" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Andrej Karpathy gave this gap a name. In February 2025 he coined "vibe coding" to describe the practice of accepting AI output without scrutiny. A year later he followed up with "agentic engineering," the practice of orchestrating AI agents with oversight, structure, and human judgment at every step.&lt;/p&gt;

&lt;p&gt;The naming matters because thousands of builders recognized themselves in it. But a label does not tell you what to change. This post gives you a way to figure out where you sit on the spectrum between the two and which practices will move you forward.&lt;/p&gt;

&lt;p&gt;If you have read &lt;a href="https://vibestacklab.substack.com/p/how-to-architect-a-feature-in-5-minutes" rel="noopener noreferrer"&gt;How To Architect A Feature In 5 Minutes Before Talking To AI&lt;/a&gt;, you already know why thinking before prompting matters. This piece picks up where that one left off. Thinking before prompting is the starting habit. What follows is the full set of habits that separates builders who ship from builders who ship and survive.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;In this edition:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;What Karpathy's two terms mean and where the conversation stops being useful&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A 4-stage maturity spectrum to locate where you are right now&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The 5 workflow practices that make the shift from vibe coding to agentic engineering concrete&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A 60-second diagnostic you can run on your last shipped feature tonight&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Karpathy Named
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhi0wl9ta8ya81rrijs85.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhi0wl9ta8ya81rrijs85.png" alt="AI assistant code generation concept illustration showing a developer reviewing AI output" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Vibe coding means giving an AI a prompt, getting code back, and shipping it without meaningful review. You trust the output because it runs. You move fast because the feedback loop feels instant.&lt;/p&gt;

&lt;p&gt;The AI writes a function, you glance at the result, and if nothing throws an error, it goes into the codebase.&lt;/p&gt;

&lt;p&gt;Agentic engineering means working with AI agents as part of a structured process. You provide context, define constraints, review outputs, run evaluations, and make the final call on what ships.&lt;/p&gt;

&lt;p&gt;The AI still does the heavy lifting, but YOU direct the work and own the result. Every piece of generated code passes through your judgment before it touches production. &lt;em&gt;And this goes beyond coding.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Both terms describe real behaviors that builders already practiced before the vocabulary existed. Karpathy gave the community a shared language for something people felt but struggled to articulate. The developer copying AI output straight into a feature branch at 2am was vibe coding long before anyone named it.&lt;/p&gt;

&lt;p&gt;The problem is that most of the conversation stopped at the labels. Forums and comment sections turned it into a binary: vibe coding bad, agentic engineering good. That framing misses the point.&lt;/p&gt;

&lt;p&gt;Nobody operates at one extreme all the time. A solo builder prototyping on a Saturday afternoon and a team shipping a payments feature to 10,000 users should not follow the same process. The real question is what specific behaviors separate one from the other, and how you move between them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Spectrum
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcqz9x76dt6iu28elkam8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcqz9x76dt6iu28elkam8.png" alt="Four-stage maturity spectrum diagram from Prompt and Ship to Agent with Guardrails" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is not a pass/fail test. Builders sit at different points on a gradient, and the position shifts depending on the project, the deadline, and the stakes. The useful exercise is recognizing where you are right now based on what you do, not what you believe.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 1: Prompt and Ship&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You describe what you want. The AI writes it. If the output runs, you merge it. Testing means clicking through the feature once to confirm it loads.&lt;/p&gt;

&lt;p&gt;This works for throwaway prototypes and weekend experiments. The failure mode shows up later. Features become untouchable because you cannot predict what changing one piece will break.&lt;br&gt;
You open a file, see 200 lines of logic, and realize you have no idea why the AI structured it that way. Every revisit feels like defusing a bomb you did not build.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 2: Patch and Pray&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Bugs appear. You fix them by prompting the AI again with the error message. The fix works, but you still do not understand the underlying structure. You know fragments of the codebase. You do not know the system.&lt;/p&gt;

&lt;p&gt;This is where most builders I see in forums and Reddit threads are sitting right now. The product shipped. Users showed up. And every maintenance task takes three times longer than it should because the codebase grew in directions nobody planned.&lt;/p&gt;

&lt;p&gt;The failure mode here is compounding. Each reactive patch adds weight. You fix the form validation bug and break the error toast. You fix the error toast and notice the loading state flickers. Confidence drops with every fix because you are never sure the patch did not break something else. The codebase starts to feel adversarial.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 3: Review and Contain&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You start reading the AI's output before merging. You notice recurring patterns in what the AI gets wrong. You add checks. You push back on suggestions that feel over-engineered or unclear.&lt;/p&gt;

&lt;p&gt;At this stage, AI becomes a fast junior developer on your team rather than an oracle. You treat its output the way a senior developer treats a pull request from someone in their first year. You catch the unnecessary abstraction. You question why it created three helper functions when one would do.&lt;/p&gt;

&lt;p&gt;The failure mode is inconsistency. The review habit exists, but it drops off when you are tired, rushed, or excited about a feature. Friday afternoon, deadline looming, a new feature working on the first try. The temptation to merge without reviewing is strongest when the output looks clean. Process without discipline reverts to vibe coding under pressure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 4: Agent with Guardrails&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You work with explicit context documents, evaluation criteria, test expectations, and review gates. You can explain why every function exists and what conditions would break it. The AI still generates the code. You architect the system and verify the output.&lt;/p&gt;

&lt;p&gt;The failure mode even here is over-automation. Trusting the process so fully that you stop applying judgment to edge cases. The test suite passes, the evaluation loop looks green, and you ship without reading the diff. Process is a tool, not a replacement for thinking.&lt;/p&gt;

&lt;p&gt;Most builders reading this will recognize themselves somewhere in stages 2 or 3. That recognition is the starting point. The next section covers what moves you forward.&lt;/p&gt;

&lt;p&gt;If you have been through &lt;a href="https://vibestacklab.substack.com/p/the-build-vs-buy-scorecard" rel="noopener noreferrer"&gt;The Build vs Buy Scorecard&lt;/a&gt;, you already know the value of slowing down before making technical decisions. The same principle applies here. The spectrum rewards deliberate behavior over fast output.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Practices
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhcysacphc64bmnsbsltj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhcysacphc64bmnsbsltj.png" alt="Five workflow practices illustration showing review gates, eval loops, test coverage, context architecture, and explain-it-back check" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The shift from vibe coding to agentic engineering is visible in workflow, not philosophy. These are five habits you can observe yourself doing, or not doing. Each one addresses a specific failure mode from the spectrum above.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Review Gates&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Treat every AI output like a pull request from a junior developer. Read the code before you merge it. Check whether the approach matches what you asked for. Look for unnecessary complexity, redundant calls, or logic you cannot follow.&lt;/p&gt;

&lt;p&gt;When you skip this: you inherit code you cannot reason about. The AI might add a caching layer you never asked for, or restructure your data flow in a way that makes sense in isolation but clashes with the rest of your system. The codebase grows in ways you did not choose, and every future change requires re-learning what the AI decided on your behalf.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Eval Loops&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Test more than "does it work once." Feed the AI's code edge cases, unexpected inputs, and failure scenarios. If you built a form handler, send it empty fields, duplicate submissions, and malformed data. Check what happens when the external API is slow or down.&lt;/p&gt;

&lt;p&gt;When you skip this: the AI passes the demo and fails the real world. You find the bugs in production instead of in development, and your users find them before you do.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Test Coverage for Agent Output&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If the agent wrote the code, someone needs to verify it holds up. Write tests for the critical paths. If you do not write tests yourself, at minimum run the feature through its failure modes manually before shipping.&lt;/p&gt;

&lt;p&gt;When you skip this: "works in dev" becomes "breaks in production." Maintenance turns into archaeology because you are digging through code with no map of what was supposed to happen. You can pair this with the approach in &lt;a href="https://vibestacklab.substack.com/p/how-to-prompt-ai-for-consistent-json" rel="noopener noreferrer"&gt;How to Prompt AI for Consistent JSON Responses&lt;/a&gt; to make sure the outputs you are testing against stay predictable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Context Architecture&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The quality of AI output depends on the quality of your input. Before prompting, define what the feature needs to handle, what it connects to, and what constraints exist. Break the problem into scoped pieces. Give the agent acceptance criteria, not open-ended requests.&lt;/p&gt;

&lt;p&gt;When you skip this: the agent guesses the system you meant. It fills in gaps with assumptions pulled from training data, and those assumptions may not match your product, your users, or your stack. You ask for a notification system and get a full pub/sub architecture when all you needed was a database flag and a polling endpoint. This is where the &lt;a href="https://vibestacklab.substack.com/p/how-to-architect-a-feature-in-5-minutes" rel="noopener noreferrer"&gt;5-minute architecture sketch&lt;/a&gt; pays off the most.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. The Explain-It-Back Check&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before shipping any AI-generated code, explain what it does in your own words. Walk through the logic, the data flow, and the failure path. If you hit a function you cannot explain, that is the part that will break first in production.&lt;/p&gt;

&lt;p&gt;When you skip this: ownership never transfers back to you. The code ships under your name, but the understanding stays with the model that generated it. When a user reports a bug at 11pm, you will stare at the function and have no starting point for debugging it. You become a passenger in your own project.&lt;/p&gt;

&lt;p&gt;None of these practices make AI infallible. AI will still produce flawed output, miss edge cases, and make assumptions you did not ask for. These practices make your relationship to that output honest. You stop hoping the code is correct and start knowing where to look when it is not. The AI handles generation. You handle judgment. That division of labor is the entire point.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 60-Second Test
&lt;/h2&gt;

&lt;p&gt;Open the last feature you shipped with AI assistance. Pick any file from that feature and read through it.&lt;/p&gt;

&lt;p&gt;For each function, check three things. Whether you can explain what it does without reading it line by line. Whether you know what happens when it fails. Whether there is a test covering the critical path.&lt;/p&gt;

&lt;p&gt;If you breeze through all three, pick a second file. Keep going until you hit the wall. Most builders find it faster than they expect.&lt;/p&gt;

&lt;p&gt;The first question you cannot answer marks your starting point on the spectrum. That is the exact spot where your next upgrade begins. You do not need to fix everything at once. Pick one practice from the list above that addresses the failure mode you are sitting in, and apply it to the next feature you build.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>coding</category>
      <category>workflow</category>
      <category>codequality</category>
    </item>
    <item>
      <title>The $30 Hermes Stack That Makes Claude Max Look Like a Ripoff</title>
      <dc:creator>cucoleadan</dc:creator>
      <pubDate>Tue, 07 Apr 2026 13:32:00 +0000</pubDate>
      <link>https://dev.to/cucoleadan/the-30-hermes-stack-that-makes-claude-max-look-like-a-ripoff-363k</link>
      <guid>https://dev.to/cucoleadan/the-30-hermes-stack-that-makes-claude-max-look-like-a-ripoff-363k</guid>
      <description>&lt;p&gt;&lt;em&gt;This post was originally published on my Substack publication as &lt;a href="https://vibestacklab.substack.com/p/the-30-hermes-stack-that-makes-claude" rel="noopener noreferrer"&gt;The $30 Hermes Stack That Makes Claude Max Look Like a Ripoff&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I was paying $200 for Claude Max and still hitting limits mid-project. I'd open Hermes to write some code, summarize an article or set some reminders for later use. It was fast and way better than OpenClaw, but something was keeping me from going all-in.&lt;/p&gt;

&lt;p&gt;Then I spent the full week to figure out how to configure it properly.&lt;/p&gt;

&lt;p&gt;Now Hermes remembers everything across sessions, manages my projects, syncs files instantly across devices, and handles complex workflows while I'm asleep. It went from a tool I use to a teammate that works independently.&lt;/p&gt;

&lt;p&gt;Here's exactly how to do the same.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8yek3wo7x9t03t55aya5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8yek3wo7x9t03t55aya5.png" alt="Hermes agent dashboard showing configuration interface with provider settings"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  In This Article:
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;The two AI providers worth using right now (Fire Pass vs OpenCode Go) so Hermes never chokes mid-session.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The four tools that changed how I use Hermes (GitHub CLI, Telegram gateway, Brave Search, Skills) and the workflows they make possible.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fixing persistent memory with Honcho, replacing bloated Nextcloud with lean WebDAV, and wiring up Asana so your work doesn't disappear.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The 30-day plan to full turbo mode, plus why you should never expose port 8642.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Open your Hermes setup right now and count how many of these you have configured. A fast AI provider with unlimited or high-limit access. Cross-session memory that works reliably. Integration with your project management tool. File sync that doesn't make you wait 30 seconds. Skills that automate repetitive workflows. Remote access from your phone.&lt;/p&gt;

&lt;p&gt;Most people have one or two. Maybe three if they're motivated.&lt;/p&gt;

&lt;p&gt;The setup looks intimidating, so people skip most of it. I did the same thing for weeks.&lt;/p&gt;

&lt;p&gt;But each of these capabilities compounds on the others. Fast AI lets you iterate. Memory means you stop repeating yourself every session. Project integration means tasks get tracked automatically. File sync means your notes show up everywhere. Skills mean the boring stuff runs without you.&lt;/p&gt;

&lt;p&gt;Once all of them are running together, the experience changes. Hermes stops feeling like a chatbot you type into and starts feeling like someone who knows your work, remembers your preferences, and handles things without being asked twice.&lt;/p&gt;

&lt;p&gt;That shift is what the rest of this article builds toward.&lt;/p&gt;

&lt;p&gt;Hermes is only as good as the AI powering it. Pick the wrong provider and you'll hit rate limits at the worst moment or watch tokens drain your budget faster than expected.&lt;/p&gt;

&lt;p&gt;I tested several. Two stood out.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr0sey26qoxr4f7wh1nqg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr0sey26qoxr4f7wh1nqg.png" alt="Fireworks Fire Pass vs OpenCode Go pricing comparison chart showing $7/week versus $5-10/month"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fireworks Fire Pass&lt;/strong&gt; costs $7 per week (about $30/month), first week free. You get unlimited access to Kimi K2.5 Turbo at roughly 393 tokens per second. That's one of the fastest inference speeds available anywhere right now.&lt;/p&gt;

&lt;p&gt;The catch: it's Kimi K2.5 only. No model variety, no backup if Kimi goes down. But for coding, reasoning, and long documents, Kimi handles all of it well. And at 393 t/s, even long outputs feel instant.&lt;/p&gt;

&lt;p&gt;Kimi K2.5 Turbo runs on a 1 trillion parameter MoE architecture with 32 billion active per forward pass. The "Turbo" label means the same weights served on optimized infrastructure, with a 256k context window and strong agentic tool use. When I'm in the middle of a long coding session and need fast iteration, this is what I reach for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenCode Go&lt;/strong&gt; costs $5 the first month, then $10. Instead of one fast model, you get six with generous request limits: MiniMax M2.7, MiniMax M2.5, MiMo-V2-Omni, GLM-5, Kimi K2.5, and MiMo-V2-Pro.&lt;/p&gt;

&lt;p&gt;MiniMax M2.7 is the standout. Released March 2026, it scores 50 on the Artificial Analysis Intelligence Index, matching GLM-5 at roughly one-third the cost.&lt;/p&gt;

&lt;p&gt;My recommendation: start with Fire Pass if you want simplicity and speed. Switch to OpenCode Go when you find yourself wanting to test alternatives or when you're doing bulk work where MiniMax M2.7's cost advantage matters.&lt;/p&gt;

&lt;p&gt;Both work with Hermes out of the box. Set your API key in &lt;code&gt;~/.hermes/.env&lt;/code&gt; and you're running.&lt;/p&gt;

&lt;p&gt;Hermes has built-in memory, but it's session-scoped by default. Close the terminal, lose the context. Fine for one-off tasks. Useless for ongoing work.&lt;/p&gt;

&lt;p&gt;I noticed this on day three. I'd spend twenty minutes bringing Hermes up to speed on a project we'd discussed the day before. The conversations were gone. Every morning felt like onboarding a new hire.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Honcho&lt;/strong&gt; fixed this. It's an open-source memory library that gives Hermes persistent cross-session context. The team describes it as a "peer paradigm" where both you and the agent build a relationship over time. In practice, it stores facts about you, your projects, your preferences. Every new session starts with that context already loaded. No re-explaining your stack, your location, or your goals.&lt;/p&gt;

&lt;p&gt;Setting it up locally took me longer than I expected. Docker Compose, deriver logs, token limit errors. I spent hours watching the deriver fail with "Observation content exceeds maximum token limit of 8192" when synthesizing my imported memory files. The raw search worked fine, but the AI-synthesized peer cards kept failing on large imports.&lt;/p&gt;

&lt;p&gt;Here's the honest breakdown. The raw memory retrieval is solid. Honcho stores entries and retrieves them instantly. The AI synthesis layer, the part that builds distilled user profiles, chokes on large imports. Use raw search for now.&lt;/p&gt;

&lt;p&gt;On April 3, 2026, Hermes introduced the Pluggable Memory Provider Interface. Memory is now an extensible plugin system where third-party backends register through a provider ABC. This changes things.&lt;/p&gt;

&lt;p&gt;The providers available today:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Honcho&lt;/strong&gt;, the reference implementation with AI-native cross-session modeling&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hindsight (vectorize.io)&lt;/strong&gt;, a purpose-built plugin hitting 91.4% accuracy on LongMemEval&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mem0&lt;/strong&gt;, widely adopted but cloud-focused&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Letta (formerly MemGPT)&lt;/strong&gt;, a full agent platform with tiered memory&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Zep/Graphiti&lt;/strong&gt;, temporal knowledge graphs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;OpenViking, Holographic, RetainDB, ByteRover&lt;/strong&gt;, community alternatives&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'm exploring building a custom solution. Full local control, token-efficient storage, direct Hermes integration without middleware, and a deriver that doesn't choke on context limits. The pluggable interface makes this possible now.&lt;/p&gt;

&lt;p&gt;For immediate setup: run &lt;code&gt;hermes memory setup&lt;/code&gt; and select Honcho. It works well enough for raw search. Expect synthesis to improve, or plan to swap providers as the ecosystem matures.&lt;/p&gt;

&lt;p&gt;Once you have fast AI and working memory, the next layer is the tooling that makes Hermes useful beyond chat.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl49fxq46q3kj5vearivb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl49fxq46q3kj5vearivb.png" alt="Hermes tool integration diagram showing GitHub CLI, Telegram gateway, Brave Search, and Skills connections"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub CLI&lt;/strong&gt; was the first thing I set up. Install &lt;code&gt;gh&lt;/code&gt;, authenticate once, and you have commits, pushes, and PR management without leaving the terminal. This became the foundation for everything else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Telegram integration&lt;/strong&gt; is what made the whole setup click for me. Run &lt;code&gt;hermes gateway telegram setup&lt;/code&gt; once and you have a direct line to your agent from anywhere. I use this constantly. Someone messages me about a website change while I'm out. I send a Telegram command to Hermes. It pulls the repo, edits the file, commits with "[via Telegram]" in the message, pushes. Vercel auto-deploys. I never opened a laptop.&lt;/p&gt;

&lt;p&gt;That workflow alone justified the entire Hermes setup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Brave Search&lt;/strong&gt; is the research tool I reach for most. Built into Hermes via MCP, it finds emails, hiring managers, technical documentation, competitive intelligence. The queries I run regularly: &lt;code&gt;"company" "hiring manager" email&lt;/code&gt;, &lt;code&gt;"competitor" pricing 2026&lt;/code&gt;, &lt;code&gt;"technology" benchmark performance&lt;/code&gt;. For contract work research, nothing comes close.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skills&lt;/strong&gt; are reusable instruction packages that teach your agent to perform specific tasks consistently. I have one for deploying to production, another for writing article briefs, another for analyzing codebases. Install with &lt;code&gt;npx skills add&lt;/code&gt; from sources like Vercel Labs or LobeHub.&lt;/p&gt;

&lt;p&gt;One thing to watch out for: the skills CLI doesn't fully recognize Hermes yet. It defaults to &lt;code&gt;.openclaw/&lt;/code&gt; directories. Import manually to &lt;code&gt;~/.hermes/skills/&lt;/code&gt; instead. And skills have full system access, so only install from sources you trust.&lt;/p&gt;

&lt;p&gt;Sometimes Hermes struggles with system-level operations. Editing system files, installing packages that need root permissions. The sandboxing gets in the way.&lt;/p&gt;

&lt;p&gt;I keep &lt;strong&gt;OpenCode&lt;/strong&gt; running on my VPS root for these situations. Quick system tweak, I use OpenCode. Complex multi-step workflow, I switch to Hermes. Two tools, each in the environment where it performs best.&lt;/p&gt;

&lt;p&gt;All this capability falls apart if you lose track of what needs doing. I use &lt;strong&gt;Asana&lt;/strong&gt; because it integrates cleanly and the free tier handles personal projects.&lt;/p&gt;

&lt;p&gt;The setup: Python &lt;code&gt;asana&lt;/code&gt; package in a dedicated virtual environment, CLI wrapper at &lt;code&gt;/usr/local/bin/asana-api&lt;/code&gt;, token in &lt;code&gt;~/.asana_env&lt;/code&gt; sourced by &lt;code&gt;.bashrc&lt;/code&gt;. My main project is called "Hermes project" and Hermes remembers the GID, auto-linking tasks to conversations.&lt;/p&gt;

&lt;p&gt;During a session I'll say "create an Asana task to research Hindsight memory provider." Hermes creates it, tags it with the session ID, and I pick it up later from anywhere. The task lives in one place regardless of which device or gateway I used to create it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Linear&lt;/strong&gt; works well too if you prefer GraphQL. &lt;strong&gt;Notion&lt;/strong&gt; databases are popular. The tool matters less than the habit: one source of truth your agent reads and writes to.&lt;/p&gt;

&lt;p&gt;If you're syncing Obsidian with Nextcloud right now, you already know the pain. Thirty seconds to sync two hundred small files. File locking issues during rapid changes. A database-backed architecture adding overhead you never asked for.&lt;/p&gt;

&lt;p&gt;I ran Nextcloud for months. It worked. But every sync felt like watching paint dry.&lt;/p&gt;

&lt;p&gt;The fix: &lt;strong&gt;WebDAV server&lt;/strong&gt; + &lt;strong&gt;Filebrowser&lt;/strong&gt; + &lt;strong&gt;Obsidian LiveSync&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;WebDAV is purpose-built for file sync. No database layer, direct file operations, lightweight protocol. Filebrowser adds a web UI for browser access when you need it. Together they're roughly ten times faster than Nextcloud for the same job.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;webdav&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bytemark/webdav&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./data:/var/lib/dav&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;AUTH_TYPE=***&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;USERNAME=youruser&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;PASSWORD=***&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080:80"&lt;/span&gt;

  &lt;span class="na"&gt;filebrowser&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;filebrowser/filebrowser&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./data:/srv&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./filebrowser.db:/database.db&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8081:80"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In Obsidian, install the Remotely Save plugin, point it at your WebDAV endpoint, set a 30-second sync interval. Done. Files created by Hermes appear instantly in your notes. Briefs, articles, research, everything syncs across devices without the Nextcloud overhead.&lt;/p&gt;

&lt;p&gt;If you're running Nextcloud for Obsidian sync only, this one change saves you hours of waiting per month.&lt;/p&gt;

&lt;p&gt;Hermes runs as an API server on port 8642. Other tools connect to it. IDE extensions in VS Code, Zed, JetBrains. Custom tools that send tasks. Multi-agent systems where one Hermes serves multiple clients. Webhooks from external services.&lt;/p&gt;

&lt;p&gt;The v0.7.0 ACP (Agent Client Protocol) integration means editors register their own MCP servers and Hermes automatically discovers them as tools. Full slash command support in your IDE, powered by your configured Hermes instance.&lt;/p&gt;

&lt;p&gt;This sounds great until you think about what you're exposing.&lt;/p&gt;

&lt;p&gt;Hermes has full system access. Terminal, file system, API keys, everything. Exposing port 8642 exposes all of that. Any client connecting executes arbitrary commands. And there's no built-in authentication in the base setup.&lt;/p&gt;

&lt;p&gt;I learned this the hard way when I briefly opened the port to test an integration from my phone. It worked, but I realized anyone on my network had the same access I did. Shut it down within the hour.&lt;/p&gt;

&lt;p&gt;If you need to expose Hermes, put a reverse proxy in front of it. Cloudflare Access or Authelia work well. Restrict to local network when possible. Use token-based auth with short expiry. Enable approval mode so every action requires manual confirmation. Never expose raw port 8642 to the internet.&lt;/p&gt;

&lt;p&gt;The safer approach: use Telegram or Discord gateways for remote access. They have built-in platform authentication. Run separate Hermes instances per project with limited scopes. Use Docker sandbox for anything untrusted.&lt;/p&gt;

&lt;p&gt;The API mode is the most capable part of the Hermes stack, and the easiest way to accidentally give the internet shell access to your systems.&lt;/p&gt;

&lt;p&gt;You don't need to do all of this in one weekend. Here's the order that worked for me.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh6hft6j9tv3eeft1db72.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh6hft6j9tv3eeft1db72.png" alt="30-day Hermes setup roadmap showing Week 1 through Week 4 milestones"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Week 1&lt;/strong&gt; is the foundation. Pick your provider, Fire Pass for unlimited Kimi or OpenCode Go for variety. Set up your API keys in &lt;code&gt;~/.hermes/.env&lt;/code&gt;. Configure GitHub CLI with &lt;code&gt;gh auth login&lt;/code&gt;. Run a few test conversations to make sure everything connects.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Week 2&lt;/strong&gt; is memory. Set up Honcho locally with Docker Compose. Run &lt;code&gt;hermes memory setup&lt;/code&gt; and select Honcho. Verify that raw memory search returns results. Import existing context from past conversations. By the end of this week, Hermes should remember who you are when you open a new session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Week 3&lt;/strong&gt; is tooling. Configure the Telegram gateway. Install 3-5 essential skills, manually importing to &lt;code&gt;~/.hermes/skills/&lt;/code&gt;. Set up your Asana CLI integration (or Linear, or Notion). Test Brave Search with a few research queries. This is the week where Hermes starts feeling useful beyond basic chat.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Week 4&lt;/strong&gt; is sync. Deploy the WebDAV + Filebrowser stack. Configure Obsidian's Remotely Save plugin. Migrate your notes from whatever slow setup you're running now. Verify that files sync instantly across all your devices.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After month one&lt;/strong&gt;, experiment with API mode locally. Explore alternative memory providers. Build internal tools that call Hermes, with proper authentication in front of everything.&lt;/p&gt;

&lt;p&gt;Each week builds on the last. By the end you'll have something that remembers everything, works while you sleep, and syncs across every device you own.&lt;/p&gt;

&lt;p&gt;That's the setup. Everything before it is first gear.&lt;/p&gt;

</description>
      <category>hermes</category>
      <category>agents</category>
      <category>setup</category>
      <category>tools</category>
    </item>
  </channel>
</rss>
