DEV Community: cucoleadan

My Hermes AI Agent Maintenance Routine For Maximum Reliability

cucoleadan — Tue, 26 May 2026 12:59:38 +0000

This post was originally published on my Substack publication as My Hermes AI Agent Maintenance Routine For Maximum Reliability.

Last week, I spent a few days blaming the model before I realized Hermes was waiting on a memory recall timeout.

When the response time got worse, I assumed provider latency because I'd changed models before and knew that layer could get noisy.

The real problem sat one layer earlier, inside the retrieval path I hadn't checked yet.

My external memory provider, Hindsight, threw a retrieval error, Hermes retried, and the request stalled because the memory system was broken before the model ever had a chance to answer.

A few days later, my Friday Hermes health-summary job missed its Telegram report over a long weekend. The stack still answered messages, but the missing report told me the scheduled workflow had stopped producing the artifact I expected to see.

Hermes maintenance means checking the layers around the model before you blame the model. The routine I use now is a set of cron-backed prompts that check memory, gateways, scheduled jobs, model IDs, and backups, then stop before they make changes that need approval.

Most install guides skip this part because they get you to the first successful command, then leave you with a working AI control plane and no maintenance loop around it.

Hermes feels like one system when it works, but it routes through models, memory, gateways, skills, cron jobs, provider keys, and local files, so the fault can sit in any one of those layers when the stack starts behaving strangely.

And don't get me wrong, I've never had a single issue with the actual Hermes code compared with my time using OpenClaw, but I have had issues with models, providers, and third-party integrations.

The model gets blamed first because it's the visible part of the stack, while the failure usually starts somewhere less obvious.

This article is the maintenance routine I use now, rewritten as prompts you can hand to your agent.

In this article:

A maintenance routine you can run after Hermes is installed, so silent drift doesn't turn into a broken workflow.
Copy-paste cron-job prompts for daily, weekly, and monthly checks across memory, gateways, scheduled jobs, providers, and backups.
A simple approval rule that lets agents report problems without giving them permission to delete, update, rotate, restore, or rewrite anything.
A rollout path for turning maintenance into useful visibility instead of another noisy automation.

After Hermes Install

The first successful Hermes run can trick you into treating setup as finished before operations have even started. You connect a provider, configure the gateway, test memory, send a message through Telegram or the TUI, and watch Hermes answer with context from the project you care about.

That moment is where the stack leaves the install guide and becomes something you have to run. Old configs can keep stale model names, scheduled jobs can miss their expected output, memory calls can slow down, and backups can look comforting until the first restore test fails.

I treat those failures as normal infrastructure behavior because a control plane becomes trustworthy only after you can see whether its dependencies are still healthy.

That lesson showed up during my OpenClaw to Hermes migration, even though the migration itself went smoothly. The first week felt better because Hermes followed instructions more closely, kept memory behavior cleaner, and made the gateway setup feel less stitched together.

The first problems were small enough to ignore in the moment but specific enough to matter later. An imported publishing skill failed because its YAML header was malformed, one environment variable was missing from the runtime, and token usage climbed while memory ingestion ran behind the workflow I was paying attention to.

None of those problems killed the setup, but each one pointed at the same operational truth: the model is only one layer inside a wider system. I stopped treating maintenance as an occasional chore once I realized a scheduled prompt could check those layers before the next failure stole an afternoon.

Cron Prompts Beat Commands

The earlier version of this routine had shell commands sprinkled through the article because that was how I checked my own server. Commands are useful when your environment matches mine, but they don't travel cleanly across Windows, Linux, Docker, hosted runners, local agents, and the custom glue every serious stack accumulates over time.

The official Hermes Agent docs are where I would start for setup details. This piece starts after setup, when the question changes from "Can Hermes run?" to "Can I trust this workflow tomorrow?"

Prompts travel better because they describe the job instead of assuming the tool. A cron-backed agent can inspect logs, check timestamps, call a gateway, read a config file, compare recent output, or ask for approval using the tools available inside its own environment.

If the maintenance prompt needs to reach outside Hermes, the same decision from When to Use MCPs, CLIs, or Your Own Tool applies here: use the smallest interface that can inspect the system cleanly without turning one check into a brittle integration project.

A scheduled prompt still needs firm boundaries because a useful maintenance job names the layer being checked and asks for evidence before it reports confidence. The report should be readable at a glance, but the agent should refuse to delete, update, rotate, restore, or rewrite anything without approval.

That boundary turns maintenance automation into a reporting system instead of a new source of damage. I want the agent to notice problems before I do while every irreversible action still comes back to me as a decision.

Three Maintenance Layers

My Hermes maintenance routine uses three layers that map cleanly to the way the stack fails: updates, cleanup, and health checks. Those labels keep the job concrete enough for a scheduled agent to report on the system without turning the prompt into a vague request to "check Hermes."

This is the operational side of the $30 Hermes stack, because a cheaper and more flexible agent setup only stays useful if the layers around it keep working.

The update layer asks whether something changed underneath the workflow while I was focused on using it. Providers rename models, preview routes become stale, plugins move, skills change formats, and memory backends update their APIs.

The cleanup layer asks whether the stack has accumulated enough junk to start changing behavior. Logs grow, sessions pile up, cached files stick around, and memory keeps old context long after the project has moved on.

The health-check layer answers the operational question before I start relying on the stack again. Before the workday starts, I want evidence that the gateway answers, the provider route works, scheduled jobs are producing output, and memory can retrieve a recent decision without timing out.

The layers keep the routine small enough to survive a busy week without reducing the review to a shallow status ping. Maintenance disappears when it depends on a vague intention, while a scheduled job with named layers can keep running after the calendar gets crowded.

Daily Hermes Health Check Prompt

The daily job should be boring enough that you can read it every morning without turning the start of the day into a debugging session. Its job is to tell you whether the stack is ready for work, then stop before it tries to repair anything.

Use this as a read-only cron job near the start of the workday, then adapt the gateway name, job names, and project references to match your own setup.

Create a Hermes cron job called "Daily Stack Pulse" that runs every morning at 8:00 local time, delivers to origin, and uses a cheap model (gemini-3.1-flash-lite via openrouter, or deepseek-v4-flash via opencode-go — pick whichever is configured). Restrict toolsets to terminal and web. Use this exact prompt body for the job:

---
Run a daily read-only Hermes stack pulse check. Make no changes: do not delete files, rotate keys, update packages, prune memory, restore backups, or rewrite configuration.

1. Gateway. Send or simulate one normal request through the Telegram gateway and confirm it responds.
2. Scheduled workflows. Run `hermes cron list` and inspect ~/.hermes/cron/output/ for the latest runs of jobs tagged or named for morning briefing, health summary, memory maintenance, publishing, client, or paid workflows. Confirm each ran inside its expected window.
3. Logs. Scan recent warnings and errors from the Hermes runner (~/.hermes/logs/), the model provider, the memory layer (hindsight), the gateway, and the scheduler.
4. Memory recall. Run one hindsight_recall query against an active project decision (use "All Agents Considered newsletter" or "Vibe Stack Lab library repo"). Report whether the result was relevant, stale, missing, or slow.

Return a short report with exactly these sections, one sentence per item:

PASS:
Healthy checks with evidence.

WARN:
Items needing attention later, with the layer named in parentheses.

FAIL:
Broken or missing items that block reliance on the stack today.

APPROVAL NEEDED:
Any action that would delete, update, rotate, restore, rewrite, prune, or change provider behavior. Name the action and layer. Do not execute.
---

After creating the job, run it once immediately so we can see the first report, then confirm the job ID and schedule.

The report matters more than the scheduler that happens to run it, as long as the result gives you enough evidence to trust or question the stack. You can run the prompt from cron, a recurring Hermes task, a hosted automation, a CI runner, or any agent runner that has permission to inspect the stack.

I care most about evidence that the gateway answered, the important jobs ran, memory recall still works, and recent errors haven't turned into a pattern. Once the report names the failed layer, the next step becomes smaller because the investigation has a place to start.

Weekly AI Agent Drift Review Prompt

My quiet cron failure is the reason I care more about weekly drift than a one-time setup checklist. A job definition sitting in a scheduler proved nothing once the Friday health-summary report stopped reaching Telegram.

That is the same reason my morning Hermes workflow checks visible output instead of trusting that a scheduled task exists somewhere in a config file.

The weekly review looks for slow changes that don't announce themselves while normal work still appears to be moving. Disk pressure, stale output, growing logs, slow memory, and old model IDs rarely feel urgent while they are accumulating, but they become expensive once they pile up inside a broken workflow.

Use this prompt near the end of the week, when the report can shape a short maintenance pass instead of interrupting deep work in the middle of a day.

Create a Hermes cron job called "Weekly Drift Review" that runs every Sunday at 9:00 local time, delivers to origin, and uses a cheap model (gemini-3.1-flash-lite via openrouter, or deepseek-v4-flash via opencode-go — pick whichever is configured). Restrict toolsets to terminal and web. Use this exact prompt body for the job:

---
Run a weekly read-only Hermes drift review. Make no changes. If a fix is obvious, list it under RECOMMENDED ACTIONS or APPROVAL NEEDED but do not execute.

1. Storage growth. Measure size of ~/.hermes/logs/, ~/.hermes/sessions/, ~/.hermes/cache/, ~/.hermes/memory/, ~/.hermes/cron/output/, /tmp/hermes*, and any backup folder under ~/.hermes/. Compare to last week if a snapshot exists at ~/.hermes/cron/output/drift-snapshot.json. Save a fresh snapshot at that path after measuring. Flag any folder that grew more than 25 percent or crossed 1GB.

2. Scheduled jobs. Run `hermes cron list`. For each job, confirm it exists, has run inside its expected window, and produced a visible artifact in ~/.hermes/cron/output/ or the delivery channel. A job definition with no recent run counts as broken.

3. Memory recall. Run three hindsight_recall queries: one active project ("All Agents Considered newsletter"), one older project ("Build It #2 AI Code Review Agent"), one recent decision ("Vibe Stack Lab library repo"). Report each as accurate, stale, empty, or slow.

4. Provider and model config. Read ~/.hermes/config.yaml. Flag preview or dated model names (anything with -preview, -beta, dated suffixes, or matching known-deprecated IDs), fallback routes pointing at old IDs, and project-level overrides under ~/.hermes/profiles/*/config.yaml that diverge from the main config without obvious reason.

5. Logs. Scan the last 7 days of ~/.hermes/logs/ for repeated errors, retry loops, auth failures, timeouts, and missing-env-var messages. Group by layer (runner, provider, memory, gateway, scheduler).

Return a report with exactly these sections:

DRIFT:
Storage growth and configuration drift observed this week.

BROKEN:
Jobs, routes, providers, memory calls, or gateways that failed and need repair. Name the layer.

STALE:
Model IDs, project configs, skills, outputs, or memory entries that look outdated.

RECOMMENDED ACTIONS:
Small proposed fixes. For each: action, risk (low/med/high), expected benefit, approval needed (yes/no).

APPROVAL NEEDED:
Anything that changes files, deletes data, updates Hermes, rotates keys, changes providers, prunes memory, restores backups, or edits scheduled jobs. Do not execute.
---

After creating the job, run it once immediately so we can see the first report, then confirm the job ID and schedule.

That weekly prompt would have caught my quiet cron failure earlier because a cron entry sitting in a file doesn't prove the workflow is alive. The agent has to find the last run, the last output, or the last expected message before it claims the job is healthy.

The same weekly review helps with memory issues because recall drift often feels like model weakness from the outside. When retrieval returns stale or empty context, the report should call that a memory-layer problem before anyone starts blaming generation quality.

Monthly Hermes Assumptions Review Prompt

The monthly job checks whether the assumptions under the stack still hold after weeks of normal use. Provider behavior, model IDs, permissions, backups, and release notes deserve a slower review because mistakes in those layers can create bigger messes than a missed daily report.

Run this one when you have enough time to read the report and decide what should change, because the monthly review is the one most likely to recommend actions that touch live state.

Create a Hermes cron job called "Monthly Assumptions Review" that runs on the 1st of every month at 10:00 local time, delivers to origin, and uses a cheap model (gemini-3.1-flash-lite via openrouter, or deepseek-v4-flash via opencode-go — pick whichever is configured). Restrict toolsets to terminal, web, and file. Use this exact prompt body for the job:

---
Run a monthly read-only Hermes assumptions review. Make no changes: do not update Hermes, change providers, rotate keys, restore backups, prune memory, delete files, rewrite configs, or edit scheduled jobs.

1. External change summary. Check for changes that could affect this stack in the last ~30 days:
   - Hermes Agent: `cd ~/.hermes/hermes-agent && git log --since="30 days ago" --oneline` and check release notes
   - Plugins and skills: list anything in ~/.hermes/plugins/ and ~/.hermes/skills/ modified in the last 30 days
   - Provider changes: scan OpenRouter and opencode-go model lists for renamed, deprecated, or newly preview-flagged IDs that match anything in ~/.hermes/config.yaml
   - Gateway, memory backend (hindsight), scheduler, and backup tool changelogs if accessible
   Summarize only changes relevant to this stack.

2. Provider and model ID audit. Grep every config layer for model IDs:
   - Main: ~/.hermes/config.yaml
   - Profiles: ~/.hermes/profiles/*/config.yaml
   - Cron jobs: ~/.hermes/cron/jobs.json
   - Skills referencing models: search_files for "model:" or model IDs under ~/.hermes/skills/
   - Scripts under ~/.hermes/scripts/
   - Env files: ~/.hermes/.env and any *.env
   Flag preview IDs (-preview, -beta, dated suffixes), known-deprecated IDs, missing fallbacks, and defaults that conflict between layers.

3. Health sweep. Quick check across:
   - Gateway response (one Telegram round-trip)
   - Provider reachability (one ping each to configured providers)
   - Memory recall (hindsight_recall on an active project)
   - Scheduler activity (hermes cron list plus recent output)
   - Storage headroom (df -h on ~/.hermes/ partition)
   - Backup completion (most recent backup artifact timestamp and size)
   - Key availability (env vars and 1Password references exist, not the values)
   - Permissions (~/.hermes/ ownership and mode)

4. Restore test. Pick one non-sensitive backup artifact under ~/.hermes/backups/ or wherever backups land. Copy to /tmp/hermes-restore-test/, inspect contents, confirm it opens and matches expectations. Do not overwrite live files. Delete the temp copy after inspection.

5. Approval-gate review. List every workflow (cron job, skill, plugin, script) that can delete files, prune memory, rotate keys, change providers, restore backups, update Hermes, edit configs, or send messages outside this workspace. For each, confirm whether it requires explicit approval or runs automatically.

Return a report with exactly these sections:

ASSUMPTIONS STILL VALID:
Operational assumptions that still look safe.

ASSUMPTIONS TO RECHECK:
Provider, memory, gateway, scheduler, backup, or permission assumptions that may have drifted. Name the layer.

RESTORE TEST:
Artifact inspected, safe location used, and result.

PROPOSED CHANGES:
Each with reason, risk (low/med/high), rollback notes, approval status.

APPROVAL NEEDED:
Every action that would modify the stack or touch live data. Name the action and layer. Do not execute.
---

After creating the job, run it once immediately so we can see the first report, then confirm the job ID and schedule.

I review provider model IDs here instead of waiting for a stale preview route to break under load. A fallback route in an old project config can keep calling yesterday's model even after the main Hermes provider has moved to the stable ID.

The Hindsight timeout became confusing because the symptom pointed at the wrong layer. Hermes felt slow, I blamed the model, and the retrieval path had already burned the time before generation started.

Approval Gate for Maintenance Jobs

Every scheduled maintenance job should carry the same approval rule because the boundary gets easy to forget after the first few reports look useful. Read-only inspection can run freely, while destructive or identity-changing work still needs a human decision.

If you haven't built that habit yet, start with the approval gate setup before you let a maintenance prompt touch files, providers, keys, or backups.

Add this block to the end of every maintenance prompt that runs on a schedule, especially if the agent has access to files, keys, backups, provider settings, or outbound channels.

Approval rule for this maintenance job:

You may observe, inspect, summarize, classify, and recommend without asking first.

You must ask for approval before any action that deletes files, prunes memory, rotates keys, changes providers, restores backups, updates Hermes, edits configuration, changes scheduled jobs, rewrites prompts, sends external messages, or changes permissions.

When approval is needed, return a proposal with the issue, suggested action, expected benefit, risk level, affected files or systems, rollback notes, and the exact command or tool call you want to run.

If the risk is unclear, classify the action as approval needed and wait.

That rule keeps the maintenance agent useful without letting it become a cleanup bot with too much confidence. The agent can prepare the decision, but I still want to make the decision when live state changes.

Quiet Agent Failures

The failures that cost time are small enough to miss and specific enough to blame on the wrong thing. My cron failure didn't crash the stack because it stopped doing work in a corner I wasn't watching.

The model ID drift behaved differently because the main provider setup looked current while an older route still pointed somewhere stale. The visible symptom showed up as slower Hermes responses and memory behavior that looked worse than it was.

The Hindsight timeout changed how I diagnose agent slowness in every workflow that depends on memory. When an AI tool slows down, I check the retrieval chain before I blame the model because the model may be downstream from the delay.

Maintenance doesn't prevent every failure, but it reduces the time spent accusing the wrong layer. Once you can name whether the issue sits in routing, memory, scheduling, storage, backup, skills, or config, the repair becomes less mysterious.

How to Roll Out the Routine

I would start with one weekly maintenance job before adding daily and monthly jobs. Weekly reporting is frequent enough to catch drift, and a month of reports gives you enough signal to decide whether the daily pulse is worth the extra noise.

Once the weekly report proves useful, add the daily pulse for the pieces you depend on most. My daily set covers gateway response, scheduled job output, memory recall, and provider reachability because those failures change whether I can trust the stack that morning.

The monthly review should stay slower and more deliberate because updates, provider IDs, backup restores, and permission gates need more attention than a quick morning report can give them.

Your stack may use different names, but the shape should stay the same. The scheduled agent observes the stack, reports the failed layer, proposes small actions, and stops before touching anything that could create real damage.

Failure Limits

Maintenance won't make the stack perfect, and the prompts shouldn't pretend they can. Provider outages, weak retrieval, bad project context, poor model fit, and bad release notes can still turn into manual work.

The routine also leaves approval gates in place for every action that changes live state. If Hermes wants to prune memory, change providers, delete logs, rotate keys, restore a backup, or update itself, I still want to approve that action before it touches anything real.

That boundary keeps the routine useful because the agent can notice problems before I do, while every action that changes the system comes back as a proposal I can read.

Bottom Line

Hermes feels like one system when it's working, but underneath it's a control plane sitting on top of models, memory, gateways, cron jobs, files, skills, providers, and backups. When one layer drifts, the whole experience gets worse even if the visible symptom looks like a slow model or a lazy agent.

The maintenance loop keeps those layers visible through a daily pulse, a weekly drift review, and a monthly assumptions review. For most personal agent stacks, that rhythm is enough to know where to look when something breaks.

Start with the weekly prompt and run it long enough to see whether the reports change your behavior. If the reports help you catch missed jobs, stale model IDs, slow memory, or backup gaps, add the daily pulse and monthly review around the same approval rule.

The install guide gets Hermes running, and the maintenance loop is what keeps it worth trusting.

This post was originally published on my Substack publication as My Hermes AI Agent Maintenance Routine For Maximum Reliability.

I Tested 6 AI Plans to Find What $5, $10 and $20 Get You

cucoleadan — Tue, 19 May 2026 14:10:55 +0000

This post was originally published on my Substack publication as I Tested 6 AI Plans to Find What $5, $10 and $20 Get You.

A little while ago, I built a multi-step workflow in Hermes to generate a ten-page report that would get stronger each time it passed through the document. It checked the latest news, then read through Reddit threads, then cross-checked with X and also read through a bunch of internal documents.

For most of the run, it worked the way I wanted, and Hermes kept moving the file forward while pulling in the context it needed and holding onto the thread of the job.

By the time it reached the last stage, somewhere around the fourteenth tool call, it already had the material it needed and only had to stay coherent long enough to verify the details and write the final section cleanly into the file.

Then it just stopped in the middle of the edit. It retried enough times to trigger a context reduction right when the report needed the fullest possible view of everything that had already happened. The fact that I had to step back in and rebuild the whole thread was extremely annoying and the reason why I decided to write this article.

That was also the moment I started focusing on reliability rather than judging AI plans by the model menu.

Pricing pages encourage you to compare plans by the names they advertise, but Hermes forces a more practical question, which is whether a plan can carry real work through a messy session without handing it back to you halfway through.

Once I started looking at plans that way, I cared a lot less about whether a subscription included a famous model and a lot more about whether Hermes could finish the work before my own attention became the most expensive part of the workflow.

I have paid for enough AI accounts to know how misleading a low sticker price can be. A five-dollar plan stops feeling cheap the moment it burns an hour of focused work.

Not to mention that most twenty-dollar plans might feel like they come with extra usage compared to their cheap alternatives, but that is not usually the case. Looking at you, Anthropic.

That's the frame for this piece, because I rechecked the official pricing pages on May 19, 2026, and I want to show you these prices through an AI agent lens rather than focusing on their sales copy.

In this article:

Why model names and benchmark scores are the wrong way to judge an AI plan
How one $5 plan became my daily driver after I fixed my routing
Why the $10 tier is where most plans start to make real sense
What the big brand names ($20 tier) actually limit once you push them
Where plans break mid-session and how cost per useful hour flips the math
The exact stack I would buy today and which plans I would skip

The One Test That Picks Winners

Benchmarks tell you how a model performs in isolation, but Hermes shows you something much harder to fake, which is whether a plan stays useful once the session fills with tool calls, file reads, and the usual clutter that comes with trying to finish real work.

My test now feels much simpler than any leaderboard, because all I really have to do is give Hermes one job from a normal week and watch how much of my own attention it gives back to me by the end.

If Hermes gets to a result I can keep, the plan earns its place. If the session breaks, the model loses the thread, or I have to step back in for cleanup, the plan gets more expensive no matter how cheap the subscription looked when I bought it.

$5: Where Most People Get It Wrong

The five-dollar tier starts with OpenCode Go, and it stands out immediately as it's the only subscription I found that gives you a real first month instead of a throwaway trial.

Right now, OpenCode Go is $5 for the first month and $10 after that, and it works in Hermes by default, which matters because it feels like a provider route built for agents instead of a chat plan stretched into agent work after the fact.

What changed my view of this plan is that it did not stay a cheap side route for long. It became my daily driver, even during the stretch when I was still paying for three subscriptions just to keep up with my usage.

At the time, the real problem was not the plan itself but the way I was using it, because I kept pushing the same model through every kind of Hermes task and expecting it to behave well no matter what the work looked like.

For a while I ran Qwen 3.6 Plus for almost everything, and that worked badly enough that I ended up compensating with more subscriptions instead of better routing.

The setup only started to make sense once I matched the model to the job, with DeepSeek V4 Flash and V4 Pro taking most of the regular Hermes work while Gemini 3.1 Flash Lite via OpenRouter handled image analysis more cleanly than the routes I had been forcing before.

OpenCode Go became much more useful once I stopped treating one model like a universal answer and started treating the plan like a routing layer for different kinds of work.

I still think the five-dollar month is the right place to learn this lesson, since it is cheap enough to experiment with and real enough to show you very quickly whether your workflow is efficient or just patched together.

$10: The Real Starting Line

The $10 tier is where most of these plans start to feel normal, since the $5 and sub-$5 options are mostly gone now outside of special promos.

That is also the first tier I would take seriously for regular Hermes use.

After the first month, OpenCode Go lands here at its regular price, and MiniMax Token Plan Starter shows up at the same $10 with 1,500 M2.7 requests every 5 hours.

On paper, that sounds like a clean comparison. In practice, I care much less about the headline limits and much more about what the workflow feels like once Hermes is doing the work.

MiniMax Starter gives you a dedicated M2.7 bucket, which is useful if you already know that model is good enough for most of your week and you want limits that are easy to reason about.

OpenCode Go works differently, since it gives you a shared routing budget across several model families, and that can look better or worse depending on what kind of week you're having.

If you mostly run MiniMax M2.7 through Go, the published estimates are higher at around 3,400 M2.7 requests every 5 hours for the same monthly price, so it can look cheaper than MiniMax Starter on raw throughput alone.

Still, that is not what would decide it for me.

I would judge the whole tier by loop quality more than by the model list or benchmarks. Sometimes I hit 503 errors on Qwen 3.6 Plus through OpenCode Go, and other times the tokens per second I got through Go were clearly better than what I was getting from MiniMax directly. And I absolutely hate it to wait for AI to answer. I'd rather have a faster model than a smarter model, but that's just personal preference.

What matters most to me is whether it keeps moving after the first answer, uses tools cleanly, and keeps its replies short enough that the session stays readable while the work is still in progress.

$20: Brands You Know, Limits You Don't

The $20 tier is where the familiar companies start showing up.

OpenAI and Anthropic are the obvious ones, because they are the subscriptions most people already know. Ollama belongs in the same conversation for a different reason, as it's one of the few open-model companies that already feels big enough to sell a hosted plan without sounding like a side project.

That matters because this tier is not only about extra usage. It is also about how much trust people attach to the company behind the plan, and whether that trust survives contact with the actual limits.

ChatGPT Plus is the default benchmark. OpenAI lists Plus at $20 per month, says it gives higher GPT-5.5 limits inside ChatGPT, and keeps API usage separate from the subscription.

You can count Plus in the real stack because Hermes supports OpenAI Codex through ChatGPT OAuth, but the plan still buys ChatGPT access rather than API credit. The limit story is also less generous than the branding makes it feel. OpenAI says Plus users can send up to 160 GPT-5.5 messages every 3 hours, and manual GPT-5.5 Thinking has a weekly limit of up to 3,000 messages. That is fine for normal chat use. It starts looking smaller once you lean on it harder.

Claude Pro has the same advantage and the same problem. Anthropic is a big enough name that people do not need much convincing to try the plan, and Claude is useful enough that plenty of people will keep paying for it anyway. The issue is that the limits are nowhere close to generous for heavy use.

It's just easy to run into the ceiling faster than the $20 price tag suggests, especially once you lean on Sonnet for real work.

Ollama Cloud Pro is more interesting to me because it is not trying to be ChatGPT or Claude. Ollama lists Pro at $20 per month or $200 per year, with larger cloud models, 50x more cloud usage than Free, and three concurrent cloud models.

That sounds strong until you compare how the limit story is presented next to OpenCode Go. OpenCode Go tells you the five-hour, weekly, and monthly caps directly, including a monthly ceiling of $60. Ollama tells you usage is mostly GPU time, gives you five-hour and weekly resets, and lets you run three cloud models at once, but it does not spell out a monthly limit on the pricing page. That makes the plan harder to reason about.

The three-model ceiling also matters more in Hermes than it would in a normal chat app. If you mostly run one agent at a time, it probably feels fine. If you like concurrent agents, background runs, or separate research and writing loops happening together, three can start feeling smaller than the headline suggests.

So yes, Ollama Pro looks good. It is just not automatically better than Go once you care about legibility, concurrency, and what the plan looks like over a full month instead of over a good afternoon.

Nous Portal Plus is less mainstream than OpenAI, Anthropic, or Ollama, but it still deserves the slot because it fits Hermes more naturally than most of the bigger brands. Nous lists Plus at $20 per month with 300+ models, hosted tool usage, and $22 in monthly credits with rollover. I felt that I should include this because they are the team who created Hermes after all.

MiniMax Token Plan Plus is still the simplest volume play. MiniMax lists Plus at $20 per month with 4,500 M2.7 requests every 5 hours plus speech and image quotas. If M2.7 already works for your Hermes load, that is a very direct way to buy more room.

Those are not the same thing, and the difference only shows up once Hermes starts leaning on the plan instead of just chatting through it.

Where Plans Hit the Wall

Hermes exposes plan limits in the middle of real work instead of at the edge of a chat.

A chat cap is annoying when you are asking questions. The same cap inside Hermes can land in the middle of a file edit, a research loop, or a tool run that was finally starting to cohere. Then you lose more than a reply. You lose the state of the job and pay for it again in the next session.

Fallback models create a quieter version of the same mess. A session starts on one route and ends on another, and you can feel it even before you check the model picker. Instruction following gets softer. The agent stops being careful with the same tool path it was following ten minutes earlier.

Tool use is still the cleanest divider for me. A model can sound impressive in a chat window and still be weak inside an agent loop. If it avoids reading files, skips verification, or acts allergic to tools, I do not care how good the brand or benchmark looks. The less glamorous route that checks its work often finishes more jobs per dollar.

Memory changes the value of a plan too. Hermes only starts to feel useful once it can carry a project forward across sessions. If the provider leaves you with a morning reset, the agent never really joins the work. It just keeps reintroducing itself.

That is also why the OpenClaw to Hermes migration mattered so much to me. I was not looking for a smarter chat app. I wanted something that could keep the work moving without making me rebuild the thread every time.

Latency has its own cost. A slow model is fine for overnight cleanup or background chores. It gets expensive the moment you are thinking with the agent in real time and waiting for the next useful move.

The Only Math That Matters

The metric I keep coming back to is cost per useful Hermes hour.

I like it because it is boring enough to be honest.

cost per useful hour = monthly plan cost / Hermes hours that ended in usable work

If a $5 plan gives you ten clean background hours, it is excellent.

If that same plan burns one focused afternoon because Hermes stalls in the fragile part of the job, the cheap price was fake.

A $20 plan can still be the cheaper one if it finishes the sessions you would otherwise have to rescue.

I would not build a dashboard for this. One line in your notes after each session is enough. Write down the plan, the job, and whether Hermes finished without babysitting.

After a week, the pattern usually gets obvious. OpenCode Go might end up doing the background work. MiniMax might carry more of the daily load than you expected. Nous might keep its place because the tool gateway removes setup friction. Ollama might stay as the open-model cloud route. ChatGPT and Claude might remain in the stack because they are still where you think best before sending the work back into Hermes.

That is enough to make the decision. The goal is to stop paying for subscriptions without knowing what job each one is there to do.

Here Is What I Would Buy

If I were rebuilding this stack today, I would still start with OpenCode Go and give it the boring work first.

That is the cheapest place to learn whether the workflow is efficient or just being propped up by extra subscriptions.

I would keep fragile sessions away from it until it earned trust. Cleanup, first-pass research, low-risk drafts, and the kind of work that is useful when it lands but not painful if it misfires.

Once the first month ended, I would treat the $10 tier like the real test. OpenCode Go at full price and MiniMax Starter both deserve a normal week before I let a $20 brand into the stack on reputation alone.

After that, I would only pay for a $20 plan if I knew exactly why it was there. ChatGPT Plus belongs if the ChatGPT or Codex lane matters enough to keep. Claude Pro belongs if Claude is still where the best writing or dev work happens, even with the limits. Nous sits closest to native Hermes work. Ollama Pro belongs if I want the open-model cloud lane and can live with the three-model ceiling. MiniMax Plus is the straightforward volume upgrade if M2.7 is already carrying real work.

That is less satisfying than picking one winner. It is also closer to how the work behaves.

Different jobs deserve different routes. Background chores do not need the same plan as the sessions where one bad restart can waste half an afternoon.

Bottom Line

The cheapest AI plan is the one that gives Hermes work you would keep.

A $5 route is great when it clears background noise. A $10 route is where I would test daily Hermes usage. A $20 route only earns its place when it gives you something the cheaper paths do not, whether that is better fit, clearer limits, or a route you trust enough to use for harder work.

The wrong plan steals focus at any price.

Before you buy another subscription, look at your last ten Hermes sessions. Mark the ones that ended in usable work. Mark the ones you had to rescue. Then ask which plan helped the work move forward and which one only looked cheap on the invoice.

That becomes the buying decision.

I would rather pay for one route that finishes the work than keep juggling three subscriptions that still need me to manage them.

Source Notes

OpenCode Go lists the $5 first month and the $10 monthly price after that. The page also covers any-agent use and current request allowances.

ChatGPT Plus lists $20 per month, app-level Plus benefits, and the note that API usage is billed separately. OpenAI API pricing lists GPT-5.5 and GPT-5.4 token pricing outside ChatGPT subscriptions.

Ollama Cloud pricing lists Pro at $20 per month or $200 per year. The same page covers three concurrent cloud models and usage measurement based mainly on GPU time.

Nous Portal lists Plus at $20 per month with 300+ models and hosted tool usage. It also lists the $22 monthly credits and rollover rules.

Claude Pro lists Pro usage behavior and resets, while Anthropic API pricing lists Claude API prices separately from Pro.

MiniMax Token Plan lists Starter at $10 per month with 1500 M2.7 requests per 5 hours and Plus at $20 per month with 4500 M2.7 requests per 5 hours.

Hermes AI Providers lists the relevant provider paths for Nous Portal and OpenAI Codex. It also covers OpenCode Go and Anthropic.

When to Use MCPs, CLIs, or Your Own Tool

cucoleadan — Tue, 12 May 2026 13:17:53 +0000

This post was originally published on my Substack publication as When to Use MCPs, CLIs, or Your Own Tool.

A while back, I wanted my AI agent to help manage my Asana tasks. Like anyone following the current agent meta, my first instinct was to plug in an Asana MCP server. Of course, this either flat-out broke or took an eternity to load a single task because the agent was trying to digest a massive, complicated integration.

Frustrated, I ripped the MCP out and installed a lightweight Asana CLI instead. It took a little bit of setup, but it worked. I took it one step further and created a custom skill teaching my agent exactly how to trigger those specific CLI commands. Checking my tasks went from a sluggish, bloated mess to happening instantly. I detailed this setup in my morning automation guide.

That experience explains why the default advice in agent-land right now, to connect every integration you can find and sort it out later, is a trap.

I get why people do it. Plug-and-play tools are everywhere right now. Every week another company ships one, another app exposes itself to AI, and another setup thread turns into a shopping list. An agent with more tools feels more capable the same way a dashboard with more widgets feels more complete.

The friction starts soon after. You notice the agent taking longer to think because it's trying to juggle too many complex instructions at once. Simple tasks start driving up your token costs. One tool fails with a timeout, another dumps a wall of messy data when you only needed a single sentence, and eventually, you lose track of what your own setup can do on its own.

In my Hermes setup, I rely on three distinct patterns. A lightweight GitHub CLI handles my repository work because it's fast and focused. The Brave Search MCP handles broad web research. My custom OpenCode Cowork Proxy Worker exists because neither an off-the-shelf integration nor a basic command line was the right fit for routing Claude through OpenCode models.

There's a fine line between over-integrating and building everything yourself. I touched on this balance in my build vs buy scorecard. How do you know which type of tool fits which job? Read on to see how to decide.

TL;DR: When deciding how to connect your AI agent:

Use CLIs for local, internal tasks where speed matters and you own the credentials.

Use MCPs to cross boundaries into external SaaS systems where structured data and secure auth are required.

Build Custom Wrappers when you need translation, formatting, or a narrower interface than what off-the-shelf tools provide.

In this edition:

Why MCP vs CLI is the wrong argument
When a CLI is the better interface for Hermes
When an MCP server earns its place
When your own small tool beats both
The 60-second test I use before adding a new tool

MCP vs CLI: Asking a Better Question

Most MCP vs CLI arguments sound cleaner than the real problem. People talk about protocols, tokens, and elegance. When you are in the middle of actual work you are usually trying to answer a simpler question. You want to know the least messy way to let your agent do this one job.

An MCP server gives an AI app a standard way to discover and call external tools. It exposes actions, inputs, and outputs in a format the model-facing app understands. Anthropic introduced MCP in November 2024 as an open standard for connecting AI assistants to data sources, business tools, content repositories, and developer environments.

A CLI gives the agent the same command-line tool a human developer would use. Think git, gh, docker, kubectl, wrangler, gws, or a tiny script you wrote for your own stack. The model writes commands, reads stdout or stderr, and adjusts from there.

Both let an agent act, but they package control differently. MCP gives the agent a typed menu of actions with structured inputs. CLI gives the agent a terminal surface with familiar commands and visible output.

The filter I use is simpler than the debate. Look at where the work happens, who owns the data, and what breaks when the agent gets it wrong. Use a CLI when the agent works as you inside your own workspace. Use an MCP server when the agent needs structured access to external systems or authenticated data. Build your own tool when MCP is too broad, CLI is too loose, or the workflow needs a narrow bridge between two systems.

The custom option matters more than people admit because many agent problems are shape problems, not model problems. A full protocol server or open shell gives the workflow too much room to drift. One small action with the right inputs, rejection rules, and clean output often fits better.

When to Start With CLI for Local AI Workflows

I start with CLI far more often than MCP. If Hermes works inside my own environment, on local files and repos, build commands, deployment checks, server diagnostics, GitHub tasks, or small scripts where I already know the command, the terminal is usually the right first stop.

A command-line tool has a structural advantage here. Most frontier models have years of examples for common command-line patterns. They know how git status behaves, how gh pr list --json returns fields, and how to trim output before the context window fills up.

Local work becomes easier to debug. When a CLI command fails, Hermes gets an exit code and an error message. I rerun the same command myself, copy it into a terminal, and see the failure without translating through a protocol layer.

Scalekit ran a useful benchmark on this in March 2026. They compared CLI, CLI plus skills, and GitHub's MCP server across 75 runs using the same model and the same GitHub tasks. In their test, CLI won on cost and reliability. CLI hit 100 percent reliability, MCP completed 72 percent of runs, and MCP used 4 to 32 times more tokens depending on the task.

I wouldn't stretch that benchmark into a universal law. It tells us something narrower and more actionable: schema weight is real. If the agent connects to a GitHub MCP server with dozens of available tools, it carries descriptions for actions it will never touch during a repo language lookup. A local gh command gives the answer with less ceremony.

StackOne makes the same split from an architecture angle. CLI fits local developer tools like Git, Docker, gh, Terraform, kubectl, and AWS CLI because these tools already have mature command cultures around them. The agent reuses patterns baked into the model and the docs instead of learning a strange new interface from scratch.

My Hermes setup leans on CLI for repo work. If I ask Hermes to clean up a branch, summarize open PRs, or check the status of a deploy, I want it using tools I run myself. I use gh for GitHub, wrangler for Cloudflare Workers, and gws for narrow Google Workspace experiments.

The trade-off is permission shape. Most CLI tools inherit local credentials. Hermes using gh after gh auth login acts with my GitHub access. That works for my own repo on my own machine, then breaks fast once a product needs to act across other users, accounts, or shared business systems.

One user on one machine inside one workspace is CLI territory. Many users across many accounts turns CLI into complex auth plumbing.

When to Reach For MCP At The Boundary of External Data

I don't reach for MCP first. I reach for it when the data lives somewhere else and I want Hermes to touch it without wandering around with raw shell access.

Search tools, shared SaaS systems, business databases, internal APIs, and third-party services need scoped auth, audit logs, and structured actions more than terminal speed.

A command-line tool assumes a person already logged in. That person owns the machine, the credentials, and the risk. An MCP server exposes a narrower set of actions to the agent with defined inputs and outputs. It gives the agent a tool boundary instead of raw shell access.

Anthropic's original MCP pitch makes sense through that lens. Every agent stack eventually hits the same wall: the model works well, but the data lives outside its reach. MCP gives AI systems a standard way to connect to those data sources without every app inventing its own format.

I use Brave Search via MCP because research is external, variable, and structured. I want Hermes calling a defined search action with a defined result format instead of guessing URLs or scraping pages with shell commands.

SaaS tools often fit the same pattern. If Hermes needs to read from Notion, Gmail, Linear, Slack, Greenhouse, or a database with scoped access, MCP is cleaner than a homemade CLI script. The farther the workflow moves from your own machine, the more identity matters.

Descope puts the identity question well: choose based on who the agent works for. If the agent acts as a solo developer inside their own workflow, CLI is enough. If the agent acts across customer data, employee accounts, partner systems, or shared business tools, auth becomes the primary concern.

At that point, you care about scopes, consent, logs, tenant boundaries, and revocation. One ambient shell token doing everything in the background becomes a liability, even if it feels faster during local tests.

Every MCP server still has to earn its place. A bloated server slows the agent down, a badly designed one returns excess data, and a broad action list hands the agent more control than the task needs. The best MCP servers feel boring: a small list of tools, clear input fields, tight output, and an auth model that matches the risk.

You want a clean tool drawer, not a giant toy box.

Build The Bridge Yourself

This sounds like extra work at first. Then you try to force a bad fit through MCP or CLI for hours and realize the small custom tool would've been the simpler path all along.

By small tool, I mean a tiny adapter, wrapper, Worker, script, webhook, or endpoint that does one job in the exact shape your workflow needs.

I used this lane for my OpenCode Cowork Proxy Worker. Claude Code speaks Anthropic's API format. OpenCode Go and Zen models mostly use OpenAI-compatible routes. I wanted Claude Code and Claude Cowork as the interface, with OpenCode as the model layer. A generic MCP server or raw CLI would've made the flow messier.

The workflow lacked translation, so I built a Cloudflare Worker that sits in the middle. Claude sends an Anthropic-style request. The Worker rewrites it for OpenCode. The response returns in the format Claude expects. That is a custom tool doing its job by removing ambiguity.

I wrote the full setup in How to Use Claude Code For Free With OpenCode Models. For this article, the decision matters more than the proxy details. When the workflow needs a translation layer, build the translation layer.

Safer wrappers around risky commands follow the same pattern. Say Hermes needs to deploy a project. One path gives it raw shell access and asks it to remember the right sequence. The better path gives it one command:

deploy-preview --project yahini

That command runs checks, prints the diff, refuses production deploys without a flag, and outputs a clear summary. Hermes gets one safe action instead of an open-ended terminal adventure.

Task tools work the same way. My Asana setup has three possible shapes: raw API calls, MCP, or a small wrapper:

asana-task create --project hermes --title "Research MCP auth tradeoffs" --due tomorrow

That wrapper hides the noisy parts. Hermes gets the project, title, and due date without carrying the project GID, JSON payload shape, or field rules in every prompt. The tool encodes the boring decisions once.

Custom tools pay off when the existing interface adds ambiguity. A translator fixes format mismatch, a filter trims redundant output, and a validator stops bad inputs before they reach the real system. The common thread is narrower access to the machinery underneath.

Approval gates fit on top of this lane. Your custom tool prepares a draft, validates inputs, or creates a preview. The final send, publish, delete, deploy, or purchase still pauses for review. I covered the safety layer in How to Add Approval Gates to Your Hermes Agent, and it pairs well with custom tools because the interface and approval rule solve different problems.

A custom tool gives Hermes the right action. An approval gate decides whether Hermes gets to complete it alone.

The Rule I Use

Before giving Hermes a new way to act, I sort the workflow into one of three lanes. The categories are plain enough to use while building, which matters more to me than making the taxonomy perfect.

Use CLI for local work. Choose CLI when the tool is mature, the docs are everywhere, the output is controllable, and Hermes is acting inside your own environment. Good fits include GitHub PR summaries through gh, Cloudflare Worker deploy checks through wrangler, local file operations, build commands, server diagnostics, and one-off scripts. If I would run the command myself in a terminal, and the worst mistake affects my own workspace, CLI is the starting point.

Use MCP for structured external systems. Choose MCP when Hermes needs a defined tool boundary, scoped auth, a remote data source, or runtime tool discovery. Search, Google Drive, Slack, Gmail, CRM data, ATS data, internal databases, and shared business tools fit this lane when permissions and structure matter. If the agent touches data outside your own local workspace, MCP deserves a look.

Build your own tool when the job is narrow. Choose a custom tool when the problem is translation, filtering, validation, or repeatability. API format translation, safer deploy wrappers, task helpers, memory update commands, webhook receivers, cronjob helpers, and scripts compressing risky command chains into one reviewed action fit here. If you keep writing long prompts to make the agent use a tool in the same careful way, that prompt wants to become a tool.

Concrete Hermes examples make the rule easier to apply. GitHub PR cleanup goes through CLI because gh is mature and easy to inspect. Competitive research goes through MCP because search needs structured external results. Morning briefings use connectors or MCP for sources like Gmail and calendars, then a prompt or custom formatter turns those inputs into the briefing structure I covered in my Hermes morning briefing article.

For higher-risk work, I mix lanes. Production deploys should use CLI wrapped in a custom command, plus an approval gate before production. Claude Code to OpenCode routing belongs in a custom Worker. Project memory updates from research should use a custom command or proposed-change format, then pause for review before permanent memory changes.

That last one matters because wrong memory is worse than no memory. If Hermes reads a weak article and updates project memory with a sloppy summary, I pay for that mistake later. A custom "propose memory update" tool is safer than letting the agent edit memory directly.

60-Second Tool Test

Before adding a new MCP server or writing a wrapper, run this test. It takes about a minute, and it saves an afternoon of cleanup.

1. Check for a mature CLI. If the tool has a strong CLI, structured output flags, and common examples in the docs, start there. The agent gets a smaller surface to reason through, and you get commands worth replaying.

2. Check whether the agent acts only as you. If Hermes works inside your own machine, repo, or server, CLI works well. Slow down when the workflow crosses into shared systems.

3. Check whether auth shape matters. MCP moves up the list when you need scopes, consent, tenant boundaries, or audit logs. Local credentials are convenient until the agent needs to act inside a shared business system.

4. Check whether the MCP is too broad. If a server exposes fifty actions and your workflow needs two, consider a custom wrapper or filtered gateway. A smaller interface beats a bigger config when the task has a narrow shape.

5. Check whether a small tool would remove repeated prompting. If your instruction keeps repeating the same safety rules and formatting rules, build a tool that enforces the shape. Repeated prompting points to an interface problem.

After the test, the answer usually sorts itself. CLI handles local work, MCP handles structured external systems, and your own tool handles narrow bridges, translations, and repeatable actions. Approval gates sit on top of all three when the action is expensive, destructive, external-facing, or hard to undo.

This is the shift I wrote about in The Agentic Engineering Shift. The work is moving from asking the model better to designing the system around the model better. Tool choice is part of that system.

More Control, Less Clutter

A well-designed agent stack earns trust when the agent knows what to do, shows what it did, and uses the smallest interface that fits the work. MCP hype tends to blur that distinction because installing another server feels like progress.

MCP is useful. CLI is underrated. Your own small tools save you from both when the workflow has a shape neither one matches.

Start by auditing one workflow. Pick the last task you gave Hermes that involved a tool and ask which lane it belonged in. If the task was local, try CLI. If it crossed into shared systems, look at MCP. If you kept explaining the same careful sequence over and over, build the tiny tool. Use an agent you trust because every interface has a reason to exist.

Learn how to use Claude Code for free with OpenCode Zen models by deploying a Cloudflare Worker proxy and configuring third-party inference.

cucoleadan — Fri, 08 May 2026 13:03:54 +0000

cucoleadan

May 8

How to Run Claude Code for Free with OpenCode Models

#ai #claude #opensource #cli

Comments

10 min read

How to Run Claude Code for Free with OpenCode Models

cucoleadan — Fri, 08 May 2026 13:03:06 +0000

This post was originally published on my Substack publication as How to Use Claude Code For Free With OpenCode Models.

Yes, you can use Claude Code for free by routing it through a small Cloudflare Worker and pointing that Worker at a free OpenCode Zen model like minimax-m2.5-free.

Claude stays the interface. OpenCode becomes the model layer.

That means you can keep the Claude experience people already like, skip Anthropic billing for low-stakes work, and only move to a paid provider lane when the task actually deserves it.

You see, at the end of March 2026, Anthropic shipped a Claude Code npm package with a source map inside it. That packaging mistake exposed a huge chunk of the Claude Code TypeScript source. Within hours, mirrors spread across GitHub. Some picked up thousands of stars and forks before Anthropic started sending takedowns.

Then the cleanup got messy. TechCrunch reported on April 1, 2026 that Anthropic's DMCA request hit about 8,100 GitHub repositories before the company narrowed the scope.

That told me two things.

First, developers wanted Claude Code badly enough to swarm the leaked source. Second, the demand for using Claude's interface with other provider lanes was already there.

All this landed at a weird time for me. I've already shifted a lot of my own coding time to Codex, mostly because GPT got close enough to Opus for my day-to-day work and the usage limits feel better. Hermes still handles my heavier recurring workflows and automations.

But I still liked Claude Code. And I still wanted to try Claude Cowork without paying the full Anthropic tax every time I wanted a polished coding session.

The problem was simple.

Claude speaks Anthropic. OpenCode Zen and OpenCode Go mostly speak OpenAI-compatible endpoints.

So I built a translator.

The OpenCode Cowork Proxy Worker lets Claude Code talk to OpenCode Go models and selected OpenCode Zen models. Claude keeps sending Anthropic-style requests, then the Worker translates them into the upstream format OpenCode expects.

No key storage. No message storage. Just a format bridge.

With that in place, you can start with free OpenCode Zen models like minimax-m2.5-free, then move to OpenCode Go's subscription lane when the work gets more demanding.

I made the switch easy on purpose. You need a Cloudflare account and an OpenCode account. Both can start free, and you only upgrade if the workflow becomes worth it.

Get the next proxy walkthrough before it eats your weekend. Subscribe to Vibe Stack Lab. I send practical AI workflows for builders who want control over the stack and fewer surprise bills.

Subscribe to Vibe Stack Lab

How to deploy the Worker in your Cloudflare account
How to configure Claude Desktop or Claude Cowork for third-party inference
Which free OpenCode Zen models to start with
When to use /zen and when to switch to /go
The first safe test to run before touching a real repo
The setup mistakes most likely to break this first

Deploy the Worker in Cloudflare First

Before Claude can use OpenCode, you need a gateway URL it can call.

Open the OpenCode Cowork Proxy Worker repo and click the Deploy to Cloudflare Workers button at the top. Cloudflare supports this one-click deploy flow directly for Workers projects, which is why this setup is fast to hand off.

Cloudflare walks you through the rest. When it finishes, copy your Worker URL.

At that point, your gateway is live.

Your deployed URL will look like your own Cloudflare Worker endpoint. In the examples below, I'll call it:

YOUR_DEPLOYED_WORKER_URL

Configure Claude Desktop to Use OpenCode Zen

Open Claude Desktop and go to the third-party inference setup.

If you're on Windows, go to:

Help > Troubleshooting > Enable Developer Mode

Claude will restart and expose a new menu:

Developer > Configure Third-Party Inference

Anthropic's current help docs for Claude Cowork's third-party setup use this same path, so you're not relying on a weird hidden hack here. You're using the intended setup UI.

For your first test, point Claude at OpenCode Zen with the free model minimax-m2.5-free:

Backend: Gateway
Gateway base URL: YOUR_DEPLOYED_WORKER_URL/zen
API key: your OpenCode API key
Auth scheme: x-api-key
Model: minimax-m2.5-free

Once that's done, make sure to add the model manually too:

minimax-m2.5-free

Click Apply locally. Fully quit Claude Desktop. Reopen it.

That's the basic path for using Claude Code with a free OpenCode model through your Worker.

Start with Free OpenCode Zen Models

Start with OpenCode Zen, not Go.

Zen is OpenCode's curated model gateway. Some Zen models are paid. Some are free for a limited time while model teams collect feedback.

Last updated: May 7, 2026

The current OpenCode Zen docs list these free models:

minimax-m2.5-free
ling-2.6-flash
hy3-preview-free
nemotron-3-super-free
big-pickle

Use this first:

minimax-m2.5-free

Your base URL should end with:

/zen

Your model field should be:

minimax-m2.5-free

Free means free while OpenCode is offering that model under a free period. It does not mean no account, no API key, or no caveats.

You still need an OpenCode API key.

And you should absolutely check the privacy notes before using free models with sensitive work. As of May 7, 2026, OpenCode's Zen docs say several free models may use collected data during the free period to improve the model. That includes minimax-m2.5-free. This is the exact opposite of the lane you want for sensitive code.

This is the test lane.

Use it for summaries, low-risk code review, documentation cleanup, and tiny file edits in a throwaway folder. Don't start by pointing it at your main repo with write access.

On my own first tests, the free Zen route handled summaries, low-risk reviews, and tiny file edits fine, but I switched to /go as soon as I wanted stronger reasoning over a larger repo.

I wrote about the bigger reason open models matter in Ditch Your Subscriptions and Run Open Source AI on Your Device. The short version is the same here: model choice gets more useful when your tools stop forcing the interface and the engine to stay married.

Send this to the friend who pays for overlapping AI plans and still hits limits. It might save them a month of model roulette.

Share this setup

Choose /zen for Free Models and /go for OpenCode Go

The proxy has two routes.

Use /zen for free models and Zen pay-as-you-go models:

YOUR_DEPLOYED_WORKER_URL/zen

Use /go for the monthly OpenCode Go subscription lane:

YOUR_DEPLOYED_WORKER_URL/go

If you want the fast mental model, use it like this:

/zen is the free test lane
/go is the stronger daily-work lane

As of May 7, 2026, the OpenCode Go docs list Go at $5 for the first month, then $10 per month.

The same docs currently list these usage limits:

5-hour limit: $12 of usage
Weekly limit: $30 of usage
Monthly limit: $60 of usage

Your actual request count depends on the model.

Cheaper models stretch much further. Heavier models burn the limit faster.

The important privacy distinction is this: OpenCode Go says its providers follow a zero-retention policy and do not use your data for model training. That makes it a much better fit for real coding work than the free-model lane. I would still avoid calling anything "complete privacy," but it is the safer route according to the current docs.

I covered OpenCode Go more broadly in The $30 Hermes Stack That Makes Claude Max Look Like a Ripoff. For Hermes, Go gives you a cheaper provider lane. With this proxy, Go becomes useful from Claude Code too.

Why This Route Instead of OpenRouter or Ollama?

Because the point here is not just "find any cheaper provider."

The point is keeping Claude's interface and tool flow while swapping the model layer underneath it.

If you just want the fastest generic provider swap, OpenRouter is simpler.

If you want fully local inference, Ollama is a better answer.

If you specifically want Claude Code or Claude Cowork as the front end while OpenCode handles the models behind the scenes, this Worker route is the right tool.

That matters more than it sounds. A lot of people do not actually want a new interface. They just want a cheaper or more flexible inference lane behind the interface they already like.

If you want the broader comparison between Claude Cowork and other agent setups, I broke that down in OpenClaw vs Claude Cowork vs Perplexity Computer - Which AI Agent Actually Fits Your Life.

Test Claude Code Safely in a Throwaway Folder

Don't point this at your main repo first.

Create a throwaway folder:

claude-opencode-proxy-test

Add a file:

project-notes.md

Put fake project notes in it. No secrets. No client data.

Ask Claude Code:

Read project-notes.md.
Summarize the project in 10 bullets.
Create a second file called next-actions.md with a short implementation checklist.
Do not modify project-notes.md.

This checks whether routing and tool behavior work together. Claude has to create the new file from the notes without touching the original.

If that works, try a small code review:

Review this function for bugs.
Do not edit files yet.
Give me the risk list first.

I like that second test because it keeps the model away from edits until you see how it behaves.

After that, test one small tool-heavy task. Ask it to compare two files and create a short note. Keep the task boring.

You're testing routing and tool behavior, not the model's taste.

Free models are useful, but they need judgment. I wrote about that line between vibe coding and agentic engineering in The Agentic Engineering Shift.

Switch to OpenCode Go When the Free Lane Stops Being Worth It

OpenCode Go is one of the more transparent AI subscriptions out there because the limits are expressed in dollar value, not in a vague "come back later" chat cap.

Switch to /go when the free Zen models are too weak, too slow, too rate-limited, or too risky for the work.

That usually happens when one of these becomes true:

You want better reasoning over a bigger codebase.
You want fewer caveats around data usage.
You are doing enough coding work that a $10 lane is cheaper than burning a premium subscription elsewhere.

The nice part is that the setup barely changes. You keep Claude as the interface. You just swap the route and the model.

How This Also Works with Claude Cowork

This is the part I care about more than the free model itself.

People like Claude Code and Claude Cowork because the interface feels good to use, and nobody wants another subscription with fuzzy limits hanging over every small coding session.

Claude Cowork especially has the kind of product polish that makes people want to stay inside it. The project view feels clean, the tool activity is easy to follow, and the whole thing feels closer to an app than a pile of agents you have to babysit.

The annoying part is paying for the whole Anthropic route every time you want that app experience.

I can justify premium reasoning models when I'm asking for difficult architecture help or reviewing a risky change. I do not want to burn premium usage on every small housekeeping task.

That's why I built this proxy, and I want the compatibility point to be explicit: this route works with Claude Cowork too. You can keep Claude Cowork or Claude Code as the place where you work without needing Claude itself as the model route behind it.

The cheap path lets you keep the Claude app experience instead of forcing yourself into another interface.

You can start with a free OpenCode Zen model, then move to the $10 OpenCode Go lane when you want a stronger open model inside Claude Cowork or Claude Code.

I still like OpenCode. I still use Codex. Hermes is still where my serious recurring workflows live. The point is that Claude Cowork does not have to become another expensive subscription decision when OpenCode can provide the model layer for free or for far less.

If you want the shared-workflow version of that story, read OpenClaw or Claude Cowork? Here's How to Plug Both Into the Same Brain.

Use This 10-Minute Checklist to Get Started

Open the OpenCode Cowork Proxy Worker repo.
Click Deploy to Cloudflare Workers and install the Worker in your Cloudflare account.
Copy your deployed Worker URL.
Open Claude Desktop.
Enable Developer Mode, then open Configure Third-Party Inference.
Set the base URL to YOUR_DEPLOYED_WORKER_URL/zen.
Set auth scheme to x-api-key.
Paste your OpenCode API key.
Add minimax-m2.5-free manually.
Click Apply locally, fully quit Claude, then reopen it.
Run the throwaway-folder test.
Switch to YOUR_DEPLOYED_WORKER_URL/go and a Go model when you want the subscription lane.

FAQ

Can I use Claude Code for free?

Yes, but not in the default Anthropic-billed path this article is bypassing.

You use Claude Code for free here by routing its requests through your own Cloudflare Worker and pointing that Worker at a free OpenCode Zen model such as minimax-m2.5-free.

Is Claude Code in VS Code free?

Claude Code itself can be installed, but the model path behind it usually costs money unless you route it to a free provider lane.

This setup gives you one of those free lanes.

How do I get Claude Code credits for free?

You don't get Anthropic credits from this method.

You bypass Anthropic billing for these sessions by translating Claude's requests to a free OpenCode Zen model instead.

How do I use Claude Code free forever?

"Forever" is doing too much work in most of the videos and posts ranking for this topic.

You can use it free as long as a provider keeps offering a free model and the setup still works. That can change. That's why this article treats the free route as a useful lane, not a permanent law of nature.

External Sources Worth Checking

If you want the primary docs behind this setup, start here:

And if you want the leak story source rather than my summary, TechCrunch covered the takedown incident here:

Anthropic took down thousands of GitHub repos trying to yank its leaked source code

Grab the Worker, run the throwaway-folder test, and star the repo if it works for you. Stars tell me which Claude/OpenCode routes are worth maintaining next.

OpenCode Cowork Proxy Worker

How to Add Approval Gates to Your Hermes Agent

cucoleadan — Tue, 28 Apr 2026 13:29:46 +0000

This post was originally published on my Substack publication as How to Add Approval Gates to Your Hermes Agent.

Most people who try AI agents go through the same cycle. They set it up, give it access to everything, and watch it do impressive things for a week. Then something goes wrong, like a wrong message or a broken file, and they shut it down and go back to doing things manually.

The problem was skipping the safety net.

I went through that cycle twice. The first time, my agent sent an email to a client with the wrong name, and I mean a completely different person, not a typo. I found out when the client forwarded it back asking if I was working with someone else. I spent the next three weeks manually reviewing everything the agent touched. That burned more time than if I had done the work myself.

The second time, I set up gates before giving the agent access. Drafts and system changes came to me for review. Spending above a threshold required approval. I let it run, and nothing went wrong. The gates caught mistakes before they went live.

I’ll show you how to build approval gates into any Hermes workflow. Gate #1 takes 15 minutes and stops your agent from sending anything external without your OK. Gate #2 adds protection against unwanted system changes and puts dollar limits on spending. Start with the first one. Add the others when you’re ready.

If you’re new to Hermes itself, start with Hermes Is the AI Agent OpenClaw Promised to Be.

In this article:

What an approval gate is, and why it isn’t a roadblock
The three types of gates every AI workflow needs
Step-by-step setup for each gate, from beginner to advanced
A simple framework to decide what to gate and what to leave free
The three mistakes people make with approval gates, and how to avoid them

A Checkpoint Is Not a Roadblock

The word gate makes people think of barriers and delays. That’s the wrong mental model. An approval gate is more like a checkpoint at the end of an assembly line. The work happens at full speed. The checkpoint keeps defects from shipping.

Three patterns exist for keeping humans involved in AI workflows. Human-in-the-loop means the agent stops and asks you before taking an action. You review, you approve, the agent continues. Human-on-the-loop means the agent runs autonomously, but you can watch what it does and intervene if something looks wrong. Full autonomy means you set it up and never look at it again.

Shopify defaults to human-in-the-loop by design for anything that touches production systems. LangChain found that most organizations use approval checkpoints as their primary guardrail. The EU AI Act requires evidence of appropriate oversight for each AI system. These are standard practice.

The same principle works for solo operators. You need a simpler version.

The Three Gates You Need

Every AI workflow that touches the outside world or modifies your data needs at least one gate.

The Send Gate. Nothing goes external without your OK. Emails, social posts, client communications, any message that carries your name. Your agent drafts, delivers to you, and waits. You review and approve the send. That’s the gate most people need first.

The Change Gate. Nothing modifies your systems without your OK. File edits, database updates, configuration changes. Your agent identifies what needs to change, shows you the proposed change with context, and waits for confirmation.

The Spend Gate. Nothing costs money without your OK. Paid API calls above a threshold, tool purchases, subscription changes. Your agent estimates the cost before any paid action. Below your threshold, it proceeds automatically. Above it, it pauses and asks you.

Each gate protects something different: your reputation, your data, your wallet. You don’t need all three on day one. Start with the Send Gate.

Gate #1: The Send Gate (Start Here)

That’s the one that fixes the wrong-name-in-an-email problem. The setup takes about 15 minutes. You build a workflow where the agent drafts everything, but you control the final step.

The workflow has four steps:

Step 1. The agent drafts the content. An email, a social post, a client response, anything.

Step 2. The agent delivers the draft to you through chat, email, or a file. It doesn’t send it anywhere, just hands it to you for review.

Step 3. You review the draft and fix anything that needs fixing. Reply with your approval or your corrections.

Step 4. The agent sends or publishes the approved version. If you asked for changes, it revises and shows you the updated version.

In Hermes, the Send Gate works best as two separate pieces. The first is a standing rule in project memory. The second is the cronjob or task that runs under that rule. If project memory still feels abstract, my article on infinite memory explains why these standing rules matter.

Example 1: Save this in Hermes memory

Treat this as a starter template, not a fixed script. You may need to change the approval words, the delivery channel, or the types of content it covers based on how Hermes is set up in your project.

You are my Content Assistant. Your job is to draft content for review.

When I give you a content request (email, social post, client response):

1. Draft the content based on my instructions and the project brief.
2. Present the draft clearly labeled "DRAFT [FOR REVIEW]".
3. Don't send, publish, or share the content anywhere.
4. Wait for my approval or my requested changes.
5. If I request changes, apply them and present the revised draft.
6. Only when I explicitly say "approved" or "send it", take the final action.

Always include a brief note at the end explaining what you did and why.

Example 2: Use this as a Hermes cronjob

This works best when Hermes already has access to the inputs it needs, such as meeting notes, a calendar, or a project brief, and already knows where to send drafts back to you. You may need to change the schedule, the source it reads from, or the format of the output to fit your workflow.

Every weekday at 9:00 AM, review yesterday's meeting notes and draft any follow-up emails that need to be sent.

Present each email as "DRAFT [FOR REVIEW]".
Do not send anything automatically.
Wait for my approval before any email goes out.

If there are no follow-ups to draft, tell me that no action is needed today.

If you want to see cronjobs in action before you build this one, the Hermes morning briefing workflow shows a complete example.

Watch Out: If your agent has tool access that lets it send emails or post to social media directly, make sure the prompt overrides those tools. The approval step must be the only path to external action.

If you’re still wiring up Hermes tools, memory, and integrations, the Hermes setup guide covers the stack behind workflows like this.

Gate #2: The Change Gate (Level Up)

Once your Send Gate works, add protection against unwanted system changes. This gate matters when your agent interacts with files, databases, or any system where a bad edit breaks something real.

The agent identifies what needs to change: which record, which field, and what the new value should be. Vague requests like “update the database” fail this gate.

The agent shows you the proposed change with full context: current state, new state, why the change is needed, and what happens if the change goes wrong.

You approve or reject. If you reject, the change never happens. If you approve, the agent executes it and confirms the result.

The rollback plan is simple. If a change causes problems, you tell the agent to reverse it. Because the agent showed you what it wanted to change before doing it, it can undo the change on request.

Use this prompt for a research agent that updates your knowledge base:

When you find information that should update the project knowledge base:

1. Show me the proposed change with this format:
   - Current value: [what exists now]
   - Proposed value: [what you want to change it to]
   - Reason: [why this change is needed]
   - Source: [where you found this]

2. Wait for my approval before making any changes.
3. If I approve, make the change and confirm what was updated.

4. If I reject, don't make the change. Log the rejection in the project notes.

Never modify files, databases, or project memory without going through this process first.

This takes about 20 minutes on top of your Send Gate. The time investment is worth it the first time your agent wants to overwrite a file with outdated information.

Gate #3: The Spend Gate (Advanced)

This gate protects your wallet. AI agents can run API calls, subscribe to tools, and make purchases if you give them access. Without a spend gate, a runaway loop of API calls can cost hundreds before you notice.

The setup relies on spending thresholds in your project memory. Set a dollar limit that matches your comfort level, whether that’s $5 per transaction or $50. Pick the number that lets you sleep well.

Your agent estimates the cost before any paid action. Below your threshold, it proceeds automatically. Above it, it pauses, shows you the estimate, and waits for approval.

Add this to your prompt:

Before taking any action that costs money (API calls, tool purchases, subscriptions):

1. Estimate the cost.
2. If the cost is below $10, proceed automatically and log the expense.
3. If the cost is $10 or above, pause and show me:
   - What you want to do
   - Why it is needed
   - Estimated cost
   - Free alternatives, if any
4. Wait for my approval before proceeding with actions that cost $10 or more.

Keep a running total of all expenses in the project notes.

Adjust the threshold to your needs. The point is to keep you informed when the agent is about to spend money that matters. This takes about 15 minutes on top of the others. If your agent doesn’t have spending access, skip this one.

To Gate or Not to Gate

You can’t gate every action. If you do, your agent becomes a slow typist that asks permission before every keystroke. At that point, you might as well do the work yourself.

Low risk, high volume. No gate needed. File organization, summarization, categorization, formatting. The worst thing that happens is a slightly messy summary, and you fix that in seconds.

Medium risk, moderate volume. Review gate. Draft emails, content suggestions, data analysis. The agent produces the work, you review it before it goes anywhere. The Send Gate handles this category.

High risk, low volume. Full gate. External communications, system changes, spending. The agent pauses, explains what it wants to do, and waits for explicit approval. All three gates cover this category.

To apply this to your own workflows, list every task your agent handles. Write down the worst thing that could go wrong. If the mistake costs you money, damages your reputation, or breaks a system, gate it.

If the mistake is annoying but easy to fix, let it run free and correct course when needed.

The Three Mistakes People Make

Most failures come from one of these three patterns.

Gating everything. This turns your AI into a slow typist. You spend an hour approving every sentence and paragraph, then realize you could have done the work yourself. The fix: apply the decision framework above. Gate only what needs gating.

Gating nothing. That’s how your AI sends wrong emails to clients, overwrites production files, and racks up unexpected charges. You give the agent full autonomy on day one. Something goes wrong. You stop using the agent entirely. The fix: start with the Send Gate. Add the others as your trust grows.

Gating without context. A vague approval request forces you to dig through the agent’s reasoning to figure out if the change is safe. The fix: require the agent to show the current state, the proposed change, and the reason for the change. A good gate gives you everything you need to decide in under 10 seconds.

Trust the Process but Keep the Net

You stop babysitting and stop opening every draft with a knot in your stomach. You give your agent a task, trust the process, and review the output at the checkpoint. Most of the time, you approve it without changes. Occasionally, you catch something and fix it. Either way, the work moves forward.

The trick is to start tight and loosen up over time. In week one, you review every draft. By month two, you move the Send Gate to sample mode: review every third draft, trust the rest. You haven’t caught a mistake in weeks. The gate stays in place, but you use it less.

That’s the goal. Reduce gates as the system matures. Real delegation is only possible when you know the safety net works. Once you see the net catching problems, you can let the agent fly higher.

Building the right gates is all delegation requires. Once they’re in place, you can let it run.

How My Hermes Agent Plans My Morning Before I Have My Coffee

cucoleadan — Tue, 21 Apr 2026 13:19:34 +0000

This post was originally published on my Substack publication as How My Hermes Agent Plans My Morning Before I Have My Coffee.

You probably start every morning the same way most people do. Phone in hand. Six apps open before you finish your coffee. Email, task manager, Slack, calendar, news feed, Substack digest. Each one wants your attention. Each one claims urgency.

By the time you reach actual work, your best mental energy is already spent. You made dozens of tiny decisions about what to open, what to read, and what to ignore. Your real work gets the leftovers.

The problem is not discipline. The fire hose of inputs hits you the second you wake up, and no amount of willpower fixes a broken system.

I stopped trying to fix my habits and started fixing how information reaches me. Now one cronjob gathers my Asana tasks and hands me one clear decision to start the day. A second cronjob runs twice daily, checks Gmail for Substack articles, and sends me a curated digest email. I read the briefing in 30 seconds. The two hours come from fewer context switches, less reactive mode, and a single first action instead of a dozen tiny decisions.

Follow along as we build both systems in three layers. Layer 1 takes 15 to 20 minutes and covers the Asana morning briefing. Layer 2 adds the Substack digest email that runs twice daily. Layer 3 extends the briefing to Slack, Jira, GitHub, or any service with an API. Start wherever you want as each layer is useful on its own.

What we will cover:

Why checking multiple apps each morning burns your best energy before work even starts
The four-section briefing structure that replaces a to-do list with a decision
Step-by-step setup for the Asana briefing and the Substack digest email
How to extend the briefing to any API-driven tool you already use
Keeping both systems from bloating into useless noise

You Are Not Lazy, Just Constantly Interrupted

You are exhausted before 9 AM, and the reason has nothing to do with laziness.

Gloria Mark at UC Irvine studied this for years. Her research found that after an interruption, it takes an average of 23 minutes and 15 seconds to return to deep focus on the original task. Not to finish it. Just to get back into it. If you check email, then Slack, then your calendar, then a task list, then a news feed, you end up losing over an hour in refocus time alone.

The Adobe Email Usage Study found that Americans spend over five hours per day checking work and personal email combined. We spend hours refreshing inboxes. Actual communication barely happens. We train ourselves to react to whatever arrives instead of deciding what matters.

Inbox zero reinforces this pattern. It teaches you to treat every incoming message as equally urgent. The newsletter you subscribed to in 2019 gets the same mental weight as a client asking about a deadline. Your morning becomes a sorting exercise for other people’s priorities.

CEOs and important people have secretaries who sift through everything before it reaches them. We have AI agents like Hermes and Openclaw that can do this for us, maybe even better than the average person.

Here is how I built the one system that gathered everything in my pipeline and made one recommendation to get my day started.

What an Actually Good AI Briefing Looks Like

A good briefing works as a decision support document. It replaces your basic to-do list.

Most people think a morning briefing means listing everything they need to do today. That approach creates anxiety instead of clarity. A list of 15 tasks leaves you feeling behind instead of showing you where to start.

My briefing has four sections. Each one stays capped at 2 to 3 bullets. It ends with one forced decision. Without that final gate, the briefing becomes another scrollable feed you skim and forget.

Here is the structure:

Today — Meetings, deadlines, hard commitments that cannot move.

Tasks — Open items from Asana, sorted by impact, not by order added.

Alerts — Unread emails from humans, not newsletters. Overdue items that need attention.

One Decision — The single action that would make the rest of the day easier.

The Today section tells me what time is already spoken for. The Tasks section tells me what I chose to work on. The Alerts section catches anything that slipped through. The One Decision section forces me to think instead of consume.

I have seen people build briefings with ten sections. Weather, stock prices, news headlines, calendar, tasks, emails, social mentions, fitness data. That approach builds a dashboard, and dashboards serve monitoring. Briefings serve decision-making.

If your briefing takes longer than 3 minutes to read, trim a source. Tighten a filter. Clarity matters more than completeness.

Layer 1: Build Your First Briefing in 20 Minutes

Start with one source. Asana works well because you already put your commitments there. If your Asana is messy, the briefing will reflect that mess. Spend 10 minutes cleaning due dates and priorities first. The cronjob cannot organize what you have not organized. If your task manager lacks structure, that cleanup step becomes your first priority before automating anything.

What you need:

An Asana account, free tier works
Your Asana personal access token
Hermes with cron support

Prerequisites:

An Asana account (sign up here)
Your Asana personal access token (get one here)
Asana API docs for reference: developers.asana.com/docs or the Asana MCP server if you prefer MCP over REST
Hermes with cron support

Everything else happens inside the prompt. Hermes handles the API calls, the sorting, and the formatting. You do not write code or configure endpoints. You paste the prompt, set a schedule, and the cronjob does the rest.

Here is the exact prompt I give my cronjob:

You are my Morning Briefing Agent. Your job is simple: help me start the day with clarity instead of chaos.

Every morning at 8:15 AM Bucharest time, run this routine:

1. Pull my open tasks from Asana using the asana API or CLI
2. Sort them by: due date (overdue first), priority, project
3. Identify my top 3 highest-impact tasks for today
4. Flag anything overdue or due today
5. Format everything into a 2-minute briefing

The briefing structure:
---
TODAY'S BRIEFING — [Date]

TODAY:
- [Meetings/deadlines from tasks]

TOP 3 TASKS:
1. [Highest impact task] — [Project] — Due: [Date]
2. ...
3. ...

ALERTS:
- [Overdue items]
- [Items due today]

ONE DECISION:
What is the one task I should finish first to make everything else easier?
---

Keep each section to 2-3 bullets. No fluff. No summaries of summaries. Just the signal.

Paste that prompt directly into your Hermes chat, whether through Telegram or the TUI. Hermes handles the API calls, the sorting, and the formatting. Set a schedule with the cronjob tool, point the delivery at your chat or email, and run it once to verify the output.

Here is what the actual output looks like on a typical morning:

TODAY'S BRIEFING — April 22, 2026

TODAY:
- 10:00 AM — Client sync call (Project Alpha)
- 3:00 PM — Article draft deadline

TOP 3 TASKS:
1. Finish API integration for client proposal — Client Work — Due: April 22
2. Review pull request #25 — Open Source Project — Due: April 23
3. Update Substack draft for next week — Substack — Due: April 25

ALERTS:
- "Design mockups feedback" task overdue since April 20
- "Invoice March services" due today

ONE DECISION:
Finish the API integration before the 10 AM call so you have something concrete to discuss.

That is it. Twenty minutes. One source. One clear read every morning.

If you have read Hermes Is the AI Agent OpenClaw Promised to Be, you know why the cron architecture matters. This briefing runs on the same backbone. If you have not set Hermes up yet, The $30 Hermes Stack That Makes Claude Max Look Like a Ripoff walks through the full stack.

Know someone who checks six apps before breakfast? Forward this to them.

Layer 2: The Substack Digest Email

The morning briefing covers your tasks. Your newsletters need a separate system. I run a second cronjob that checks Gmail for Substack article emails twice a day on weekdays, at 9:15 AM and 4:15 PM. It reads each article, summarizes it, and sends me a formatted digest email. This is not part of the Telegram morning briefing. It is a separate pipeline with a different output and a different schedule.

What you need:

The gws CLI tool for Google Workspace integration
A Gmail account where Substack sends your digests

gws is a command-line tool that connects Hermes to Google Workspace, including Gmail. You install it once, authenticate with your Google account, and the cronjob gains read-only access to your inbox.

Here is the cronjob prompt I use:

Check Gmail for new Substack article emails and deliver a nicely formatted digest to your email, only covering emails received since the last check.

## Step 1: Check Gmail for new Substack articles

1. Read the last run timestamp from a tracking file. If it doesn't exist, use 24 hours ago as the cutoff.
2. List Substack emails: gws gmail +triage --query "from:*@substack.com" --max 20
   Note: Substack emails arrive already marked as read, so do NOT use is:unread.
3. For each message, read it with gws gmail +read --id <message_id> to get the email Date header. Compare against the last run timestamp. Skip anything received AT or BEFORE the last run time. Only process emails received AFTER.
4. Filter out non-article emails (follower notifications, subscriber alerts, live video announcements).
5. For each new article in the time window:
   a. Extract the article URL from the email body
   b. Fetch the full article via markdown.new: curl -sL "https://markdown.new/http://<article_url>"
   c. Generate a short 2-3 sentence summary
   d. Determine if FREE or PAID by checking for paywall text

## Step 2: Output Format

Compose a formatted email digest with this exact structure:

📬 Substack Digest — [Day, Month DD, YYYY]

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🆓 Article Title
by Author Name
Summary: 2-3 sentences covering the core argument and why it matters
🔗 Article URL

🔒 Article Title (PAID — subscriber only)
by Author Name
Summary: 2-3 sentences covering the core argument and why it matters
🔗 Article URL

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Use 🆓 for free articles and 🔒 for paid ones. Keep it clean and scannable.

## Step 3: Deliver

1. Send the digest via gws gmail +send to your email
2. Write the current time to the tracking file for the next run

If no new articles found, respond with [SILENT].

The digest runs twice daily because Substack articles arrive throughout the day. The morning catch covers overnight posts. The afternoon catch covers everything published during work hours. Each run only processes articles received since the last check, so you never see duplicates.

Here is what a typical digest looks like:

📬 Substack Digest — Tuesday, April 21, 2026

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🆓 Computer agents are going mainstream
by Jonas Braadbaart (The Circuit)
Summary: Examines the gap between AI adoption and actual agent deployment, arguing that individual operators are the ones closing the gap rather than enterprise teams. Practical look at how solo builders are using agents for real workflows.
🔗 https://thecircuit.substack.com/p/computer-agents-mainstream

🔒 Claude Managed Agents Review
by Creators AI
Summary: Testing managed agent workflows and comparing them to self-hosted alternatives. Covers setup complexity, cost trade-offs, and when managed services actually save time.
🔗 https://creatorsai.substack.com/p/managed-agents

🆓 Don Quixote and the Sorrowful Algorithm
by Farida Khalaf (Lights On)
Summary: Literary essay on AI narrative inevitability using Don Quixote as metaphor. Explores how algorithmic storytelling converges on predictable patterns despite different prompts.
🔗 https://lightson.substack.com/p/don-quixote-sorrowful-algorithm

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Trade-off: You are giving your cronjob email access. Start with read-only permissions. Never give it send permissions until you have run this for a month and trust the output. The agent reads. You decide. That boundary matters.

Layer 3: Plug in Slack, Jira, GitHub, or Anything with an API

The morning briefing can pull from more sources besides Asana. The same cronjob that checks your tasks can also query Slack, Jira, GitHub, or any service with an API.

What you can add:

Slack: Unread DMs or mentions from specific channels
Jira: Tickets assigned to you, overdue sprints
GitHub: PRs waiting for review, assigned issues
Notion: Database items flagged for review
Any API: If it has an API, Hermes can query it

How to add a new source:

Get an API token for the service.
Add the token to your Hermes project environment.
Add one step to the cronjob prompt: “Pull my open items from [service].”
Add a section to the briefing template.
Test once, then schedule.

Trade-off: More sources means more noise. I recommend adding one source per week and watching the signal-to-noise ratio. If the briefing gets longer than 30 seconds to read, trim something.

The pattern stays consistent across every source. Token in environment. One new step in the prompt. One new section in the output. You do not need to rebuild the system. You just expand it.

I run Asana for tasks and GitHub PRs for code review items. That is my sweet spot. Anything more and the briefing starts to feel like work before I have finished my coffee.

Information Without Action Is Just Noise

The cronjob’s final instruction in every layer is the same: “At the end, identify the ONE decision or action that would make the rest of the day easier.”

This prevents the briefing from becoming another scrollable feed. It forces me to think, not just consume.

You read the briefing. You nod. You close it. You open your laptop and immediately forget what you just read. The information felt useful, but it did not change what you did next. The One Decision gate fixes this problem. Skip it and the briefing loses its purpose. The One Decision is the entire point.

Some days the decision is obvious. “Finish the client proposal first so the deadline stops hanging over me.” Other days it is strategic. “Block two hours for deep work before opening Slack, or the day will get stolen.” Either way, I start with a clear intention instead of a reactive scan.

Here are real One Decisions from my last week:

“Reply to the contract email before noon so the other side does not stall waiting for us.”
“Merge the PR before the afternoon standup so the team can proceed with testing.”
“Write the article outline first because the blank page anxiety blocks everything else.”

This is where the human-in-the-loop approach from article 005 on approval gates matters. The cronjob drafts. You approve. The system does not replace your judgment. It surfaces information so your judgment has something to work with.

The briefing is a draft. You review it. The cronjob does not act without you.

How to Keep Your Briefing Brief

Most people give up because the briefing becomes useless fast. Here are the three breakdowns I have seen, and how to fix each one.

1. Too much noise

Filter out newsletters, automated alerts, and low-priority senders. In Asana, use sections, tags, or due dates to surface only what matters. In email, filter by sender, not just unread status. If your briefing includes a GitHub notification about someone starring a repo you contributed to in 2022, your filters are too loose.

2. Stale priorities

Update your project memory bank weekly. Five minutes. Review and tweak the prompt monthly, or the briefing drifts into generic summaries. The tasks you cared about in January differ from the tasks you care about in April. Your briefing should reflect that shift.

3. Cronjob goes stale

If you stop reading the briefing, the cronjob will keep sending it. It becomes inbox clutter. Pause the schedule. Fix the output. Resume. Do not let automation become noise you ignore. The system only works if you trust it enough to read it.

I review my briefing prompt on the first Monday of each month. Five minutes. I check whether the sections still match what I need, whether any source has gotten too noisy, and whether the One Decision question still forces useful answers. Those five minutes save me from a month of useless briefings.

Less Cognitive Load Is the Real Payoff

Time saved is a side effect. The actual benefit comes from removing decision fatigue.

I no longer open email first thing. I start with my own priorities. The two hours come from fewer context switches, less reactive mode, and clearer first actions. Moving between working and getting the right things done requires that shift in how you begin the day.

Before the briefing, my morning involved a series of small decisions about what to check next. After the briefing, my morning involves one decision about what to do first. Everything else follows from there.

The system has rough edges. Some days the cronjob misses context. Some days the One Decision misses the mark. The results still beat the chaos I dealt with before, and every prompt tweak makes it sharper.

The architecture behind this is straightforward. One cronjob. Four sections. One forced decision. Each source adds a single API call and a single template section. The complexity lives in the filters, not the infrastructure.

The Agentic Engineering Shift

cucoleadan — Tue, 14 Apr 2026 13:35:47 +0000

This post was originally published on my Substack publication as The Agentic Engineering Shift.

The fastest way to build a product you cannot maintain is to let AI write all of it without asking questions.

Sure, you will ship faster than anyone in the room. Yeah, the demo will work. But two weeks later, when a user reports a bug, you will open the file and scroll through functions you do not remember writing. Because you did not write them. The AI did. You accepted the output, tested the happy path, and moved on.

Changing one line feels dangerous because you cannot trace what it touches. The code works, but you do not own it. You are maintaining a stranger's project inside your own repo.

Andrej Karpathy gave this gap a name. In February 2025 he coined "vibe coding" to describe the practice of accepting AI output without scrutiny. A year later he followed up with "agentic engineering," the practice of orchestrating AI agents with oversight, structure, and human judgment at every step.

The naming matters because thousands of builders recognized themselves in it. But a label does not tell you what to change. This post gives you a way to figure out where you sit on the spectrum between the two and which practices will move you forward.

If you have read How To Architect A Feature In 5 Minutes Before Talking To AI, you already know why thinking before prompting matters. This piece picks up where that one left off. Thinking before prompting is the starting habit. What follows is the full set of habits that separates builders who ship from builders who ship and survive.

In this edition:

What Karpathy's two terms mean and where the conversation stops being useful
A 4-stage maturity spectrum to locate where you are right now
The 5 workflow practices that make the shift from vibe coding to agentic engineering concrete
A 60-second diagnostic you can run on your last shipped feature tonight

What Karpathy Named

Vibe coding means giving an AI a prompt, getting code back, and shipping it without meaningful review. You trust the output because it runs. You move fast because the feedback loop feels instant.

The AI writes a function, you glance at the result, and if nothing throws an error, it goes into the codebase.

Agentic engineering means working with AI agents as part of a structured process. You provide context, define constraints, review outputs, run evaluations, and make the final call on what ships.

The AI still does the heavy lifting, but YOU direct the work and own the result. Every piece of generated code passes through your judgment before it touches production. And this goes beyond coding.

Both terms describe real behaviors that builders already practiced before the vocabulary existed. Karpathy gave the community a shared language for something people felt but struggled to articulate. The developer copying AI output straight into a feature branch at 2am was vibe coding long before anyone named it.

The problem is that most of the conversation stopped at the labels. Forums and comment sections turned it into a binary: vibe coding bad, agentic engineering good. That framing misses the point.

Nobody operates at one extreme all the time. A solo builder prototyping on a Saturday afternoon and a team shipping a payments feature to 10,000 users should not follow the same process. The real question is what specific behaviors separate one from the other, and how you move between them.

The Spectrum

This is not a pass/fail test. Builders sit at different points on a gradient, and the position shifts depending on the project, the deadline, and the stakes. The useful exercise is recognizing where you are right now based on what you do, not what you believe.

Stage 1: Prompt and Ship

You describe what you want. The AI writes it. If the output runs, you merge it. Testing means clicking through the feature once to confirm it loads.

This works for throwaway prototypes and weekend experiments. The failure mode shows up later. Features become untouchable because you cannot predict what changing one piece will break.
You open a file, see 200 lines of logic, and realize you have no idea why the AI structured it that way. Every revisit feels like defusing a bomb you did not build.

Stage 2: Patch and Pray

Bugs appear. You fix them by prompting the AI again with the error message. The fix works, but you still do not understand the underlying structure. You know fragments of the codebase. You do not know the system.

This is where most builders I see in forums and Reddit threads are sitting right now. The product shipped. Users showed up. And every maintenance task takes three times longer than it should because the codebase grew in directions nobody planned.

The failure mode here is compounding. Each reactive patch adds weight. You fix the form validation bug and break the error toast. You fix the error toast and notice the loading state flickers. Confidence drops with every fix because you are never sure the patch did not break something else. The codebase starts to feel adversarial.

Stage 3: Review and Contain

You start reading the AI's output before merging. You notice recurring patterns in what the AI gets wrong. You add checks. You push back on suggestions that feel over-engineered or unclear.

At this stage, AI becomes a fast junior developer on your team rather than an oracle. You treat its output the way a senior developer treats a pull request from someone in their first year. You catch the unnecessary abstraction. You question why it created three helper functions when one would do.

The failure mode is inconsistency. The review habit exists, but it drops off when you are tired, rushed, or excited about a feature. Friday afternoon, deadline looming, a new feature working on the first try. The temptation to merge without reviewing is strongest when the output looks clean. Process without discipline reverts to vibe coding under pressure.

Stage 4: Agent with Guardrails

You work with explicit context documents, evaluation criteria, test expectations, and review gates. You can explain why every function exists and what conditions would break it. The AI still generates the code. You architect the system and verify the output.

The failure mode even here is over-automation. Trusting the process so fully that you stop applying judgment to edge cases. The test suite passes, the evaluation loop looks green, and you ship without reading the diff. Process is a tool, not a replacement for thinking.

Most builders reading this will recognize themselves somewhere in stages 2 or 3. That recognition is the starting point. The next section covers what moves you forward.

If you have been through The Build vs Buy Scorecard, you already know the value of slowing down before making technical decisions. The same principle applies here. The spectrum rewards deliberate behavior over fast output.

The Practices

The shift from vibe coding to agentic engineering is visible in workflow, not philosophy. These are five habits you can observe yourself doing, or not doing. Each one addresses a specific failure mode from the spectrum above.

1. Review Gates

Treat every AI output like a pull request from a junior developer. Read the code before you merge it. Check whether the approach matches what you asked for. Look for unnecessary complexity, redundant calls, or logic you cannot follow.

When you skip this: you inherit code you cannot reason about. The AI might add a caching layer you never asked for, or restructure your data flow in a way that makes sense in isolation but clashes with the rest of your system. The codebase grows in ways you did not choose, and every future change requires re-learning what the AI decided on your behalf.

2. Eval Loops

Test more than "does it work once." Feed the AI's code edge cases, unexpected inputs, and failure scenarios. If you built a form handler, send it empty fields, duplicate submissions, and malformed data. Check what happens when the external API is slow or down.

When you skip this: the AI passes the demo and fails the real world. You find the bugs in production instead of in development, and your users find them before you do.

3. Test Coverage for Agent Output

If the agent wrote the code, someone needs to verify it holds up. Write tests for the critical paths. If you do not write tests yourself, at minimum run the feature through its failure modes manually before shipping.

When you skip this: "works in dev" becomes "breaks in production." Maintenance turns into archaeology because you are digging through code with no map of what was supposed to happen. You can pair this with the approach in How to Prompt AI for Consistent JSON Responses to make sure the outputs you are testing against stay predictable.

4. Context Architecture

The quality of AI output depends on the quality of your input. Before prompting, define what the feature needs to handle, what it connects to, and what constraints exist. Break the problem into scoped pieces. Give the agent acceptance criteria, not open-ended requests.

When you skip this: the agent guesses the system you meant. It fills in gaps with assumptions pulled from training data, and those assumptions may not match your product, your users, or your stack. You ask for a notification system and get a full pub/sub architecture when all you needed was a database flag and a polling endpoint. This is where the 5-minute architecture sketch pays off the most.

5. The Explain-It-Back Check

Before shipping any AI-generated code, explain what it does in your own words. Walk through the logic, the data flow, and the failure path. If you hit a function you cannot explain, that is the part that will break first in production.

When you skip this: ownership never transfers back to you. The code ships under your name, but the understanding stays with the model that generated it. When a user reports a bug at 11pm, you will stare at the function and have no starting point for debugging it. You become a passenger in your own project.

None of these practices make AI infallible. AI will still produce flawed output, miss edge cases, and make assumptions you did not ask for. These practices make your relationship to that output honest. You stop hoping the code is correct and start knowing where to look when it is not. The AI handles generation. You handle judgment. That division of labor is the entire point.

The 60-Second Test

Open the last feature you shipped with AI assistance. Pick any file from that feature and read through it.

For each function, check three things. Whether you can explain what it does without reading it line by line. Whether you know what happens when it fails. Whether there is a test covering the critical path.

If you breeze through all three, pick a second file. Keep going until you hit the wall. Most builders find it faster than they expect.

The first question you cannot answer marks your starting point on the spectrum. That is the exact spot where your next upgrade begins. You do not need to fix everything at once. Pick one practice from the list above that addresses the failure mode you are sitting in, and apply it to the next feature you build.

The $30 Hermes Stack That Makes Claude Max Look Like a Ripoff

cucoleadan — Tue, 07 Apr 2026 13:32:00 +0000

This post was originally published on my Substack publication as The $30 Hermes Stack That Makes Claude Max Look Like a Ripoff.

I was paying $200 for Claude Max and still hitting limits mid-project. I'd open Hermes to write some code, summarize an article or set some reminders for later use. It was fast and way better than OpenClaw, but something was keeping me from going all-in.

Then I spent the full week to figure out how to configure it properly.

Now Hermes remembers everything across sessions, manages my projects, syncs files instantly across devices, and handles complex workflows while I'm asleep. It went from a tool I use to a teammate that works independently.

Here's exactly how to do the same.

The two AI providers worth using right now (Fire Pass vs OpenCode Go) so Hermes never chokes mid-session.
The four tools that changed how I use Hermes (GitHub CLI, Telegram gateway, Brave Search, Skills) and the workflows they make possible.
Fixing persistent memory with Honcho, replacing bloated Nextcloud with lean WebDAV, and wiring up Asana so your work doesn't disappear.
The 30-day plan to full turbo mode, plus why you should never expose port 8642.

Open your Hermes setup right now and count how many of these you have configured. A fast AI provider with unlimited or high-limit access. Cross-session memory that works reliably. Integration with your project management tool. File sync that doesn't make you wait 30 seconds. Skills that automate repetitive workflows. Remote access from your phone.

Most people have one or two. Maybe three if they're motivated.

The setup looks intimidating, so people skip most of it. I did the same thing for weeks.

But each of these capabilities compounds on the others. Fast AI lets you iterate. Memory means you stop repeating yourself every session. Project integration means tasks get tracked automatically. File sync means your notes show up everywhere. Skills mean the boring stuff runs without you.

Once all of them are running together, the experience changes. Hermes stops feeling like a chatbot you type into and starts feeling like someone who knows your work, remembers your preferences, and handles things without being asked twice.

That shift is what the rest of this article builds toward.

Hermes is only as good as the AI powering it. Pick the wrong provider and you'll hit rate limits at the worst moment or watch tokens drain your budget faster than expected.

I tested several. Two stood out.

Fireworks Fire Pass costs $7 per week (about $30/month), first week free. You get unlimited access to Kimi K2.5 Turbo at roughly 393 tokens per second. That's one of the fastest inference speeds available anywhere right now.

The catch: it's Kimi K2.5 only. No model variety, no backup if Kimi goes down. But for coding, reasoning, and long documents, Kimi handles all of it well. And at 393 t/s, even long outputs feel instant.

Kimi K2.5 Turbo runs on a 1 trillion parameter MoE architecture with 32 billion active per forward pass. The "Turbo" label means the same weights served on optimized infrastructure, with a 256k context window and strong agentic tool use. When I'm in the middle of a long coding session and need fast iteration, this is what I reach for.

OpenCode Go costs $5 the first month, then $10. Instead of one fast model, you get six with generous request limits: MiniMax M2.7, MiniMax M2.5, MiMo-V2-Omni, GLM-5, Kimi K2.5, and MiMo-V2-Pro.

MiniMax M2.7 is the standout. Released March 2026, it scores 50 on the Artificial Analysis Intelligence Index, matching GLM-5 at roughly one-third the cost.

My recommendation: start with Fire Pass if you want simplicity and speed. Switch to OpenCode Go when you find yourself wanting to test alternatives or when you're doing bulk work where MiniMax M2.7's cost advantage matters.

Both work with Hermes out of the box. Set your API key in ~/.hermes/.env and you're running.

Hermes has built-in memory, but it's session-scoped by default. Close the terminal, lose the context. Fine for one-off tasks. Useless for ongoing work.

I noticed this on day three. I'd spend twenty minutes bringing Hermes up to speed on a project we'd discussed the day before. The conversations were gone. Every morning felt like onboarding a new hire.

Honcho fixed this. It's an open-source memory library that gives Hermes persistent cross-session context. The team describes it as a "peer paradigm" where both you and the agent build a relationship over time. In practice, it stores facts about you, your projects, your preferences. Every new session starts with that context already loaded. No re-explaining your stack, your location, or your goals.

Setting it up locally took me longer than I expected. Docker Compose, deriver logs, token limit errors. I spent hours watching the deriver fail with "Observation content exceeds maximum token limit of 8192" when synthesizing my imported memory files. The raw search worked fine, but the AI-synthesized peer cards kept failing on large imports.

Here's the honest breakdown. The raw memory retrieval is solid. Honcho stores entries and retrieves them instantly. The AI synthesis layer, the part that builds distilled user profiles, chokes on large imports. Use raw search for now.

On April 3, 2026, Hermes introduced the Pluggable Memory Provider Interface. Memory is now an extensible plugin system where third-party backends register through a provider ABC. This changes things.

The providers available today:

Honcho, the reference implementation with AI-native cross-session modeling
Hindsight (vectorize.io), a purpose-built plugin hitting 91.4% accuracy on LongMemEval
Mem0, widely adopted but cloud-focused
Letta (formerly MemGPT), a full agent platform with tiered memory
Zep/Graphiti, temporal knowledge graphs
OpenViking, Holographic, RetainDB, ByteRover, community alternatives

I'm exploring building a custom solution. Full local control, token-efficient storage, direct Hermes integration without middleware, and a deriver that doesn't choke on context limits. The pluggable interface makes this possible now.

For immediate setup: run hermes memory setup and select Honcho. It works well enough for raw search. Expect synthesis to improve, or plan to swap providers as the ecosystem matures.

Once you have fast AI and working memory, the next layer is the tooling that makes Hermes useful beyond chat.

GitHub CLI was the first thing I set up. Install gh, authenticate once, and you have commits, pushes, and PR management without leaving the terminal. This became the foundation for everything else.

Telegram integration is what made the whole setup click for me. Run hermes gateway telegram setup once and you have a direct line to your agent from anywhere. I use this constantly. Someone messages me about a website change while I'm out. I send a Telegram command to Hermes. It pulls the repo, edits the file, commits with "[via Telegram]" in the message, pushes. Vercel auto-deploys. I never opened a laptop.

That workflow alone justified the entire Hermes setup.

Brave Search is the research tool I reach for most. Built into Hermes via MCP, it finds emails, hiring managers, technical documentation, competitive intelligence. The queries I run regularly: "company" "hiring manager" email, "competitor" pricing 2026, "technology" benchmark performance. For contract work research, nothing comes close.

Skills are reusable instruction packages that teach your agent to perform specific tasks consistently. I have one for deploying to production, another for writing article briefs, another for analyzing codebases. Install with npx skills add from sources like Vercel Labs or LobeHub.

One thing to watch out for: the skills CLI doesn't fully recognize Hermes yet. It defaults to .openclaw/ directories. Import manually to ~/.hermes/skills/ instead. And skills have full system access, so only install from sources you trust.

Sometimes Hermes struggles with system-level operations. Editing system files, installing packages that need root permissions. The sandboxing gets in the way.

I keep OpenCode running on my VPS root for these situations. Quick system tweak, I use OpenCode. Complex multi-step workflow, I switch to Hermes. Two tools, each in the environment where it performs best.

All this capability falls apart if you lose track of what needs doing. I use Asana because it integrates cleanly and the free tier handles personal projects.

The setup: Python asana package in a dedicated virtual environment, CLI wrapper at /usr/local/bin/asana-api, token in ~/.asana_env sourced by .bashrc. My main project is called "Hermes project" and Hermes remembers the GID, auto-linking tasks to conversations.

During a session I'll say "create an Asana task to research Hindsight memory provider." Hermes creates it, tags it with the session ID, and I pick it up later from anywhere. The task lives in one place regardless of which device or gateway I used to create it.

Linear works well too if you prefer GraphQL. Notion databases are popular. The tool matters less than the habit: one source of truth your agent reads and writes to.

If you're syncing Obsidian with Nextcloud right now, you already know the pain. Thirty seconds to sync two hundred small files. File locking issues during rapid changes. A database-backed architecture adding overhead you never asked for.

I ran Nextcloud for months. It worked. But every sync felt like watching paint dry.

The fix: WebDAV server + Filebrowser + Obsidian LiveSync.

WebDAV is purpose-built for file sync. No database layer, direct file operations, lightweight protocol. Filebrowser adds a web UI for browser access when you need it. Together they're roughly ten times faster than Nextcloud for the same job.

services:
  webdav:
    image: bytemark/webdav
    volumes:
      - ./data:/var/lib/dav
    environment:
      - AUTH_TYPE=***
      - USERNAME=youruser
      - PASSWORD=***
    ports:
      - "8080:80"

  filebrowser:
    image: filebrowser/filebrowser
    volumes:
      - ./data:/srv
      - ./filebrowser.db:/database.db
    ports:
      - "8081:80"

In Obsidian, install the Remotely Save plugin, point it at your WebDAV endpoint, set a 30-second sync interval. Done. Files created by Hermes appear instantly in your notes. Briefs, articles, research, everything syncs across devices without the Nextcloud overhead.

If you're running Nextcloud for Obsidian sync only, this one change saves you hours of waiting per month.

Hermes runs as an API server on port 8642. Other tools connect to it. IDE extensions in VS Code, Zed, JetBrains. Custom tools that send tasks. Multi-agent systems where one Hermes serves multiple clients. Webhooks from external services.

The v0.7.0 ACP (Agent Client Protocol) integration means editors register their own MCP servers and Hermes automatically discovers them as tools. Full slash command support in your IDE, powered by your configured Hermes instance.

This sounds great until you think about what you're exposing.

Hermes has full system access. Terminal, file system, API keys, everything. Exposing port 8642 exposes all of that. Any client connecting executes arbitrary commands. And there's no built-in authentication in the base setup.

I learned this the hard way when I briefly opened the port to test an integration from my phone. It worked, but I realized anyone on my network had the same access I did. Shut it down within the hour.

If you need to expose Hermes, put a reverse proxy in front of it. Cloudflare Access or Authelia work well. Restrict to local network when possible. Use token-based auth with short expiry. Enable approval mode so every action requires manual confirmation. Never expose raw port 8642 to the internet.

The safer approach: use Telegram or Discord gateways for remote access. They have built-in platform authentication. Run separate Hermes instances per project with limited scopes. Use Docker sandbox for anything untrusted.

The API mode is the most capable part of the Hermes stack, and the easiest way to accidentally give the internet shell access to your systems.

You don't need to do all of this in one weekend. Here's the order that worked for me.

Week 1 is the foundation. Pick your provider, Fire Pass for unlimited Kimi or OpenCode Go for variety. Set up your API keys in ~/.hermes/.env. Configure GitHub CLI with gh auth login. Run a few test conversations to make sure everything connects.

Week 2 is memory. Set up Honcho locally with Docker Compose. Run hermes memory setup and select Honcho. Verify that raw memory search returns results. Import existing context from past conversations. By the end of this week, Hermes should remember who you are when you open a new session.

Week 3 is tooling. Configure the Telegram gateway. Install 3-5 essential skills, manually importing to ~/.hermes/skills/. Set up your Asana CLI integration (or Linear, or Notion). Test Brave Search with a few research queries. This is the week where Hermes starts feeling useful beyond basic chat.

Week 4 is sync. Deploy the WebDAV + Filebrowser stack. Configure Obsidian's Remotely Save plugin. Migrate your notes from whatever slow setup you're running now. Verify that files sync instantly across all your devices.

After month one, experiment with API mode locally. Explore alternative memory providers. Build internal tools that call Hermes, with proper authentication in front of everything.

Each week builds on the last. By the end you'll have something that remembers everything, works while you sleep, and syncs across every device you own.

That's the setup. Everything before it is first gear.

Hermes Is the AI Agent OpenClaw Promised to Be

cucoleadan — Tue, 31 Mar 2026 13:58:03 +0000

This post was originally published on my Substack publication as Hermes Is the AI Agent OpenClaw Promised to Be.

The first time my agent forgot who I was, I blamed the config. The twelfth time, I blamed the architecture.

I'd been running OpenClaw for weeks now. Building workflows, storing preferences, training my agent to work the way I work. And every few weeks, something would slip. A project decision I'd explained twice already. A formatting rule I'd set three conversations ago. Context that should have been obvious, gone.

A week after I stopped blaming myself, I installed Hermes. A month later, going back never crossed my mind.

This is the migration guide I wish someone had written for me.

In this article:

Why I Moved
The Tradeoff
Before You Start
The Migration
The Gateway
Your First Week
The Honcho Difference
First Bumps
The Bottom Line

OpenClaw carried me for a few weeks. I built workflows, connected it to Nextcloud, ran it alongside Claude Cowork, and wrote about the whole setup on this newsletter. The tool works.

But the longer I used it, the more I noticed where it fought me.

Instructions. I'd write a detailed prompt telling my agent exactly how to handle a task. OpenClaw would get 70% of it right and improvise the rest. I'd rewrite. It'd still miss the same parts. After enough rounds of this, I realized the problem wasn't my prompting. The framework itself was cutting corners on how it passed instructions to the model.

Hermes fixed this on day one. The same prompts that produced mediocre results in OpenClaw produced exactly what I asked for in Hermes. It does use more tokens per interaction, so the API bill goes up. But the outputs land closer to what you asked for, every time.

Memory. I wrote an entire article about OpenClaw's memory problem. Lossless Claw patched it. The patch worked, mostly. But it was still a patch on a system that wasn't designed for persistent memory from the ground up.

Security. In January 2026, a published security audit reported 512 vulnerabilities in OpenClaw, including one allowing remote code execution through a single malicious link. Researchers at Cisco found 335 malicious skills on ClawHub. I'd been running this on my server with access to my files, API keys, and personal documents.

Reading those reports accelerated my timeline. I was already leaning toward Hermes.

Hermes is better for me. It might not be better for you. Here's both sides.

What got better:

Instruction following. Hermes reads your full prompt and executes it. The token cost is higher because it runs more tools, creates skills to standardize processes and basically processes more of your instructions instead of skipping them. It likes to be super thorough.

Setup. Installing Hermes took less time than any OpenClaw update I've done. The setup wizard walks you through everything, detects your existing OpenClaw install, and offers to bring your data over automatically.

Memory. Honcho runs as a separate memory layer. You can just sign up and get $100 for it. Your conversations persist across restarts, model switches, and gateway reboots. I am playing with the local install to use my own models and keep my memory safe and hosted locally.

Self-improving skills. When Hermes solves a hard problem for you, it writes a reusable skill document. Next time a similar task comes up, it's faster and will actually know what to do. OpenClaw never did this.

What got worse:

Model providers. OpenClaw works with a massive list of providers. Hermes doesn't. If you depend on a specific model through a specific provider, check compatibility before you commit.

Token usage. Your API bill will go up. Mine increased by roughly 20%. The outputs are better, so the cost per useful result is about the same. But the raw number on your invoice will be higher. This is a price I am willing to pay just for the model to actually listen to my instructions.

Ecosystem. OpenClaw has ClawHub with thousands of community skills. Hermes has a smaller library. The self-improving skills system offsets this over time, but on day one you'll have fewer pre-built options.

Two things to do before you install Hermes.

Back up your OpenClaw config. Copy your ~/.openclaw/ directory somewhere safe. Your .env files, your skill definitions, your SOUL.md. Hermes will import most of this automatically, but having a backup costs you thirty seconds and saves you from a bad day.

cp -r ~/.openclaw/ ~/openclaw-backup/

Check your model providers. Open your OpenClaw .env file and look at which providers you're using. Visit the Hermes docs and confirm they're supported. If your primary model works, you're good. If it doesn't, figure out your alternative before you start.

Hermes handles everything else.

Install Hermes on your server. The first time you run hermes setup, it detects your OpenClaw installation and asks if you want to import your data.

Say yes.

The setup wizard pulls in your memories, your SOUL.md (your agent's personality and system prompt) and your preferred channels (I use Telegram).

What you'll need to redo by hand:

Any API keys or secrets that weren't on the allowlist get skipped. The setup tells you exactly which ones it skipped and why. Add them to ~/.hermes/.env manually.

If you were using ClawHub marketplace skills, those don't transfer. Recreate the ones you need in the Hermes skill format, or check if the Hermes community has equivalents.

The entire process took me about fifteen minutes. Most of it was copying over API keys and recreating some skills.

Hermes centralizes your gateway config in one file: ~/.hermes/config.yaml.

Run Hermes with hermes and ask it to configure the messaging channels for you, or just do it manually.

If you were on Telegram with OpenClaw, the setup looks like this:

gateway:
  platform: telegram
  telegram:
    bot_token: ${TELEGRAM_BOT_TOKEN}
  port: 8082

Add your bot token to ~/.hermes/.env, point the config at it, and start the gateway. Open Telegram and send your agent a message. If it responds, you're live.

For Discord, Slack, WhatsApp, or Signal, the pattern is the same. One platform block in the config file, one token in the env file. The Hermes docs list every supported platform with copy-paste examples.

I switched my Telegram bot to Hermes and sent it a test message within two minutes of finishing the migration. It responded with context from my last OpenClaw conversation. The memory import worked.

The first thing you'll notice is that your agent listens better.

I gave Hermes a content brief with twelve specific formatting rules. It followed all twelve. The same brief in OpenClaw would produce something that hit eight or nine, with two or three "creative interpretations" I didn't ask for.

The second thing you'll notice is the token counter. My daily usage went up noticeably. Hermes processes more of the conversation context on every turn, and it doesn't shy away from calling tools, which is why the instruction following is better. You're paying for the model to read more and skip less.

In a single conversation Hermes had created two skill documents on its own. One for how I like my research summaries formatted. Another for the file naming convention I use in my Nextcloud docs folder. I didn't ask it to learn these things. It picked them up from our conversations and wrote reusable procedures.

If you built the Nextcloud bridge from my earlier article, it works with Hermes too. Point Hermes at the same synced folder and your shared brain carries over. Nothing changes on the file sync side.

The real memory system in Hermes is called Honcho, and it's a 3rd party tool.

It stores your conversations, builds a profile of who you are and how you work, and serves that context back to the agent at the start of every interaction.

I chanted with Hermes using the TUI and then asked it something on Telegram. It picked up exactly where we left off. The context from Friday's conversation was also there. The decisions we made on Monday were referenced correctly.

I tested the same scenario in OpenClaw before I migrated. The agent remembered some fragments through vector search. It missed the thread connecting them. I spent ten minutes re-explaining what we'd already decided.

According to Honcho's published pricing, it costs $2 per million tokens ingested. Every context retrieval call is free with no limits. Based on my usage patterns, it adds a few dollars a month to my total cost. For the amount of time it saves me re-explaining context, it's the cheapest upgrade in my entire stack.

Nevertheless, I am a die-hard fan of self hosting and you can actually self host Honcho. I wanted to give it a try first before committing but now I am sold.

The migration went smoothly, but the first few days had a couple of bumps. Here's what I ran into and how I fixed it.

Skills not loading. One of my imported skills had a formatting issue in its SKILL.md file. Hermes skipped it silently. I checked ~/.hermes/skills/openclaw-imports/, opened each file, and found a broken YAML header. Fixed the formatting, restarted, done.

Token spikes. My first full day of usage ran higher than my OpenClaw average. By day three it settled to about 20% above baseline. The spike on day one was Honcho ingesting my conversation history and building the initial context graph.

Missing env variables. I forgot to move one API key from my backup into ~/.hermes/.env. The error message told me exactly which key was missing and which skill needed it. Added the key, restarted, fixed.

For anything else, the same pattern from my Nextcloud article applies. Copy the error message, paste it to your agent, and let it diagnose the problem. Hermes is better at debugging itself than OpenClaw because it retains the context of what went wrong and what was already tried.

If you're building long-term workflows where context matters, where your agent needs to remember what you decided last week and why, migrate now. Hermes was built for this from the ground up, and the migration wizard makes the switch painless.

If you're running simple one-off automations and OpenClaw handles them fine, stay put. There's no reason to move if your current setup does what you need.

For everyone in between, it takes fifteen minutes and a slightly higher API bill. You get an agent that follows your instructions, remembers your preferences, and gets better at your specific workflows every day it runs.

I moved a week ago. The only thing I regret is not moving sooner.

Ditch Your Subscriptions and Run Open Source AI on Your Device

cucoleadan — Tue, 24 Mar 2026 13:35:30 +0000

This post was originally published on my Substack publication as Ditch Your Subscriptions and Run Open Source AI on Your Device.

Open-source AI models are beating the paid ones. A year ago that sentence would have been ridiculous. Not anymore.

Qwen 3.5 is outscoring GPT-5.2 on key benchmarks. MiniMax M2.5 is running on people's Mac Studios at 20 words per second, trading blows with frontier models like Opus 4.5 and Gemini 3 Pro. The gap between a $20/month cloud subscription and running that same intelligence on your own hardware has never been thinner.

The models are free. The tools are ready. The part most people get stuck on is figuring out which model their specific hardware can actually handle without crawling.

I spent weeks digging through benchmarks, community reports, and real-world results across every hardware tier for two of the most relevant open-source model families. What follows is exactly what runs where, how fast, and which model deserves a spot on your machine.

In this article:

Two Families
Efficiency Tier
GPU and Mac Mini Tier
The Final Boss
The Cheat Sheet
Your Machine, Your Model
What Comes Next

Two model families cover the entire spectrum from "runs on a phone" to "runs on a workstation" better than anything else available today.

Qwen 3.5 (by Alibaba) is the Swiss Army knife of open-source AI. Eight sizes from 0.8B to 397B parameters. Specialized variants for coding, vision, and reasoning. All Apache 2.0 licensed. Every local AI tool worth mentioning, Ollama, LM Studio, llama.cpp, Jan.ai, supports it out of the box. The latest generation dropped between February and March 2026 with a new Gated DeltaNet architecture, 262K-token context windows, and 201 languages. This is where most people should start.

MiniMax M2.5 is the ambitious one. 230 billion total parameters, but it only activates 10 billion on every response thanks to an extreme Mixture-of-Experts architecture (more on this soon). 200K native context window. The community at Unsloth compressed it from 457GB down to a 101GB file, making home deployment possible. For those with the hardware, it's frontier-class intelligence on your own desk.

You do not need an expensive GPU to run a language model locally.

You can use your travel laptop with integrated graphics, a base Mac Mini or your aging desktop to run a Qwen model at home.

Qwen3.5-4B (~2.5 GB at Q4 quantization) is the best quality at this size. Drafting emails, summarizing documents, light coding help, translation, private conversations that never leave your machine.

Based on community reviews, it's coherent and helpful in ways you wouldn't expect from a model this small. Qwen3.5-2B (~1.3 GB) is the sweet spot for CPU-only machines. Qwen3.5-0.8B (~0.5 GB) runs on anything with a CPU (like your phone).

These won't write your PhD thesis, but they're fast (40+ tokens per second on CPU), completely private, and the 4B punches well above its weight. Getting started takes one command:

ollama run qwen3.5:4b

Ollama downloads the model and then you're ready to chat. Use LM Studio if you prefer a GUI or, my favorite, Jan.ai if you want something prettier.

No API latency. No rate limits. You hit Enter and the answer starts flowing instantly. This tier is where people have the "wait, this is running on MY computer?" moment.

The hardware range here is wide. On the lower end: an RTX 3060 12GB, an RTX 4060 Ti 16GB, an RX 7800 XT, or a Mac Mini M4 Pro with 24GB. On the upper end: an RTX 3090, an RTX 4090, or a Mac Mini M4 Pro with 48-64GB.

Apple's unified memory works like VRAM for AI inference, so a 24GB Mac Mini sits in this tier right alongside a 24GB GPU. One rule applies across the board: the bigger the GPU and the more memory you have, the faster your tokens generate and the larger the model you fit.

Qwen3-8B (~5 GB) is a solid all-rounder that leaves tons of headroom on a 12GB card. Good for quick tasks and light conversations.

Qwen3-14B (~9 GB) is the Goldilocks model. Fits comfortably on 12-16 GB, and delivers top notch quality when you take its size into account. It's does a great job at coding, reasoning and creative writing. If you have the memory for it, this is where I'd recommend most people start.

Qwen3.5-35B-A3B (~18.6 GB) is the model that inspired me to write this article. It has 35 billion total parameters, but only 3 billion activate on every response. This is a Mixture-of-Experts model.

MoE models are built differently. Instead of one massive brain firing every neuron, think of it as a team of specialists. Ask a coding question and the coding experts light up. Switch to creative writing and a different set activates. The result: you get 35B-level intelligence at 3B speed and memory cost. Fits on 16GB with CPU offloading.

This MoE architecture is the same idea behind MiniMax M2.5, so keep that concept in mind as we move along.

Qwen3.5-27B (~17 GB) is the dense powerhouse at the top of this tier. Built for 24GB cards and 48-64GB Macs. All 27 billion parameters fire on every response, it supports 262K context across 201 languages, and it wins on reasoning and coding benchmarks against every model at this size. With 24GB of VRAM you still have plenty of headroom left for long conversations.

Qwen3-Coder-30B-A3B also deserves a mention here as its a dedicated coding model (also MoE, 3B active), rivaling Claude Sonnet 4 on SWE-Bench.

Speed across this tier ranges from 15 to 40+ tokens per second, depending on model size and your hardware. A 64GB Mac Mini M4 Pro runs the 27B at 15-25 tok/s and the 35B-A3B even faster thanks to MoE efficiency. A 24GB GPU pushes the smaller models past 40 tok/s. For reference, average human reading speed is roughly 250 words per minute, or about 5-6 tokens per second.

Worth noting for anyone planning to run models around the clock: the Mac draws about 30W under load compared to 300W+ for a GPU rig. Over months of use, the electricity savings add up.

Everything above was the warm-up. This is the final boss.

You need a Mac Studio with 128GB unified memory or a multi-GPU PC with 96GB+ RAM. The Mac Mini caps at 64GB, so it tops out at the GPU tier above.

MiniMax M2.5 takes the MoE concept to the extreme: 230 billion total parameters, 10 billion active per response. A 200K native context window that handles entire codebases, full novels, or months of transcripts in one conversation.

Mac Studio 128GB is the ideal setup. No bottleneck between GPU and CPU since it's all one memory pool. Community benchmarks: 20-25 tok/s. PC with dual GPUs + 96GB RAM works through CPU offloading. Slower (12-25 tok/s) but functional.

The key number: 101GB. Unsloth's 3-bit GGUF (UD-Q3_K_XL) compresses the model from 457GB to 101GB with minimal quality loss.

Start with 16K-32K context and scale up. Enable flash attention and CPU-MOE offloading.

Do all that and you'll get frontier-class intelligence and massive context entirely on your hardware. No API costs, no data leaving your machine, no rate limits. For lawyers, researchers, or developers handling sensitive work, this is the endgame of private AI.

For developers at this tier: Qwen3-Coder-480B-A35B is the most capable open-source coding model available (that you can run at home). 480B total parameters, 35B active, 69.6% on SWE-bench Verified, comparable to Claude Sonnet. It needs 240GB+ at Q4, so a Mac Studio with 192GB or a multi-GPU server setup is the minimum. If you write code for a living and have the hardware, this is the local Copilot replacement to end all replacements.

Looking ahead: MiniMax M2.7 launched March 18 with strong coding benchmarks (56.2% SWE-Pro, 97% skill adherence across 40+ tasks), but weights are proprietary. You can't run it locally yet. MiniMax M3 is expected to add multimodal capabilities (text, images, video). M2.5 is text-only, which is its biggest gap compared to Gemini Flash or GPT-5.4 Mini. If M3 ships open-weight, it becomes a direct competitor to those cloud-only models on home hardware.

Find your hardware, grab your model, go.

The cheat sheet above gets you close, but VRAM estimates are estimates. Your exact hardware, OS, and background apps all matter. Reddit gives conflicting advice. YouTube benchmarks were run on different machines.

This is why I'm building the AI Hardware Checker. It's a website where you plug in your hardware details, your GPU, your RAM, and it tells you exactly which AI model fits your setup, what settings to use, and what speed to expect.

It's not live yet. I'm actively building it right now. And I want to build it around real hardware owned by real people.

What needed a datacenter two years ago runs on a gaming PC today. What runs on a gaming PC will run on a phone tomorrow. MoE architectures, Gated DeltaNet, aggressive quantization. The field is sprinting toward "run anywhere."

Qwen and MiniMax are the beginning. MiniMax M2.7 is already here (API-only), M3 with multimodal is on the horizon. The walls between cloud AI and local AI are dissolving.

The best time to start was last year. The second best time is right now.

OpenClaw vs Claude Cowork vs Perplexity Computer - Which AI Agent Actually Fits Your Life

cucoleadan — Tue, 17 Mar 2026 13:11:58 +0000

You have a rare opportunity.

Both Anthropic and OpenAI are running 2x usage limits on their paid plans right now. Claude Pro subscribers get double the capacity across Claude AI, Cowork, and Claude Code. ChatGPT Plus subscribers get the same bump across ChatGPT and Codex, which is awesome since you can use your ChatGPT Pro account in OpenClaw.

Same price, twice the output. These promos won't last, and they've given me the perfect excuse to push all of these tools harder than I normally would.

And I needed that, because three weeks ago the AI agent space exploded.

In late February, the AI agent space stacked up fast. OpenClaw shipped a major security and reliability update. Days later, Anthropic launched scheduled tasks for Claude Cowork and Perplexity dropped their Computer product on the same day. Three different companies, three different visions of what an AI agent should be, all landing within the same few weeks.

I've been running OpenClaw on a cloud server for months now. Claude Cowork is my daily driver for local work. I've spent a week with OpenAI's Codex after it launched its Windows app. And I've done a ton of research on Perplexity Computer, watching head-to-head comparisons, reading reviews, and studying how it stacks up against the tools I use daily.

Based on all of that, I wrote this so you can make an informed decision and pick the right tool for your specific use case.

If this kind of breakdown saves you time, follow me X for more like it.

In this article:

The Lineup
Where It Runs
Head to Head
What It Costs
Who Sees Your Data
Pick Yours
Where This Is Going
One More: Codex

The Lineup

Here's the 30-second version of each tool. They look similar on paper but they're built on fundamentally different philosophies.

OpenClaw is a self-hosted, always-on AI agent. Open-source, 319,000+ GitHub stars, recently "acquired" by OpenAI. You install it on a VPS or a Mac Mini, and it runs 24/7. You talk to it over Telegram, WhatsApp, Discord, or whatever messaging platform you prefer. Think of it as a remote employee who never clocks out. You bring your own API keys, your own models, your own rules.

Claude Cowork is a local co-pilot that lives inside Claude Desktop. $20/month on any paid Claude plan. Best UI of the three by a wide margin. The recently added scheduled tasks feature turned it into a lightweight automation tool, letting you set up daily briefings, weekly reports, and recurring workflows. The catch: it only runs while your computer is awake and the app is open.

Perplexity Computer is a cloud-based multi-model orchestrator. It takes your task, breaks it into subtasks, and dispatches them across 19 specialized AI models: Claude Opus for reasoning, Gemini for research, GPT-5.4 for long-context work, Grok for quick tasks. It was locked behind the $200/month Max plan at launch, but it's now available to Pro users ($20/month) through usage credits. Over 400 app integrations including Gmail, Notion, Slack, and Salesforce.

Where It Runs

Every comparison I've read focuses on features, pricing tiers, model benchmarks. None of those tell you as much as one thing: where the agent runs.

This is the fork in the road. Security, reliability, cost structure, and what the tool does for you all flow from this single architectural decision.

On your machine. That's Claude Cowork. Easiest to start. Most polished experience. You install the desktop app, point it at a folder, and you're working. But it dies when your laptop sleeps. Scheduled tasks only fire while the app is open. If you close your lid at 6 PM, your "daily overnight report" never runs. Great for working-hours automation. Terrible for overnight agent dreams.

On your server. That's OpenClaw. True autonomy. It runs while you sleep, eat, and go on vacation. I've had mine running on a Hetzner VPS for months and it processes tasks at 3 AM without me lifting a finger. But you're now a sysadmin. Updates break things. Security is your responsibility. In independent testing it scored a 4 out of 10 for ease of setup. That number feels generous on a bad day.

In someone else's cloud. That's Perplexity Computer. Always on, zero maintenance, no servers to manage. But you're paying a premium for that convenience, and your data flows through Perplexity's infrastructure. With Pro credits now available, the entry barrier dropped significantly. Heavy users will still feel the cost.

I chose to run OpenClaw and Cowork together because I need both modes. If I had to pick only one, the answer depends entirely on whether I need an agent that works while I don't. If you need something running overnight, it's OpenClaw or Perplexity. If your AI work happens during business hours, Cowork is more than enough.

Head to Head

Enough architecture. Here's how they perform across the three categories that matter most.

Research and Reports

Perplexity Computer wins here, and it's not particularly close. The multi-model orchestration means it cross-references sources across different AI systems, and every output includes clickable citations with links in the footer. In one head-to-head test I studied, both tools were asked to research AI invoice automation tools and compile a one-page PDF comparison. Perplexity came back with precise pricing data, clean formatting, and source URLs you could verify in seconds.

Cowork produced a solid report but broke the one-page constraint, got a team plan price wrong, and listed sources by name without linking them. Still usable, about 90% as good, but that missing 10% is the part that matters when you're sending deliverables to a client.

OpenClaw handles research, but it's manual. You prompt, it fetches, you verify. There's no orchestration layer coordinating multiple models behind the scenes. It works. It's slower and requires more hand-holding.

Verdict: If research is your main job, Perplexity Computer earns its price tag. For everyone else, Cowork is good enough.

Working With Your Own Files

Cowork dominates this category. It reads your local filesystem directly. Point it at a folder and it references, reviews, or builds on anything inside it. No uploads, no API overhead, no friction.

I tested Cowork with messy client intake notes: 10 inconsistent text files with different formats, missing fields, and contradictory information. Cowork parsed them all, identified the real business bottleneck for each client, and output a clean structured spreadsheet. In the same test run by a reviewer, Perplexity handled the task but required manual file uploads for each one. It also made a weaker recommendation, suggesting ChatGPT for a client whose real problem was a bad website.

OpenClaw can access server files natively, and if you've built the Nextcloud bridge I wrote about previously, it can reach anything your local AI can reach too. Different path to the same destination.

Verdict: If your work lives in local files, Cowork. If it lives on a server, OpenClaw. If it lives in cloud apps like Gmail, Notion, or Google Sheets, Perplexity Computer's 400+ integrations give it the edge.

Automation and Scheduled Tasks

This is where the "where does it run" question hits hardest.

OpenClaw was built for this. Heartbeats, cron jobs, always-on triggers, multi-platform messaging. It runs whether you're awake or not. I have mine sending me a Telegram summary of my project pipeline every morning at 9 AM. I don't touch anything. It shows up.

Cowork now has scheduled tasks and they work well when they work. Daily, weekly, hourly, on-demand. The UI for managing them is cleaner than anything OpenClaw offers. But the limitation is real: they only execute while your computer is on and the app is open. If your laptop is closed, the task simply doesn't run.

Perplexity Computer can handle multi-hour and even multi-day workflows autonomously in the cloud. For long-running research or complex multi-step tasks, the results are strong. But you're paying cloud prices for that always-on capability.

Verdict: True 24/7 automation means OpenClaw or Perplexity. For business-hours automation, Cowork handles that well.

Know someone choosing between these tools? Share this with them.

What It Costs

Sticker prices lie. Here's what each tool costs when you're using it daily.

Claude Cowork: $20/month. Simplest math of the bunch. One subscription, everything included. No API keys, no hosting bills, no infrastructure to maintain. This is the "I want it to work, no setup" option. And right now, Anthropic is running 2x usage limits on all paid plans. Double the Cowork capacity for the same $20. If you've been waiting to try it, this is the window.

OpenClaw: $0 software + $6-50/month real cost. The software is free and open-source. But you need somewhere to run it. A VPS costs $6-24/month depending on specs. API calls for models like Opus, Gemini Pro, or GPT-5.4 add another $10-200/month depending on how heavily you use it. Light users land under $20 total. Power users blow past $200 easily.

The hidden cost is your time. Setup, maintenance, debugging when an update breaks something. OpenClaw is free in dollars but expensive in hours. I spent an entire evening once debugging a permissions conflict between OpenClaw and Nextcloud on the same server. That evening had a price even if my credit card didn't see it.

Perplexity Computer: $20/month (Pro with credits) or $200/month (Max). The Pro credits path is new and it changes the equation. You get access to the full multi-model orchestrator without the $200 commitment. But heavy workflows eat through credits fast. Max at $200/month is aimed at professionals whose time is worth more than the subscription: consultants, researchers, analysts working on problems where a single good report saves hours of manual work.

The value verdict. Cowork wins on pure value for most people. OpenClaw wins if you already have a VPS and enjoy tinkering. Perplexity Computer's Pro tier makes it worth trying, but Max is only justified if it saves you measurable hours every week. And with both Claude and Codex running 2x promos simultaneously, mid-March 2026 is the cheapest stress-test window you're going to get. Take advantage before it ends.

Who Sees Your Data

This is the part that nobody wants to talk about but everybody should.

OpenClaw has full system access by design. It can read, write, and execute anything on the host machine. In independent testing it scored a 3 out of 10 on security. The low score comes from the architecture itself.

The tool works precisely because it's unconstrained. But if you misconfigure it, everything on that machine is exposed. Your business plans, client files, API keys, all accessible.

Self-hosting means you own the risk and the control.

This is why I run OpenClaw behind my own server with a Nextcloud layer on top. Full control over every file, every model, every connection. But I wouldn't recommend that setup to someone who isn't comfortable managing a VPS and thinking about access control lists.

Claude Cowork is sandboxed to your working folder. It scored a 9 out of 10 on security in the same evaluation. Anthropic handles the model infrastructure, you handle your files. It's the safest option out of the box by a wide margin. If security keeps you up at night, this is your tool.

Perplexity Computer is cloud-based with 400+ integrations. Your data flows through Perplexity's infrastructure and gets routed across multiple model providers. If you trust them, great. If you've built your workflow around data sovereignty and self-hosting, this is going to feel uncomfortable.

Pick Yours

Find yourself in 10 seconds.

Choose Claude Cowork if:

You're non-technical or want zero setup friction
You already pay for Claude Pro, Max, or Team
Your work is mostly local files, writing, and brainstorming
You want scheduled tasks that run during working hours
Budget matters and $20/month all-in sounds right

Choose OpenClaw if:

You want a 24/7 agent that works while you sleep
You're comfortable with a VPS and basic server management
You need mobile access via Telegram or WhatsApp
You want full control over which models you use, where your data lives, and how everything connects
You like open-source and building your own stack

Choose Perplexity Computer if:

Your work is research-heavy and citation quality matters
You need multi-model orchestration across providers
Your workflow already lives in cloud apps like Gmail, Notion, Sheets, or Slack
You want autonomous multi-day task execution without managing infrastructure
You're a Pro user willing to try it with credits, or a professional where $200/month pays for itself in saved time

Use more than one if:

You have different modes of work. Deep focus, always-on automation, and heavy research are three different jobs. These tools aren't mutually exclusive. I run OpenClaw on my VPS for overnight tasks, Cowork locally for writing and brainstorming, and bridge them through Nextcloud so both AIs share the same files. Different tools for different jobs, one shared brain.

Which combo are you running? Drop a comment, I read every one.

Where This Is Going

Three major platforms shipped competing agent features in the same few weeks. All of them converging at once tells you where the industry is heading.

The "build a SaaS wrapper" era is ending. Scheduled email briefings, automated research reports, CRM workflows, client intake processing. These tools do all of that out of the box now. Dozens of startups built businesses around features included in a $20/month subscription.

The surviving play is learning which agent to deploy for which task, how to connect them, and how to make them share context across platforms. The people who figure out how to wire these tools together will have the real edge.

Pick the right combination of agents, matched to how you work, and you'll move faster than anyone stuck choosing sides.

One More: Codex

I've only been using Codex for a week, so it's not in the main comparison. But it deserves a mention because it's solving a different problem entirely.

Codex is OpenAI's cloud-based coding agent, powered by codex-1, a version of o3 optimized specifically for software engineering. It comes in three flavors. Codex Web is the autonomous cloud version: you give it a task, it spins up a sandboxed environment, works for 1 to 30 minutes, and comes back with a pull request. You can fire off multiple tasks in parallel.

Codex Desktop launched on Windows recently, which is how I've been running it, as a native app on my machine alongside Claude Desktop. Codex CLI is the open-source command-line version, similar in spirit to Claude Code.

Codex stays in one lane: software engineering. Writing code, fixing bugs, answering codebase questions, and proposing pull requests. Where the main three tools try to be general-purpose assistants, orchestrators, or always-on agents, Codex focuses on code and nothing else.

It's included with ChatGPT Plus at $20/month with usage limits. Pro at $200/month gets 6x the capacity.

And right now, OpenAI is running 2x usage limits on Plus, so you get double the Codex tasks for the same $20. Pair that with Anthropic's own 2x promo on Claude, and mid-March 2026 is the best window to trial both ecosystems side by side.

My early impressions after a week on the Windows app: the cloud execution model is the standout feature. Fire off a task and go do something else. No terminal babysitting. I kicked off three bug fixes simultaneously and reviewed them all when they came back.

The sandboxed environment means it can't break your local setup, which is a relief compared to OpenClaw's "full access to everything" philosophy. And it cites terminal logs and test outputs as evidence, so you can trace exactly what it did and why.

But it's early days. I haven't stress-tested it on complex multi-file refactors yet, and I don't know how it handles edge cases in large codebases. For developers, it fills the same focused coding role in your stack. For non-developers, it won't add much.

I'll write a full breakdown once I've spent more time with it. For now: if you're already paying for ChatGPT Plus, you have access. Go try it while the 2x limits last.

If this helped you pick the right tool, follow for more real-world breakdowns like this one.

DEV Community: cucoleadan

My Hermes AI Agent Maintenance Routine For Maximum Reliability

After Hermes Install

Cron Prompts Beat Commands

Three Maintenance Layers

Daily Hermes Health Check Prompt

Weekly AI Agent Drift Review Prompt

Monthly Hermes Assumptions Review Prompt

Approval Gate for Maintenance Jobs

Quiet Agent Failures

How to Roll Out the Routine

Failure Limits

Bottom Line

I Tested 6 AI Plans to Find What $5, $10 and $20 Get You

The One Test That Picks Winners

$5: Where Most People Get It Wrong

$10: The Real Starting Line

$20: Brands You Know, Limits You Don't

Where Plans Hit the Wall

The Only Math That Matters

Here Is What I Would Buy

Bottom Line

Source Notes

When to Use MCPs, CLIs, or Your Own Tool

In this edition:

MCP vs CLI: Asking a Better Question

When to Start With CLI for Local AI Workflows

When to Reach For MCP At The Boundary of External Data

Build The Bridge Yourself

The Rule I Use

60-Second Tool Test

More Control, Less Clutter

Learn how to use Claude Code for free with OpenCode Zen models by deploying a Cloudflare Worker proxy and configuring third-party inference.

How to Run Claude Code for Free with OpenCode Models

How to Run Claude Code for Free with OpenCode Models

In this article:

Deploy the Worker in Cloudflare First

Configure Claude Desktop to Use OpenCode Zen

Start with Free OpenCode Zen Models

Choose /zen for Free Models and /go for OpenCode Go

Why This Route Instead of OpenRouter or Ollama?

Test Claude Code Safely in a Throwaway Folder

Switch to OpenCode Go When the Free Lane Stops Being Worth It

How This Also Works with Claude Cowork

Use This 10-Minute Checklist to Get Started

FAQ

Can I use Claude Code for free?

Is Claude Code in VS Code free?

How do I get Claude Code credits for free?

How do I use Claude Code free forever?

External Sources Worth Checking

How to Add Approval Gates to Your Hermes Agent

A Checkpoint Is Not a Roadblock

The Three Gates You Need

Gate #1: The Send Gate (Start Here)

Gate #2: The Change Gate (Level Up)

Gate #3: The Spend Gate (Advanced)

To Gate or Not to Gate

The Three Mistakes People Make

Trust the Process but Keep the Net

How My Hermes Agent Plans My Morning Before I Have My Coffee

You Are Not Lazy, Just Constantly Interrupted

What an Actually Good AI Briefing Looks Like

Layer 1: Build Your First Briefing in 20 Minutes

Layer 2: The Substack Digest Email

Layer 3: Plug in Slack, Jira, GitHub, or Anything with an API

Information Without Action Is Just Noise

How to Keep Your Briefing Brief

Less Cognitive Load Is the Real Payoff

The Agentic Engineering Shift

What Karpathy Named

The Spectrum

The Practices

The 60-Second Test

The $30 Hermes Stack That Makes Claude Max Look Like a Ripoff

In This Article:

Hermes Is the AI Agent OpenClaw Promised to Be

Ditch Your Subscriptions and Run Open Source AI on Your Device

OpenClaw vs Claude Cowork vs Perplexity Computer - Which AI Agent Actually Fits Your Life

The Lineup