cucoleadan

Posted on May 20 • Originally published at vibestacklab.substack.com on May 19

I Tested 6 AI Plans to Find What $5, $10 and $20 Get You

#ai #plans #hermes #stack

This post was originally published on my Substack publication as I Tested 6 AI Plans to Find What $5, $10 and $20 Get You.

A little while ago, I built a multi-step workflow in Hermes to generate a ten-page report that would get stronger each time it passed through the document. It checked the latest news, then read through Reddit threads, then cross-checked with X and also read through a bunch of internal documents.

For most of the run, it worked the way I wanted, and Hermes kept moving the file forward while pulling in the context it needed and holding onto the thread of the job.

By the time it reached the last stage, somewhere around the fourteenth tool call, it already had the material it needed and only had to stay coherent long enough to verify the details and write the final section cleanly into the file.

Then it just stopped in the middle of the edit. It retried enough times to trigger a context reduction right when the report needed the fullest possible view of everything that had already happened. The fact that I had to step back in and rebuild the whole thread was extremely annoying and the reason why I decided to write this article.

That was also the moment I started focusing on reliability rather than judging AI plans by the model menu.

Pricing pages encourage you to compare plans by the names they advertise, but Hermes forces a more practical question, which is whether a plan can carry real work through a messy session without handing it back to you halfway through.

Once I started looking at plans that way, I cared a lot less about whether a subscription included a famous model and a lot more about whether Hermes could finish the work before my own attention became the most expensive part of the workflow.

I have paid for enough AI accounts to know how misleading a low sticker price can be. A five-dollar plan stops feeling cheap the moment it burns an hour of focused work.

Not to mention that most twenty-dollar plans might feel like they come with extra usage compared to their cheap alternatives, but that is not usually the case. Looking at you, Anthropic.

That's the frame for this piece, because I rechecked the official pricing pages on May 19, 2026, and I want to show you these prices through an AI agent lens rather than focusing on their sales copy.

In this article:

Why model names and benchmark scores are the wrong way to judge an AI plan
How one $5 plan became my daily driver after I fixed my routing
Why the $10 tier is where most plans start to make real sense
What the big brand names ($20 tier) actually limit once you push them
Where plans break mid-session and how cost per useful hour flips the math
The exact stack I would buy today and which plans I would skip

The One Test That Picks Winners

Benchmarks tell you how a model performs in isolation, but Hermes shows you something much harder to fake, which is whether a plan stays useful once the session fills with tool calls, file reads, and the usual clutter that comes with trying to finish real work.

My test now feels much simpler than any leaderboard, because all I really have to do is give Hermes one job from a normal week and watch how much of my own attention it gives back to me by the end.

If Hermes gets to a result I can keep, the plan earns its place. If the session breaks, the model loses the thread, or I have to step back in for cleanup, the plan gets more expensive no matter how cheap the subscription looked when I bought it.

$5: Where Most People Get It Wrong

The five-dollar tier starts with OpenCode Go, and it stands out immediately as it's the only subscription I found that gives you a real first month instead of a throwaway trial.

Right now, OpenCode Go is $5 for the first month and $10 after that, and it works in Hermes by default, which matters because it feels like a provider route built for agents instead of a chat plan stretched into agent work after the fact.

What changed my view of this plan is that it did not stay a cheap side route for long. It became my daily driver, even during the stretch when I was still paying for three subscriptions just to keep up with my usage.

At the time, the real problem was not the plan itself but the way I was using it, because I kept pushing the same model through every kind of Hermes task and expecting it to behave well no matter what the work looked like.

For a while I ran Qwen 3.6 Plus for almost everything, and that worked badly enough that I ended up compensating with more subscriptions instead of better routing.

The setup only started to make sense once I matched the model to the job, with DeepSeek V4 Flash and V4 Pro taking most of the regular Hermes work while Gemini 3.1 Flash Lite via OpenRouter handled image analysis more cleanly than the routes I had been forcing before.

OpenCode Go became much more useful once I stopped treating one model like a universal answer and started treating the plan like a routing layer for different kinds of work.

I still think the five-dollar month is the right place to learn this lesson, since it is cheap enough to experiment with and real enough to show you very quickly whether your workflow is efficient or just patched together.

$10: The Real Starting Line

The $10 tier is where most of these plans start to feel normal, since the $5 and sub-$5 options are mostly gone now outside of special promos.

That is also the first tier I would take seriously for regular Hermes use.

After the first month, OpenCode Go lands here at its regular price, and MiniMax Token Plan Starter shows up at the same $10 with 1,500 M2.7 requests every 5 hours.

On paper, that sounds like a clean comparison. In practice, I care much less about the headline limits and much more about what the workflow feels like once Hermes is doing the work.

MiniMax Starter gives you a dedicated M2.7 bucket, which is useful if you already know that model is good enough for most of your week and you want limits that are easy to reason about.

OpenCode Go works differently, since it gives you a shared routing budget across several model families, and that can look better or worse depending on what kind of week you're having.

If you mostly run MiniMax M2.7 through Go, the published estimates are higher at around 3,400 M2.7 requests every 5 hours for the same monthly price, so it can look cheaper than MiniMax Starter on raw throughput alone.

Still, that is not what would decide it for me.

I would judge the whole tier by loop quality more than by the model list or benchmarks. Sometimes I hit 503 errors on Qwen 3.6 Plus through OpenCode Go, and other times the tokens per second I got through Go were clearly better than what I was getting from MiniMax directly. And I absolutely hate it to wait for AI to answer. I'd rather have a faster model than a smarter model, but that's just personal preference.

What matters most to me is whether it keeps moving after the first answer, uses tools cleanly, and keeps its replies short enough that the session stays readable while the work is still in progress.

$20: Brands You Know, Limits You Don't

The $20 tier is where the familiar companies start showing up.

OpenAI and Anthropic are the obvious ones, because they are the subscriptions most people already know. Ollama belongs in the same conversation for a different reason, as it's one of the few open-model companies that already feels big enough to sell a hosted plan without sounding like a side project.

That matters because this tier is not only about extra usage. It is also about how much trust people attach to the company behind the plan, and whether that trust survives contact with the actual limits.

ChatGPT Plus is the default benchmark. OpenAI lists Plus at $20 per month, says it gives higher GPT-5.5 limits inside ChatGPT, and keeps API usage separate from the subscription.

You can count Plus in the real stack because Hermes supports OpenAI Codex through ChatGPT OAuth, but the plan still buys ChatGPT access rather than API credit. The limit story is also less generous than the branding makes it feel. OpenAI says Plus users can send up to 160 GPT-5.5 messages every 3 hours, and manual GPT-5.5 Thinking has a weekly limit of up to 3,000 messages. That is fine for normal chat use. It starts looking smaller once you lean on it harder.

Claude Pro has the same advantage and the same problem. Anthropic is a big enough name that people do not need much convincing to try the plan, and Claude is useful enough that plenty of people will keep paying for it anyway. The issue is that the limits are nowhere close to generous for heavy use.

It's just easy to run into the ceiling faster than the $20 price tag suggests, especially once you lean on Sonnet for real work.

Ollama Cloud Pro is more interesting to me because it is not trying to be ChatGPT or Claude. Ollama lists Pro at $20 per month or $200 per year, with larger cloud models, 50x more cloud usage than Free, and three concurrent cloud models.

That sounds strong until you compare how the limit story is presented next to OpenCode Go. OpenCode Go tells you the five-hour, weekly, and monthly caps directly, including a monthly ceiling of $60. Ollama tells you usage is mostly GPU time, gives you five-hour and weekly resets, and lets you run three cloud models at once, but it does not spell out a monthly limit on the pricing page. That makes the plan harder to reason about.

The three-model ceiling also matters more in Hermes than it would in a normal chat app. If you mostly run one agent at a time, it probably feels fine. If you like concurrent agents, background runs, or separate research and writing loops happening together, three can start feeling smaller than the headline suggests.

So yes, Ollama Pro looks good. It is just not automatically better than Go once you care about legibility, concurrency, and what the plan looks like over a full month instead of over a good afternoon.

Nous Portal Plus is less mainstream than OpenAI, Anthropic, or Ollama, but it still deserves the slot because it fits Hermes more naturally than most of the bigger brands. Nous lists Plus at $20 per month with 300+ models, hosted tool usage, and $22 in monthly credits with rollover. I felt that I should include this because they are the team who created Hermes after all.

MiniMax Token Plan Plus is still the simplest volume play. MiniMax lists Plus at $20 per month with 4,500 M2.7 requests every 5 hours plus speech and image quotas. If M2.7 already works for your Hermes load, that is a very direct way to buy more room.

Those are not the same thing, and the difference only shows up once Hermes starts leaning on the plan instead of just chatting through it.

Where Plans Hit the Wall

Hermes exposes plan limits in the middle of real work instead of at the edge of a chat.

A chat cap is annoying when you are asking questions. The same cap inside Hermes can land in the middle of a file edit, a research loop, or a tool run that was finally starting to cohere. Then you lose more than a reply. You lose the state of the job and pay for it again in the next session.

Fallback models create a quieter version of the same mess. A session starts on one route and ends on another, and you can feel it even before you check the model picker. Instruction following gets softer. The agent stops being careful with the same tool path it was following ten minutes earlier.

Tool use is still the cleanest divider for me. A model can sound impressive in a chat window and still be weak inside an agent loop. If it avoids reading files, skips verification, or acts allergic to tools, I do not care how good the brand or benchmark looks. The less glamorous route that checks its work often finishes more jobs per dollar.

Memory changes the value of a plan too. Hermes only starts to feel useful once it can carry a project forward across sessions. If the provider leaves you with a morning reset, the agent never really joins the work. It just keeps reintroducing itself.

That is also why the OpenClaw to Hermes migration mattered so much to me. I was not looking for a smarter chat app. I wanted something that could keep the work moving without making me rebuild the thread every time.

Latency has its own cost. A slow model is fine for overnight cleanup or background chores. It gets expensive the moment you are thinking with the agent in real time and waiting for the next useful move.

The Only Math That Matters

The metric I keep coming back to is cost per useful Hermes hour.

I like it because it is boring enough to be honest.

cost per useful hour = monthly plan cost / Hermes hours that ended in usable work

If a $5 plan gives you ten clean background hours, it is excellent.

If that same plan burns one focused afternoon because Hermes stalls in the fragile part of the job, the cheap price was fake.

A $20 plan can still be the cheaper one if it finishes the sessions you would otherwise have to rescue.

I would not build a dashboard for this. One line in your notes after each session is enough. Write down the plan, the job, and whether Hermes finished without babysitting.

After a week, the pattern usually gets obvious. OpenCode Go might end up doing the background work. MiniMax might carry more of the daily load than you expected. Nous might keep its place because the tool gateway removes setup friction. Ollama might stay as the open-model cloud route. ChatGPT and Claude might remain in the stack because they are still where you think best before sending the work back into Hermes.

That is enough to make the decision. The goal is to stop paying for subscriptions without knowing what job each one is there to do.

Here Is What I Would Buy

If I were rebuilding this stack today, I would still start with OpenCode Go and give it the boring work first.

That is the cheapest place to learn whether the workflow is efficient or just being propped up by extra subscriptions.

I would keep fragile sessions away from it until it earned trust. Cleanup, first-pass research, low-risk drafts, and the kind of work that is useful when it lands but not painful if it misfires.

Once the first month ended, I would treat the $10 tier like the real test. OpenCode Go at full price and MiniMax Starter both deserve a normal week before I let a $20 brand into the stack on reputation alone.

After that, I would only pay for a $20 plan if I knew exactly why it was there. ChatGPT Plus belongs if the ChatGPT or Codex lane matters enough to keep. Claude Pro belongs if Claude is still where the best writing or dev work happens, even with the limits. Nous sits closest to native Hermes work. Ollama Pro belongs if I want the open-model cloud lane and can live with the three-model ceiling. MiniMax Plus is the straightforward volume upgrade if M2.7 is already carrying real work.

That is less satisfying than picking one winner. It is also closer to how the work behaves.

Different jobs deserve different routes. Background chores do not need the same plan as the sessions where one bad restart can waste half an afternoon.

Bottom Line

The cheapest AI plan is the one that gives Hermes work you would keep.

A $5 route is great when it clears background noise. A $10 route is where I would test daily Hermes usage. A $20 route only earns its place when it gives you something the cheaper paths do not, whether that is better fit, clearer limits, or a route you trust enough to use for harder work.

The wrong plan steals focus at any price.

Before you buy another subscription, look at your last ten Hermes sessions. Mark the ones that ended in usable work. Mark the ones you had to rescue. Then ask which plan helped the work move forward and which one only looked cheap on the invoice.

That becomes the buying decision.

I would rather pay for one route that finishes the work than keep juggling three subscriptions that still need me to manage them.

Source Notes

OpenCode Go lists the $5 first month and the $10 monthly price after that. The page also covers any-agent use and current request allowances.

ChatGPT Plus lists $20 per month, app-level Plus benefits, and the note that API usage is billed separately. OpenAI API pricing lists GPT-5.5 and GPT-5.4 token pricing outside ChatGPT subscriptions.

Ollama Cloud pricing lists Pro at $20 per month or $200 per year. The same page covers three concurrent cloud models and usage measurement based mainly on GPU time.

Nous Portal lists Plus at $20 per month with 300+ models and hosted tool usage. It also lists the $22 monthly credits and rollover rules.

Claude Pro lists Pro usage behavior and resets, while Anthropic API pricing lists Claude API prices separately from Pro.

MiniMax Token Plan lists Starter at $10 per month with 1500 M2.7 requests per 5 hours and Plus at $20 per month with 4500 M2.7 requests per 5 hours.

Hermes AI Providers lists the relevant provider paths for Nous Portal and OpenAI Codex. It also covers OpenCode Go and Anthropic.