Kristiyan Stoyanov

Posted on Jun 23

The AI agent habit that was quietly wasting my time and tokens

#ai #agentskills #agents #hermes

The AI Agent Habit That Was Quietly Wasting My Time and Tokens

Tags: ai, localai, machinelearning, productivity, agents

I realized I had been using AI agents in a way that looked smart but was actually pretty wasteful.

The pattern was simple: I would ask the agent for something useful, it would go off and figure it out, and eventually I would get an answer. The problem is that if you keep asking the agent to rediscover the same process over and over, you are paying for repeated reasoning, repeated tool usage, and repeated trial and error. That means more tokens, more latency, and more opportunities for the agent to fumble.

What finally clicked for me was this: use LLM inference for decisions, not for repetition.

If a task has already been figured out once, I do not want the model burning context and tool calls to solve it again every time. I want the model to recognize the task, use a reliable tool, and move on.

That is the pattern I have been using with Hermes, and it has made my local agent setup much more useful.

The Setup I Am Running

Right now I am running Hermes on a DGX Spark. In the video, I show the machine with 128 GB of unified memory, and at that moment I had about 1 GB free because I had a quantized Qwen 3.5 model loaded locally.

Hermes is my current agent framework of choice. I have tried other options, but Hermes has been easy to install and easy to live with. One thing I especially like is that it supports Telegram through a gateway, so I can talk to my agent from my phone instead of only from a terminal window.

On the tool side, the ones that matter most for this workflow are:

Web search and scraping
Terminal access
File operations
Code execution
Sub-agent delegation

For web search, I am using Tavily. In the video, I mention the free tier gives about 1,000 requests per month, which is enough for experimentation but still limited enough that I notice when an agent wastes calls.

That matters, because this whole post is really about reducing unnecessary tool usage.

The Wasteful Version

I started with a normal prompt:

What is the weather going to be like in Sofia this weekend? Help me plan some activities based on it.

That is exactly the kind of thing I would send to an agent from Telegram while I am on the move.

Hermes did eventually answer, but watching the trace was the important part. It did a web search, then a web extract, then checked time and date, then stumbled a bit, then searched again after not getting what it wanted the first time. In the video, I call out the real cost: this simple request filled about 20k tokens of context.

And that is the issue.

The answer was fine. The process was not.

If I ask for weather and activity suggestions regularly, I do not want the model improvising a mini research project every single time.

The Better Version: Research Once, Automate Once, Reuse Forever

Instead of asking the agent the end question again, I switched to building a capability.

First, I asked Hermes to research free weather APIs that did not need a key and were easy to automate:

Research free and open APIs that give you weather forecasts. Look for APIs that do not need an API key and can be easily automated with a Python script. Do not write the script yet. Let me choose the API first.

Hermes went off, searched around, and came back with several options, including Open-Meteo, WeatherAPI, and met.no. It recommended Open-Meteo, and that was good enough for me.

So I moved to the next step and told it to build something concrete:

Let's use Open-Meteo. I want you to spawn an open code sub-agent and create a directory. Inside of that directory the sub-agent must implement an Open-Meteo API client wrapped by a CLI. Use Python. Make sure it uses real data. Use mocks only for unit tests. Report back when ready.

That “use real data, mocks only for unit tests” line is one I use a lot. If the agent can run against reality, it can verify its own work much better.

Hermes delegated the task to a coding-focused sub-agent, created the project, and implemented the CLI.

Then came the part that matters most.

Never Trust the Agent

When the agent said the project was complete, I did not just accept it.

I tested it with a real request:

Now let's test it with the real API. I want you to use the script to give me the weather forecast for Berlin.

This is the rule I keep coming back to: never trust your agent.

Read the code. Run the script. Verify the output. Make sure it is using the real API. Make sure it is not doing anything unexpected. Only after that should it move from “experiment” to “capability.”

In the demo, the script returned a 7-day Berlin forecast in about 0.4 seconds. That is the moment where the whole pattern becomes obvious. The slow, token-heavy part was discovering how to do the task. Once that is solved, the best move is to package it.

Turning a One-Off Script into a Permanent Skill

Once the weather CLI worked, I asked Hermes to wrap it as a reusable skill:

Now let's create a skill for you that wraps around this CLI script and uses it whenever I ask you about the weather in future sessions.

Hermes created the skill using its skill management flow, and that became part of its permanent skill set.

Then I started a completely new session.

That is the real test, because a fresh session has fresh context. No hidden memory from the earlier chat. No cheating.

I asked the same question again:

What is the weather going to be like in Sofia this weekend? Help me plan some activities based on it.

This time Hermes checked its skills, found the weather skill, executed the script, and gave me a clean answer with activity suggestions.

The difference was huge.

The first time, it burned through web searches, including two Tavily searches, and spent a lot of tokens figuring out how to answer. The second time, it reduced the whole thing to essentially one tool call to the script I had already verified.

That is the pattern in one line:

Explore once. Automate once. Wrap it as a skill. Reuse forever.

This Gets More Interesting Than Weather

Weather is a toy example, but it is useful because the waste is easy to see.

The more interesting example from the video is one I built from my phone over Telegram. I use my agent pretty often for stock-related questions, so I had it create a stock analyzer script that fetches stock or index data from an open API.

Same pattern:

Create a stock analyzer Python script that fetches stock or index data from a popular and open API. Use real data. Use mocks only for unit tests.

When it finished, I verified it with a real run for Microsoft one year back. The output included about 250 trading days of data, the latest price, some moving averages, technical indicators, and a short interpretation.

Then I turned that into a skill too.

In a brand-new session, I asked a vague question about the USO ETF. I did not mention the script. I did not explain the workflow again. Hermes picked the stock analyzer skill on its own and returned a useful summary with current data.

That is where this starts to feel less like chatting with a model and more like growing a personal assistant over time.

The Security Rule I Want to Keep

The core safety idea here is simple: verify before you automate.

If an agent writes a script, read it. If it claims something works, test it. If it needs access to real systems, expose only the operations you actually want it to perform.

For anything sensitive, I would keep the agent on the narrowest possible rails. In practice, that means preferring read-only capabilities where possible, using small purpose-built tools instead of broad access, and only promoting a workflow into a permanent skill after I have seen it behave correctly.

The more capable the agent gets, the more important this becomes.

The Bigger Picture

What I like most about this pattern is that it compounds.

Every time I notice a repeated agent task, I have a choice:

Keep paying for the agent to rediscover the solution.
Or turn the solution into a reusable capability.

Over time, that changes the shape of the whole setup.

I stop treating the model like a universal improviser and start treating it like a coordinator that knows when to call reliable tools. The model still provides the intelligence, but the repetitive parts move into code.

That opens the door to more domain-specific assistants too. A natural next step is something like a private realtor assistant that checks listings, pulls mortgage news, summarizes changes, and sends a Telegram update on a schedule. Same principle, just applied to a workflow that actually matters to someone day to day.

That is the part I find exciting. Not AI magic, but a steadily improving assistant that gets more useful because I keep teaching it durable skills.

If you want to see the full walkthrough, including the Hermes session, the weather skill build, and the Telegram-based stock example, watch the YouTube video here: