lilili

Posted on May 31

Stop Burning Tokens on Chat / Agent Loops — Here's What Actually Works

#ai #agents #webdev #llm

You’re Overpaying Every Day — You Just Can’t See It Think about the last time you asked an AI to clean up your meeting notes.

You probably opened a new chat, pasted in the transcript — maybe 1,500 words — then pasted your usual notes template on top of that, then said something like “format this, bold the action items.”

It worked. Useful, even.

But here’s what actually happened: the model just read ~3,000 words to produce ~300 words of output.

Do that five times a week. Every week. And now think about what’s riding along in that context every single time — your template, your formatting preferences, all the background you’ve already explained before. The model doesn’t remember any of it. It reads it fresh on every call.

Every repeat. Every charge.

This isn’t a flaw in ChatGPT. It’s the fundamental nature of chat as a paradigm.

2.
Chat Is Great — But It Has a Structural Bug
Chat is the most natural way to start with AI. Unclear what you want? Talk it out. Need to change direction? Just say so. The feedback loop is instant, the barrier is zero.

That’s why everyone starts there.

But chat has a structural problem: every single turn carries the entire history.

This is how context windows work — the “conversation history the model reads every single time.” Every API call packages up your full history and sends it to the model. You pay for every token the model reads. Ten rounds in, round ten doesn’t cost the price of one message. It costs the price of all ten, stacked.

Here’s a concrete version of this. Say you use AI to write your weekly status update. You paste in your bullet points from the week, say “turn this into a proper update,” tweak the tone, go back and forth a couple times. Feels efficient.

But those bullets, plus the AI’s draft, plus your follow-up messages, plus the format you’re implicitly re-explaining each time — the real token cost of one weekly update is probably 5 to 8x what you’d guess.

You’re paying for repeated context. The bill just isn’t obvious enough to feel.

Agents Are Smarter — and Much Harder to Budget

So if chat gets expensive, agents sound like the upgrade. Let the AI break down the task itself, call its own tools, decide its own next steps.

The demos are genuinely impressive. Hand it a goal, walk away, come back to a finished report.

Then you try to ship it.

Agents typically run on a loop: think about the next step → call a tool → observe the result → think again. This is called a ReAct loop — think → act → observe → repeat, each loop = one LLM call. The model controls how many loops it takes. You don’t.

Picture this: you’ve set up an agent to triage your inbox and draft replies to routine emails. Most days it’s smooth — it reads the email, matches a template, sends a reply, cost is stable.

Then one day someone sends you a vague message: “Hey, on that proposal from last week — we’re still thinking it over.” The agent isn’t sure which proposal. So it decides to scan your sent folder. Finds two candidates. Still not certain. Checks the thread. Then the attachment.

You didn’t ask it to do any of that. It didn’t tell you it was doing it. You just saw your bill at the end of the month and noticed that one day cost three times the usual.

The cost depends on how the model reasoned in that moment. You have no control over that.

That’s not a bug — it’s by design. Agents are built for exploration. When the path isn’t clear, they walk further. For research tasks, deep one-off problems, open-ended investigation: that’s exactly the right tool. But for daily recurring work where you need predictable costs and reliable output, that same autonomy becomes a surprise expense waiting to happen.

An agent is like a brilliant contractor who never writes down what they did or why. You trust the output. You can’t audit the process.

4.
Workflow: Only Use AI Where AI Is Actually Needed
At this point you might be thinking: okay, so workflow is just a bunch of if-else logic?

Not quite.

Workflow isn’t “avoid AI.” It’s “only use AI where it genuinely earns its place.”

Here’s an example you can map directly to your work.

Say you produce a weekly sales report: pull last week’s numbers, calculate week-over-week changes, identify top and bottom performers, write a natural-language summary, email it to your manager.

In ChatGPT, every Friday looks the same: paste the data, re-explain what you want, tweak the output, copy it into an email. You’re re-describing your own job to the model every single week.

In a workflow on MorphMind, the same task looks like this:

4 steps. 1 LLM call. Set it once. It runs every week on its own.

This is the core idea behind workflow: split the task into deterministic steps (rules handle it, no model needed) and LLM nodes (where language understanding actually matters), and only pay for the second category.

Step 3’s input is clean, structured numbers — no extra context, no history, no re-explanation. The token cost is minimal. The output is consistent. And Monday morning, the report is already in your manager’s inbox before they sit down.

You didn’t open ChatGPT. You didn’t paste anything. You didn’t say a word.

Three Paradigms. Three Jobs. Know Which Is Which. The question isn’t “which is best?” That’s the wrong frame. Each paradigm has a job.

Good AI practitioners don’t pick one and stay there. They know which tool fits which situation.

Notion AI polishing your doc is workflow logic: fixed input, fixed format, predictable cost. Asking it to “brainstorm a completely new direction” is chat logic. Both are right. Different jobs.

Why Serious Products Keep Landing on Workflow “Shippable” isn’t just “it runs.”

It means cost-controlled — you know what each execution costs, and there are no surprises.
It means reproducible — the same input produces reliably similar output.
It means debuggable — when something breaks, you know exactly which step broke, and you can fix just that step.

Chat and agents struggle with all three. You can’t explain to a stakeholder why costs vary 3x day to day. But you can draw a workflow diagram — here’s every step, here’s what it does, here’s how many times it runs per week, here’s the cost per run.

That’s not technical sophistication. That’s delivery discipline.

There’s also an engineering reason workflow wins in production: replay. Every step has logged inputs and outputs. When something goes wrong, you don’t restart the whole pipeline — you re-run the broken step with the same inputs and fix it in isolation. Chat and agents don’t give you that. Every conversation is a black box that starts over from scratch.

One Thing You Can Do Right Now If you’ve read this far, you probably fit one of these:

You’re a heavy ChatGPT user — handling a lot of recurring work through chat, and it mostly works, but you’ve never actually looked at what it’s costing you or how much repeated context you’re sending every single time. This article is describing your situation.

You’ve tried an agent tool — Manus, OpenAI Operator, Devin, something like that. The demos blew you away. But the reliability and cost unpredictability make you nervous about actually depending on it for anything critical. You want the capability without the chaos.

You’re burning tokens in your dev workflow — using AI for code, and you’ve noticed it goes back and forth more than you expected. Fixing one function turns into four or five rounds. The context balloons fast.

You’ve hit the ceiling on Zapier or Make — your automations work great until they don’t. Every time a step requires actual judgment — a non-standard email, a field that’s missing, a case that doesn’t fit the template — the workflow chokes and you’re back to doing it manually.

In every one of these cases, the underlying problem is the same: you haven’t found a way to use AI without letting it freelance on your dime.

That way exists.

Pick the repetitive task you do most often. Break it apart: which steps follow fixed rules? Which single step actually needs language understanding? Build that as a workflow. Set it once. Let it run.

That’s what MorphMind is built for. It lets you call the LLM only where you need it, while everything else runs automatically. Traditional automation tools like Zapier hit a wall the moment they need AI judgment — there’s no LLM node to slot in. Chat and agents have the intelligence but no structure and no cost control. MorphMind combines both. Every step’s inputs and outputs are logged, so if something breaks you can replay just that step — no starting over from scratch.

Free to try. No credit card required. Go in, find your most familiar recurring task, and build your first workflow.

You’ll do the same work. Spend a fraction of the tokens. And never have to think about it again.

👉 morphmind.ai

Top comments (1)

Harjot Singh • May 31

"Overpaying every day, you just can't see it" is the right frame because token waste is invisible by design, no error, no slowdown, just a slightly bigger bill you never trace back to the habit. The meeting-notes example is perfect: re-pasting a 1,500-word transcript plus a template into a fresh chat every time means you pay full price to re-establish context the system should already hold, and you do it dozens of times a week without noticing. The fixes that actually move the number are the unglamorous ones: stop re-sending context that doesn't change (the template should live once, not in every message), trim the transcript to what the task needs instead of dumping the whole thing, and for repeated tasks codify it so you're not re-typing the same setup. The mental shift is treating context as a metered resource you spend, not a free buffer you fill, the same way you'd never re-upload the same file on every API call. Most people's AI bill is mostly re-sent context, not new work. That make-the-invisible-cost-visible instinct is core to how I think about spend in Moonshift. Of the tactics you found, which cut the most, trimming the input, or moving the static parts out of the per-message payload?