DEV Community

QuoLu
QuoLu

Posted on

A Journey into Token Optimization for My AI Assistant

I Messed Up

In my previous article, I wrote about giving an AI assistant memory and a personality to serve as my secretary. I was pumped, thinking, "I've got my own dedicated AI secretary, now it's time for full-scale operation."

I hit the weekly limit in three days.

Claude's MAX plan, with a 20x cap. That's a quota that usually lasts me until the weekend, even with heavy development. I burned through it in three days. I was surprised myself. When the screen popped up saying, "You've used up your limit for this week," I actually said, "Wait, already?"

The culprit was clear. The secretary was running 24/7, reading and making decisions on my emails, calendar, and Discord. Of course it would run out. When you have an AI handling the administrative work of one human being around the clock, there's no way it wouldn't burn through tokens.

But this was a problem. A big problem. So I frantically researched solutions. This article is the record of that journey.

Wandering into the World of Token Conservation

I searched for things like "Claude Code token savings" and landed on three main things:

  • ECC (Everything Claude Code)
  • RTK (Rust Token Killer)
  • Caveman

At first, I had no idea what these were just by looking at the names. But as I played with them and researched, I realized they all shared one common principle.

The Realized Common Principle

Formats designed for humans are full of waste for AI.

Thinking about it, it was obvious. Command outputs, file contents, error messages—everything is created to be "human-readable." Whitespace, borders, colors, helpful explanations, headers. To an AI, most of this is just noise, decorations that eat up tokens.

The essence of token conservation is stripping away human-oriented formats before handing them to the AI. And then compressing the output generated by the AI even further. These three tools solve this premise from different angles.

ECC — Things experts set up well

To be honest, I haven't grasped the full picture yet. But once installed and running, Claude's behavior is clearly more organized. It's packed with skills, hooks, and agent definitions, putting Claude on rails that say "act like this in this situation."

In my understanding, ECC is like a "collection of settings filled with expertise from pros." Even if you don't think for yourself, it follows best practices. From a token-saving perspective, it reduces unnecessary exploration and detours, which ultimately lowers consumption. It's the type of saving that comes from not doing unnecessary things rather than directly cutting output.

RTK — Simplifying command responses for AI

This one is straightforward. When Claude runs a command, it intercepts the output and trims it for AI consumption.

For example, the output of git status or ls usually passes through human-oriented decorations, but through RTK, it reaches Claude with the excess info stripped away. You just need to prefix the command with rtk. It's also great that it doesn't break things, as it simply passes through targets that don't have filters.

If you write "prefix all commands with rtk" in your global CLAUDE.md, Claude will do it automatically. It's low effort but high impact.

Caveman — Summarizing AI output (Skipped for now)

I haven't installed this one yet. While RTK trims the "input" side, my understanding is that Caveman trims the "output" side. It seems to be a direction for compressing Claude's own responses to be shorter.

The reason I didn't include it is simple: I don't want my secretary to sound like a robot. I put a lot of work into the personality and speech patterns of my secretary, so it would be a letdown if it suddenly started responding bluntly. I could have used it selectively—perhaps only enabling it for development sessions—but before I could dig into that, I decided, "I'll skip this for now."

This is just a matter of my personal use case; I think it's probably quite powerful for people who purely use it for development.

Things I might add next in my own development style

I've established a "foundation" for saving tokens with these three tools. From here, I'm thinking about additional ways to trim things based on my own habits. These are things I haven't tried yet, but plan to do next.

If it reports using variable or function names, it means nothing to me

My development style involves barely writing any code. All the naming is done by Claude. So, even if Claude tells me, "I fixed handleUserSubmit," it honestly doesn't register.

Conversely, this means that it's zero information for me when Claude cites variable or function names in its reports. Even if that information is helpful for a human, to a reader like me, it's closer to noise.

In that case, I should just have it explain things in words I understand, like "I fixed the processing when the submit button is pressed." If it reduces the number of citations of names, the report becomes shorter, and I understand it faster. Killing two birds with one stone.

Detailed where decisions are needed, light on the preceding explanations

I also realized that Claude's reports are quite long when explaining "what it did." But the part I most want to read is the final "So, what should we do?" section.

I want the parts requiring a decision—that is, the parts where things won't proceed unless I reply—to be written in detail. But the preceding parts—what files it read, what it checked, the sequence of events, etc.—are, frankly, not that necessary for making a decision. I'll ask if I need them later, so the default can be light.

These two points should work if I write them into the rule file and hand it over, so I plan to work on that next.

Summary

My biggest takeaway this time is that token conservation is, in short, not letting the AI read human-oriented formats.

ECC solves this by "not doing unnecessary things," RTK by "trimming inputs," and Caveman by "trimming outputs," each tackling the same principle from different angles. It seems better to choose what fits your needs rather than trying to put everything in. In my case, I skipped Caveman because I want to preserve my secretary's way of speaking.

And from here on, it's my turn to trim even more according to my own reading habits. Have it explain by meaning instead of by name, keep the intro light, and the decision parts detailed. I think I'll write another update once I've refined these points.

It was painful to hit the limit, but it served as an opportunity to face the theme of "designing information to hand to AI." Maybe it was a blessing in disguise.

Top comments (0)