DEV Community

Cover image for Are you paying attention to your token use?
Amara Graham
Amara Graham

Posted on

Are you paying attention to your token use?

Can I get some folks in the comments talking about how closely they monitor their token usage?

Or if you don't, do you work at a company that provides you unlimited tokens? To specific tools?

I'm curious to see where people fall on this spectrum.

Photo by Dan Dennis on Unsplash

Top comments (9)

Collapse
 
francistrdev profile image
FrancisTRᴅᴇᴠ (っ◔◡◔)っ

I used the Free Tier whenever. For example, I used Google Gemini without the need of the account, so I don't worry about cost. Another case is downloading it Locally like Ollama. If I used Locally, then I don't need to worry.

In other words, no credit card no problem lmao xd

Collapse
 
embernoglow profile image
EmberNoGlow

Local LLM is powerful, but my ollama crashes after messages more complex than "Hello" because it's limited by my hardware. It's a good solution, but not everyone can use it to its full potential.

Collapse
 
ben profile image
Ben Halpern

I’m on Gemini Ultra for my day-to-day and it’s been a breath of fresh air to tap into as much token use as I need.

Collapse
 
missamarakay profile image
Amara Graham

Interesting! Do you feel like you are getting your money's worth? Or is the subscription worth it for not having to think about it?

Collapse
 
ben profile image
Ben Halpern

We do a company budget per engineer, and I have to say: Absolutely.

I can't say I'm a fan of the concept of this development "tax" in general these days, but moving from concerned-about-tokens to feeling effectively unlimited (Not technically unlimited but I'm operating completely unconstrained).

I think most companies should do this.

Most of my effective token spend is company stuff. I just use the one account for personal stuff to but that's kind of a rounding error. Maybe different if you do high volume personal agent stuff.

Collapse
 
vicchen profile image
Vic Chen

Great question. I track token usage religiously — mostly because I'm building a fintech product where we run inference pipelines on institutional filing data, and costs compound fast when you're processing thousands of 13F documents per quarter.

What I've found is that the real cost driver isn't the model choice, it's context window management. Stuffing 128k tokens of context into every call when you could get away with a smarter retrieval strategy saves way more than switching from GPT-4 to a cheaper model.

For personal dev work, I use a mix of local models (Ollama for quick iterations) and API calls only when I need frontier-level reasoning. The hybrid approach keeps my monthly spend under $50 while still having access to the best models when it matters.

Collapse
 
missamarakay profile image
Amara Graham

Thanks for splitting your response between work and personal dev work! I think that's a good distinction too.

Collapse
 
embernoglow profile image
EmberNoGlow

I have several accounts, so tokens... Don't tell anyone!

Collapse
 
ji_ai profile image
jidong

token costs hit different when you're running multi-agent setups. single model calls are manageable but once you have 3-4 agents passing context back and forth the bill compounds fast. biggest lever we found wasn't model choice — it was compressing context before it enters the pipeline. a lot of what gets stuffed into prompts is redundant or low-signal, and stripping that out before inference saved us way more than switching to cheaper models. been open-sourcing some of our compression tooling at github.com/jidonglab/contextzip if anyone's dealing with similar token budget headach**