Can I get some folks in the comments talking about how closely they monitor their token usage?
Or if you don't, do you work at a company that provides you unlimited tokens? To specific tools?
I'm curious to see where people fall on this spectrum.
Photo by Dan Dennis on Unsplash
Top comments (9)
I used the Free Tier whenever. For example, I used Google Gemini without the need of the account, so I don't worry about cost. Another case is downloading it Locally like Ollama. If I used Locally, then I don't need to worry.
In other words, no credit card no problem lmao xd
Local LLM is powerful, but my ollama crashes after messages more complex than "Hello" because it's limited by my hardware. It's a good solution, but not everyone can use it to its full potential.
I’m on Gemini Ultra for my day-to-day and it’s been a breath of fresh air to tap into as much token use as I need.
Interesting! Do you feel like you are getting your money's worth? Or is the subscription worth it for not having to think about it?
We do a company budget per engineer, and I have to say: Absolutely.
I can't say I'm a fan of the concept of this development "tax" in general these days, but moving from concerned-about-tokens to feeling effectively unlimited (Not technically unlimited but I'm operating completely unconstrained).
I think most companies should do this.
Most of my effective token spend is company stuff. I just use the one account for personal stuff to but that's kind of a rounding error. Maybe different if you do high volume personal agent stuff.
Great question. I track token usage religiously — mostly because I'm building a fintech product where we run inference pipelines on institutional filing data, and costs compound fast when you're processing thousands of 13F documents per quarter.
What I've found is that the real cost driver isn't the model choice, it's context window management. Stuffing 128k tokens of context into every call when you could get away with a smarter retrieval strategy saves way more than switching from GPT-4 to a cheaper model.
For personal dev work, I use a mix of local models (Ollama for quick iterations) and API calls only when I need frontier-level reasoning. The hybrid approach keeps my monthly spend under $50 while still having access to the best models when it matters.
Thanks for splitting your response between work and personal dev work! I think that's a good distinction too.
I have several accounts, so tokens... Don't tell anyone!
token costs hit different when you're running multi-agent setups. single model calls are manageable but once you have 3-4 agents passing context back and forth the bill compounds fast. biggest lever we found wasn't model choice — it was compressing context before it enters the pipeline. a lot of what gets stuffed into prompts is redundant or low-signal, and stripping that out before inference saved us way more than switching to cheaper models. been open-sourcing some of our compression tooling at github.com/jidonglab/contextzip if anyone's dealing with similar token budget headach**