Juan Torchia

Posted on Apr 17 • Edited on Apr 20 • Originally published at juanchi.dev

Do AI Tools Spend Your Credits Without Telling You Why?

#english #opinion #claudecode #arquitecturasoftware

When's the last time you checked exactly what tokens you paid for in a Claude Code or Cursor session? Not the monthly total — session by session, request by request. Yeah, me neither. And that carelessness cost me more than I want to admit.

A few weeks ago I started digging into Gas Town, a tool that markets itself as a wrapper over LLMs for specific workflows. The question floating around in some forums was blunt: is Gas Town "stealing" user credits to train or improve their own models? Spoiler: I found no evidence of theft. What I found was something more uncomfortable — an opacity so well-designed that the theft question becomes almost irrelevant.

AI Tools That Use Your Credits: The Real Problem Isn't Theft

When you use Claude Code, Cursor, Cline, or any wrapper over an LLM API, you're inside an abstraction chain with layers. Lots of layers. And at each layer there can be tokens you pay for but don't control.

The basic model works like this:

[Your prompt] → [Wrapper/Tool] → [Hidden system prompt] → [LLM API] → [Response]

The problem lives in that "hidden system prompt." Every tool injects additional context before sending your request. Context you don't see, didn't control, and yes — you pay for.

Concrete example: if Cursor injects 2,000 tokens of your codebase context plus 800 tokens of its own system prompt before sending your request, and you think you sent 300 tokens — you're paying almost 10x what you thought.

That's not theft. That's the product working as designed. But nobody explains that to you before you hand over your card.

How I Audited My Own Logs (And What I Found)

The Gas Town investigation scratched an itch I'd been ignoring. I went straight to the Anthropic Console and started filtering by date and model.

# If you have direct API access, you can audit with this
# First, get your monthly usage
curl https://api.anthropic.com/v1/usage \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01"

# The output gives you totals, but not the breakdown per tool
# For that you need to cross-reference timestamps with your actual sessions

First problem: Anthropic's API doesn't tell you which application made each request. Just timestamp, model, and tokens. If you use Claude Code and Cursor on the same day, you're reconstructing the map yourself.

// Script I put together to correlate usage with work sessions
// I logged timestamps for when I opened/closed each tool

const correlateUsageWithSessions = (usageLogs, workSessions) => {
  return usageLogs.map(log => {
    // Find which work session each request falls into
    const session = workSessions.find(session => 
      log.timestamp >= session.start && 
      log.timestamp <= session.end
    );

    return {
      ...log,
      // If it doesn't match any session, that's suspicious
      tool: session?.activeTool ?? 'UNKNOWN',
      suspicious: !session  // requests outside my active sessions
    };
  });
};

// What I found: 23% of my tokens for the month
// came from requests I couldn't correlate to any active session
// Background sync? Context refresh? Telemetry? I have no idea.

That 23% bugged me. I can't claim it's theft — it's probably legitimate tool processes (codebase indexing, context warming, etc.). But I also can't rule it out, because the documentation for these tools doesn't explain it with the granularity I need to make an informed decision.

When I built my local MCP server a few months back, one of the advantages I don't talk about enough is exactly this: you have total control over what gets sent and when. You can log every request before it goes out. Nothing moves without you seeing it.

The Most Common Mistakes When Assuming You "Used X Tokens"

Mistake 1: Confusing input tokens with what you actually typed

What you write is a fraction of the real input. The tool's system prompt, the project context, the conversation history — it all adds up. In a typical Cursor session on a medium-sized project, the codebase context can be 10–15k tokens before you type a single letter.

Mistake 2: Not accounting for "thinking" or planning requests

Some tools make multiple internal calls for a single visible result. Claude Code, for example, might fire an analysis request before the actual code request. You see one response — you paid for two.

Mistake 3: Assuming "no visible response = no cost"

If a tool does a context refresh in the background when you open a new file, that request exists even though you didn't ask for anything. You paid tokens for the tool to get ready for you. Legitimate, but opaque.

Mistake 4: Ignoring auto-retried network errors

If a request fails and the tool automatically retries, you paid for the failed attempt too. LLMs charge for tokens sent, not for successful responses.

This level of granularity is the same thing I had to learn when optimizing Docker images — the gap between what you think is happening and what's actually happening at each layer is exactly where the problem lives.

Mistake 5: Blaming the tool for everything instead of the LLM

Here's the honest counterpoint. A lot of the time, high costs aren't malicious opacity from the tool — it's that the LLM genuinely needs context to work well, and you don't want reduced context because quality tanks. The same thing applies when you're designing agents: the problem isn't always the tool, sometimes you're just asking for more than you need.

The Gas Town Case Specifically

Back to where this started: does Gas Town steal credits? The specific accusation was that the tool was sending additional requests without disclosure to "improve their models."

What I found:

Their terms of service are vague on data usage. They say they can use "usage data" to improve the service, but they don't define whether that includes the content of requests or just metadata.
I found no technical evidence of unsolicited requests in the traffic analyses I saw in specialized forums. What I did find were telemetry requests — usage metadata, not content.
The opacity of system prompts is real. Gas Town doesn't publish their system prompt. Which means you don't know exactly what they inject before your request.

My conclusion: they probably don't steal credits directly. But vague terms plus hidden system prompts plus non-opt-out telemetry create an ecosystem where trust is an act of faith, not an informed decision.

And that bothers me more than the hypothetical theft. Theft you can prove and fight. Well-designed opacity is just... the state of the art in modern SaaS.

The same way Claude Code routines make you more productive but pull you further from understanding what's happening underneath — convenience always costs you visibility.

FAQ: AI Tools and Control Over Your Credits

How do I know exactly how many tokens I'm spending with Claude Code or Cursor?

Short answer: you can't know with total precision without instrumenting it yourself. Anthropic Console gives you the monthly total by model, but not the breakdown by tool or session. To audit properly, you need to manually cross-reference timestamps in your usage log with your work sessions, or intercept traffic with a local proxy that logs every request before it goes out.

Is it legal for a tool to inject tokens into my requests without telling me?

Yes, completely legal. When you accept the terms of service for Cursor, Claude Code, or any wrapper, you're implicitly accepting that the tool can add context to your requests. This is technically necessary for them to function. The issue isn't legality — it's the lack of transparency about how much and why.

How do I audit whether a tool is making requests in the background?

Use a network proxy like Charles Proxy or mitmproxy to intercept HTTPS traffic from your machine. Filter by LLM API domains (api.anthropic.com, api.openai.com, etc.) and watch what requests go out when you're not actively typing. If there are background requests, you'll see them there. You can also check network logs in DevTools if the tool is web-based.

Should I worry about this if I don't have direct API access and use subscription plans?

If you're using Claude.ai on a monthly subscription or Cursor on their own plan, the model is different — you're not paying per token, you're paying a flat rate. There the opacity of usage matters less economically, though it's still relevant for privacy (what context you're sending to the tool's servers). The variable token cost problem applies mainly when the tool is using your own API key.

Which tools are most transparent about token usage?

In my experience, open source tools you can run locally (like some agent implementations with LangChain, or your own MCP server) are the most transparent because you can see the code that builds the prompts. Among commercial tools, the ones that publish their system prompts or have a debug mode that shows the full request before sending it. Cline has a mode that shows you the complete context window — that's the bare minimum every tool should offer.

Is the extra token cost worth it for the convenience of these tools?

Generally yes, if you're using them for the right things. The problem isn't the extra cost itself — it's not knowing how much "extra" is. If Cursor injects 5k tokens of context and that makes the response better, those tokens are worth it. If it injects 5k tokens of boilerplate that improves nothing, that's waste. Without visibility, you can't make that evaluation. My recommendation: spend an hour auditing your real usage before renewing any paid tool. It's like the pneumatic display that uses compressed air instead of pixels — sometimes the abstraction is elegant, but when it breaks, you need to understand what's underneath.

What I'd Do Differently (And What I'd Ask From These Tools)

I'm not going to tell you to stop using Claude Code or Cursor. I use both every day. But I did change some habits after this investigation:

Check usage logs once a week, not once a month when the invoice shows up.
Keep a test project with a separate API key to try new tools without polluting my main metrics.
Always ask for a debug mode before adopting any new wrapper. If they don't have a mode that shows the full request, that's a red flag.

What I'd ask from the ecosystem: make the minimum transparency standard showing the real token count (input + injected context) before confirming each request. Not after. Before. With a breakdown of what's yours and what's the tool's.

It's not hard to implement. It's a product decision. And the fact that almost no tool does it says something about what incentives actually drive them.

The question of whether Gas Town steals credits is interesting. The question of why we accept not knowing exactly what we're paying for is more important. Thirty years ago, when I was diagnosing connection drops at a cyber café at 11pm, I learned that the first step to solving any problem is knowing exactly what's happening on the network. That principle hasn't changed. The abstraction layers multiplied — our tolerance for opacity did too, and that's a problem we chose to have.