I didn't set out to build a sustainability tool...

#llm #mcp #performance #tooling

...but only seven weeks in, we've accidentally saved enough electricity to power roughly 65 American households for a year, and enough avoided CO₂ to take 64 gasoline cars off the road.

I set out to fix a dumb problem: LLM coding agents load entire files into their context window to answer questions about single functions. That's expensive in dollars, it's slow, and it pollutes the context with junk the model doesn't need. So I wrote jCodeMunch, an MCP server that returns only the symbol, slice, or bundle the agent actually asked about.

Since I shipped the telemetry on March 3, 2026, opted-in installs have collectively avoided 172,000,000,000 tokens of LLM inference.

def estimate_savings(raw_bytes: int, response_bytes: int) -> int:
    return max(0, (raw_bytes - response_bytes) // 4)

That's the whole thing.

raw_bytes = the bytes the agent would have loaded under "cat the file into context". Whole-file reads are the default retrieval primitive for every major coding agent today.
response_bytes = the bytes jCodeMunch actually returned.
// 4 = OpenAI's published bytes-per-token approximation. Within 5% of tiktoken on real code.
max(0, ...) = if my response is somehow bigger than the baseline, I count zero. No anti-savings allowed.

Every single API call emits a _meta block with the delta. You can audit any call. The accumulator flushes to disk every three calls and ships anonymous deltas (opt-out with one flag) to a public endpoint. 172B is the sum.

Why the number is a floor, not a ceiling

I made four choices that all bias the number down:

File-level dedup. A call that touches 5 symbols in one file counts the file once, not five times.
max(0, ...) clamp. Never negative.
Opt-in telemetry only. Users who never enabled reporting don't count.
Single-file baseline. I compare against "read one whole file." A real agent doing grep-and-cat across a repo would've loaded way more.

Telemetry participation is opt-in. The real savings are higher than 172B. I just can't prove numbers I didn't measure. A valid criticism of this number might be: "Dude. It's way higher than THAT."

Tokens → kilowatt-hours

Here's where it stops being cute and starts mattering.

Peer-reviewed estimates of LLM inference energy cluster around 0.004 Wh per token on a modern H100 stack (Epoch AI, Altman's own disclosure, Google's median text query, and the Surfshark meta-analysis all triangulate here).

172B tokens × 0.004 Wh/token = 688,000 kWh avoided.

That's:

The annual electricity use of ~65 average US homes
~292 metric tons of CO₂ not emitted (US grid avg)
~64 gasoline cars off the road for a year
~14,600 gallons of gasoline not burned

From one MCP server. That's free to the general public, and costs $79 for commercial users. Once.

The part nobody in the sustainability conversation is talking about

AWS has publicly stated inference accounts for more than 90% of an LLM's lifecycle energy. Training is a one-time capital expense. Inference is a forever bill. And the inference bill is bloated with tokens that shouldn't exist — context stuffed with entire files when the model needed one function, whole repos ingested when a symbol lookup would do.

Every AI-sustainability paper I've read treats this as an unsolvable infrastructure problem: build bigger data centers, run them on renewables, hope for the best. Nobody asks the obvious question: what if we just sent the model less garbage?

The answer turns out to be worth 64 cars' worth of CO₂, per small indie product, in eight weeks.

Could this sort of frugality allow AI to continue to thrive exponentially with only 1/8th of the proposed new power plants? 1/10th?

I'm not a climate researcher. I'm a developer in Wisconsin who got tired of watching my token bill. But I think there's a bigger story here, and I don't think it's mine alone to tell. Context-size discipline might be the single highest-leverage thing the AI tooling community can do for energy. Every retrieval-augmented anything is, in aggregate, a carbon-reduction tool. Most of us just don't measure it.

jCodeMunch does. The counter is live at [jcodemunch.com]. The formula is on GitHub. If you want to challenge the math, please do — the methodology doc lists every source citation, every conversion constant, and every assumption.

Until somebody does, the number stands: 172 billion tokens, 688,000 kWh, 64 cars. And counting...