DEV Community

Cover image for How Much Does It Actually Cost to Run a Local LLM? (€ per Million Tokens, Measured)
Arsen Apostolov
Arsen Apostolov

Posted on

How Much Does It Actually Cost to Run a Local LLM? (€ per Million Tokens, Measured)

"It runs on my own GPU, so it's basically free." I believed that until I put a meter on it. So I ran a controlled benchmark on one box — an openSUSE machine with a single RTX 3090 — driving three local models through ollama under an identical fixed workload (256-token generations in a loop for ~4 minutes each), while my open-source dashboard priced every run by the real GPU energy it burned: power sampled from nvidia-smi every 10 s, integrated over each run's exact window, multiplied by my actual day/night tariff. One number per model, in euros per million output tokens.

Here's the part that made me re-run it. The tiny gemma3:1b came out at €0.118 / 1M tokens — about 5× cheaper than a hosted Flash-class API (~€0.55). But gemma3:27b's electricity alone was €0.706 / 1Mmore expensive per token than just paying the cloud, and that's before a single cent of the GPU's purchase price. "Local" didn't make it cheaper; it made it cost more and I own the depreciation. The mechanism is one line: each token costs watts ÷ throughput, and a big dense model is both slow and thirsty. A newer mid-size architecture (gemma4:26b) bought a lot of that back, landing at €0.272.

The full guide is methodology-first and reproducible end to end — minting an ingest key, the stdlib-only client, the exact ollama loop that reads eval_count/eval_duration for real tokens-per-second, reading each run back priced, and the honest caveats (this is marginal GPU energy only — not capex, idle, or cooling — and the absolute numbers round to fractions of a cent; the shape is the finding).

Read the full guide on Medium → https://medium.com/@arsen.apostolov/how-much-does-it-actually-cost-to-run-a-local-llm-per-million-tokens-measured-4a90a7f31a48

Top comments (0)