I Built a Production GPU Energy Optimizer in One Day — From My Phone
Not from a MacBook. Not from a cloud VM. From my Android phone,
using Termux.
Here's what shipped by end of day:
- Real-time GPU energy dashboard
- DESYNC & GHOST power anomaly detection
- 17 cloud provider support
- Per-user API keys
- Time-series metrics scaling to 100+ GPUs
- 18/18 smoke tests passing
- 60-second Docker install
The Problem
GPU providers lie. Not intentionally — but telemetry desync is real.
Two failure modes kill your energy budget:
DESYNC — GPU drawing 420W but reporting 8% utilization.
You're paying full price for a GPU doing nothing useful.
GHOST power — GPU reporting 98% utilization at 40W draw.
Physically impossible. Your scheduler is making decisions on
fake data.
We found both in the wild across AWS and Vast.ai during testing.
The Solution
An open validation stack that:
- Detects DESYNC and GHOST anomalies automatically
- Works across 17 GPU cloud providers
- Evicts bad workloads via Kubernetes or Run:ai
- Alerts via Slack
- Stores time-series data for 100+ GPUs
What We Built
| Component | Status |
|---|---|
| CEI Formal Specification | ✅ |
| Grafana Dashboard | ✅ |
| GPU Agent Script | ✅ |
| Per-user API Keys | ✅ |
| Time-series DB | ✅ |
| 17-Provider Validator | ✅ |
| Smoke Test 18/18 | ✅ |
| Docker one-liner | ✅ |
Why Termux
No laptop. No cloud IDE. Just an Android phone with Termux.
This matters because it proves the stack is lightweight enough
to run anywhere. If it builds and runs on a phone, it runs on
any bare metal server, VPS, or edge node.
60-Second Install
bash
Install Docker (skip if already installed)
curl -fsSL https://get.docker.com | sh
Clone and run
git clone https://github.com/mikebains41-debug/ai-gpu-energy-optimizer-
cd ai-gpu-energy-optimizer-
docker-compose up
Top comments (0)