Biricik Biricik

Posted on Apr 11 • Edited on May 16 • Originally published at zsky.ai

How We Self-Host an AI Image Platform on 7 RTX 5090s (Full Cost Breakdown)

#ai #gpu #infrastructure #hardware

Every time I post about ZSky AI's seven RTX 5090 workstation, I get the same two questions in the replies.

"Why didn't you just use AWS?"

"What does this actually cost?"

I have answered both in pieces across the last year, but never in a single post with every number in it. I am going to do that here. The full hardware bill of materials, the full monthly operating cost, the measured throughput, and the comparison to three different cloud configurations for an identical workload. If you are considering a self-hosted AI buildout in 2026, this is the post I wish had existed when I started.

Disclaimer up front: I run ZSky AI, so I am obviously biased toward the bare-metal answer. I will try to be honest about where cloud still wins, because there are real cases for it.

Why we self-hosted in the first place

The short version is that our workload is the exact shape cloud pricing punishes.

Our workload characteristics:

Always on. We serve users 24/7. The GPUs are either generating or warm-pooling for the next request.
Steady-state, not bursty. Usage dips at night but never approaches zero. A cloud "spot instance that spins up on demand" model does not apply.
High VRAM, short jobs. Each image generation is a few seconds. Each video generation is ~30 seconds. We do not need 80GB cards, but we do need a lot of them to keep the queue short.
Inference only. We are not training. We are serving. That changes the math significantly.

Cloud providers price GPUs on the assumption that you will use them for spiky, bursty workloads and leave them idle the rest of the time. When your workload actually is always-on, you pay the idle-tax on top of the usage-tax. A workstation you own eliminates both.

The hardware bill of materials

Here is the actual receipt, assembled and dialed in as of early 2026:

Component	Spec	Cost
GPUs	7x NVIDIA RTX 5090 (32GB VRAM each)	$14,000 × 7 = $98,000
CPU	32-core / 64-thread	~$2,500
RAM	256 GB DDR5 ECC	~$1,200
Motherboard	Dual-socket, 7x PCIe 5.0 x16 capable	~$2,800
Storage (primary)	4 TB NVMe Gen5 (model weights + hot cache)	~$500
Storage (warm)	8 TB NVMe (queue + output cache)	~$600
Storage (cold)	20 TB HDD array (logs, outputs)	~$700
PSU	2× 1600W redundant	~$1,200
Cooling	Custom loop + 3× 420mm radiators	~$3,500
Case	Open-air rack-mount chassis	~$500
Risers, cables, fans, monitoring	—	~$1,500
Total capex		~$113,000

Full disclosure: I got slightly better than list on the GPUs by buying early. Current retail pricing would put this closer to $115K. Call it $113K even.

That is the scary number. Now let me show you why it is not actually scary.

The operating cost

This is the part almost nobody accounts for correctly. Capex is the sticker price; opex is what actually determines whether this move was smart.

Power. The workstation draws ~2.8kW at full load, ~1.6kW at typical load. At my local commercial rate of ~$0.12/kWh:

Full load: 2.8 × 24 × 30 × 0.12 = ~$241/month
Typical load: 1.6 × 24 × 30 × 0.12 = ~$138/month

Call it ~$200/month for a realistic average, splitting the difference.

Cooling. Ambient cooling plus an in-room 18,000 BTU unit adds about $40/month to electricity. Total power+cooling: ~$240/month.

Internet. Business-grade 1 Gbps symmetric, static IP: ~$150/month.

Remote ops. Nothing. The machine is on a KVM in a closet I walk to. If you need colocation instead, budget $200-400/month.

Amortization. I amortize over 36 months because that is how long I expect the hardware to remain competitive for serving. At $113K / 36 = ~$3,140/month.

Total monthly opex including amortization: ~$3,530/month.

Or, if you exclude amortization and just look at "money out the door each month": ~$390/month. Month 37 onward, once the box is paid off, you are running an AI platform at four hundred bucks a month.

What it actually produces

This is the number nobody in the "just use AWS" camp wants to look at.

Our measured throughput on this single workstation, as of March 2026:

Image generation: ~7,000 images/hour sustained (all 7 GPUs, typical prompt, 1080p)
Video generation: ~80 videos/hour sustained (all 7 GPUs, 30-second clips, 1080p with audio)
Mixed workload (real traffic shape): enough to serve several thousand active daily users with sub-30-second median latency

In a month, that is roughly 5 million images or 57,000 videos, or any mix in between. Now divide the ~$390/month operating cost by 5 million images and you get a marginal cost per image of $0.000078. Less than a hundredth of a cent. Including amortization: ~$0.0007 per image. Still less than a tenth of a cent.

The cheapest commercial API for a comparable image generation costs about $0.003 per image. We are roughly 4x cheaper fully loaded, and 40x cheaper at marginal cost. At our volume, the workstation paid for itself in months.

The cloud comparison

Let me run the same workload through three cloud configurations.

Option A: On-demand H100s

An on-demand H100 instance runs about $3-5/hour at the big providers. One H100 roughly matches the throughput of two RTX 5090s at half-precision inference. So to match our workstation I would need about 4 H100s running 24/7.

4 × $4/hour × 24 × 30 = ~$11,520/month

That is 3x more than my fully-loaded cost including amortization and 30x more than my marginal cost. Running the same workload on on-demand H100s would mean I would need to charge ~15x more per generation just to break even.

Option B: Reserved H100s (1-year commit)

Reserving H100s for a year brings the price down significantly, but you are now locked into a 12-month bill regardless of usage.

4 × ~$2.40/hour × 24 × 30 = ~$6,912/month

Better, but still 2x my fully-loaded cost. And you have just traded the "I own it" capex risk for a "I committed to a bill" risk. It is not obvious that the cloud version is actually lower risk — it is just lower up-front cash.

Option C: Serverless GPU (pay per second)

Serverless GPUs look cheap on paper — some providers advertise $0.00015/second or similar. For a one-minute image generation at very low volume, this is clearly the right answer. For an always-on platform, the numbers turn ugly fast.

At our throughput of roughly 140,000 GPU-seconds per hour (all 7 cards effectively in use), 24/7:
140,000 × $0.00015 × 24 × 30 = ~$15,120/month

Serverless is the worst option for always-on inference. It is the best option for spiky batch jobs. Know which you are.

Where cloud actually wins

I promised I would be honest about this. There are real cases where cloud is the right call:

You are pre-traffic. If you have no users yet, spending $113K up front on hardware is the wrong move. Start on serverless, prove demand, then migrate when the unit economics flip.
You have a bursty workload. Batch rendering, training jobs, one-off experiments — cloud is perfect.
You are a small team without anyone who can debug hardware. When a card throws ECC errors at 2 AM, you want someone else to swap it. If you do not have that person, cloud is insurance.
You need geographical distribution. One workstation serves users well from one location. Global latency is a real problem cloud regions solve.
You are doing rapid experimentation on new model architectures. Hardware depreciates, and the right silicon for your workload in 2027 might be different. Cloud abstracts that risk.

If any of those apply to you, do not buy 7 GPUs.

The mistakes I made

I am going to save you from the ones that cost me real money.

Mistake 1: Underbuying PSUs. My first build used a single 1600W PSU. Under heavy load, 7 cards plus the CPU plus everything else ran the PSU into the red. I had random reboots for two weeks. Added a second redundant 1600W. Problem gone. Budget for 2x what you think you need.

Mistake 2: Stock cooling. The stock blower coolers on the 5090s work fine in a one- or two-card setup. In a 7-card workstation they cook each other. I moved to a custom water loop and temps dropped ~15°C. Not optional at this density.

Mistake 3: Undersized NVMe for models. I started with a single 2TB drive. Model weights for a serious image + video platform eat more space than you think, and the hot cache grows fast. I now have 4TB Gen5 just for hot weights plus 8TB for the queue and warm cache. Disk is cheap. Do not skimp.

Mistake 4: Not monitoring power at the outlet. I assumed my circuit could handle 2.8kW. It technically could, but it was on a shared breaker with the office. Every time the coffee machine kicked on, the UPS flipped. Put your GPU workstation on its own 20A dedicated circuit. I am not joking.

Mistake 5: Treating outputs as ephemeral. I deleted old generations to save disk. Big mistake — users asked for re-downloads, we could not serve them, and support load spiked. Now we keep outputs on the 20TB cold tier for 30 days. Storage is the cheapest part of the stack; stop being clever about it.

What would I change if I built this again

Honestly, not much. The biggest regret is that I didn't start with the water loop on day one. Second biggest regret: I should have bought the ECC RAM from the start instead of rolling non-ECC for the first six months and then being paranoid about memory corruption until I upgraded.

One thing I would not change: the decision to own the hardware. Every month the box runs past the amortization window is pure margin. Every month the cloud alternative runs, you are paying rent. Compounded, the difference is enormous.

The philosophical part

I built ZSky AI because I believe access to creative tools should not be gated by how much cloud credit you can afford. A free tier that works is my main product feature, and a free tier that works requires unit economics that only make sense on owned hardware. The math has to work at zero revenue per user, because that is what our free tier promises.

You can make that math work on bare metal. You cannot make it work on cloud. That is the whole reason this workstation exists.

If you are building something where the free tier is the product — which is an increasingly common pattern in AI — run the numbers I showed here with your own throughput assumptions and see where you land. You may be surprised how quickly the owned hardware payback kicks in.

DEV Community