"You should just use AWS." — every VC I pitched in 2025.
I didn't. I bought seven RTX 5090s instead. This post is the math, the tradeoffs, and the reason it makes sense for ZSky AI — a free AI image and video generator I built after a traumatic brain injury took away my ability to visualize.
If you're a developer thinking about self-hosting a GPU workload in 2026, here's everything I wish someone had written down for me.
Why I couldn't rent
Quick backstory because it matters for the architecture: I have aphantasia — no mind's-eye imagery — plus a TBI from a bad accident. Photography was the first thing that gave my inner world back to me, because the camera did the "seeing" for me. I could point it at something beautiful and have proof that beauty existed, even when I couldn't picture it.
ZSky is the tool I wish I'd had in the hospital. A prompt box that turns a sentence into an image or a short video, free, no credit card, no watermark on the first generations. Every person on the internet deserves a camera for their imagination — especially the ones whose imagination got broken.
Free tier is the whole thesis. And free tier is exactly what kills you on AWS.
The AWS math that didn't work
Let's start with the rental version of this business, because I actually ran the numbers on a spreadsheet and then ran them again because I didn't believe them.
A100 80GB on-demand (us-west-2): ~$3.06/hr
H100 80GB on-demand: ~$4.00–$12.00/hr depending on reserved vs. on-demand
p4d.24xlarge (8x A100): ~$32.77/hr on-demand
My service needs to generate images in under 8 seconds and short videos in under 90 seconds. Realistic capacity per GPU, at the quality bar I'd ship: roughly 400 image generations per hour, or ~40 video generations per hour.
Now plug in the traffic. ZSky currently sees 26,000 users in 4 months and 3,000+ new creators joining daily. On a normal day that's maybe 35,000 image generations and 4,000 video generations. On a launch-traffic day it's 3–5x that.
At those numbers, on AWS on-demand:
35,000 images / 400 per hour = 87.5 GPU-hours/day
4,000 videos / 40 per hour = 100 GPU-hours/day
-----------------------------------------------------
~188 GPU-hours/day
188 hr * $3.06 = $575/day (A100 on-demand)
188 hr * ~$8.00 = $1,504/day (H100 on-demand)
That's $17,250–$45,120 per month in raw GPU spend. Before storage. Before bandwidth. Before any of the managed-inference markups (SageMaker adds ~25%). Before spiky launch days.
And — critical point — free users generate exactly as much load as paid users. The whole "freemium subsidizes itself" model assumes your marginal cost per request is zero-ish. On rented GPUs, it's ~$0.03 per image and ~$0.20 per video. Every free generation is a direct withdrawal from your bank account.
So the freemium math on AWS boils down to: ration the free tier, add a watermark, pop up "upgrade" modals, cap at 10/day. The exact thing I hated about every other tool. I built ZSky to escape that. I wasn't about to rebuild it with a paywall nailed to the front door.
The self-hosted math that did
Here's what 7x RTX 5090 looks like on a spreadsheet. I paid MSRP where I could and a bit above during the shortage for two of them. Real numbers, receipts in a folder.
7x RTX 5090 (32GB, ~1,800 TFLOPS fp8) ~$14,700
EPYC 7773X, 512GB DDR4, dual PSU chassis ~$6,200
2x 4TB NVMe (models + scratch) ~$900
Noctua cooling, custom airflow ~$400
UPS + surge ~$800
10GbE networking ~$350
------------------------------------------------
Total capex ~$23,350
Electricity: Each RTX 5090 draws ~575W under full load, ~30W idle. Call it an average 400W per card under realistic duty cycle. Plus CPU, RAM, fans: ~200W. Total system average: ~3kW.
3 kW * 24 hr * 30 days = 2,160 kWh/month
2,160 kWh * $0.14/kWh = ~$302/month
(I'm on a residential commercial-adjacent rate. Your mileage will vary. Texas or Washington State you pay half that, California you pay double.)
Cooling: Winter I open a window. Summer I run a portable 12,000 BTU AC for ~$45/month of extra power. Honest.
Amortization: GPUs are realistically good for 4 years before resale value tanks. $23,350 / 48 months = $487/month capex amortization. Add electricity and cooling: ~$834/month all-in.
Versus $17,250–$45,120 on AWS.
Self-hosted is 20–55x cheaper at my load. That's not a rounding error. That's the difference between "ZSky exists as a free tool" and "ZSky doesn't exist."
Capacity check: does it actually handle the load?
Math is pretty but hardware either performs or it doesn't. Here's what 7x RTX 5090 actually does for ZSky:
- Images: ~2,800 generations/hour across the fleet at my target quality. That's 67,200/day theoretical. My daily peak is ~35,000. ~50% headroom before I have to do anything clever.
- Video: ~280 generations/hour across the fleet. 6,720/day theoretical. My daily peak is ~4,000. ~40% headroom.
When traffic spikes past that — and it does, when an article lands or a creator posts — the queue just holds. Users see "your generation will start in ~45 seconds" instead of a 503. Which, honestly, is a better UX than most rented-GPU services give me on their own dashboards.
The honest tradeoffs
I'm not going to pretend self-hosted is free lunch. It isn't. Here's what I pay for it.
1. No auto-scale. If ZSky goes viral and daily load 10x's overnight, I have three options: queue harder, buy two more GPUs and wait 3–5 days for shipping, or spill overflow to a rented cluster. I've tested all three. Spill works but reintroduces per-request cost on the marginal users — acceptable as a spike absorber, not as steady state.
2. Hardware risk. I had one 5090 fail at 6 weeks. RMA took 11 days. During that time I ran 6x instead of 7x and throttled video generations. Nobody noticed. But they would have noticed a 70% failure. This is why the rack has dual PSUs and why I keep a cold spare 5090 in the closet now — a $2,100 insurance policy.
3. Maintenance time. Driver updates. CUDA pinning. Kernel upgrades that break the NVIDIA module. Network cards dropping at 2am. I probably spend 3–5 hours per week on infra that AWS would abstract. That's real. For a solo founder that's non-trivial. For a team of 3+ it's noise.
4. Physical failure domains. My ISP goes down, ZSky goes down. I mitigated with Cloudflare Tunnels in front — if the last mile dies, Cloudflare caches most of the static surface and the generation API fails gracefully to a retry banner. But "the whole internet" is genuinely more reliable than "my apartment's internet," and any honest self-hosted post should say so.
5. Power. 3kW continuous on residential wiring is the edge of sane. I'm on a 20-amp circuit dedicated to the rack. I had to rewire. If you're renting, your landlord will not love this conversation.
Why it works for a free tier specifically
This is the part most self-hosting posts miss. The economics aren't just "cheaper per hour." They're structurally different in a way that changes product design.
On AWS, every free generation is a variable cost. So the incentive is to ration: daily limits, watermarks, lower quality for free users, aggressive upsell, captchas to slow down bots. Every one of those decisions erodes trust.
On self-hosted, electricity is a fixed cost. The 3kW is running whether I generate 1 image or 10,000. The marginal cost of the 10,001st image is effectively zero. So the incentive flips: I want the cluster pegged. I want every free user to generate as much as they can. Idle GPUs are wasted money.
This is why ZSky has no daily limits on free image generation, no watermark on most formats, and no "upgrade to unlock quality" modal. Not because I'm generous. Because my fixed-cost architecture makes it the profit-maximizing move. The people who love the free tier tell their friends. Their friends are 3,000+ new creators a day right now.
Incentives matter. Build the architecture that makes the right thing the cheap thing.
What I'd do differently
If I were starting over in April 2026:
- Buy 8 not 7. N+1 redundancy matters more than I thought. One down out of 8 is 12.5% capacity loss. One down out of 7 is 14.3%. Feels the same but psychologically the "one spare" mental model is worth the extra $2,100.
- Skip the dual-PSU chassis. Two separate machines, 4 GPUs each, is more resilient than one 8-GPU rack. One PSU fault kills the rack. One motherboard fault kills the rack. I'd split the failure domain.
- Get a commercial electric rate sooner. My electricity bill is 40% of opex. A 20% rate improvement is $60/month forever. Worth the paperwork.
- Document the driver pinning from day one. I lost a weekend to a CUDA version bump. Pin everything. Use containers. Future-me will thank present-me.
- Don't undersell cooling. Summer is coming. Plan for 95°F ambient in July and make sure the rack can hold temps.
The part that isn't math
I didn't buy 7 GPUs because I ran a spreadsheet and the spreadsheet told me to. I ran the spreadsheet because I'd already decided ZSky had to be free, and I was looking for a configuration that would let me keep that promise without going broke.
There's a kid somewhere right now with a brain injury, or aphantasia, or just no money, who wants to make something beautiful and can't. The only reason I get to hand them a tool that works is that I own the silicon. Every time a VC tells me to "just use AWS and raise a Series A to burn on compute," I think about that kid and I keep the rack running.
Self-hosting isn't a purity play. It's the only architecture where my incentives and my users' interests are the same.
If you're building something that needs to be free to matter, run the numbers on hardware before you run the numbers on rented GPUs. The answer might surprise you.
Cemhan Biricik, Founder, ZSky AI. ZSky is a free AI image and video generator running on 7x RTX 5090s in the United States. 26,000 creators in 4 months, 3,000+ joining daily. No credit card, no watermark, no rationing. Try it at zsky.ai.
Top comments (0)