kusunoki

Posted on Apr 8 • Edited on Apr 11

I Replaced $150/Month of SaaS With a $24 VPS and a Weekend — Building Your Private AI Infrastructure [1/5]

#ai #security #cloud #selfhosted

uploads.s3.amazonaws.com/uploads/articles/2fwd8amj0ejx1yn4ppmd.png)---

series: Building Your Private AI Infrastructure

Free series. No DevOps background required. All open-source. Total cost: ~$15–50/month depending on team size.

There is a question that every engineer eventually asks, and then spends years not answering. You look at the SaaS invoices — $20 here for AI, $20 there for another AI, $18 for cloud storage, $15 for monitoring, $12 for a VPN you don't fully trust — and you think: I could build this. I could own this. I could run this on a single machine I control, behind a security architecture I designed, for a fraction of what I'm paying strangers to hold my data. You have the skills. You have had the thought. What you may not have had is the weekend to research every moving piece, make every configuration decision, and test each integration until it actually works. This guide is that weekend.

This is also the guide for a transition that is no longer optional. The era of trusting your data, your clients' data, and your business operations to SaaS vendors whose terms change without notice, whose security you cannot audit, and whose pricing you cannot negotiate — that era is ending. Not because self-hosting became fashionable. Because the threat landscape changed. The FBI's IC3 documented $12.5 billion in cybercrime losses in 2023, with VPN compromise and stolen credentials as the primary vectors for ransomware targeting small businesses. The perimeter model that every SaaS VPN relies on is structurally broken. Zero-trust is not an upgrade. It is the replacement. And for the first time in the history of computing, you can build it yourself, on commodity hardware, with open-source software, for less than the cost of a single SaaS subscription.

If you follow Parts 1 through 4 in order, you will finish with a production-grade, zero-trust, self-hosted AI environment running on a single $24 VPS — with eight security layers, four AI providers unified behind one portal, agentic automation, collaborative document editing, remote desktop access, live monitoring with alerting, and triple-redundant encrypted backups. Every command tested. Every decision explained. Every mistake I made flagged so you don't have to. The entire series is free.

Who This Is For

If you are a developer who has ever thought about building something like this for a non-technical client — a lawyer, an accountant, a small agency, a consultant — this guide is the deliverable you can hand them, fully documented, with every operational procedure written for someone who has never opened a terminal.

If you are a technical founder or indie hacker who wants enterprise-grade infrastructure without an enterprise budget or an enterprise IT team, this is the blueprint.

If you are a small business owner who is comfortable in a terminal but does not live there, this guide is written to be followed sequentially — every command, every configuration file, every verification step — without requiring you to fill in gaps from Stack Overflow.

The stack: Ubuntu 24.04 LTS on Vultr, Cloudflare Zero Trust (free tier), Nextcloud, Collabora Online, a unified four-AI API proxy (ChatGPT + Claude + Gemini + Perplexity), OpenClaw for agentic automation, Apache Guacamole, Prometheus + Grafana + Alertmanager, and AES-256 encrypted backup to Supabase.

Let's go through what you're actually building before we touch a single command.

What the Stack Does

This is not an overview designed to impress you and then disappoint you in the implementation. Everything listed here is fully built out in Parts 2 through 4, with every command and configuration file provided.

Your Private Cloud

Nextcloud replaces Google Drive and Dropbox. Your files live on your server. They sync across devices. They have version history and a complete audit trail. No third party can access, index, or monetize your data. Nextcloud has over 4,000 contributors and is deployed by governments, universities, and enterprises across more than 50 countries. It is not a weekend experiment. It is production infrastructure used by organizations that cannot afford data leaks.

Four AI Assistants Through One Secure Interface

A self-hosted reverse proxy routes requests to ChatGPT, Claude, Gemini, and Perplexity through a single authenticated interface at ai.yourdomain.com. API keys are stored in an isolated environment file on the server and never reach the browser.

The practical difference between the four matters in daily use. Claude handles nuanced writing, contractual analysis, and tasks requiring sustained register awareness. Perplexity provides source-cited real-time research. ChatGPT covers broad reasoning and coding tasks. Gemini integrates with Google Workspace data. One portal. Four models. Your credentials, secured on your server.

An Agentic AI Butler

OpenClaw is an open-source agentic AI framework that executes multi-step instructions autonomously — it does not answer and wait, it acts and completes. "Organize last quarter's client invoices by company name and move unpaid items to a dedicated folder" is not a prompt. It is a job order. When you return from a meeting, it will be done.

(There is a security section later in this article specifically about OpenClaw's disclosed vulnerabilities and why the architecture here neutralizes them structurally. If the headlines made you hesitate, read that section before deciding.)

Real-Time Collaborative Document Editing

Collabora Online is a self-hosted office suite — browser-based, simultaneous editing of Word and Excel files, functionally equivalent to Google Docs. One version. Always.

Browser-Based Remote Desktop

Apache Guacamole provides HTML5 remote desktop access to any designated machine through a standard browser — no client software required on the connecting device. RDP, VNC, and SSH sessions are proxied through Guacamole, authenticated through Cloudflare Access, and rendered in the browser.

Automated Monitoring and Alerting

Prometheus scrapes metrics — CPU, memory, disk, network, application health — and Grafana renders them in a live dashboard. Email alerts fire when any threshold is breached. You know about problems before your users do.

Triple-Redundant Encrypted Backups

Daily database dumps to Supabase. Weekly AES-256 encrypted full-system backups. Monthly off-site copies to local storage. The system backs itself up on three independent schedules without your intervention.

Custom Domain Email

yourname@yourdomain.com, configured with DKIM, SPF, and DMARC authentication records, forwarded through Cloudflare Email Routing to your existing inbox. AI-assisted drafting. Phishing resistance from proper authentication.

The Real Costs

I am going to give you every number, because the first thing that breaks trust in a guide like this is discovering a hidden cost in Part 3.

Vultr VPS

Vultr is a global infrastructure provider with data centers across North America, Europe, and Asia-Pacific. The server you provision is a Cloud Compute instance — dedicated vCPU, dedicated RAM, dedicated SSD, isolated from other tenants.

Plan	vCPU	RAM	SSD	$/mo	Use case
Starter	1	2 GB	55 GB	~$12	Single user / evaluation
Recommended	2	4 GB	80 GB	~$24	3–8 users, full stack (used in this guide)
Growth	4	8 GB	160 GB	~$48	8–30 users, heavier workloads

Domain

$10–$15/year through any registrar. Cloudflare Registrar is used in this guide because it unifies DNS and Zero Trust management in one dashboard.

Software

Zero. Nextcloud, Collabora, Guacamole, OpenClaw, Prometheus, Grafana, fail2ban, Certbot — all open-source, all actively maintained, all free.

Cloudflare Zero Trust

Zero for up to 50 users. WAF, DDoS mitigation, Access authentication, and Tunnel — enterprise-grade capabilities that would cost tens of thousands per year commercially.

AI API Costs — The Honest Part

AI providers charge per token (roughly per three-quarters of a word). There is no monthly flat fee. There is no automatic spending cap. If you build this stack and forget to set limits, you will receive a bill that reflects exactly what you used.

You will set hard monthly caps on every provider's dashboard before this stack handles a single production request. With a $20 cap per provider ($80 total), realistic usage:

Light (5–15 interactions/day): $3–$8/mo. Moderate (20–40/day): $10–$20/mo. Heavy (50+/day with automated workflows): $25–$50/mo.

Total Monthly Cost

Configuration	Monthly
Sole proprietor / Starter / Light AI	~$18–$28
Small team / Recommended / Moderate AI	~$35–$45
Growing business / Growth plan	~$64–$79
Equivalent SaaS stack (per user)	$131–$175

Your stack replaces the entire SaaS list at a fraction of the cost, and the data stays on hardware you control.

The Security Architecture

I want to explain the zero-trust architecture in detail — both because it is the most important design decision in the stack and because "zero-trust" has been co-opted by marketing to the point where the term alone communicates nothing. Here is what it actually means in this implementation.

A VPN is a perimeter model. Once you are authenticated to the tunnel, you are inside. Every resource inside is reachable. One stolen credential opens the entire castle. The FBI's IC3 documented over $12.5 billion in cybercrime losses in 2023, with VPN vulnerabilities and credential theft as the most common initial access vectors for ransomware targeting SMBs.

Zero-trust abandons the perimeter concept. Nothing inside or outside the network is trusted by default. Every access request is evaluated independently, every time.

This stack implements zero-trust through eight independent layers, each of which would need to be defeated separately for an attacker to reach any sensitive data:

Layer 1: Cloudflare WAF + DDoS mitigation: All inbound traffic filtered at Cloudflare's edge before reaching your server's IP.

Layer 2: Cloudflare Access (email OTP): Every session requires a fresh six-digit code to a verified email address.

Layer 3: Vultr hardware firewall: Permits inbound only on ports 80 and 443. Enforced before Ubuntu boots.

Layer 4: UFW software firewall: Independent OS-level enforcement of the same port restrictions.

Layer 5: fail2ban: Continuous auth log monitoring. Auto-blocks IPs after repeated failures.

Layer 6: SSH via Cloudflare Tunnel (ED25519): No public SSH port. Administrative access routed through authenticated tunnel only.

Layer 7: Application-level auth: Each service maintains independent credentials. Compromising one grants no access to others.

Layer 8: systemd sandboxing + env isolation: Each service runs as a dedicated low-privilege user. API keys exist only in process memory. No external interface can reach them.

A traditional VPN is one wall. This is eight independent checkpoints. The architecture is consistent with NIST SP 800-207, Zero Trust Architecture, translated to the budget and operational reality of an SMB.

OpenClaw Security: CVE-2026-25253 and the ClawJacked Class

This section exists because the disclosure was real, the coverage was extensive, and anyone who read it has a legitimate question about why this guide uses OpenClaw at all. I want to answer that question with the precision it deserves, because the answer matters not just for OpenClaw but for every agentic AI framework that will follow it.

CVE-2026-25253 (CVSS 8.8, High severity) allowed an attacker who could induce an authenticated user to visit a crafted URL to hijack an active OpenClaw session through its WebSocket connection, steal the auth token, and execute arbitrary commands on the host. A second class of vulnerabilities — ClawJacked — included command injection and SSRF flaws in the image processing pipeline. China's Ministry of Industry and Information Technology issued a formal advisory. SecurityScorecard identified over 42,000 internet-exposed instances, of which approximately 15,200 (~36%) were immediately exploitable.

All of that is accurate. Here is what the coverage did not distinguish.

Every one of those 42,000 vulnerable instances had OpenClaw reachable from the public internet. In this architecture, it is not.

The coverage created a narrative: OpenClaw is dangerous. That narrative is incomplete. OpenClaw exposed directly to the internet without authentication is dangerous — exactly as dangerous as any powerful tool operated without safeguards. The question is not whether the tool has risk. Every tool with real capability has risk. The question is whether the architecture contains that risk. This one does.

Security researchers at Snyk noted that even localhost-bound instances are technically reachable via CVE-2026-25253 because the victim's browser acts as the pivot — the WebSocket originates from the browser, not an external attacker. This is correct. The protection in this architecture does not rest on the localhost binding alone. It rests on what sits in front of it.

OpenClaw is installed on your Vultr server, bound to 127.0.0.1, behind Cloudflare Access with mandatory email OTP. There is no public URL at which an unauthenticated visitor can reach OpenClaw's interface. Before any request arrives, it must pass through Cloudflare Access — verified email + valid time-limited code. An attacker without your email credentials cannot start the chain. An attacker who cannot start the chain cannot reach the WebSocket. An attacker who cannot reach the WebSocket cannot execute CVE-2026-25253.

The ClawJacked SSRF vulnerability is addressed by the systemd sandboxing configuration in Part 3. OpenClaw runs as a dedicated user with no sudo, write permissions restricted to its working directory, and network egress constrained to specific API endpoints.

CVE-2026-25253 was patched in OpenClaw 2026.1.29. This guide specifies 2026.1.29 or later throughout. The patch addresses the root vulnerability. The architecture addresses the class of vulnerabilities — public exposure of agentic AI interfaces — regardless of whether future disclosures emerge in the same codebase.

The 42,000 exposed instances that made headlines were not running zero-trust architectures. They were running OpenClaw the way most people run new software: directly, on a machine with a public IP, without hardening. This guide exists specifically to prevent that.

What a Day Actually Looks Like Running This

This is the part of technical writing that usually gets skipped, which is a mistake, because the gap between "architecturally sound" and "actually useful in daily practice" is where most self-hosted projects die.

Early morning. Coffee shop. Wi-Fi you do not control or trust.

You activate Cloudflare WARP. Every byte leaving your device is encrypted in a tunnel the coffee shop's network cannot inspect. You open Nextcloud — a colleague updated a proposal at 11 PM. You open it in Collabora and begin editing simultaneously. One document. One version.

Three emails need replies. You type one instruction to the AI portal: draft professional responses — first declining a meeting, second confirming a timeline, third following up on an overdue invoice. Thirty seconds later, three drafts. You change four words in the third. Six minutes total.

A client calls needing a file from your office workstation. You open Guacamole, authenticate, and your office desktop materializes in the browser. You locate the file, send it, close the connection. Your client assumes you're at your desk.

End of day. Grafana dashboard: CPU normal, memory stable, disk 34%, two blocked IPs from fail2ban. Backup completed at 3 AM. Every indicator green.

That is what good infrastructure looks like in operation: invisible, quiet, and below the threshold of daily attention.

Your Own Machine Is Not Involved

Everything you build lives on the remote VPS. Your laptop is the browser window through which you reach it. It is not part of the structure.

If you make a configuration error, log into Vultr, click Reinstall, and the server returns to a clean Ubuntu state in under two minutes. Nothing on your laptop is affected. Nothing was ever at risk.

The only local prerequisite is a browser.

The Series

This is Part 1 of five. Each part is free, ungated, and complete.

Part	What It Covers
Part 1 — Architecture Overview (you are here)	Stack, costs, security model, daily usage
Part 2 — Zero-Trust Server	Vultr, Ubuntu, Cloudflare Tunnel, UFW, fail2ban, SSH
Part 3 — The Intelligence Layer	Docker, Nextcloud, Collabora, AI proxy, OpenClaw, CalDAV, backups
Part 4 — Operations & Monitoring	Guacamole, Prometheus, Grafana, Alertmanager, encrypted backups
Part 5 — The Operations Manual	Maintenance, security audits, cost optimization, troubleshooting, runbook

All five parts are published and free. No paywall, no signup, no follow-up sequence.

On AI and Judgment

Every AI in this stack drafts, researches, summarizes, and executes on instruction. No AI in this stack makes decisions. You review the draft before you send it. You verify the organized files before you archive them. You consult qualified professionals before acting on any AI-generated legal, financial, or medical information.

This is not a disclaimer. It is a correct description of what these tools are. They are extraordinarily capable instruments. The outcome depends entirely on the judgment directing them. That distinction matters more as the tools get better, not less.

Legal Disclaimer

The information provided in this series is for educational and informational purposes only. It does not constitute legal, financial, tax, accounting, cybersecurity, or professional advice of any kind. No professional relationship is formed by reading or acting on this content.

The author makes no representation or warranty, express or implied, as to the accuracy, completeness, currentness, fitness for a particular purpose, or non-infringement of any information contained herein. All cost estimates, technical configurations, and third-party service descriptions are based on publicly available information as of April 2026 and are subject to change without notice.

All use of techniques, software, services, configurations, commands, or information described in this series is undertaken at the sole risk of the user. To the maximum extent permitted by applicable law, the author expressly disclaims all liability for any direct, indirect, incidental, special, consequential, exemplary, or punitive damages arising from use of or reliance upon this content.

This disclaimer is intended to be enforceable to the fullest extent permitted by applicable law, including the laws of the State of California (Cal. Civ. Code §§1668, 3513) and the State of New York (N.Y. GOL §5-323), as well as applicable federal law of the United States. Nothing herein limits liability where prohibited by law, including for gross negligence, willful misconduct, or fraud.

References to Vultr, Cloudflare, OpenAI, Anthropic, Google DeepMind, Perplexity AI, Nextcloud, Collabora Online, Apache Guacamole, OpenClaw, Prometheus, Grafana, and all other third-party products are for informational purposes only. The author has no commercial, sponsorship, affiliate, or compensated relationship with any provider mentioned. All trademarks are the property of their respective owners.

Non-commercial sharing and attribution are permitted and encouraged. Commercial reproduction requires the author's explicit written consent.

Part 2 — provisioning the server and building the zero-trust layer — is next.

Drop questions in the comments. I read every one and answer the technical ones in detail.

If this is the kind of thing you build for clients or have been meaning to build for yourself, follow along. Parts 2 through 4 get into the actual commands.

— Kusunoki
International Tax Specialist & Systems Builder
Sapporo, Japan | @kusunoki

DEV Community