In the last 10 days I shipped two production web services on a single $5 Hetzner VPS:
- Funding Finder — a 20-exchange perpetual futures funding rate aggregator. ~6,800 USDT-margined symbols, refreshed every 5 minutes. 5 systemd services, ~80 MB resident memory.
- cronviz — a stdlib-only CLI tool for unified cron + systemd timer observability. Zero dependencies. 48 unit tests.
Both shipped with the same boring stack: Python 3.12 + Flask + SQLite (WAL mode) + systemd + vanilla HTML. Nothing exotic. Nothing trendy. Things that have worked since 2010.
This post is 5 patterns from that work. They're not the only thing I used — there are about 25 more in the full collection — but they're the 5 that did the most leverage. If you're a solo dev shipping a side project on weekends and you keep getting told you need Postgres / Redis / Docker / Kubernetes / FastAPI / asyncio to be "production-ready", read these first. You probably don't.
Each pattern follows the same shape: the pain it solves, the code, when not to use it, and a real number from production.
1. The 12-line systemd unit
You wrote a Flask app. It runs locally with python app.py. You SSH into the VPS, git pull, and now you need it to run forever: restart on crash, restart on reboot, log to a place you can tail -f, let you systemctl restart it on deploy.
Half the internet will tell you to use Docker. On a $5 VPS, for one process, Docker is overkill. systemd has been on every Linux box since 2015 and does exactly this in 12 lines.
/etc/systemd/system/myapp.service:
[Unit]
Description=My Flask app
After=network.target
[Service]
Type=simple
User=root
WorkingDirectory=/root/myapp
ExecStart=/usr/bin/python3 /root/myapp/app.py
Restart=on-failure
RestartSec=3
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
systemctl daemon-reload
systemctl enable --now myapp.service
journalctl -u myapp.service -f # logs
systemctl restart myapp.service # deploy
That's it. There's nothing else.
When NOT to use it. If you need rolling deploys, multiple replicas, cross-machine service discovery, or you're already on Kubernetes — skip this. For literally anything below that bar, this is the right shape.
Production number. My Funding Finder API service has been running for 10+ days, ~25 MB resident memory, 12 restarts (all from systemctl restart during deploys, zero from crashes).
2. SQLite in WAL mode is enough for almost any solo project
You picked SQLite because you didn't want to run a separate database process. Then you started writing concurrently from a Flask request handler and a background collector, and SQLite locked up.
The fix is two PRAGMAs:
import sqlite3
from contextlib import contextmanager
@contextmanager
def conn():
c = sqlite3.connect("/var/data/myapp.db")
c.row_factory = sqlite3.Row
c.execute("PRAGMA journal_mode=WAL")
c.execute("PRAGMA synchronous=NORMAL")
try:
yield c
c.commit()
finally:
c.close()
WAL means concurrent reads no longer block writes (and vice versa). synchronous=NORMAL means SQLite calls fsync() once per commit instead of once per page write — throughput goes from ~100 commits/sec to ~10,000 commits/sec on a typical SSD, and you only lose the last in-flight transaction in the (rare) case of a hard kernel crash.
When NOT to use it. Networked filesystems (NFS): WAL is broken there. Use Postgres. Multi-machine deployments: same answer.
Production number. My collector inserts ~6,800 rows in one transaction every 5 minutes. With WAL: ~80 ms per batch, API server reads never block. Without WAL: random 500s during commit windows.
3. Per-IP rate limiting in 25 lines, no Redis
You opened your free-tier API to the public. Within hours, one IP starts hammering it 50 req/sec. You need to throttle without (a) blocking entirely, (b) running Redis, (c) installing a "professional" rate limiting library that needs a year of config.
import time
from collections import deque
from threading import Lock
from flask import request, abort
_hits: dict[str, deque] = {}
_lock = Lock()
_WINDOW_SECONDS = 60
_MAX_PER_WINDOW = 60 # 60 req/min/IP
def rate_limit():
ip = request.headers.get("X-Forwarded-For", request.remote_addr) or "unknown"
now = time.time()
cutoff = now - _WINDOW_SECONDS
with _lock:
dq = _hits.setdefault(ip, deque())
while dq and dq[0] < cutoff:
dq.popleft()
if len(dq) >= _MAX_PER_WINDOW:
retry = int(dq[0] + _WINDOW_SECONDS - now) + 1
abort(429, description=f"rate limit, retry in {retry}s")
dq.append(now)
@app.route("/api/expensive")
def api_expensive():
rate_limit()
return jsonify(...)
Each IP gets a deque of recent timestamps. Drop expired ones, count, abort if over the limit, otherwise append. ~25 lines, one global dict, one lock.
When NOT to use it. Multi-process deployments (gunicorn --workers=4) where each worker has its own dict and the limit becomes per-worker. The fix is gunicorn --workers=1 --threads=8 (fine for I/O-bound services), or move to Redis.
Production number. My free tier is 60 req/min/IP, paid tiers are 600 and 3000 req/min. 10 days, 10k+ unique IPs, ~5 MB total memory cost across all the tracked deques. Zero issues.
4. ThreadPoolExecutor for fan-out HTTP fetching, when async is overkill
You need to fetch funding rates for 285 OKX perpetual contracts. The exchange rate-limits public endpoints to ~10 req/s. Your options:
- Sequential. 285 × 100 ms = 28.5 seconds. Too slow.
-
aiohttp+asyncio.gather. Fast, but you're now rewriting the codebase inasync def, your tests needpytest-asyncio, your stack traces are 80% framework noise, and you're debugging "RuntimeError: This event loop is already running" in REPL sessions. -
concurrent.futures.ThreadPoolExecutor. 8 worker threads, 8 seconds end-to-end, zero new dependencies, your code stays synchronous.
from concurrent.futures import ThreadPoolExecutor, as_completed
import requests
session = requests.Session()
session.headers.update({"User-Agent": "myapp/1.0"})
def fetch_one(inst_id: str) -> dict | None:
try:
r = session.get(
"https://www.okx.com/api/v5/public/funding-rate",
params={"instId": inst_id},
timeout=10,
)
if r.status_code != 200:
return None
rows = r.json().get("data", [])
return rows[0] if rows else None
except Exception:
return None
def fetch_all(inst_ids: list[str], workers: int = 8) -> list[dict]:
results = []
with ThreadPoolExecutor(max_workers=workers) as pool:
futures = {pool.submit(fetch_one, inst): inst for inst in inst_ids}
for fut in as_completed(futures):
r = fut.result()
if r is not None:
results.append(r)
return results
Pick worker count just below (rate_limit_per_sec × p95_latency_seconds) × 2. For OKX (10 req/s, 150 ms median): 8 workers, comfortably under the limit.
When NOT to use it. CPU-bound work (the GIL doesn't help — use ProcessPoolExecutor or NumPy). Or millions of concurrent connections (~1 MB stack per thread; 10k threads = 10 GB RAM — use async). For "fan out 100-500 HTTP calls under a rate limit", threads are the boring, working answer.
Production number. My OKX fetcher: 285 instruments, 8 worker threads, ~8 seconds end-to-end, < 0.1% failure rate over 10 days. The async version would shave ~3 seconds off and require rewriting half my codebase. Not worth it.
5. Telegram bot as monitoring transport — €0/month
Your service crashes at 3 AM. PagerDuty exists for this and costs €19/user/month. Telegram exists for this and costs €0.
Setup (90 seconds):
- Open Telegram, search
@BotFather, send/newbot. Get a token. - Send your new bot a message (you must message it first, before it can message you).
- Visit
https://api.telegram.org/bot<TOKEN>/getUpdatesand find"chat":{"id":12345678}.
Code:
import os
import requests
TOKEN = os.environ.get("TELEGRAM_BOT_TOKEN")
CHAT_ID = os.environ.get("TELEGRAM_CHAT_ID")
def send_alert(message: str) -> bool:
if not (TOKEN and CHAT_ID):
return False
try:
r = requests.post(
f"https://api.telegram.org/bot{TOKEN}/sendMessage",
data={"chat_id": CHAT_ID, "text": message, "parse_mode": "Markdown"},
timeout=10,
)
return r.json().get("ok", False)
except Exception:
return False
Use it from anywhere:
if disk_usage > 0.9:
send_alert(f"⚠️ disk at {disk_usage:.0%} on `{hostname}`")
if collector_age > 600:
send_alert(f"❌ collector stale: last fetch {collector_age}s ago")
send_alert(f"✅ deploy of `{commit_sha[:7]}` complete")
Your phone buzzes within 2 seconds. No SaaS account, no SDK, no webhook UI, no escalation policy.
When NOT to use it. On-call rotation. Acknowledgment + escalation. Compliance audit trails. > 30 messages/sec sustained.
Production number. I've used this pattern in every service I've shipped for 3 years. Zero missed alerts. Zero subscription cost. Telegram has not become a product they're trying to monetize — the Bot API has been stable since 2015.
The take-away
Shipping a paid web service on a $5 VPS doesn't require 47 config files, a microservices architecture, or a year of devops yak-shaving. It requires:
- One systemd unit per service
- SQLite in WAL mode
- A 25-line rate limiter
- A
ThreadPoolExecutorinstead of asyncio - A free Telegram bot for alerts
Total dependencies introduced by the 5 patterns above: requests (and even that is optional — you could use urllib.request from stdlib). That's it.
If you're still in "I should learn Kubernetes before I ship anything" mode, please consider that this entire stack costs €5/month, fits on a $5 VPS, has been running two production services for 10 days with zero downtime, and is fundamentally easier to reason about than any "modern" alternative. Use what works. The boring stack works.
Where this comes from
These 5 patterns are extracted from a longer collection I'm putting together: 30 Boring Patterns for Solo Devs Who Ship. The other 25 cover:
- Theme 2 — SQLite at solo-dev scale (migrations, backups, single-writer pattern, cursor pagination)
- Theme 3 — HTTP and Flask (API key auth with revocation, CSV exports, OpenAPI by hand, vanilla HTML/SVG dashboards)
- Theme 4 — More external API patterns (mocked-HTTP testing, the structural sanity check on aggregate output, per-source field normalization)
-
Theme 5 — Operations and observability (health endpoints, /metrics in 15 lines,
makeas your only deploy tool) - Theme 6 — Going from free to paid (per-tier rate limiting, Lemon Squeezy vs Stripe, pricing a niche dev tool)
The full pack (standalone HTML + companion code zip) is €19, one-time, no subscription. Live now at http://178.104.60.252:8083/boring-patterns — instant download after checkout via Stripe. The 5 patterns above are the full content of those 5 patterns — no teaser cuts. If they're useful, the other 25 are too. If they're not, you've still got 5 production-tested patterns for free. Patterns are CC BY-SA 4.0, code is MIT, 14-day no-questions refund.
Comments / corrections / "you're wrong about X" replies very welcome.
Top comments (0)