Part 7 of 7 — Self-hosting Supabase: a learning journey
Also available in French: Partie 7 — Sécurité et test de charge
We have a working two-project cluster. Now two questions: is it actually secure, and what does it take to break it?
The security layers
Security here is defense in depth. Multiple layers, each one adding friction for an attacker. No single layer is sufficient on its own.
ufw and fail2ban
The outer layer. Only ports 22, 80, and 443 are open. fail2ban bans IP addresses after five failed SSH attempts. This stops automated scanners and brute-force attacks.
Kong: key authentication and rate limiting
Every request to the API must include a valid apikey header. Without it, Kong returns 401 before the request reaches any backend service. GoTrue, PostgREST, Realtime, Storage, none of them see unauthenticated traffic.
Rate limiting: 30 requests per minute per consumer by default. This limits the damage from credential stuffing attempts and protects GoTrue from being used as a bulk signup platform.
Kong is configured via a YAML file (kong.yml) that lives on the server and is never committed to git. The important sections:
plugins:
- name: key-auth
config:
key_names: ["apikey"]
- name: rate-limiting
config:
minute: 30
policy: local
Studio behind basic auth
The Studio dashboard gives full admin access to your database. It should not be publicly accessible. We set this up in Part 4: Traefik's basic auth middleware protects the Studio route, and the password hash is embedded in the service labels. See Part 4 for the htpasswd -nB command and the $$ escaping rule.
Falco: runtime intrusion detection
ufw and Kong handle external threats. Falco watches what happens inside the running containers.
Falco is an eBPF-based security tool that monitors Linux kernel events: file access, process execution, network connections, privilege changes. It runs on the host, so it cannot be disabled by a compromised container. Events that match rules trigger alerts.
Install it:
curl -fsSL https://falco.org/repo/falcosecurity-packages.asc \
| gpg --dearmor -o /usr/share/keyrings/falco-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/falco-archive-keyring.gpg] \
https://download.falco.org/packages/deb stable main" \
| tee /etc/apt/sources.list.d/falcosecurity.list
apt update && apt install falco -y
systemctl enable falco-modern-bpf
systemctl start falco-modern-bpf
Custom rules go in /etc/falco/rules.d/. For our cluster, useful rules include alerting on unexpected outbound connections from containers and on direct psql commands containing destructive statements.
The alerting gotcha. The program_output directive pipes alerts to a script. You can verify it works by watching the Falco journal while triggering an event: journalctl -u falco-modern-bpf -f. If you see events there but your handler script is never called, program_output is broken in your version. The safe workaround is a separate systemd service that tails the journal directly:
# /etc/systemd/system/falco-alerter.service
[Unit]
Description=Falco alert handler
After=falco-modern-bpf.service
[Service]
ExecStart=/bin/bash -c 'journalctl -f -u falco-modern-bpf -o cat | /root/supabase-vps-cluster/scripts/falco-alert.sh'
Restart=always
The alert handler logs events to /var/log/falco-alerts.log with a 5-minute cooldown per rule.
Expected noise. Kong's nginx health checks run every 10 seconds. Each one spawns a shell subprocess and reads /etc/passwd. Falco reports this as "Shell spawned in container" and "Sensitive file read in container." This is normal behavior, not an attack. The cooldown mechanism keeps the log readable. After 24 hours of observation you can tune the rules to whitelist the specific Kong process.
The security picture
Internet
|
ufw (ports 22/80/443 only)
|
Traefik (TLS, security headers)
|
+-- Studio (Traefik basic auth)
+-- Kong (key-auth + rate limiting)
|
+-- internal services (not publicly reachable)
Host layer:
SSH key-only auth, fail2ban
Falco eBPF watching all container syscalls
Vault on localhost only, UI disabled
No published Postgres port
The load test
I wanted a concrete answer to the question: what is the actual limit of this server?
I used Grafana Cloud k6 for the tests. Before running any of them, you need:
-
k6 installed locally (
brew install k6on macOS, or download from k6.io for other platforms) - A Grafana Cloud account (free at grafana.com/products/cloud). The free tier allows up to 50 concurrent virtual users and tests up to around 10 minutes long.
- Your project's anon key and service role key (from Vault, or from the
.envfile if you have not set up Vault yet)
Three tests.
Test 1: re-authentication on every request
Each virtual user signs in on every iteration. The test ramps to 50 concurrent users.
The server broke at around 30 to 50 VUs.
At 50 VUs, the database CPU hit 100% and stayed there. GoTrue started returning 504 timeouts. The problem is bcrypt. Password hashing is CPU-intensive by design, each login requires a bcrypt verification in GoTrue and a round-trip to the database. With 50 users re-authenticating every few seconds, the 2 vCPU server was saturated by cryptographic work alone.
| Service | At rest | During test |
|---|---|---|
| GoTrue | 0% CPU | 51% CPU |
| PostgreSQL | 0% CPU | 100% CPU |
This looks alarming. But no real application works like this. JWT tokens are valid for an hour. You authenticate once, use the token, refresh it when it expires. You do not re-authenticate on every API call.
Test 2: cached JWT, pure CRUD
Each VU logs in once during setup and caches the token for the entire test. Then it runs insert, read, delete in a loop with no re-authentication.
No breaking point at 50 VUs. The Grafana Cloud free tier hit its test duration limit (about 5 minutes) before the server showed any stress.
| Service | At rest | During test |
|---|---|---|
| GoTrue | 0% CPU | 0.02% CPU |
| PostgreSQL | 0% CPU | 9% CPU |
| PostgREST | 0% CPU | 13% CPU |
Database CPU went from 100% to 9%. The only change was caching the JWT. PostgREST is now the highest-CPU service and would eventually become the bottleneck at higher VU counts, but we did not reach that ceiling.
Test 3: realistic sessions
Three user archetypes in parallel, with randomized think time between actions:
- 70% casual users (10 to 30 second think time, mostly reads)
- 20% active users (5 to 15 second think time, mixed)
- 10% power users (2 to 8 second think time, more writes)
Each user logs in once per session. Random idle periods simulate switching tabs or stepping away.
The test ran to full completion: 10 minutes 30 seconds, zero errors.
| Service | At rest | Peak during test |
|---|---|---|
| GoTrue | 0% CPU | 0.02% CPU |
| PostgreSQL | 0% CPU | 0.67% CPU |
| PostgREST | 0% CPU | 1.19% CPU |
The server was essentially idle.
There was eventually a threshold breach: read latency at the 95th percentile exceeded 300ms. This was not the server. The test ran from Grafana Cloud's Ohio region to our server in Germany. The base round-trip time is 100 to 130ms. The cluster was healthy throughout, the latency was from the network, not from the application.
What the numbers mean
With 50 concurrent users and realistic think time, there are 3 to 8 database queries actually running at any moment. Users are not hammering submit, they are reading something, typing a reply, thinking about what to do next. Think time changes the math completely.
The CX22 is fine for a hobby project with real users. The only scenario that saturates it is a continuous re-authentication hammer test, which no real application does.
What the managed service provides
After going through all of this, I have a much clearer sense of what Supabase gives you on the free tier.
I spent several evenings on setup, configuration, and debugging. I went through the Vault incident. I debugged Traefik routing issues, Realtime crashes, incorrect environment variable scope, and healthcheck behavior in Docker Swarm. I set up monitoring so I know what the server is doing. I am responsible for upgrades, backups, and incidents.
Supabase does all of that for two free projects. The infrastructure behind even one free project is more complex than everything in this series. The team keeps GoTrue, PostgREST, Realtime, and the rest upgraded and running, continuously, at scale.
The Pro plan, database with point-in-time recovery, automated backups, connection pooling via PgBouncer, uptime guarantees, is a fair price for what it eliminates from your life. I understand that now because I have seen what it eliminates.
Self-host if you want to learn. Use the managed service if you want to build.
The full series
- Why we are building this
- The server
- Traefik and SSL
- The first Supabase instance
- Vault
- Two instances
- Security and the load test, you are here
Top comments (0)