Dinh Doan Van Bien

Posted on Mar 6 • Edited on Mar 17 • Originally published at supabase.voieduco.de

Part 7 — Security and the load test

#supabase #docker #selfhosted #devops

Part 7 of 7 — Self-hosting Supabase: a learning journey

Also available in French: Partie 7 — Sécurité et test de charge

We have a working two-project cluster. Now two questions: is it actually secure, and what does it take to break it?

The security layers

Security here is defense in depth. Multiple layers, each one adding friction for an attacker. No single layer is sufficient on its own.

ufw and fail2ban

The outer layer. Only ports 22, 80, and 443 are open. fail2ban bans IP addresses after five failed SSH attempts. This stops automated scanners and brute-force attacks.

Kong: key authentication and rate limiting

Every request to the API must include a valid apikey header. Without it, Kong returns 401 before the request reaches any backend service. GoTrue, PostgREST, Realtime, Storage, none of them see unauthenticated traffic.

Rate limiting: 30 requests per minute per IP address by default (configured with limit_by: ip in the Kong rate-limiting plugin). This limits the damage from credential stuffing attempts and protects GoTrue from being used as a bulk signup platform.

Kong is configured via a YAML file (kong.yml) that lives on the server and is never committed to git. The important sections:

plugins:
  - name: key-auth
    config:
      key_names: ["apikey"]

  - name: rate-limiting
    config:
      minute: 30
      limit_by: ip
      policy: local

Studio behind basic auth

The Studio dashboard gives full admin access to your database. It should not be publicly accessible. We set this up in Part 4: Traefik's basic auth middleware protects the Studio route, and the password hash is embedded in the service labels. See Part 4 for the htpasswd -nB command and the $$ escaping rule.

Falco: runtime intrusion detection

ufw and Kong handle external threats. Falco watches what happens inside the running containers.

Falco is a security tool that hooks into Linux syscalls to monitor container activity: file access, process execution, network connections, privilege changes. We use falco-modern-bpf, which uses eBPF as its capture mechanism. It runs on the host, outside the container's control. Events that match rules trigger alerts.

Install it:

curl -fsSL https://falco.org/repo/falcosecurity-packages.asc \
  | gpg --dearmor -o /usr/share/keyrings/falco-archive-keyring.gpg

echo "deb [signed-by=/usr/share/keyrings/falco-archive-keyring.gpg] \
  https://download.falco.org/packages/deb stable main" \
  | tee /etc/apt/sources.list.d/falcosecurity.list

apt update && apt install falco -y
systemctl enable falco-modern-bpf
systemctl start falco-modern-bpf

Custom rules go in /etc/falco/rules.d/. For our cluster, useful rules include alerting on unexpected outbound connections from containers and on direct psql commands containing destructive statements.

The alerting gotcha. The program_output directive pipes alerts to a script. You can verify it works by watching the Falco journal while triggering an event: journalctl -u falco-modern-bpf -f. If you see events there but your handler script is never called, program_output is broken in your version. The safe workaround is a separate systemd service that tails the journal directly:

# /etc/systemd/system/falco-alerter.service
[Unit]
Description=Falco alert handler
After=falco-modern-bpf.service

[Service]
ExecStart=/bin/bash -c 'journalctl -f -u falco-modern-bpf -o cat | /root/supabase-vps-cluster/scripts/falco-alert.sh'
Restart=always

The alert handler logs events to /var/log/falco-alerts.log with a 5-minute cooldown per rule.

Expected noise. Kong's nginx health checks run every 10 seconds. Each one spawns a shell subprocess and reads /etc/passwd. Falco reports this as "Shell spawned in container" and "Sensitive file read in container." This is normal behavior, not an attack. The cooldown mechanism keeps the log readable. After 24 hours of observation you can tune the rules to whitelist the specific Kong process.

The security picture

Internet
    |
  ufw (ports 22/80/443 only)
    |
  Traefik (TLS, security headers)
    |
    +-- Studio (Traefik basic auth)
    +-- Kong (key-auth + rate limiting)
              |
              +-- internal services (not publicly reachable)

Host layer:
  SSH key-only auth, fail2ban
  Falco eBPF watching all container syscalls
  Vault on localhost only, UI disabled
  No published Postgres port

The load test

I wanted a concrete answer to the question: what is the actual limit of this server?

I used Grafana Cloud k6 for the tests. Before running any of them, you need:

k6 installed locally (brew install k6 on macOS, or download from k6.io for other platforms)
A Grafana Cloud account (free at grafana.com/products/cloud). The free tier allows up to 50 concurrent virtual users and tests up to around 10 minutes long.
Your project's anon key and service role key (from Vault, or from the .env file if you have not set up Vault yet)

Three tests.

Test 1: re-authentication on every request

Each virtual user signs in on every iteration. The test ramps to 50 concurrent users.

The server broke at around 30 to 50 VUs.

At 50 VUs, the database CPU hit 100% and stayed there. GoTrue started returning 504 timeouts. The problem is bcrypt, the intentionally slow password-hashing algorithm. Each login requires a bcrypt verification via PostgreSQL's pgcrypto extension. With 50 users re-authenticating every few seconds, the database was saturated by cryptographic work alone.

Service	At rest	During test
GoTrue	0% CPU	51% CPU
PostgreSQL	0% CPU	100% CPU

This looks alarming. But no real application works like this. JWT tokens are valid for an hour. You authenticate once, use the token, refresh it when it expires. You do not re-authenticate on every API call.

Test 2: cached JWT, pure CRUD

Each VU logs in once during setup and caches the token for the entire test. Then it runs insert, read, delete in a loop with no re-authentication.

No breaking point at 50 VUs. The Grafana Cloud free tier hit its test duration limit (about 5 minutes) before the server showed any stress.

Service	At rest	During test
GoTrue	0% CPU	0.02% CPU
PostgreSQL	0% CPU	9% CPU
PostgREST	0% CPU	13% CPU

Database CPU went from 100% to 9%. The only change was caching the JWT. PostgREST is now the highest-CPU service and would eventually become the bottleneck at higher VU counts, but we did not reach that ceiling.

Test 3: realistic sessions

Three user archetypes in parallel, with randomized think time between actions:

70% casual users (10 to 30 second think time, mostly reads)
20% active users (5 to 15 second think time, mixed)
10% power users (2 to 8 second think time, more writes)

Each user logs in once per session. Random idle periods simulate switching tabs or stepping away.

The test ran to full completion: 10 minutes 30 seconds, zero errors.

Service	At rest	Peak during test
GoTrue	0% CPU	0.02% CPU
PostgreSQL	0% CPU	0.67% CPU
PostgREST	0% CPU	1.19% CPU

The server was essentially idle.

There was eventually a threshold breach: read latency at the 95th percentile exceeded 300ms. This was not the server. The test ran from Grafana Cloud's Ohio region to our server in Germany. The base round-trip time is 100 to 130ms. The cluster was healthy throughout, the latency was from the network, not from the application.

What the numbers mean

With 50 concurrent users and realistic think time, there are 3 to 8 database queries actually running at any moment. Users are not hammering submit, they are reading something, typing a reply, thinking about what to do next. Think time changes the math completely.

The CX22 is fine for a hobby project with real users. The only scenario that saturates it is a continuous re-authentication hammer test, which no real application does.

What the managed service provides

After going through all of this, I have a much clearer sense of what Supabase gives you on the free tier.

I spent several evenings on setup, configuration, and debugging. I went through the Vault incident. I debugged Traefik routing issues, Realtime crashes, incorrect environment variable scope, and healthcheck behavior in Docker Swarm. I set up monitoring so I know what the server is doing. I am responsible for upgrades, backups, and incidents.

Supabase does all of that for two free projects. The infrastructure behind even one free project is more complex than everything in this series. The team keeps GoTrue, PostgREST, Realtime, and the rest upgraded and running, continuously, at scale.

The Pro plan, database with point-in-time recovery, automated backups, connection pooling via PgBouncer, uptime guarantees, is a fair price for what it eliminates from your life. I understand that now because I have seen what it eliminates.

Self-host if you want to learn. Use the managed service if you want to build.

The full series

Why we are building this
The server
Traefik and SSL
The first Supabase instance
Vault
Two instances
Security and the load test, you are here

DEV Community