Chalom Ellezam

Posted on May 14

Your indie SaaS has zero working Postgres backups. Here's the 20-minute fix (and the drill you need to run before you sleep tonight).

#postgres #devops #indiehackers #beginners

I'm a senior backend tech lead in Paris and I run HostingGuru, a managed PaaS. I'll mention HG exactly once near the end. Everything else in this article works on any platform you ship on.

A founder DMed me last month about a Render Postgres instance that had been humming along for nine months without a hiccup. Stripe charges going through, customers happy, MRR pushing past €4k. He wanted my opinion on something else, but during the call I asked how often his database backed up. He paused, opened the Render dashboard, clicked Postgres, then Backups, and saw exactly one snapshot from the day he provisioned the instance.

This is not rare. I review side projects and small SaaS stacks every week, and "zero working backups" is the single most common operational bug in the deployment of someone who shipped fast and learned to operate later. It's also the cheapest bug in the world to fix. You can wire up a working strategy in twenty minutes. The reason most solo founders don't is that the canonical advice (configure AWS RDS, set up S3 lifecycle policies, write a Lambda, install Datadog) is written for a team of four. So we're skipping all of that.

The lie we tell ourselves about "managed" databases

Almost every managed Postgres provider has a "backups" tab. Render does. Supabase does. Railway does. Neon does. AWS RDS does. People glance at this tab once, see the word "automatic," and assume the problem is solved.

It is not solved, for three reasons.

First, the default retention window on free and hobby tiers is much shorter than founders think. Render's hobby Postgres retains daily snapshots, but only for a few days, and the snapshots stop the moment you exceed plan limits. Supabase free retention covers a single day on the current tier. Neon has branching but no automatic point-in-time recovery on the free plan past 24 hours. None of this is hidden, but I have not yet met an indie founder who has read the fine print on the plan they signed up for in 2024.

Second, "managed" backups are usually stored on the same vendor as your live database. If your account is suspended (billing failure, terms-of-service trigger, a misunderstanding on a 2am support ticket), your backups vanish with the rest of the instance. I have watched this happen twice in two years. Both founders had paying customers. Both lost data they would have paid me five figures to recover. There was nothing to recover.

Third, the restore path is almost never tested. A snapshot you cannot restore from in under fifteen minutes is not a backup. It is hope.

Working backups for a solo founder satisfy three properties: they run automatically, they live somewhere your primary vendor cannot touch, and you have personally restored from them at least once.

What a real backup strategy looks like for a 1-person SaaS

You want three things in place. None of them require a DevOps hire.

A nightly logical dump of your production database, pushed to an external store. "Logical" means pg_dump, not a filesystem snapshot. Logical dumps are slower than snapshots, but they are portable: you can restore them onto any Postgres of compatible major version, on any provider, from any laptop. For a SaaS with a database under 10 GB (which is most indie SaaS in their first two years), this is the right primitive.

A retention policy of at least thirty days, daily. For most products, the urgent question is not "what was the data five minutes ago" (that's what your live database is for). The question is "what did the data look like before the migration I shipped on Tuesday that quietly nuked the users.timezone column." Thirty daily snapshots covers nearly every realistic incident I've seen in fifteen years. If you need point-in-time recovery within seconds, you are past the threshold of this article and you should pay an ops person.

A restore drill, run once, written down. The single most useful operational habit I've ever developed is restoring a backup to a scratch database, running a quick query against it, and timing how long the whole thing took. The first time I did this on a project at koodos labs back in NYC, the restore "worked" but the encoding settings on the receiving instance differed enough that a couple of emoji-heavy columns came back mangled. Better to find that out on a Sunday afternoon than during an incident.

The 4-line cron job that gets you 80% there

Here is the smallest setup that satisfies all three properties. Drop it on any host with cron and pg_dump available, plus credentials to write to one external object store (Backblaze B2, Cloudflare R2, Wasabi, or AWS S3 if you must).

#!/usr/bin/env bash
set -euo pipefail

TS=$(date -u +%Y%m%d-%H%M%S)
FILE="/tmp/backup-${TS}.sql.gz"

pg_dump "$DATABASE_URL" --format=plain --no-owner --no-privileges \
  | gzip -9 > "$FILE"

rclone copyto "$FILE" "b2:my-bucket/postgres/${TS}.sql.gz"

rm "$FILE"

Four substantive lines. Put it in /etc/cron.daily (or your platform's scheduled-jobs feature), set DATABASE_URL and the rclone config in env vars, and you have a daily off-vendor backup. rclone is one binary, no dependencies, and it talks to virtually every cloud storage provider with the same syntax.

For retention, give the bucket a lifecycle rule: keep objects for 30 days, then delete. Cloudflare R2 and Backblaze B2 both have these in their UI under "Bucket settings." You don't need to write code for the rotation, just configure it once.

What this setup does not do: it does not protect you against a backup that is silently broken (a dump that says "succeeded" but is missing tables because of a permission issue, or a gzip that truncated because the disk filled up). The simplest defense is a size sanity check. If today's compressed dump is dramatically smaller than yesterday's, something is wrong, and you want a Telegram message before you find out the hard way.

Here is the version I actually use, which adds that check.

#!/usr/bin/env bash
set -euo pipefail

TS=$(date -u +%Y%m%d-%H%M%S)
FILE="/tmp/backup-${TS}.sql.gz"
LAST_SIZE_FILE="/var/lib/backups/last-size"

pg_dump "$DATABASE_URL" --format=plain --no-owner --no-privileges \
  | gzip -9 > "$FILE"

SIZE=$(stat -c%s "$FILE")
LAST=$(cat "$LAST_SIZE_FILE" 2>/dev/null || echo "$SIZE")
RATIO=$(awk -v a="$SIZE" -v b="$LAST" 'BEGIN{print a/b}')

if awk "BEGIN {exit !($RATIO < 0.8)}"; then
  curl -s "https://api.telegram.org/bot${TG_TOKEN}/sendMessage" \
    -d chat_id="${TG_CHAT}" \
    -d text="Backup shrank to ${RATIO}x of yesterday. Investigate."
fi

echo "$SIZE" > "$LAST_SIZE_FILE"
rclone copyto "$FILE" "b2:my-bucket/postgres/${TS}.sql.gz"
rm "$FILE"

If you don't have Telegram alerting wired up yet, see article #4 in this series. The Telegram piece takes five minutes and is the thing I'd put on every project before I'd install Sentry.

A note for the Postgres pedants: --format=plain is intentional. Custom format (-Fc) is faster and smaller, and pg_restore is more flexible against it, but plain SQL is human-readable. I have personally done a partial restore by opening a backup in vim and copying out the rows I needed. You will not regret choosing plain text the first time you need it.

Restoring is the test you actually need to run

This is the step almost everyone skips. It is the step that turns "I have backups" into "I have working backups." They are not the same thing.

Once a quarter, do this:

# 1. Pull yesterday's backup down
rclone copyto b2:my-bucket/postgres/<yesterday>.sql.gz /tmp/restore.sql.gz

# 2. Spin up a scratch Postgres locally
docker run -d --name scratch-pg \
  -e POSTGRES_PASSWORD=test \
  -p 5433:5432 \
  postgres:16

# 3. Restore
gunzip -c /tmp/restore.sql.gz | psql -h localhost -p 5433 -U postgres -d postgres

# 4. Sanity-check a row count
psql -h localhost -p 5433 -U postgres -d postgres \
  -c "select count(*) from users;"

Time it. Write the elapsed minutes down. The number you want in your head is your recovery time: how long from "database is gone" to "product is back up." For most indie SaaS the answer should be under an hour, and most of that should be data transfer. If it takes you four hours to figure out how to restore, your backups are doing less for you than you think.

The first time you run this drill, you will hit one of these problems. The dump is missing a schema you didn't know about (Postgres has public plus often a pg_catalog, plus extension schemas like pgvector or pg_trgm). The Postgres major versions are incompatible because you upgraded the live instance and forgot the dump tooling. The gunzip produces a corrupted file because last night's S3 upload timed out and you only stored the truncated piece. The role definitions clash because you used --no-owner but a function depends on a specific role.

Every one of those is easier to debug on a Sunday afternoon than at 3am during an incident.

The three mistakes I see every single week

The first is using only the vendor's built-in backups. We covered this above. If your provider's account suspends, your billing card expires, or the region has a bad day, the backups go with the database. Off-vendor storage is not optional.

The second is backing up only the database. Indie SaaS often has uploaded user files (avatars, generated PDFs, CSV exports, AI-generated images) sitting on the same disk as the app, or in the vendor's local volume. If you are already using S3-compatible object storage for uploads, you are fine: those buckets have their own durability and you can mirror them with rclone sync on the same schedule as your DB. If you are storing uploads on the dyno's local disk, you have unbacked-up state, and the day the dyno is recycled you discover this. Move that to object storage first.

The third is the one I want to spend a paragraph on, because it bites people who think they did everything right. If your app stores PII encrypted at the column level (which it should, especially under GDPR), the database dump is useless without the encryption key. The key lives in an env var or a secrets manager. Back that up too. Store it separately, in a password manager or a dedicated secrets vault, and write down the recovery procedure. I lost an afternoon to this exact configuration drift on a client project a few years ago. The database came back fine and we still couldn't read half the columns.

What I built

I run HostingGuru because I spent fifteen years (Oney, BeReal, Ringover, koodos labs, agency contracts) writing variations of the cron above for every new project. At some point I got tired of rebuilding the same scaffolding and shipped a managed PaaS where daily off-vendor Postgres backups, encrypted env vars, and AI-driven Telegram alerts (including the "your backup shrank" pattern from above) are part of the default experience. EU and US data centers, GDPR, ISO 27001, the routine. The free tier doesn't sleep. That's the only mention you'll get. Everything in this article works on Render, Railway, Fly.io, Supabase, a raw VPS, or your own Kubernetes cluster.

What to do tonight regardless of which platform you use

Five steps, in order. Block off an hour.

Step one. Pull your live database with pg_dump from your laptop right now. Time it. If you can do it at all, you have a baseline. If you cannot (credentials are wrong, network rules block you, you've forgotten the password to the live DB role), fix that first. You need this skill to exist before you automate anything.

Step two. Create a bucket on Backblaze B2 or Cloudflare R2. Both have free tiers that cover a 10 GB SaaS for years. Generate an access key, store it in your password manager, and verify you can rclone copyto a test file into it.

Step three. Wire up the cron script from this article. Put it on whatever scheduler your platform exposes (Render scheduled jobs, Fly cron, GitHub Actions on a schedule:, your PaaS's on-demand script feature, or a /etc/cron.daily entry on a VPS). Run it once manually. Confirm the file lands in the bucket.

Step four. Set a lifecycle rule on the bucket: retain for 30 days, delete after. This step takes 90 seconds in the B2 or R2 UI. Without it, you'll pay for storage forever and the bucket will become a haystack.

Step five. Block off ninety minutes next weekend for the restore drill. Restore yesterday's dump to a scratch Postgres, run a row-count query, write down the elapsed time. That number is your recovery time objective. Now you can answer the customer who asks "what happens if you lose my data" without lying.

If you only do step one tonight, you've already moved the needle. Most founders haven't.

One question

I'd be curious from anyone reading: have you ever actually restored from a backup in production, not as a drill but because something went wrong? What broke in the restore that you didn't expect? The interesting failure modes are not in the docs, and I'd love to read them in the comments.

Previous posts in this series:

DEV Community