Engineering and security approaches used in open-source PostgreSQL backup tool Databasus
A backup tool is a high-value target. It holds database credentials, it holds full restoreable copies of production data and it usually holds the encryption keys that protect the rest. If any of those slip, the blast radius is the entire database. So the engineering bar for a tool like this is not the same as for an internal admin panel that nobody outside the team will ever talk to.
Databasus is an open-source industry standard for PostgreSQL backup tools. The project has crossed 500,000+ Docker pulls, around 7,000 GitHub stars and roughly 30 contributors at the time of writing, and the security pipeline below is what supports that scale. None of it is exotic. What's worth showing is how the pieces fit together, because for sensitive software no single check is enough on its own.
Why one security check is never enough
Every scanner has blind spots. CodeQL catches a class of bugs that secret scanners ignore, and secret scanners catch leaks that semantic analysis would never look for. Container scanners look at compiled layers, dependency scanners look at the supply chain and unit tests look at logic. The layers overlap on purpose so that a false negative in one pass gets caught by the next.
That's the whole argument: defence in depth means you assume each tool will miss something, and you stack tools whose blind spots don't overlap.
Static analysis on every pull request
Static analysis is the cheapest place to catch a security bug. It runs before any human reviewer reads the diff, it runs on every PR and it doesn't get tired on the fiftieth review of the week. Databasus runs several independent passes so that a miss in one engine has a good chance of getting caught by another. They look at different things, which is the whole point.
- CodeQL — full security-extended query suite over Go, JavaScript / TypeScript and GitHub Actions code. Runs on every PR and on a weekly schedule.
- CodeRabbit — per-PR review pass that flags logic bugs, suspect patterns and style drift before a human reviewer opens the diff.
- gitleaks — secret scanning over the diff. Catches credentials and tokens accidentally committed.
- semgrep — custom security rules over the diff. Cheap to extend when a new pattern needs to be banned project-wide.
- Codex Security from OpenAI — a separate program that runs deeper, periodic audits over the whole codebase. It catches architectural and cross-cutting issues that narrow per-PR scans tend to miss.
Per-PR scans and periodic deep audits answer different questions, so both have to exist.
Dependency hygiene with a deliberate cooldown
The dependency surface is its own threat model. Most projects that got compromised in the last two years were not compromised through their own code. They pulled in a transitively dependent package that got hijacked, and the malicious version shipped to production within hours of being published. Databasus treats this as a separate problem and uses Dependabot together with the Dependency Review Action, plus a deliberate cooldown so that newly published versions have time to be inspected by the wider community before the project adopts them.
| Ecosystem | Tracked by | Cooldown |
|---|---|---|
| Go modules (backend, agent) | Dependabot | 3 days patch / 7 days minor / 30 days major |
| npm (frontend) | Dependabot | 3 days patch / 7 days minor / 30 days major |
| Docker base images | Dependabot | 7 days patch / 7 days minor / 30 days major |
| GitHub Actions | Dependabot | 7 days minor / 30 days major |
On top of the cooldown, the Dependency Review Action blocks any pull request that introduces a HIGH or CRITICAL CVE before it can be merged. So a fresh CVE in a transitive dependency doesn't have a path into the codebase even if a contributor opens a PR that pulls it in by accident.
Containers and the CI supply chain
The container image and the CI workflows are part of the attack surface even though they're not application code. A poisoned base image, a misconfigured Dockerfile or a malicious GitHub Action all give an attacker code execution inside the build, with the same access as the build itself. The Databasus pipeline locks these down with the same seriousness as the application code.
- Trivy scans the built container image on every build for vulnerable layers and known CVEs in installed packages.
- A separate Trivy pass scans the Dockerfile itself for misconfigurations before the image is even built.
- The
.trivyignorefile is explicit and documented. DS-0002 (the "container runs as root" rule) is suppressed because the entrypoint legitimately starts as root to handle PUID / PGID remap and volume chown for NAS deployments, then drops to the unprivilegeddatabasususer viagosubefore running the app. - All third-party GitHub Actions are pinned to full commit SHAs with a
# vX.Y.Ztag comment. For example:actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1. Floating tags like@v4or@mainare forbidden. - Workflows default to top-level
permissions: contents: read. Any job that needs more is elevated explicitly and only for that job.
2025 saw several successful attacks against floating Action tags, which is why pinning is a hard requirement in the project rather than a style preference.
Tests that prove a backup can actually be restored
For a backup tool, "the backup completed successfully" and "the backup is actually restoreable" are two different statements. Plenty of historical incidents come down to teams that watched green checkmarks for years and then found out the dumps were unreadable on the day they needed them. So Databasus tests the recovery path directly. The e2e-agent-backup-restore job runs a full backup-then-restore cycle on every pull request, against real PostgreSQL containers, on a matrix of every supported major version: 15, 16, 17 and 18.
The same approach covers MySQL, MariaDB and MongoDB on their own matrices, against real engine containers rather than mocks. A release ships only if every supported engine version on the matrix can restore a backup cleanly. If a refactor breaks the restore path for PostgreSQL 15 only, the release is blocked even though the other three versions still work.
Runtime hardening in the application itself
CI catches what's wrong with the code before it ships. Runtime hardening is what the code does once it's running, when there's actual data on the line. Databasus encrypts backup contents with AES-256-GCM at rest, which means a stolen backup blob is useless without the key. This is what makes it safe to push encrypted backups to S3, Google Drive or any other shared storage without trusting the storage provider with anything readable.
Secrets follow the same rule. Database passwords and storage credentials are encrypted in the project's own database and are never logged. Redaction happens in the logger layer, not at call sites, because call sites forget and the logger doesn't. The default database user for backup work is read-only, so even a compromised Databasus instance has a hard time mutating the source database. And the encrypted blobs on storage can be decrypted and restored without Databasus itself if you keep the secret key, which means there's no vendor lock-in even to an open-source tool.
The full picture in one place
The previous sections describe each layer in isolation. The table below puts them next to each other so it's easier to see what each defence is responsible for and when it runs. Reading it in one go makes the overlap visible, which is the part that does the actual work.
| Defence | What it catches | When it runs |
|---|---|---|
| CodeQL | Code-level security issues in Go, JS / TS, Actions | Every PR plus weekly schedule |
| CodeRabbit | Review-time issues, style, logic bugs | Every PR |
| gitleaks | Leaked credentials in diffs | Every PR (via CodeRabbit) |
| semgrep | Custom security rule violations | Every PR (via CodeRabbit) |
| Codex Security | Cross-cutting, architectural issues | Periodic deep audits |
| Dependabot | New CVEs in dependencies | On advisory publication (with cooldown) |
| Dependency Review Action | HIGH / CRITICAL CVEs introduced in a PR | Every PR |
| Trivy (image) | Vulnerable layers in the built image | Every image build |
| Trivy (Dockerfile) | Dockerfile misconfigurations | Every PR touching the Dockerfile |
| Backup-restore e2e | Backups that can't actually be restored | Every PR, all supported engine versions |
| Lint, type-check and tests | Regressions, type errors, style drift | Every PR |
No single row would be enough on its own. The overlap between rows is what makes a missed bug recoverable.
Vulnerability disclosure
Even with all of the above, something will eventually slip. So the disclosure path matters as much as the prevention path. Databasus uses a SECURITY.md file with GitHub Security Advisories as the primary channel, an acknowledgement window of 48 to 72 hours and a severity-dependent fix timeline. Security reports sit at the top of the work queue and pre-empt feature work.
What to take from this if you build something sensitive
The point of writing all this down is not to advertise a feature list. It's that the same playbook applies to any project that handles credentials, secrets or production data, regardless of language or stack. None of these techniques are specific to PostgreSQL or to Go. The takeaways below are the ones that would help most.
- Layer overlapping scanners so that one false negative doesn't reach production. CodeQL plus secret scanning plus dependency scanning plus container scanning is the cheapest layered setup available today.
- Treat the dependency surface as a separate threat model. Add a cooldown on new versions, and block HIGH or CRITICAL CVEs at PR time so that they cannot reach the main branch by mistake.
- Pin every third-party GitHub Action to a full commit SHA with a comment showing the human-readable version. Floating tags were exploited at scale in 2025.
- For anything stateful, test the recovery path against real engines, not mocked ones. Mocks confirm your assumptions about the engine, which is exactly what fails in a real incident.
- Redact at the logger, not at call sites. Call sites get refactored and someone always forgets the one place. The logger is one piece of code that gets updated once.
A backup tool is the kind of software where the engineering bar shows up directly in user trust. That's the same engineering bar that makes Databasus a credible choice for PostgreSQL backup at the scale it operates today.

Top comments (0)