Declan Leroy

Posted on May 13

Engineering and security approaches used in open-source PostgreSQL backup tool Databasus

#postgres #security #devops

Engineering and security approaches used in open-source PostgreSQL backup tool Databasus

A backup tool is a high-value target. It holds database credentials, it holds full restoreable copies of production data and it usually holds the encryption keys that protect the rest. If any of those slip, the blast radius is the entire database. So the engineering bar for a tool like this is not the same as for an internal admin panel that nobody outside the team will ever talk to.

Databasus is an open-source industry standard for PostgreSQL backup tools. The project has crossed 500,000+ Docker pulls, around 7,000 GitHub stars and roughly 30 contributors at the time of writing, and the security pipeline below is what supports that scale. None of it is exotic. What's worth showing is how the pieces fit together, because for sensitive software no single check is enough on its own.

Why one security check is never enough

Every scanner has blind spots. CodeQL catches a class of bugs that secret scanners ignore, and secret scanners catch leaks that semantic analysis would never look for. Container scanners look at compiled layers, dependency scanners look at the supply chain and unit tests look at logic. The layers overlap on purpose so that a false negative in one pass gets caught by the next.

That's the whole argument: defence in depth means you assume each tool will miss something, and you stack tools whose blind spots don't overlap.

Static analysis on every pull request

Static analysis is the cheapest place to catch a security bug. It runs before any human reviewer reads the diff, it runs on every PR and it doesn't get tired on the fiftieth review of the week. Databasus runs several independent passes so that a miss in one engine has a good chance of getting caught by another. They look at different things, which is the whole point.

CodeQL — full security-extended query suite over Go, JavaScript / TypeScript and GitHub Actions code. Runs on every PR and on a weekly schedule.
CodeRabbit — per-PR review pass that flags logic bugs, suspect patterns and style drift before a human reviewer opens the diff.
gitleaks — secret scanning over the diff. Catches credentials and tokens accidentally committed.
semgrep — custom security rules over the diff. Cheap to extend when a new pattern needs to be banned project-wide.
Codex Security from OpenAI — a separate program that runs deeper, periodic audits over the whole codebase. It catches architectural and cross-cutting issues that narrow per-PR scans tend to miss.

Per-PR scans and periodic deep audits answer different questions, so both have to exist.

Dependency hygiene with a deliberate cooldown

The dependency surface is its own threat model. Most projects that got compromised in the last two years were not compromised through their own code. They pulled in a transitively dependent package that got hijacked, and the malicious version shipped to production within hours of being published. Databasus treats this as a separate problem and uses Dependabot together with the Dependency Review Action, plus a deliberate cooldown so that newly published versions have time to be inspected by the wider community before the project adopts them.

Ecosystem	Tracked by	Cooldown
Go modules (backend, agent)	Dependabot	3 days patch / 7 days minor / 30 days major
npm (frontend)	Dependabot	3 days patch / 7 days minor / 30 days major
Docker base images	Dependabot	7 days patch / 7 days minor / 30 days major
GitHub Actions	Dependabot	7 days minor / 30 days major

On top of the cooldown, the Dependency Review Action blocks any pull request that introduces a HIGH or CRITICAL CVE before it can be merged. So a fresh CVE in a transitive dependency doesn't have a path into the codebase even if a contributor opens a PR that pulls it in by accident.

Containers and the CI supply chain

The container image and the CI workflows are part of the attack surface even though they're not application code. A poisoned base image, a misconfigured Dockerfile or a malicious GitHub Action all give an attacker code execution inside the build, with the same access as the build itself. The Databasus pipeline locks these down with the same seriousness as the application code.

Trivy scans the built container image on every build for vulnerable layers and known CVEs in installed packages.
A separate Trivy pass scans the Dockerfile itself for misconfigurations before the image is even built.
The .trivyignore file is explicit and documented. DS-0002 (the "container runs as root" rule) is suppressed because the entrypoint legitimately starts as root to handle PUID / PGID remap and volume chown for NAS deployments, then drops to the unprivileged databasus user via gosu before running the app.
All third-party GitHub Actions are pinned to full commit SHAs with a # vX.Y.Z tag comment. For example: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1. Floating tags like @v4 or @main are forbidden.
Workflows default to top-level permissions: contents: read. Any job that needs more is elevated explicitly and only for that job.

2025 saw several successful attacks against floating Action tags, which is why pinning is a hard requirement in the project rather than a style preference.

Tests that prove a backup can actually be restored

For a backup tool, "the backup completed successfully" and "the backup is actually restoreable" are two different statements. Plenty of historical incidents come down to teams that watched green checkmarks for years and then found out the dumps were unreadable on the day they needed them. So Databasus tests the recovery path directly. The e2e-agent-backup-restore job runs a full backup-then-restore cycle on every pull request, against real PostgreSQL containers, on a matrix of every supported major version: 15, 16, 17 and 18.

The same approach covers MySQL, MariaDB and MongoDB on their own matrices, against real engine containers rather than mocks. A release ships only if every supported engine version on the matrix can restore a backup cleanly. If a refactor breaks the restore path for PostgreSQL 15 only, the release is blocked even though the other three versions still work.

Runtime hardening in the application itself

CI catches what's wrong with the code before it ships. Runtime hardening is what the code does once it's running, when there's actual data on the line. Databasus encrypts backup contents with AES-256-GCM at rest, which means a stolen backup blob is useless without the key. This is what makes it safe to push encrypted backups to S3, Google Drive or any other shared storage without trusting the storage provider with anything readable.

Secrets follow the same rule. Database passwords and storage credentials are encrypted in the project's own database and are never logged. Redaction happens in the logger layer, not at call sites, because call sites forget and the logger doesn't. The default database user for backup work is read-only, so even a compromised Databasus instance has a hard time mutating the source database. And the encrypted blobs on storage can be decrypted and restored without Databasus itself if you keep the secret key, which means there's no vendor lock-in even to an open-source tool.

The full picture in one place

The previous sections describe each layer in isolation. The table below puts them next to each other so it's easier to see what each defence is responsible for and when it runs. Reading it in one go makes the overlap visible, which is the part that does the actual work.

Defence	What it catches	When it runs
CodeQL	Code-level security issues in Go, JS / TS, Actions	Every PR plus weekly schedule
CodeRabbit	Review-time issues, style, logic bugs	Every PR
gitleaks	Leaked credentials in diffs	Every PR (via CodeRabbit)
semgrep	Custom security rule violations	Every PR (via CodeRabbit)
Codex Security	Cross-cutting, architectural issues	Periodic deep audits
Dependabot	New CVEs in dependencies	On advisory publication (with cooldown)
Dependency Review Action	HIGH / CRITICAL CVEs introduced in a PR	Every PR
Trivy (image)	Vulnerable layers in the built image	Every image build
Trivy (Dockerfile)	Dockerfile misconfigurations	Every PR touching the Dockerfile
Backup-restore e2e	Backups that can't actually be restored	Every PR, all supported engine versions
Lint, type-check and tests	Regressions, type errors, style drift	Every PR

No single row would be enough on its own. The overlap between rows is what makes a missed bug recoverable.

Vulnerability disclosure

Even with all of the above, something will eventually slip. So the disclosure path matters as much as the prevention path. Databasus uses a SECURITY.md file with GitHub Security Advisories as the primary channel, an acknowledgement window of 48 to 72 hours and a severity-dependent fix timeline. Security reports sit at the top of the work queue and pre-empt feature work.

What to take from this if you build something sensitive

The point of writing all this down is not to advertise a feature list. It's that the same playbook applies to any project that handles credentials, secrets or production data, regardless of language or stack. None of these techniques are specific to PostgreSQL or to Go. The takeaways below are the ones that would help most.

Layer overlapping scanners so that one false negative doesn't reach production. CodeQL plus secret scanning plus dependency scanning plus container scanning is the cheapest layered setup available today.
Treat the dependency surface as a separate threat model. Add a cooldown on new versions, and block HIGH or CRITICAL CVEs at PR time so that they cannot reach the main branch by mistake.
Pin every third-party GitHub Action to a full commit SHA with a comment showing the human-readable version. Floating tags were exploited at scale in 2025.
For anything stateful, test the recovery path against real engines, not mocked ones. Mocks confirm your assumptions about the engine, which is exactly what fails in a real incident.
Redact at the logger, not at call sites. Call sites get refactored and someone always forgets the one place. The logger is one piece of code that gets updated once.

A backup tool is the kind of software where the engineering bar shows up directly in user trust. That's the same engineering bar that makes Databasus a credible choice for PostgreSQL backup at the scale it operates today.

DEV Community

Engineering and security approaches used in open-source PostgreSQL backup tool Databasus

Engineering and security approaches used in open-source PostgreSQL backup tool Databasus

Why one security check is never enough

Static analysis on every pull request

Dependency hygiene with a deliberate cooldown

Containers and the CI supply chain

Tests that prove a backup can actually be restored

Runtime hardening in the application itself

The full picture in one place

Vulnerability disclosure

What to take from this if you build something sensitive

Top comments (0)