If you are still SSH-ing into four boxes to confirm last night's dump actually
ran, automated database backups across servers are exactly the kind of chore
that should be running on autopilot — not eating your evenings. In this guide we
wire up postgres backup automation for a small fleet (one primary, two
streaming replicas, an offsite backup box) and drive the whole thing from a
single AI interface using the MCP server
remote-agents. No agent
controller, no central scheduler to babysit — every host runs its own cron and
reports back per-host.
In short: every database host runs a lightweight agent connected to an
encrypted relay room. Your AI assistant (Claude or opencode) sees the whole
fleet as one machine and calls tools likeschedule_add,fleet_exec,
file_stat,fleet_git,send_file, andset_mode. Cron jobs live on the
host, so a nightlypg_dumpkeeps running even if the relay link drops.
Payloads are end-to-end encrypted (AES-GCM-256) — the relay forwards
ciphertext blind.
Table of Contents
- Architecture: four hosts, one room
- Installing agents and tagging db hosts
- Quick summary of the seven recipes
- Why and when to use this
- Recipe 1 — Tag db hosts and look around in plan mode
- Recipe 2 — Schedule a nightly pg_dump with retention
- Recipe 3 — Verify backups are fresh across the fleet
- Recipe 4 — Run a schema migration across replicas safely
- Recipe 5 — Replication-lag and health checks
- Recipe 6 — Read a prod .env read-only
- Recipe 7 — Ship a dump host→host, SHA-256 verified
- FAQ
- Wrapping up
Architecture: four hosts, one room
Let's take a realistic small-SaaS database tier: a Postgres primary, two
streaming replicas for read scaling and failover, and an offsite box that only
stores compressed dumps. We'll also nod to a Windows SQL Server host at the end
to show the cross-platform story.
| Host | Role | Tag | Stack |
|---|---|---|---|
db-primary |
Postgres primary (writes) | db,primary |
PostgreSQL 16, Linux |
db-replica-1 |
Streaming replica | db,replica |
PostgreSQL 16, Linux |
db-replica-2 |
Streaming replica | db,replica |
PostgreSQL 16, Linux |
backup-box |
Offsite dump target | backup |
rsync, gzip, Linux |
sql-win |
SQL Server (optional) | db,windows |
SQL Server 2022, Windows |
All agents join one relay room (say dbfleet). Tags let you address a group
in a single call: target="db,replica" hits both replicas, target="os:linux"
hits every Linux box, and target="all" sweeps the whole fleet. Note that
multi-tag targets match a host carrying either tag, so db,primary resolves
to every database host.
┌────────────── AI (Claude / opencode) ──────────────┐
│ remote-agents (MCP, stdio) │
└───────────────────────┬────────────────────────────┘
│ wss:// (E2E-encrypted)
┌───────┴────────┐ relay (CF Worker or self-hosted)
│ room=dbfleet │
┌───────────┬───────┴────┬────────────┬─────────────┐
db-primary db-replica-1 db-replica-2 backup-box sql-win
(db,primary) (db,replica) (db,replica) (backup) (db,windows)
The relay is interchangeable: use the hosted Cloudflare Worker, or run your own
Rust relay with remote-agents-relay --bind 0.0.0.0:8080 and point agents at
ws://your-host:8080. Either way the relay only ever sees encrypted frames.
Installing agents and tagging db hosts
On each host you install the package once and start the agent with the right
tags. For 24/7 database servers you'll want the background service form so the
agent survives reboots:
# once on every machine
npm i -g remote-agents
# the Postgres primary
remote-agents run --relay wss://<relay> --room dbfleet --token <secret> \
--name db-primary --tags db,primary
# the two replicas
remote-agents run ... --name db-replica-1 --tags db,replica
remote-agents run ... --name db-replica-2 --tags db,replica
# the offsite backup target
remote-agents run ... --name backup-box --tags backup
# install as a systemd service for 24/7 hosts (instead of `run`)
remote-agents install --relay wss://<relay> --room dbfleet --token <secret> \
--name db-primary --tags db,primary
Then confirm the whole fleet is online. Ask your AI:
"List the agents in the room."
Under the hood that calls list_agents, which returns each peer's OS family,
distro, kernel, shell, tags, and an update_available flag when a newer agent
version is out. It's a flat peer network — there is no controller node; every
host both executes and dispatches.
Also read: Run a security audit across your whole fleet
Quick summary of the seven recipes
| # Recipe | What it does |
|---|---|
| 1. Plan-mode look | Tag db hosts, set them to read-only plan, and inspect Postgres safely before touching anything. |
| 2. Nightly pg_dump |
schedule_add a 0 3 * * * * host-local cron that dumps + gzips and prunes anything older than 14 days. |
| 3. Verify freshness |
fleet_exec + file_stat to confirm the dump exists, its size, and its mtime on every host at once. |
| 4. Replica migration |
fleet_git pull migrations + fleet_exec across db,replica with per-host results so a failed node is obvious. |
| 5. Lag / health check |
fleet_exec target="db,replica" running pg_stat_replication to catch lag and broken streaming. |
| 6. Read prod config |
read_file a production .env in plan mode — zero write risk. |
| 7. Ship a dump |
send_file a dump from primary to backup-box over a direct UDP channel, SHA-256 verified. |
Why and when to use this
Database fleets are where "I'll just SSH in real quick" goes to die. A single
forgotten retention prune fills a disk; a silent pg_dump failure isn't noticed
until the day you need the dump; a schema migration applied to the primary but
not the replicas causes mysterious read errors hours later. Here's where driving
the fleet through one AI interface pays off:
-
Cron that survives the network.
schedule_addinstalls the cron on the host itself, so your nightly0 3 * * * *dump runs even if the relay link is down, your laptop is closed, or the AI session has long ended. The schedule is not tied to your connection. -
Per-host truth in one call.
fleet_execandfile_statgive you the dump size and mtime on all four boxes in a single round trip instead of four SSH sessions. One failing host does not sink the batch — it just shows up red in the aggregated result. -
Replica maintenance without drift.
fleet_git+fleet_execagainst thedb,replicatag apply the same migration to both replicas and report each one separately, so a half-applied change is impossible to miss. -
Safe-by-default reads.
planmode makes a host read-only, so you can pull a prod.envor runpg_stat_replicationduring an incident with no chance of fat-fingering a write. -
Cross-platform reach.
target="os:linux"hits the Postgres boxes; a Windows SQL Server host joins the same room and answersos:windowstargeting with asqlcmdbackup instead ofpg_dump.
This is not a replacement for a declarative backup orchestrator or a managed
RDS-style service. It shines for self-hosted databases, small-to-mid fleets,
dev/staging tiers, and the operational glue around backups that nobody wants to
maintain in YAML.
Recipe 1 — Tag db hosts and look around in plan mode
Before automating anything, look first and touch nothing. Set every database
host to plan mode — read-only read_file, git_status, and safe exec
only — and get oriented.
- Flip the database hosts to read-only.
- Confirm Postgres is up and check current data directory size.
- Eyeball where existing backups (if any) land.
set_mode target="db,replica" mode=plan
set_mode agent_id=db-primary mode=plan
fleet_exec target="db,primary" command="systemctl is-active postgresql && psql -tAc 'SELECT version();'"
fleet_exec target="db,primary" command="du -sh /var/lib/postgresql/16/main; df -h /var"
list_dir agent_id=backup-box path=/srv/backups
Tip:
planmode still allows a read-onlyexec, sopsql -tAc 'SELECT ...'
anddu -shwork fine. The moment a command would write, the agent rejects it.
Start every incident here — you literally cannot break prod fromplan.
The aggregated reply comes back per host: you'll see db-primary reporting
active and a 240 GB data directory, while a replica that's catching up might
report differently. That per-host shape is the whole point — no guessing which
box you're looking at.
Recipe 2 — Schedule a nightly pg_dump with retention
This is the core of scheduled pg_dump automation. We install a host-local
cron on db-primary that dumps the database, gzips it, and prunes anything
older than 14 days. Because schedule_add writes the cron to the host, it
keeps firing at 03:00 every night whether or not anyone is connected.
Remember the cron is a 6-field spec — sec min hour day month dow — so
"3 AM nightly" is 0 0 3 * * *.
- Switch the primary to
editso the agent may create the dump file and the wrapper script. - Add the nightly dump-and-prune schedule.
- List schedules to confirm it registered.
# the command the cron runs on db-primary (one line, gzip + 14-day prune)
pg_dump -Fc -U postgres app_prod \
| gzip > /srv/backups/app_prod-$(date +\%F).sql.gz \
&& find /srv/backups -name 'app_prod-*.sql.gz' -mtime +14 -delete
set_mode agent_id=db-primary mode=edit
schedule_add agent_id=db-primary name=nightly-pgdump \
cron="0 0 3 * * *" \
command="pg_dump -Fc -U postgres app_prod | gzip > /srv/backups/app_prod-$(date +%F).sql.gz && find /srv/backups -name 'app_prod-*.sql.gz' -mtime +14 -delete"
schedule_list agent_id=db-primary
The -Fc custom format gives you a compressed, pg_restore-friendly archive; a
240 GB cluster typically lands around a 30–45 GB gzipped dump depending on how
much of it is indexes and TOAST. The -mtime +14 -delete clause keeps roughly
two weeks of dumps and nothing more, so the disk doesn't quietly fill.
Note: for a mysqldump cron the only thing that changes is the command —
mysqldump --single-transaction app_prod | gzip > ...— wrapped in the exact
sameschedule_add. And for the Windows SQL Server host you'd schedule a
sqlcmd -Q "BACKUP DATABASE ..."instead. The scheduler, retention, and
verification flow are identical across all of them.
To remove the schedule later (say you've migrated to WAL archiving):
schedule_remove agent_id=db-primary name=nightly-pgdump
Recipe 3 — Verify backups are fresh across the fleet
A backup you never check is a backup you don't have. This recipe answers the only
question that matters at 9 AM: did last night's dump actually run, and is it the
right size, on every host? We combine fleet_exec to find the newest dump with
file_stat to read its exact size and mtime.
- Find the newest dump file on each db host.
-
file_statthat file for an authoritative size and modification time. - Flag anything older than ~26 hours as stale.
fleet_exec target="db,primary" \
command="ls -t /srv/backups/app_prod-*.sql.gz 2>/dev/null | head -n1"
file_stat agent_id=db-primary path=/srv/backups/app_prod-2026-06-18.sql.gz
fleet_exec target="db,primary" \
command="find /srv/backups -name 'app_prod-*.sql.gz' -mmin -1560 -printf '%p %s bytes\n' || echo NO_FRESH_BACKUP"
file_stat returns the size and mtime directly, which is more trustworthy than
parsing ls output. The -mmin -1560 window (26 hours) gives the 03:00 job some
slack — anything that didn't write a fresh file in that window prints
NO_FRESH_BACKUP, and because results are aggregated per host, a replica
that skipped its dump stands out instantly while the healthy boxes report a clean
~42 GB file.
Important: size is your early-warning signal. A dump that suddenly drops
from 42 GB to 200 KB almost always meanspg_dumperrored out mid-run (bad
credentials, a dropped connection, a full disk) and gzipped only the error.
Watching size over time catches silent failures that a green exit code hides.
You can fold this into a once-a-day self-check by wrapping the find in another
schedule_add that appends to a log — same host-local cron pattern as Recipe 2.
Also read: Build a cross-platform CI test farm
Recipe 4 — Run a schema migration across replicas safely
Replicas in database fleet management drift the moment a migration lands on
one box but not another. With fleet_git to pull the migration repo and
fleet_exec to apply it across the db,replica tag, both replicas move in
lockstep and you get a per-host verdict.
On true streaming replicas you don't run DDL directly (they're read-only and
replay WAL from the primary). This recipe fits the common real-world setup where
"replica" hosts also run migration tooling against a logical target, or where you
roll out an out-of-band maintenance script. Adjust to your topology.
- Pull the latest migrations on both replicas at once.
- Apply them, capturing per-host success/failure.
- Re-check that both ended on the same schema version.
fleet_git target="db,replica" op=pull repo=/srv/migrations remote=origin branch=main
fleet_exec target="db,replica" \
command="cd /srv/migrations && ./run_migrations.sh 2>&1 | tail -n 5 || echo MIGRATION_FAILED"
fleet_exec target="db,replica" \
command="psql -tAc 'SELECT max(version) FROM schema_migrations;'"
The payoff is the per-host aggregation: if db-replica-2 returns
MIGRATION_FAILED while db-replica-1 reports the new version, you know exactly
which node to fix — no scrolling through interleaved SSH output trying to figure
out whose error you're reading. One failing replica does not block the batch from
finishing on the healthy one.
Tip: put the hosts in
editmode (notbypass) for migrations. Inedit,
the agent auto-creates a backup before any overwrite, so a botched
write_fileto a config or script leaves you the original. Drop back toplan
the moment you're done.
Recipe 5 — Replication-lag and health checks
The single most useful replica maintenance check is replication lag: how far
behind the primary each replica is. From the primary you query
pg_stat_replication; from each replica you check pg_last_wal_replay_lag.
Both are one fleet_exec away.
- From the primary, list connected replicas and their byte lag.
- From each replica, read its replay delay.
- Alert on anything beyond your threshold.
fleet_exec target="db,primary" \
command="psql -xtAc \"SELECT client_addr, state, pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes FROM pg_stat_replication;\""
fleet_exec target="db,replica" \
command="psql -tAc 'SELECT now() - pg_last_xact_replay_timestamp() AS replay_lag;'"
fleet_exec target="db,replica" \
command="psql -tAc 'SELECT CASE WHEN pg_is_in_recovery() THEN 1 ELSE 0 END;' | grep -q 1 && echo OK_RECOVERY || echo NOT_A_REPLICA"
Run from plan mode — these are all read-only queries, so there's zero risk even
on a busy primary. The first query, sent to db-primary, shows you both replicas
as rows with their state (streaming is what you want) and lag_bytes. The
second, fanned out across the db,replica tag, returns each replica's replay lag
side by side. A replica that has fallen out of streaming shows up either as a
missing row in the primary's view or a ballooning replay_lag.
Note: a healthy LAN replica usually sits under a few hundred milliseconds
of replay lag. A sudden jump to tens of seconds (or a replica vanishing from
pg_stat_replicationentirely) means streaming broke — often a WAL retention
or network issue. Catching it here, across both replicas in one call, beats
discovering it when a read query returns stale data.
Recipe 6 — Read a prod .env read-only
Sometimes you just need to confirm a connection string or a backup target path
without any chance of editing it. With the host in plan mode, read_file
hands you the file contents and nothing about the session can write.
- Ensure the host is in
planmode. -
read_filethe config. - Optionally grep for one key with a read-only
exec.
set_mode agent_id=db-primary mode=plan
read_file agent_id=db-primary path=/srv/app/.env
exec agent_id=db-primary command="grep -E '^(PGHOST|PGDATABASE|BACKUP_DIR)=' /srv/app/.env"
Because the agent is read-only, even if you (or the AI) followed up with a
write_file, it would be rejected. On top of that, a hard deny-list —
covering paths like /etc/shadow and /boot — applies on every host in
every mode, including bypass, so the truly sensitive system files are never
readable or writable through the agent regardless of how you set things up.
Important:
planmode is the right default for production database hosts.
Keep them inplanday to day and only flip a single host toeditfor the
specific window you're changing something — then flip it straight back. It
turns "I'll be careful" into "I literally can't write right now."
Recipe 7 — Ship a dump host→host, SHA-256 verified
Finally, get the dump off the primary and onto the offsite backup-box.
send_file streams a file host→host over a direct UDP data channel (opened on
demand, with automatic relay fallback) and verifies it end-to-end with a
SHA-256 checksum. Because the receiving side performs a write, the
destination host needs edit or bypass.
- Put
backup-boxineditso it can write the incoming file. -
send_filethe latest dump fromdb-primarytobackup-box. - Confirm arrival and size with
file_staton the receiver.
set_mode agent_id=backup-box mode=edit
send_file agent_id=db-primary \
path=/srv/backups/app_prod-2026-06-18.sql.gz \
to=backup-box \
dest=/srv/offsite/app_prod-2026-06-18.sql.gz
file_stat agent_id=backup-box path=/srv/offsite/app_prod-2026-06-18.sql.gz
The transfer goes peer-to-peer where the network allows, so a 42 GB dump doesn't
have to round-trip through the relay — and if a direct UDP path can't be
established, it transparently falls back to the relay so the copy still
completes. The SHA-256 check means you find out about a corrupted transfer
immediately, not when a restore fails three weeks later. If you'd rather pull
from the receiving side, transfer_get is the mirror-image tool.
Tip: in the browser fleet-chat panel there's a Files view that does the same
move with a live progress bar, plus a binary-safe chunked download of any file
through the relay. Handy when you want to eyeball a dump's size or grab one to
your laptop without dropping to the CLI.
Set backup-box back to plan when the copy is done, so the offsite store is
read-only at rest:
set_mode agent_id=backup-box mode=plan
FAQ
How is this different from a cron job I'd write myself?
It's the same idea, but the cron is installed and managed through one AI
interface across the whole fleet, and schedule_add registers it on the host
so it survives relay outages and reboots. The difference is operational: you add,
list, verify, and remove backup schedules on four boxes from one place, and you
get per-host confirmation that each dump actually ran, instead of trusting four
independent crontabs you never look at.
Will my nightly pg_dump still run if the relay or my laptop is offline?
Yes. schedule_add writes a host-local cron that lives entirely on the database
server. The relay link and your AI session are only needed to create, list, or
remove the schedule — not to run it. Your 0 0 3 * * * dump fires at 03:00
regardless of whether anything is connected.
Is it safe to point this at a production database?
Start every host in plan mode, which is read-only — you can run
pg_stat_replication, read_file a .env, and check dump freshness with zero
write risk. Flip a single host to edit only for the specific change you're
making (edit auto-backs-up before overwriting), and a hard deny-list on paths
like /etc/shadow and /boot applies even in bypass. All command payloads and
results are end-to-end encrypted; the relay forwards ciphertext and never sees
your data or keys.
Can I back up MySQL or SQL Server the same way?
Yes — the workflow is database-agnostic. Swap the command inside schedule_add:
mysqldump --single-transaction ... | gzip for MySQL, or
sqlcmd -Q "BACKUP DATABASE ..." on a Windows host targeted via os:windows.
The scheduling, 14-day retention prune, freshness verification, and SHA-256
transfer all stay identical.
How do I keep the agents themselves up to date across the fleet?
Run fleet_update_check, which tells you which idle hosts have a newer agent
version available, then run npm i -g remote-agents@latest on those hosts. It
won't interrupt a host mid-task, so you can update the fleet without disrupting a
running backup or migration.
Wrapping up
One MCP interface, five hosts, zero manual SSH sessions: a nightly scheduled
pg_dump with 14-day retention runs host-local on the primary, freshness is
verified across the fleet with file_stat, replicas get migrations and lag checks
in lockstep, prod config is read safely in plan mode, and the dump lands offsite
with a SHA-256 guarantee. That's what automated database backups across servers
should feel like in 2026 — boring, observable, and running whether you're watching
or not. Add cross-platform targeting and the same recipes cover your MySQL and SQL
Server hosts too.
Install:
npm i -g remote-agents→
package on npm ·
source and docs
Top comments (0)