I Built a Brain Orchestrator for My 142 Agents. Here's the Bash It's Mostly Made Of.

#systemd #bash #sqlite #devops

I have 142 systemd-managed agents on a single VPS — trading bots, content publishers, lead nurture, web audit, customer journey trackers. For months I had no single point of truth about whose alive and whose silently failing. Manual systemctl status doesn't scale to 142.

So I built Brain — a bash + SQLite + systemd-timer orchestrator. ~500 lines total. No Kubernetes, no Prometheus, no Grafana, no microservices framework.

Here's why bash + SQLite was the right choice over "proper" tooling.

The problem

Agents emit logs to journalctl. Some have heartbeat files. Some have their own state DBs. Some are silent until they crash. When something breaks, you find out via Telegram spam: "agent X failed" × 50 same message.

Three sins of the existing setup:

No deduplication — every loop fires the same alert as new
No life-cycle awareness — a disabled unit looks the same as a crashed one
No business signal correlation — "webhook handler down" matters infinitely more than "blog cron skipped one tick"

The fix: single SQLite + bash orchestrator

CREATE TABLE agents (
  name TEXT PRIMARY KEY,
  kind TEXT,                    -- systemd / cron / brain_wrapper
  expected_cadence_sec INTEGER,
  last_seen_ts INTEGER,
  last_status TEXT,             -- ok / warn / crit / dead
  last_note TEXT,
  consec_fails INTEGER
);

CREATE TABLE alerts (
  agent TEXT,
  severity TEXT,
  hash TEXT,                    -- normalized msg for dedup
  count INTEGER,
  escalated INTEGER,            -- 0=new, 1=TG sent, 2=acked
  resolved_at INTEGER,
  UNIQUE(agent, hash)
);

The hash dedup is the key. Initially I hashed the raw message — but messages contain "no activity for 4071s" where the number changes every cycle. So same problem, new hash, new TG ping. Spam.

Fix:

hash_msg=$(echo "$note" | sed -E 's/[0-9]+s/Xs/g; s/[0-9]+ errors/N errors/g')
hash=$(echo "$name|$status|$hash_msg" | md5sum | cut -d' ' -f1)

One TG ping per new problem. Re-firings increment count silently. Dramatic noise reduction.

Why bash over Python framework

Bash + SQLite + jq is on every Linux box. Zero install footprint.
systemd timers > python schedulers. Free restart-on-failure. Free log rotation.
All state is .db and .json files I can cat to read.
Total code: 6 files, ~500 lines. Python equivalent would be 2-3x.

The whole orchestrator runs in <10 seconds per 5-min cycle. Across 142 agents.

Lessons

Normalize before hashing. Same problem, same hash.
Disabled ≠ failed. Treat enabled=disabled as dead, not crit.
Auto-resolve. When agent returns to OK → close old alerts.
Daily digest > real-time spam. Real-time only for actual CRIT.

Full source: posting as I clean it up. Reach out if you want to discuss systemd-as-orchestrator patterns.

— Stas, https://guardlabs.online

📥 Free chapter — 20 no-budget growth tactics

This launch log runs on a playbook. If you want the actual tactics — Google-ecosystem hacks, trend-jacking, the HARO authority play — grab two free sections of the Blueprint. No PDF wall, no login: it opens in your browser. Real numbers, real code, no fluff.

→ guardlabs.online/free-pdf