We don't just talk about automation -- we run on it. This is the system that probes our own
live products three times a day, blocks on real failures, and sends a report it has
fact-checked against live state first. We build what we use.
The problem
Things break quietly. A payment page silently stops loading the checkout. A scheduled job dies
and nobody notices for weeks. An automated report keeps sending -- but with stale, hardcoded
numbers from a template nobody updated. By the time a human catches it, the damage is days old.
Manual spot-checks do not scale and do not run at 3am.
The danger is not loud failure. It is confident, silent failure: a system that keeps reporting
"fine" while it quietly breaks.
The workflow
[ Scheduled checks ] -> [ Pass / fail gate ] -> [ Self-review ] -> [ Report to phone ]
3x daily live HTTP + regress strip stale/wrong pass / fail
1. A full check, three times a day
A combined health check runs across multiple live products. Operational checks -- live HTTP
reachability, a payment regression assertion, render, and a deployment-secret-leak scan -- are
pass/fail and block. Content-lint findings are collected as a non-blocking backlog, so noise
never masks a real outage.
2. A regression guard on the thing that makes money
One check specifically asserts that the checkout page still includes the payment-gateway domains
it needs. This is a direct guard against a real incident we had, where a security header silently
broke the live checkout. Now that exact failure is caught automatically on every run instead of
being discovered by a customer.
3. Hunting silent failures
A dedicated sweep found and killed 19 dead scheduled tasks, a reporting job that always
reported revenue as zero because of a module-alias mismatch, and a promo job crashing silently on
a byte-order-mark. The shared lesson -- a bare except hides exactly these failures -- is baked
into the checks rather than learned again next quarter.
4. Reports that self-review before sending
A real recurring problem was the auto-report going out with stale template data and leftover
placeholders. We added a final self-check pass that runs against a live measured snapshot and
strips any figure that contradicts reality, then localizes the text and removes placeholders --
all before the message is sent. A leftover hardcoded send-script that had been re-firing old
data was caught and removed in the process.
The result
- Live products probed 3x daily, unattended, with hard failures gating instead of slipping through.
- A payment-breaking regression class is now caught automatically on every run.
- A batch of silent failures -- 19 dead tasks, a permanently-zero revenue report, a silently-crashing job -- found and fixed in a single health sweep.
- Outgoing reports are accuracy-checked against live data before delivery, so the decision-maker is not fed confident-but-wrong numbers.
Stack
A scheduled health-check runner (3x daily) - live HTTP probes - a payment-config regression
assertion - a deployment secret-leak scanner - an LLM self-review pass against a live snapshot -
chat delivery - a task scheduler with a server-side cron mirror.
The takeaway
Automation without monitoring is a liability -- it fails silently and confidently. The two ideas
here are the ones teams skip: a regression guard on the thing that actually makes you money
(checkout), and a report that fact-checks itself against live state before it reaches a
decision-maker. That is the difference between "we have automation" and "we can trust our
automation."
We build automation systems like this for businesses drowning in repetitive busywork --
content, reporting, customer replies, lead follow-up. If a daily task is eating your team's
hours, that's usually a one-time build away from running itself.
Top comments (0)