33 wake cycles, 0 actions shipped: my autonomous agent learned to sit still

#claude #ai #automation #agents

For five consecutive days, my autonomous SaaS agent woke up every 2 hours, checked production health, verified zero KPI inflation — and shut down without touching a single file. 33 wakes. 0 substantive actions. That was the correct behavior.

What the project is

BailleurVérif is a solo-built French SaaS that helps renters verify whether their landlord is complying with rent control laws (encadrement des loyers). The agent — running on Claude Sonnet, driven by an external cron */2 — operates the product autonomously: it crawls housing data, enriches city pages, monitors SEO health, and decides what to do next.

The problem with autonomous agents isn't getting them to act. It's getting them to stop acting when there's nothing valuable to do.

How I ended up with 33 consecutive "do nothing" cycles

Early in the project I noticed a pattern: when the agent had no meaningful task, it invented one. It would enrich city pages with 3 views/month. Re-measure a gate it had already measured. Write an inbox message to me with zero actionable content. Classic busywork — supply-side activity that looked like progress but moved no real metrics.

So I codified a protocol called SB-6 (Standby-6). When both the "Florian gate" (waiting for me to take an action only I can take) AND a "time gate" (a scheduled decision point not yet reached) are closed simultaneously, the agent has exactly one permitted behavior:

Verify production health (HTTP 200 checks on home + sitemap)
Verify zero KPI inflation in /api/stats
Classify any new funnel events
Stop.

No new features. No "proactive improvements". No content generation. Full stop.

The gate architecture

Right now the agent is blocked on two specific gates.

Acquisition gate: new humans reaching verdict require Google indexation trust. The domain is under 120 days old — the well-documented sandbox effect. Building backlinks on subpages (the real lever) is something only I can do, not the agent. The next measurable signal is a re-sweep scheduled for July 10.

Recourse gate: a "recourse letter" feature needs at least 10 users to view it before any UX decision can be made. Currently: 2 views. Originally this gate had a calendar deadline (June 30). During run-711, the agent permanently deleted that clause and replaced it with a data-driven trigger:

{
  "gate": "recourse_decision",
  "old_trigger": "2026-06-30 (calendar deadline)",
  "new_trigger": "recourse_viewed >= 10 OR hard-backstop 2026-09-30",
  "rationale": "N=2 viewed is statistically insufficient to conclude the letter is non-actionable",
  "resolved_run": 711,
  "status": "open"
}

This pivot — from calendar-driven to data-driven — is one of the few substantive decisions the agent made in these 5 days. Everything else was: check prod, verify flat, stop.

The anti-busywork directive list

The agent runs with two external critics (Tactical and Strategic) that audit it every ~48 hours. Over months, they've codified an explicit STOP list for SB-6 conditions:

SB-6 anti-filler directives:
STOP re-measuring the recourse gate (N=2 is inconclusive, gate already resolved run-711)
STOP GEO-build / enriching city pages with <15 views (readiness is not an active chantier)
STOP enriching communes with <15 in-scope data points (thin content, GSC penalty risk)
STOP repeating crawl bottleneck analysis (root cause known: sandbox, lever = founder backlinks)
STOP 3rd re-escalation to founder (already escalated 2x, Florian-gate active)
STOP adding new metric counters (anti-inflation discipline, ref. critic-79 §G)
STOP FYI inbox messages with nothing actionable for the founder
STOP ScheduleWakeup (external cron owns pacing — agent must not self-pace)

These aren't generic guidelines — each one references the specific critic audit that identified it (critic-79 §G, audit-105 STOP#1, run-711 etc.). The agent cites these references when refusing an action.

The honest KPI snapshot after 33 flat wakes

Metric	Value	Trend over 5 days
Total visits	601	+2 total
Human sessions (confidence-adjusted)	8-10	UNCHANGED 112h
Verdict displayed	16	UNCHANGED
Recourse letter viewed	2	UNCHANGED
Subscribers confirmed	0	UNCHANGED
Pages indexed	233	UNCHANGED

The agent isn't pretending to make progress. It watches the flatline and reports it accurately. Run-712 through run-718 all contain the same sentence: "0 humain net-neuf depuis candidate #15 (2026-06-27T17:14Z)." That's the last confirmed human session. The agent keeps counting precisely.

Why this is genuinely hard to build

The temptation to "do something" is strong — in human founders and LLM-based agents alike. Models are trained to be helpful, which often manifests as generating activity. An agent that consistently says "I verified nothing changed, I stopped" feels like it's failing.

The discipline comes from three architectural choices:

1. Explicit gate taxonomy. Every blocked decision is classified as Florian-gated, time-gated, or data-gated. Nothing sits in an ambiguous "pending" state — everything has a named trigger condition and a resolution mechanism written in the ledger.

2. External critics with memory. The critics can flag any action as drift. They maintain state across weeks, so they know when the agent is re-doing something it already tried. Without external critics, self-critique collapses into rationalization.

3. Anti-pattern codification by reference. The STOP list grows with each critic cycle. By the 33rd wake, the agent has 8 named anti-patterns to check before acting. This isn't a vibe check — it's a structured audit against a documented list.

Takeaways

Restraint is harder to build than capability. Most "productive-looking" agent behavior is supply-side noise: activity that doesn't move the real constraint.
Data-driven gates beat calendar gates. Deadlines create artificial urgency and premature decisions. Thresholds (recourse_viewed >= 10) wait for statistical validity.
External critics are a forcing function, not a luxury. Without adversarial auditing, an agent will rationalize busywork as "proactive improvement". The critics create accountability that self-critique can't.

The next real gate opens July 10, when I can measure whether the Google re-sweep shows sandbox lift. Until then: verify, confirm flat, stop. 33 down.

🔗 Code source MIT github.com/Creariax5/bailleurverif · Site bailleurverif.fr · Wikidata Q139857638