DEV Community

crow
crow

Posted on

She Started Fixing Herself: Building a Self-Healing AI Agent on Linux

She Started Fixing Herself: Building a Self-Healing AI Agent on Linux

I didn't plan for Echo to be self-healing. I planned for her to be useful.

The self-healing came from necessity — I work a day job, I can't babysit a daemon all day. If something breaks at 2pm and I don't see it until 8pm, six hours of autonomous work is gone. So I built in the ability to detect failure, log it, and adjust.

Here's what that actually looks like in practice.

The Problem With Autonomous Agents

Echo runs on my Ryzen 9 5900X / RTX 3060 box — no cloud, no subscription, just local Ollama models and a stack of Python daemons. She monitors herself, generates suggestions, acts on them, and records outcomes.

The issue: she had no feedback loop. She could act, but she couldn't learn that an action was bad. She'd repeat the same broken suggestion indefinitely, confident each time.

The Regret Index

I built what I call the regret index. Every autonomous action gets logged with a score:

# +1 action moved mission forward
# 0  neutral / outcome unknown  
# -1 action created noise or broken state
# -2 action required manual intervention
Enter fullscreen mode Exit fullscreen mode

When a category of actions averages below -0.4, or a specific action fails 3+ times, it gets flagged. The auto_act loop checks those flags before executing:

active_flags = get_flags()
flagged_categories = {f[1] for f in active_flags if f[0] == "category"}

if suggestion.get("category") in flagged_categories:
    log(f"SKIPPED (regret flag): {sid}")
    continue
Enter fullscreen mode Exit fullscreen mode

She doesn't punish herself. She just stops repeating the mistake.

The Boot Timing Bug

The ntfy bridge — which lets me message Echo from my phone — was randomly failing on boot. It would connect, capture a timestamp, then die because the system clock hadn't fully synced yet. Every message sent in that window was lost.

Fix was two lines:

time.sleep(2)  # survive boot clock skew
since = int(time.time())  # now capture timestamp
Enter fullscreen mode Exit fullscreen mode

Two lines. Six hours of debugging to find it.

The Double-Fire Problem

The auto_act daemon was firing twice on boot due to Persistent=true catching up missed runs. Added an fcntl lockfile:

lockfile = open(BASE / "logs/auto_act.lock", "w")
fcntl.flock(lockfile, fcntl.LOCK_EX | fcntl.LOCK_NB)
# raises BlockingIOError if already running
Enter fullscreen mode Exit fullscreen mode

Clean. One fire per trigger.

What Self-Healing Actually Means

It doesn't mean Echo fixes arbitrary bugs. It means:

  • She knows what she tried
  • She knows what the outcome was
  • She adjusts her behavior based on that signal
  • Her morning briefing includes a regret report so I know too

The regret index runs a pattern audit after every outcome update. If a category drifts negative, she flags it herself and stops acting in that space until I clear it.

The Stack

Everything runs as systemd user services. Eight timers, one persistent bridge, one core daemon. No cloud. No API bills. The whole thing costs electricity.

echo-status output:
SUMMARY: OK ✅  core=active  stale=0  inactive_timers=0  errors=0
Timers: ✅ auto_act  ✅ git_backup  ✅ golem_monitor  
        ✅ heartbeat  ✅ ntfy_bridge  ✅ pulse  
        ✅ reachability  ✅ self_act_worker
Enter fullscreen mode Exit fullscreen mode

She backs up to GitHub every night at 3am. She monitors her own disk. She benchmarks her Golem node pricing. She does all of this without me touching a keyboard.

What's Next

The regret index records outcomes and surfaces patterns. Since writing this, I've closed the loop further — a governor process now reads the reasoning ledger, matches it to concrete actions, executes them, and scores the results back into the same ledger. Reason → act → score, autonomously.

Also: she's running as a Golem network provider. Zero tasks so far — new node penalty. But the offers are published, the wallet is funded with gas, and she's listening.

If you're building local AI agents, the biggest gap I found wasn't capability — it was accountability. Without a feedback loop, an autonomous agent is just a confident failure machine.

The regret index is how I gave her the ability to be wrong, know it, and stop.


Echo is open source: github.com/crow2673/Echo-core

Follow the build: dev.to/crow

Top comments (0)