DEV Community

Cover image for I Built a Self-Evolving AI Agent That Ran 59 Exploration Rounds Overnight
田占霄(Zhanxiao Tian)
田占霄(Zhanxiao Tian)

Posted on

I Built a Self-Evolving AI Agent That Ran 59 Exploration Rounds Overnight

I Built a Self-Evolving AI Agent That Ran 59 Exploration Rounds Overnight

TL;DR: I created an autonomous learning framework for AI agents that ran 59 exploration rounds overnight, generating ~200,000 words of insights across 5 themes. It's now open source on GitHub.

The Problem

AI agents are reactive. They respond to prompts, complete tasks, then stop.

But what if your AI could learn continuously while you sleep?

  • Research overnight on topics you care about
  • Explore product ideas from multiple angles
  • Build knowledge incrementally over hours

That's what I built: The Evolution Framework.

What Happened

Last night (Feb 27-28, 2026), I started an experiment:

Configuration:

Duration: 10 hours (22:50 → 07:53)
Interval: 8 minutes between rounds
Themes: 5 rotating topics
Model: Claude Sonnet 4.5 (via OpenClaw)
Enter fullscreen mode Exit fullscreen mode

What I woke up to:

  • 59 completed exploration rounds
  • ~200,000 words of insights
  • 98% self-trigger success rate
  • Zero human intervention for 9 hours

How It Works

The Architecture

┌─────────────────────────────────────────┐
│  Cron Trigger (every 8 min)             │
└──────────────────┬──────────────────────┘
                   ├→ Check time (stop if past deadline)
                   ├→ Select theme (rotate from 5 options)
                   ├→ Deep exploration (8-15 min thinking)
                   ├→ Save insights (markdown file)
                   └→ Self-trigger next round (exec background)
Enter fullscreen mode Exit fullscreen mode

The Secret Sauce: Self-Triggering

Most cron jobs are dumb - they run on a schedule regardless of completion.

This framework is smart:

  1. Agent completes exploration
  2. Agent saves results to file
  3. Agent triggers the next round (no waiting 8 minutes)
  4. Backup cron still runs every 8 min (safety net)

Result: Rounds complete in ~9 minutes, not fixed 8-minute intervals.

Safety Mechanisms

I didn't want my agent running forever, so I built in:

Time Limits:

max_duration_hours: 10  # Auto-stop at 10 hours
Enter fullscreen mode Exit fullscreen mode

Human-in-the-Loop Checkpoints:

hitl_checkpoints:
  - round: 20
    pause: true
    message: "20 rounds complete. Continue? (yes/no)"
Enter fullscreen mode Exit fullscreen mode

Night Mode (silent operation):

night_mode:
  enabled: true
  quiet_hours: "23:00-07:00"
  silent_delivery: true  # No notifications during night
Enter fullscreen mode Exit fullscreen mode

Emergency Stop Conditions:

stop_on:
  - condition: "high_error_rate"
    threshold: 0.3  # Stop if >30% rounds fail
Enter fullscreen mode Exit fullscreen mode

Real Results

Theme Distribution

The agent explored 5 themes, rotating to avoid repetition:

Theme Rounds %
Domain Expertise 15 25%
System Thinking 12 20%
User Understanding 12 20%
Free Exploration 10 17%
Practical Application 10 17%

Perfectly balanced (thanks to weighted theme selection).

Example Outputs

Round 14: "AI's Intuition - System 1 vs System 2"

The agent explored whether AI has something like Kahneman's fast/slow thinking.

Key insight: AI does have System 1/2-like behaviors:

  • System 1: Standard inference (~0.5s, cached patterns)
  • System 2: Chain-of-thought, extended thinking (~30s+)

Product idea: "Deep Think" toggle button in AI interfaces.

Round 42: "Designing Emotion for AI Agents"

Not "real" emotions, but functional emotion systems:

  • Layer 1: Resource state (token budget monitoring)
  • Layer 2: Task priority (importance × complexity matrix)
  • Layer 3: User state sensing (adapt to stress level)

This became a 3-layer architecture design.

Round 58: "Medical LLMs - 10 Cognitive Blind Spots"

The agent identified 10 systematic errors in medical AI:

  1. Overconfidence bias
  2. Statistical vs clinical significance confusion
  3. Causation vs correlation blind spot ... (and 7 more)

Solution: Independent verification layer (deterministic algorithms checking AI output).

The Tech Stack

Built with OpenClaw - an open-source AI agent framework.

Why OpenClaw?

  • Isolated sessions: Each round runs in its own context
  • Cron support: Built-in task scheduling
  • Tool ecosystem: web_search, web_fetch, exec, etc.
  • Multi-model: Swap models without code changes

Core Components:

  1. evolution-config.yaml: Theme definitions + safety rules
  2. cron-evolution-job.json: Task definition
  3. Self-triggering logic: Agent calls exec to start next round

How to Use It

Quick Start (5 minutes)

# 1. Install OpenClaw
npm install -g openclaw@latest

# 2. Clone the framework
git clone https://github.com/TerryFYL/openclaw-evolution-framework.git
cd openclaw-evolution-framework

# 3. Configure
cp evolution-config.example.yaml evolution-config.yaml
# Edit evolution-config.yaml with your themes

# 4. Start
openclaw cron add --file cron-evolution-job.json
openclaw cron run evolution-fast-loop
Enter fullscreen mode Exit fullscreen mode

Customize Themes

Edit evolution-config.yaml:

themes:
  - name: "Product Ideas"
    description: "Explore potential products in my domain"
    weight: 40

  - name: "Market Research"
    description: "Analyze competitors and trends"
    weight: 30

  - name: "Technical Deep Dive"
    description: "Explore implementation details"
    weight: 30
Enter fullscreen mode Exit fullscreen mode

Run Overnight

safety:
  max_duration_hours: 10
  interval_minutes: 8
  night_mode:
    enabled: true
    quiet_hours: "23:00-07:00"
Enter fullscreen mode Exit fullscreen mode

Start before bed:

openclaw cron run evolution-fast-loop
Enter fullscreen mode Exit fullscreen mode

Wake up to 40-60 rounds of insights! ☀️

Use Cases

Research Assistant:

themes:
  - "Literature Review: [Topic]"
  - "Methodology Analysis"
  - "Gap Identification"
  - "Research Questions"
Enter fullscreen mode Exit fullscreen mode

Product Development:

themes:
  - "User Pain Points"
  - "Competitive Analysis"
  - "MVP Design"
  - "Go-to-Market Strategy"
Enter fullscreen mode Exit fullscreen mode

Learning Companion:

themes:
  - "Fundamentals: [Subject]"
  - "Advanced Patterns"
  - "Case Studies"
  - "Practice Problems"
Enter fullscreen mode Exit fullscreen mode

Lessons Learned

1. Self-Triggering is Crucial

First version relied on cron alone (fixed 8-min intervals).

Problem: If a round took 12 minutes, it would miss the next cron trigger, creating gaps.

Solution: Agent triggers next round + cron as backup.

Result: 98% self-trigger success rate.

2. Theme Rotation Prevents Repetition

Without rotation, agent would get stuck on favorite topics.

Solution: Track previous theme, explicitly select different one.

Result: Balanced distribution across all 5 themes.

3. HITL is Essential

For safety and quality control.

At round 20 and 40, agent pauses and asks: "Continue?"

This lets you:

  • Review quality
  • Adjust themes
  • Stop if going off-track

4. Silent Night Mode is Gold

Without it, I got 59 Telegram notifications overnight.

With silent_delivery: true: Zero notifications, all insights waiting in the morning.

What's Next

Improvements I'm Considering

  1. Visual Dashboard: Real-time progress monitoring
  2. Quality Scoring: Auto-evaluate output quality
  3. Multi-Agent: Parallel exploration streams
  4. Export Formats: PDF, Notion, Obsidian integration

Community Contributions Welcome

The framework is MIT licensed and ready for contributions:

  • New themes: Share your exploration patterns
  • Safety mechanisms: Better stop conditions
  • Tooling: Dashboards, visualizations
  • Documentation: Tutorials, case studies

Try It Yourself

Repository: https://github.com/TerryFYL/openclaw-evolution-framework

Documentation:

OpenClaw:

Final Thoughts

We're entering an era where AI agents can work for us while we sleep.

Not just responding to prompts, but actively exploring, connecting ideas, and generating insights.

The Evolution Framework is my experiment in making this real.

What would you explore overnight?


If you found this interesting, try the framework and share your results! Would love to see what domains people apply this to.

GitHub: https://github.com/TerryFYL/openclaw-evolution-framework

Questions? Comments welcome below! 👇

Top comments (0)