Javan

Posted on Mar 24

Stop Tuning Your AI. Let It Tune Itself.

#claudecode #ai #productivity #opensource

How a goal tree replaced hundreds of hours of manual AI configuration.

You're a Claude Code power user. Or an OpenClaw user. You've built rules, hooks, agents, skills. Your setup is dialed in.

Until Monday. Someone tweets a new prompting technique. You spend your evening rewriting rules. Wednesday, Claude Code ships a version with features you missed. Another evening gone. Friday, a better agent config shows up on Reddit. You refactor again.

Two weeks pass. Half your changes are outdated.

You're on the AI configuration treadmill. The tools move so fast that mastering one approach takes longer than three better ones take to appear. Your productive hours go toward keeping your AI productive. Not toward your actual work.

And breaking things is part of the deal. OpenClaw users know this: edit SOUL.md, update AGENTS.md, tweak HEARTBEAT.md, and your assistant stops responding. Now you open a second AI session to debug the first one. A recursive trap.

The Org Chart Illusion

A popular pattern right now: model your AI like a company. PM agent, engineer agent, QA agent, reviewer agent. Give them titles and responsibilities. Let them delegate to each other.

Demos look great. But consider the mechanics:

Agents spend tokens talking to each other about doing work, instead of doing the work.
You're applying human organizational structures to AI. Companies need org charts because people have limited bandwidth, require specialization, and can't share context instantly. AI has none of these limits.
The "CEO agent" orchestrating everything is a prompt router burning 10x the tokens.
Output looks busy. Results aren't proportionally better than a single well-configured agent.

We invented org charts because human brains are bottlenecked. AI brains are not. Forcing our constraints onto AI is the wrong abstraction.

A better approach: start with a single agent. Let the system observe where it struggles. When a task keeps failing or burns too many tokens in the main thread, the system creates a specialized subagent for that specific job. Agents appear because the system needs them, not because you drew an org chart on day one.

The Building Blocks Exist. The Spine Doesn't.

The community is moving toward AI autonomy. Look at what appeared in the past few months:

OpenClaw showed that AI can extend its own capabilities without human prompting. 300K+ stars in weeks. An AI that writes its own tools.
everything-claude-code built a system that extracts behavioral patterns from your usage and persists them across sessions. Your AI remembers how you work. Anthropic Hackathon winner.
Karpathy's autoresearch proved AI can run 118 experiment iterations over 12 hours, improving results with zero human input. Autonomous scientific research.

All three point in the same direction: AI should improve itself, not wait for humans to improve it.

But they're disconnected. OpenClaw extends capabilities without knowing whether the extensions helped. Pattern extraction captures behaviors without ranking which ones matter. Autoresearch runs experiments without a system telling it what to try next.

The missing piece: a structure that connects these capabilities and gives them direction. Something that answers "improve toward what?" and "is it working?"

A goal tree.

Goal Trees: Tell AI Where, Not How

Homunculus replaces manual configuration with a goal tree. You define outcomes. The system builds its own path there.

                  🎯 My AI Assistant
            ┌───────────┼───────────┐
            │           │           │
     AI Intelligence   Self-    Productivity
      ┌─────┴─────┐  Improvement    │
      │           │   ┌──┴──┐       │
 News Curation  Morning  Skill  Quality  Automation
                Briefing Evolution

Each node says what you want. Not how.

# architecture.yaml
ai_intelligence:
  purpose: "Stay ahead on AI developments"
  goals:
    news_curation:
      purpose: "Auto-fetch top AI news daily"
      realized_by: # will evolve              ← system fills this in
    morning_briefing:
      purpose: "Deliver a concise report every morning"
      realized_by: # will evolve

self_improvement:
  purpose: "Get smarter over time"
  goals:
    skill_evolution:
      purpose: "Learn new skills from usage patterns"
      realized_by: # will evolve
    quality:
      purpose: "Verify improvements work"
      realized_by: # will evolve

The realized_by field accepts any implementation type:

Implementation	Example	Best for
Script	`scripts/fetch-news.sh`	Standalone automation
Cron / LaunchAgent	`com.app.daily-briefing.plist`	Scheduled tasks
Skill	`skills/tdd-workflow.md`	Behavioral knowledge
Agent	`agents/code-reviewer.md`	Tasks needing specialized AI
Hook	`hooks/pre-commit.sh`	Event-triggered automation
Rule	`rules/security.md`	Behavioral constraints
MCP Server	`mcp-servers/github`	External service integration

Goals stay. Implementations rotate. The same goal evolves through different solutions:

Goal: "Catch bugs before merge"

  Day 1:   (nothing, an aspiration)
  Day 5:   instinct — "remember to run tests"
  Day 12:  skills/tdd-workflow.md — tested, versioned knowledge
  Day 20:  hooks/pre-commit.sh — runs without thinking about it
  Day 30:  agents/code-reviewer.md — AI-powered review

A skill from last week becomes a hook this week. A script becomes an agent next month. The goal tree tracks whether things get done, not how.

Three Inputs, One Engine

Set up Homunculus, use Claude Code normally, and three processes run in the background:

1. Observation

A hook watches your tool usage. Repeated behaviors, like running tests before commits or searching the same error patterns, become instincts: confidence-scored behavioral patterns. Confidence grows with reinforcement. It decays without use (90-day half-life). The system remembers useful habits and forgets stale ones.

2. Implementation Routing

Each instinct gets tagged with the best mechanism to implement it. Not everything should be a skill:

Deterministic, every time? → Hook. Zero AI judgment needed.
Tied to specific files? → Rule. Path-scoped guidance.
Reusable knowledge collection? → Skill. With eval spec and versioning.
Periodic automation? → Script + scheduler. No AI needed at runtime.
Needs isolated context? → Agent. Specialist role.

The system routes each instinct automatically. When it's implemented — as a hook, rule, skill, or script — the instinct gets archived. The implementation is the source of truth now.

3. Nightly Autonomy

This changes the equation. Every night, a scheduled agent runs the full pipeline:

Routes instincts to the right mechanism (hook/rule/skill/script/agent)
Evaluates all implementations — skills via eval specs, hooks via error rates, rules via freshness
Reviews every goal — is the current mechanism still the best one?
Researches better approaches (with cross-night dedup so it doesn't repeat topics)
Runs experiments in isolated git worktrees
Writes a morning report

You get a report like this:

Evolution Report — 2026-03-22

Actions taken:
✅ Created scripts/ai-morning-briefing.sh
✅ Installed LaunchAgent (daily 09:03)
✅ Updated architecture.yaml (2 goals now realized)

Goal Health:
- ai_intelligence:    ✅ healthy
- self_improvement:   ○ collecting patterns...
- productivity:       ○ waiting for usage data

No manual configuration. No midnight YAML editing. No fixing things you accidentally broke.

3 Weeks of Evidence

I ran this on my personal AI assistant, starting from an empty repo. In 3 weeks, the system generated:

Artifact	Count
Behavioral patterns (instincts)	179 (24 active + 155 auto-archived)
Tested skills	10 (100% eval pass, 135 test scenarios)
Specialized agents	3
Slash commands	15
Automation scripts	24
Hooks	11
Scheduled agents	5
Architecture decisions	8

The nightly agent made 155 autonomous commits. It routed instincts to the right mechanisms, evolved skills, ran experiments, tracked Claude Code updates, and archived outdated patterns. I slept through all of it.

The system also measures its own evolution mechanism: instinct survival rate, eval discrimination, skill convergence speed, mechanism coverage, dispatch compliance. When a metric drops, it adjusts extraction thresholds or adds harder test scenarios.

Three Knobs, Not Three Hundred

Most AI tools hand you more controls. Homunculus gives you three:

Define your goals. A tree of outcomes you care about.
Set your boundaries. Permissions for autonomous action.
Walk away. The system handles the rest.

You don't need expertise in prompt engineering, hook configuration, agent orchestration, or skill design. You need clarity on what you want. The system handles the rest.

Try It

npx homunculus-code init

Then open Claude Code:

> /hm-goal                  # AI asks about your project, builds your goal tree
> /hm-night                 # Runs your first evolution cycle

60 seconds to set up. After that, the system grows on its own.

GitHub → Homunculus

Acknowledgments

Homunculus builds on ideas from:

everything-claude-code — Continuous Learning and eval-improve loop patterns
OpenClaw — Proving AI can generate its own skills
Karpathy's autoresearch — Autonomous 12-hour experiment loops
Anthropic's eval research — Eval methodology and noise tolerance

How do you configure your AI assistant today? Still on the treadmill?

Built by Javan and his self-evolving AI assistant.

DEV Community