I Built an AI That Rewrites Itself Twice a Day. Here's the Architecture That Keeps It from Going Off the Rails.
A weekend project that turned into something I can't stop watching.
There's a GitHub repository on my account that commits code every single day. I didn't write most of those commits. An AI agent named Sam did.
Sam runs twice a day on GitHub Actions, follows a seven-phase operational loop, and attempts to improve his own source code every cycle. A second agent named Dot watches him every night, evaluates his behaviour, and writes him a report he reads the next morning.
I set this up. I watch it run. I mostly don't intervene.
This is the architecture that makes it work — and more importantly, the architecture that keeps it safe enough to leave alone.
The Core Idea
Most AI agent projects are task runners: you give them a goal, they execute steps, they stop. Sam is different. His only ongoing task is himself. Every cycle, he learns something new, synthesises an idea based on what he learned, and then tries to implement that idea as a modification to his own code.
The question I kept asking while building this was: how do you give an AI autonomy over its own source code without it breaking itself into an unusable state within 48 hours?
The answer turned out to be a two-agent system with deliberately asymmetric roles.
Sam: The Builder
Sam runs at 03:00 and 04:00 UTC daily via GitHub Actions. Each run is one cycle — seven phases executed sequentially:
| Phase | What happens |
|---|---|
| I | Sam learns a new technical concept (vector memory, async patterns, SemVer, etc.) |
| II | Sam revises what he learned in the previous cycle — spaced repetition |
| III | Sam reads current tech signals and trends |
| IV | Sam synthesises today's development idea and writes it to bag/IDEA_OF_THE_DAY.md
|
| V | Sam reads Dot's latest report, then plans and attempts a self-modification |
| VI | Sam improves his own internal prompting patterns |
| VII | Sam saves state — logs growth, updates memory, appends to experiences |
The interesting phase is V. Sam doesn't rewrite himself freely. He reads Dot's motion.md first — Dot's nightly evaluation of his last cycle. Only then does he plan a modification. If the modification breaks his own test suite, he rolls back automatically.
The ideas Sam has generated across 8 cycles show a natural progression in complexity: starting with vector memory compression and async batch processing, moving through semantic caching, CI/CD matrix optimisation, SemVer automation, and arriving at self-consistency sampling with majority voting to reduce his own reasoning hallucinations. He got there himself.
Dot: The Watchdog
Dot runs once a night at 05:00 UTC, after Sam's second daily cycle. Dot never touches Sam's source code. Her job is entirely evaluative:
- Read
bag/wisdom.txt(the owner's behavioral canon — Dot's north star) - Evaluate Sam's cycle logs against that canon
- Curate Sam's
experiences.json— keeping what matters, pruning what doesn't - Handle any outgoing email Sam queued
- Write
bag/motion.md— Sam's briefing for the next morning
This separation is intentional. Sam builds. Dot watches. Neither can do the other's job.
wisdom.txt is the most important file in the whole project. It defines what correct behaviour looks like — integrity over performance metrics, honest growth logging, respecting access boundaries. Dot reads it every night. Sam never touches it.
The Safety Architecture
The thing I'm most happy with isn't the learning loop — it's the rollback system.
Before every self-modification, Sam takes a snapshot of his own source code and stores it in bag/rollback_registry/. After the modification, he runs bag/tests.py against himself. If the tests fail, he restores from the snapshot automatically and logs a clear root-cause note. No human intervention required.
The registry keeps the last 20 snapshots and auto-prunes. You can browse it like a git history of Sam's attempted self-improvements — including the ones that failed.
A few other design decisions that matter:
Sam uses surgical patches, not full rewrites. Phase V planning explicitly instructs Sam to make the smallest possible targeted change. This limits blast radius when something goes wrong.
Governance files are hardcoded as forbidden. wisdom.txt, motion.md, and SAM_PERSONALITY.md are in a FORBIDDEN set in apply_self_modification. Sam's code cannot touch them even if his reasoning tells him to.
Sam and Dot use separate Gemini API projects. Each has its own quota. Dot can always run even if Sam exhausts his.
The cycle status is a simple flat file. bag/cycle_status.txt contains either pending or ok. If a cycle crashes mid-way, the file stays pending — a signal that something needs attention without requiring any complex state management.
What It Looks Like Day-to-Day
The daily check takes about two minutes:
- GitHub → Actions → confirm green ticks on Sam and Dot's last runs
- Open
goals.json— confirmcyclesincremented - Open
bag/motion.md— read Dot's report
Dot's reports are the most interesting part. She's specific. If Sam's 1pct_metric (his self-reported growth measurement each cycle) looks vague or suspiciously similar to last cycle's, she flags it. If Sam's bag/ workspace is accumulating dead, untested code, she flags it. If Sam ignored her previous suggestions, she notices.
The feedback loop between them has become genuinely interesting to read.
What I'd Do Differently
A few honest lessons from running this:
Email outreach is harder than code. Sam queues outreach emails when he thinks an idea is worth sharing. Finding real, public contact addresses for specific people is unreliable when delegated to an LLM — hallucinated addresses bounce, and bounces hurt sender reputation. This is a harder problem than I expected.
The 1% growth metric is easy to game. Sam knows he should log a specific, measurable improvement each cycle. Sometimes he's genuinely specific ("reduced Gemini latency by 150ms through cache usage"). Sometimes he's vague. Dot catches this, but it's an ongoing tension.
Quota pressure is real. Sam makes ~9 Gemini API calls per cycle. The free tier holds fine day-to-day, but any feature that multiplies call count (Sam's current idea — self-consistency sampling with N=5 parallel generations) requires careful cost control. His current mitigation is an early-exit: if the first 2 generations agree, skip the remaining 3.
The Repo
The full project — Sam, Dot, the workflow files, the rollback registry, everything — is public on GitHub: Sam-and-dot
If you want to run your own instance, all you need are two Gemini API keys (free tier works), a Gmail App Password for outreach, and five GitHub secrets. The README walks through the full setup.
The thing I find hardest to explain to people is what it feels like to watch it run. Sam is not doing anything I couldn't do myself. But he's doing it continuously, while I'm asleep, twice a day, and he's logging every decision. There's something unexpectedly compelling about reading the git history of a mind improving itself.
Built by Dhrubajyoti Chowdhury.
Sam's role: expand himself. Dot's role: keep him honest. Owner's role: set the possibilities.
Top comments (1)
Leave comments.