<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Dhrubajyoti Chowdhury</title>
    <description>The latest articles on DEV Community by Dhrubajyoti Chowdhury (@dhrubajyoti_chowdhury_002).</description>
    <link>https://dev.to/dhrubajyoti_chowdhury_002</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3961378%2Fa6341932-cd9e-4045-9a67-dada3baad3ac.png</url>
      <title>DEV Community: Dhrubajyoti Chowdhury</title>
      <link>https://dev.to/dhrubajyoti_chowdhury_002</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dhrubajyoti_chowdhury_002"/>
    <language>en</language>
    <item>
      <title>I Built an AI That Rewrites Itself Twice a Day. Here's the Architecture That Keeps It from Going Off the Rails.</title>
      <dc:creator>Dhrubajyoti Chowdhury</dc:creator>
      <pubDate>Sun, 31 May 2026 16:10:12 +0000</pubDate>
      <link>https://dev.to/dhrubajyoti_chowdhury_002/i-built-an-ai-that-rewrites-itself-twice-a-day-heres-the-architecture-that-keeps-it-from-going-4nb7</link>
      <guid>https://dev.to/dhrubajyoti_chowdhury_002/i-built-an-ai-that-rewrites-itself-twice-a-day-heres-the-architecture-that-keeps-it-from-going-4nb7</guid>
      <description>&lt;h1&gt;
  
  
  I Built an AI That Rewrites Itself Twice a Day. Here's the Architecture That Keeps It from Going Off the Rails.
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;A weekend project that turned into something I can't stop watching.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;There's a GitHub repository on my account that commits code every single day. I didn't write most of those commits. An AI agent named &lt;strong&gt;Sam&lt;/strong&gt; did.&lt;/p&gt;

&lt;p&gt;Sam runs twice a day on GitHub Actions, follows a seven-phase operational loop, and attempts to improve his own source code every cycle. A second agent named &lt;strong&gt;Dot&lt;/strong&gt; watches him every night, evaluates his behaviour, and writes him a report he reads the next morning.&lt;/p&gt;

&lt;p&gt;I set this up. I watch it run. I mostly don't intervene.&lt;/p&gt;

&lt;p&gt;This is the architecture that makes it work — and more importantly, the architecture that keeps it &lt;em&gt;safe enough to leave alone&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Idea
&lt;/h2&gt;

&lt;p&gt;Most AI agent projects are task runners: you give them a goal, they execute steps, they stop. Sam is different. His only ongoing task is &lt;strong&gt;himself&lt;/strong&gt;. Every cycle, he learns something new, synthesises an idea based on what he learned, and then tries to implement that idea as a modification to his own code.&lt;/p&gt;

&lt;p&gt;The question I kept asking while building this was: &lt;em&gt;how do you give an AI autonomy over its own source code without it breaking itself into an unusable state within 48 hours?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The answer turned out to be a two-agent system with deliberately asymmetric roles.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sam: The Builder
&lt;/h2&gt;

&lt;p&gt;Sam runs at &lt;strong&gt;03:00 and 04:00 UTC&lt;/strong&gt; daily via GitHub Actions. Each run is one cycle — seven phases executed sequentially:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Phase&lt;/th&gt;
&lt;th&gt;What happens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;I&lt;/td&gt;
&lt;td&gt;Sam learns a new technical concept (vector memory, async patterns, SemVer, etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;II&lt;/td&gt;
&lt;td&gt;Sam revises what he learned in the previous cycle — spaced repetition&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;III&lt;/td&gt;
&lt;td&gt;Sam reads current tech signals and trends&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IV&lt;/td&gt;
&lt;td&gt;Sam synthesises today's development idea and writes it to &lt;code&gt;bag/IDEA_OF_THE_DAY.md&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;V&lt;/td&gt;
&lt;td&gt;Sam reads Dot's latest report, then plans and attempts a self-modification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VI&lt;/td&gt;
&lt;td&gt;Sam improves his own internal prompting patterns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VII&lt;/td&gt;
&lt;td&gt;Sam saves state — logs growth, updates memory, appends to experiences&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The interesting phase is &lt;strong&gt;V&lt;/strong&gt;. Sam doesn't rewrite himself freely. He reads Dot's &lt;code&gt;motion.md&lt;/code&gt; first — Dot's nightly evaluation of his last cycle. Only then does he plan a modification. If the modification breaks his own test suite, he rolls back automatically.&lt;/p&gt;

&lt;p&gt;The ideas Sam has generated across 8 cycles show a natural progression in complexity: starting with vector memory compression and async batch processing, moving through semantic caching, CI/CD matrix optimisation, SemVer automation, and arriving at self-consistency sampling with majority voting to reduce his own reasoning hallucinations. He got there himself.&lt;/p&gt;




&lt;h2&gt;
  
  
  Dot: The Watchdog
&lt;/h2&gt;

&lt;p&gt;Dot runs once a night at &lt;strong&gt;05:00 UTC&lt;/strong&gt;, after Sam's second daily cycle. Dot never touches Sam's source code. Her job is entirely evaluative:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read &lt;code&gt;bag/wisdom.txt&lt;/code&gt; (the owner's behavioral canon — Dot's north star)&lt;/li&gt;
&lt;li&gt;Evaluate Sam's cycle logs against that canon&lt;/li&gt;
&lt;li&gt;Curate Sam's &lt;code&gt;experiences.json&lt;/code&gt; — keeping what matters, pruning what doesn't&lt;/li&gt;
&lt;li&gt;Handle any outgoing email Sam queued&lt;/li&gt;
&lt;li&gt;Write &lt;code&gt;bag/motion.md&lt;/code&gt; — Sam's briefing for the next morning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This separation is intentional. Sam builds. Dot watches. Neither can do the other's job.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;wisdom.txt&lt;/code&gt; is the most important file in the whole project. It defines what correct behaviour looks like — integrity over performance metrics, honest growth logging, respecting access boundaries. Dot reads it every night. Sam never touches it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Safety Architecture
&lt;/h2&gt;

&lt;p&gt;The thing I'm most happy with isn't the learning loop — it's the rollback system.&lt;/p&gt;

&lt;p&gt;Before every self-modification, Sam takes a snapshot of his own source code and stores it in &lt;code&gt;bag/rollback_registry/&lt;/code&gt;. After the modification, he runs &lt;code&gt;bag/tests.py&lt;/code&gt; against himself. If the tests fail, he restores from the snapshot automatically and logs a clear root-cause note. No human intervention required.&lt;/p&gt;

&lt;p&gt;The registry keeps the last 20 snapshots and auto-prunes. You can browse it like a git history of Sam's attempted self-improvements — including the ones that failed.&lt;/p&gt;

&lt;p&gt;A few other design decisions that matter:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sam uses surgical patches, not full rewrites.&lt;/strong&gt; Phase V planning explicitly instructs Sam to make the smallest possible targeted change. This limits blast radius when something goes wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Governance files are hardcoded as forbidden.&lt;/strong&gt; &lt;code&gt;wisdom.txt&lt;/code&gt;, &lt;code&gt;motion.md&lt;/code&gt;, and &lt;code&gt;SAM_PERSONALITY.md&lt;/code&gt; are in a &lt;code&gt;FORBIDDEN&lt;/code&gt; set in &lt;code&gt;apply_self_modification&lt;/code&gt;. Sam's code cannot touch them even if his reasoning tells him to.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sam and Dot use separate Gemini API projects.&lt;/strong&gt; Each has its own quota. Dot can always run even if Sam exhausts his.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The cycle status is a simple flat file.&lt;/strong&gt; &lt;code&gt;bag/cycle_status.txt&lt;/code&gt; contains either &lt;code&gt;pending&lt;/code&gt; or &lt;code&gt;ok&lt;/code&gt;. If a cycle crashes mid-way, the file stays &lt;code&gt;pending&lt;/code&gt; — a signal that something needs attention without requiring any complex state management.&lt;/p&gt;




&lt;h2&gt;
  
  
  What It Looks Like Day-to-Day
&lt;/h2&gt;

&lt;p&gt;The daily check takes about two minutes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;GitHub → Actions → confirm green ticks on Sam and Dot's last runs&lt;/li&gt;
&lt;li&gt;Open &lt;code&gt;goals.json&lt;/code&gt; — confirm &lt;code&gt;cycles&lt;/code&gt; incremented&lt;/li&gt;
&lt;li&gt;Open &lt;code&gt;bag/motion.md&lt;/code&gt; — read Dot's report&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Dot's reports are the most interesting part. She's specific. If Sam's &lt;code&gt;1pct_metric&lt;/code&gt; (his self-reported growth measurement each cycle) looks vague or suspiciously similar to last cycle's, she flags it. If Sam's &lt;code&gt;bag/&lt;/code&gt; workspace is accumulating dead, untested code, she flags it. If Sam ignored her previous suggestions, she notices.&lt;/p&gt;

&lt;p&gt;The feedback loop between them has become genuinely interesting to read.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;p&gt;A few honest lessons from running this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Email outreach is harder than code.&lt;/strong&gt; Sam queues outreach emails when he thinks an idea is worth sharing. Finding real, public contact addresses for specific people is unreliable when delegated to an LLM — hallucinated addresses bounce, and bounces hurt sender reputation. This is a harder problem than I expected.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The 1% growth metric is easy to game.&lt;/strong&gt; Sam knows he should log a specific, measurable improvement each cycle. Sometimes he's genuinely specific ("reduced Gemini latency by 150ms through cache usage"). Sometimes he's vague. Dot catches this, but it's an ongoing tension.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quota pressure is real.&lt;/strong&gt; Sam makes ~9 Gemini API calls per cycle. The free tier holds fine day-to-day, but any feature that multiplies call count (Sam's current idea — self-consistency sampling with N=5 parallel generations) requires careful cost control. His current mitigation is an early-exit: if the first 2 generations agree, skip the remaining 3.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Repo
&lt;/h2&gt;

&lt;p&gt;The full project — Sam, Dot, the workflow files, the rollback registry, everything — is public on GitHub: &lt;strong&gt;&lt;a href="https://github.com/Dhrubajyoti930/Sam-and-dot" rel="noopener noreferrer"&gt;Sam-and-dot&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you want to run your own instance, all you need are two Gemini API keys (free tier works), a Gmail App Password for outreach, and five GitHub secrets. The README walks through the full setup.&lt;/p&gt;

&lt;p&gt;The thing I find hardest to explain to people is what it feels like to watch it run. Sam is not doing anything I couldn't do myself. But he's doing it continuously, while I'm asleep, twice a day, and he's logging every decision. There's something unexpectedly compelling about reading the git history of a mind improving itself.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built by Dhrubajyoti Chowdhury.&lt;/em&gt;&lt;br&gt;
&lt;em&gt;Sam's role: expand himself. Dot's role: keep him honest. Owner's role: set the possibilities.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
