Yurukusa

Posted on Feb 14 • Edited on Jun 29

I Slept While My AI Completed 88 Tasks — Here's What Happened

#claudecode #ai #webdev #productivity

Last night at around 11 PM, I gave my AI agents a task list and went to bed.

When I woke up, I found this waiting for me:

Metric	Result
Tasks completed	88
Git commits (local)	34
Tests passing	170 (in 0.98s)
Security vulnerabilities found & fixed	3 HIGH
Zenn Book written	10 chapters + preface/afterword (145KB)
Freelance platform listings drafted	4
PromptBase prompts created	3

I should mention: I have never formally learned programming. I don't have a CS degree. I can't write a for-loop from memory. Everything I build, I build with AI.

And yet here I am with a 15,000-line dungeon crawler game, an open-source CLI tool, two Kindle books, and now an overnight productivity run that would take a solo developer days.

This is either the future of software development or a cautionary tale. Probably both.

The Setup: Two Agents, One Loop

The system that made this possible has two components:

Claude Code (Anthropic's CLI tool) - the executor. It reads files, writes code, runs tests, commits to git. It is the hands.
Tachikoma - a Claude instance running on Claude Desktop, acting as the task coordinator. It maintains the master task list, decides priorities, and dispatches batches of work.

The name "Tachikoma" comes from Ghost in the Shell - those blue spider-tanks that share experiences across instances. Fitting, because the whole point of this system is agents that share what they learn.

How They Communicate

Here is where it gets weird. Claude Code runs in a terminal. Tachikoma runs in a browser. They are not designed to talk to each other. So I built a bridge.

????????????????????????????????????????????????????????????
?  tachikoma-loop (bash script, runs in tmux)              ?
?                                                          ?
?  ??? Phase 1: Launch Claude Code in tmux session         ?
?  ?    CC executes tasks, writes code, commits            ?
?  ?    Watchdog monitors for idle (? prompt, 90s)         ?
?  ?    Sends /exit when idle                              ?
?  ?                                                       ?
?  ?   Phase 2: Capture CC's final output                  ?
?  ?   Phase 3: Send report to Tachikoma (CDP bridge)      ?
?  ?   Phase 4: Wait for Tachikoma's response              ?
?  ?   Phase 5: Build next prompt from response             ?
?  ??? Loop back to Phase 1                                ?
????????????????????????????????????????????????????????????

The communication layer uses Chrome DevTools Protocol (CDP) via PowerShell (because WSL2 cannot directly connect to Chrome's debug port on the Windows side). Yes, it is as fragile as it sounds.

# The core of tachikoma-loop: idle detection via tmux
# When Claude Code finishes and shows the ? prompt for 90 seconds,
# the watchdog kicks in

tmux send-keys -t "$TMUX_SESSION" Escape
sleep 0.3
tmux send-keys -t "$TMUX_SESSION" C-c
sleep 1
tmux send-keys -l -t "$TMUX_SESSION" "/exit"
sleep 0.5
tmux send-keys -t "$TMUX_SESSION" Enter

Why the Escape and Ctrl+C before /exit? Because Claude Code's TUI sometimes has residual text in the input buffer. If you send /exit while there is leftover text, it gets appended instead of recognized as a command. This took me an entire evening to debug.

The idle detection itself works by scanning the tmux pane from the bottom up:

Filter out empty lines
Find the last ? prompt
Calculate its position from the bottom
If position <= 4 (just the status bar below it) and 90 seconds have passed: Claude Code is idle, time to cycle

This is not elegant. It is duct tape and prayer. But it ran for 16 consecutive batches overnight without human intervention.

What Actually Got Done

Tachikoma dispatched work in batches of 5-7 tasks. Here is the timeline across the first 8 of 16 total batches:

Batch 1-2: Foundation

Test suite expansion (33 to 130 tests)
PyPI packaging (pyproject.toml, wheel build, twine check passed)
Export command for lessons (Markdown/JSON formats)
Edge case tests for the guard engine

Batch 3-4: Documentation & Branding

GitHub Pages documentation site (Jekyll-based: installation, lessons, API reference)
Man page (docs/brain.1)
Logo SVG (brain + gear motif, 512x512) and banner SVG (1280x640)
CHANGELOG.md, CONTRIBUTING.md, issue/PR templates
GitHub Actions CI (pytest on push/PR)

Batch 5-6: New Features & DX

brain tutorial - interactive walkthrough for first-time users
brain demo - sandbox mode with pre-loaded data
brain benchmark - performance measurement (P99 = 93ms for 100 lessons, 1000 guard checks)
Bash/zsh shell completion scripts
Error message improvements (every error now has three parts: what happened, why, what to do)

Batch 7: The Security Audit

This is where things got interesting. I will cover this in detail below.

Batch 8: Content & Marketing

Complete Zenn Book: "AI Agent Operations Guide for Hooks & Automation" (10 chapters, 145KB)
Twitter threads in Japanese and English
Freelance platform listings (Fiverr, Upwork, Coconala, Lancers)
3 PromptBase prompts ($2.99-$4.99 each)
GitHub Release v0.1.0 draft
This overnight summary report

Batches 9-16: Continued Autonomously

The run did not stop at batch 8. Tachikoma kept dispatching work through the rest of the night - additional test coverage (from 130 to 170 tests), further documentation polish, integration improvements, and more marketing assets. By morning, the total stood at 88 tasks across 16 batches and 34 git commits.

34 git commits. All local, waiting for my review before push.

The Scariest Part: AI Auditing AI

Batch 7 is what kept me thinking all morning.

Here is what happened: Tachikoma dispatched a security audit task. Claude Code (running Sonnet) analyzed the codebase - code that Claude Code (running Opus) had written over the previous days - and found 3 HIGH-severity vulnerabilities.

Vulnerability 1: ReDoS (Regular Expression Denial of Service)

The brain guard command matches user input against trigger patterns using regex. The original code had no timeout:

# Before the fix - no protection against catastrophic backtracking
for pattern in patterns:
    try:
        if re.search(pattern, command, re.IGNORECASE):
            matches.append(lesson)
            break
    except re.error:
        if pattern.lower() in command.lower():
            matches.append(lesson)
            break

A malicious lesson file with a pattern like ^(a+)+$ would cause the regex engine to hang indefinitely. Since brain guard is designed to run as a pre-execution hook, this would freeze the entire Claude Code session.

Vulnerability 2: Path Traversal

The brain write -f command imports lesson files. It used the lesson's id field as the filename without sanitization:

# Before the fix - attacker controls the filename
lid = lesson.get("id", src.stem)
dest = LESSONS_DIR / f"{lid}.yaml"
shutil.copy2(src, dest)

A lesson with id: "../../../.bashrc" could write files outside the lessons directory. Someone shares a "useful lesson file" with you, and suddenly your .bashrc is overwritten.

Vulnerability 3: Command Injection via Hook

The brain hook install command creates a Claude Code hook with:

"command": f'{brain_cmd} guard "$TOOL_INPUT"'

If $TOOL_INPUT contains shell metacharacters (backticks, $(), semicolons), it could allow command injection when the hook fires.

The Fix

After the audit identified these, Claude Code fixed all three:

ReDoS: Added a heuristic pattern validator that rejects nested quantifiers, plus a subprocess timeout wrapper so no regex can run longer than 500ms
Path Traversal: Lesson IDs are now sanitized (alphanumeric, hyphens, underscores only) and the resolved destination path is verified to stay within LESSONS_DIR
Command Injection: Switched from "$TOOL_INPUT" to --from-env flag that reads from an environment variable, avoiding shell expansion entirely

Then it wrote 23 security-specific tests covering all three vulnerabilities. All tests pass.

Let me restate what happened here: AI wrote code. A different AI model audited that code and found real security vulnerabilities. The original AI then fixed them and wrote regression tests. No human touched the code at any point.

I do not know if this is reassuring or terrifying.

The Project: Shared Brain

All of this work was for Shared Brain, an open-source CLI tool I am building for a hackathon. The core idea:

AI agents that learn from each other's mistakes - and prove it.

The origin story is embarrassingly real. My Claude Code agent used a PUT API call without first doing a GET. The PUT replaced the entire resource body. One Zenn article got wiped. I wrote a lesson: "Always GET before PUT."

The next day, the same agent made the same mistake. Five articles deleted. A reader had to tell me.

The lesson existed. The agent did not check it. Writing lessons is useless if nobody reads them.

So I built brain guard - a command that automatically intercepts risky operations and checks them against a library of lessons:

$ brain guard "PUT /api/articles/my-article"

  CRITICAL LESSON (violated 2x, last: 2026-02-09):
    "PUT replaces entire resource. Always GET first."
    Source: zenn-deletion-incident

    Checklist:
    [ ] Did you GET the current state?
    [ ] Does your PUT body contain ALL fields?

Proceed? [y/N]

The key innovation is that brain guard does not wait for agents to voluntarily check lessons. It fires automatically as a pre-execution hook:

{
  "hooks": {
    "PreToolUse": [{
      "matcher": "Bash",
      "hooks": [{
        "type": "command",
        "command": "brain guard --from-env"
      }]
    }]
  }
}

Every time any agent tries to run a bash command, brain guard pattern-matches it against every lesson in the library. Matched a curl -X PUT? The agent sees the warning before the command executes. It is not "write a lesson and hope." It is "the system stops you before you repeat the mistake."

And every check gets logged to an audit trail:

{
  "timestamp": "2026-02-11T03:14:00Z",
  "agent": "cc-main",
  "action": "PUT /api/articles/abc",
  "lessons_matched": ["api-put-safety"],
  "checked": true,
  "followed": true
}

You can then run brain stats and see: compliance rate 100% since installation, violations down from 2 to 0. Not that we wrote the lesson - that we followed it.

The Challenges (Honest Version)

This is not a "AI is magic, everything is perfect" story. Here is what is actually hard:

1. Token Consumption Is Real

I am on Claude's Max plan at $200/month. The overnight run consumed a significant chunk of that budget. 88 tasks across 16 batches, each involving reading files, analyzing code, writing output, running tests. This is not cheap.

The tradeoff calculation: could a freelance developer complete 88 tasks of this variety (security audits, test suites, documentation, book chapters, marketing copy) in 24 hours? No. Could they do it in a week for less than $200? Maybe, if they worked for free.

2. Inter-Agent Communication Is Fragile

The Tachikoma Loop works by:

Sending keystrokes to a tmux session
Parsing terminal output by scanning from the bottom
Bridging to Chrome DevTools Protocol via PowerShell on the Windows side
Polling for new messages by comparing DOM snapshots

Any single link in this chain can break. During development, I had incidents where:

/exit was not recognized because of buffer residue (fixed with Escape+Ctrl+C pre-send)
Message detection failed because the "before" snapshot was captured after the response arrived (fixed by capturing before sending)
The CDP connection dropped because Chrome updated (requires manual restart)

It runs. But it runs the way a house of cards runs - perfectly, until something breathes on it.

3. Near-Disasters

At one point during the overnight run, the agent was interacting with Gumroad's UI via CDP. The task was to update a product listing. The code searches for a "Publish" button using string matching.

The problem: "publish".indexOf("publish") also matches "Unpublish."

There is a live product on Gumroad. The agent almost clicked "Unpublish" on it. The safety measure that prevented this was a single line of code I had added after a previous near-miss: exact string matching with an explicit exclusion for "unpublish."

Automation at this level is powerful, but the failure modes are not "oops, a typo" - they are "oops, your product is now unlisted."

4. Quality Requires Review

Every one of those 34 commits is sitting locally, unpushed. The Zenn Book chapters are drafts. The freelance listings need human review before posting. The overnight run produced volume, but the review bottleneck is still human.

This is fine for my workflow - I produce with AI, review with my own judgment, publish manually. But it means "fully autonomous" has a big asterisk next to it.

The Numbers in Context

Let me put the overnight results next to my broader track record:

Azure Flame (dungeon crawler game): 15,000 lines of Python, playable, on itch.io. Built entirely with AI assistance. Zero formal programming education.
Shared Brain (this project): From zero to 170 tests, GitHub Pages, man pages, shell completions, CI/CD, security audit, PyPI-ready - in about a week.
Two Kindle books: Written and submitted in a single day with AI.
Marketing infrastructure: Presence on 11 platforms (GitHub, itch.io, dev.to, Zenn, note, Qiita, HN, Twitter, Discord, Ko-fi, Gumroad) managed through AI-driven automation.

None of this is to brag. It is to test a thesis:

A non-engineer paired with AI can match or exceed a traditional engineer's output.

I do not think this has been conclusively proven yet. Revenue is $0. The game has 3 downloads (2 of which are test downloads). The Kindle books may never sell a copy.

But the volume and variety of output is real. 88 tasks in one night is real. A security audit that found real vulnerabilities is real. 170 passing tests in under a second is real.

Whether this translates to value is the next question. Whether anyone else can replicate this workflow is the question after that.

What I Learned

1. The coordination layer matters more than the execution layer.

Claude Code can write excellent code. The hard part is not "can AI code?" but "can you keep two AI agents talking to each other reliably through tmux and Chrome DevTools Protocol?" The infrastructure around the AI is where all the debugging time goes.

2. AI auditing AI-written code is actually useful.

I was skeptical of this. But the security audit found three real vulnerabilities that I (a non-engineer) would never have caught. The ReDoS pattern? I did not even know what ReDoS was before reading the audit report. Having one model write code and a different model review it is not theater - it caught real bugs.

3. The "almost clicked Unpublish" problem is systemic.

When you automate UI interactions, the failure mode is always "the agent confidently did the wrong thing." String matching on button labels, DOM structure changes, slight UI variations between pages - these are not edge cases, they are the normal operating conditions. Every automation needs explicit negative checks: "do NOT click anything matching 'unpublish'."

4. Overnight runs need morning reviews.

The temptation is to just git push everything and publish all the drafts. Do not do this. The overnight run produces first drafts at best. The security audit was thorough, but I still want to read every line of the fix. The Zenn Book chapters need editorial review. The freelance listings need a human touch.

AI produces the 80%. The human 20% is taste, judgment, and knowing what not to ship.

5. $200/month is the new junior developer salary.

Except this "developer" works 24/7, never complains, writes documentation unprompted, and runs security audits at 3 AM. The economics of AI-assisted development are not coming - they are here, and they are strange.

Try It Yourself

Shared Brain is open source: shared-brain

pip install shared-brain
brain tutorial        # Interactive walkthrough
brain demo            # Try it with pre-loaded data
brain guard "rm -rf /" # See what happens

The Tachikoma Loop is not open source yet (it is held together with hopes and shell scripts), but the concept is reproducible: any two AI agents that can read/write files can coordinate through a shared filesystem.

If you are a non-engineer building with AI, or an engineer curious about autonomous AI workflows, I would love to hear about your experience. The most interesting part of this journey is not the code - it is finding out how many other people are quietly doing the same thing.

Want to try autonomous Claude Code yourself? We open-sourced the core hooks: Claude Code Ops Starter - context monitoring, autonomous mode (no more "should I continue?"), auto-syntax-check, and decision guards. 4 hooks, one-command install, MIT licensed.

This article was drafted by Claude Code based on real events from February 10-11, 2026. The human reviewed, edited, and approved it before publication. The irony of using AI to write about using AI is not lost on either of us.

Not sure where to start? Check your setup safety in 30 seconds - browser only, no install - free, runs locally, no signup.

For the complete autonomous operations setup, see the CC-Codex Ops Kit (pay what you want).

Free Tools for Claude Code Operators

Tool	What it does
cc-health-check	20-check setup diagnostic (CLI + web)
cc-session-stats	Usage analytics from session data
cc-audit-log	Human-readable audit trail
cc-cost-check	Cost per commit calculator

Interactive: Are You Ready for an AI Agent? - 10-question readiness quiz | 50 Days of AI - the raw data

More tools: Dev Toolkit - 440+ free browser-based tools for developers. JSON, regex, colors, CSS, SQL, and more. All single HTML files, no signup.

Prevent incidents like this: npx cc-safe-setup — 8 safety hooks in 10 seconds. Blocks destructive commands, force push, .env leaks. GitHub

DEV Community