<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: QuoLu</title>
    <description>The latest articles on DEV Community by QuoLu (@quolu).</description>
    <link>https://dev.to/quolu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3943846%2Fda4f1e44-c6df-4338-8653-6db8efd5d92b.jpg</url>
      <title>DEV Community: QuoLu</title>
      <link>https://dev.to/quolu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/quolu"/>
    <language>en</language>
    <item>
      <title>How I Fixed the Infinite Feedback Loop When Auditing Project Plans with Claude</title>
      <dc:creator>QuoLu</dc:creator>
      <pubDate>Wed, 10 Jun 2026 01:00:24 +0000</pubDate>
      <link>https://dev.to/quolu/how-i-fixed-the-infinite-feedback-loop-when-auditing-project-plans-with-claude-4gh</link>
      <guid>https://dev.to/quolu/how-i-fixed-the-infinite-feedback-loop-when-auditing-project-plans-with-claude-4gh</guid>
      <description>&lt;p&gt;I always enjoy AI programming.&lt;/p&gt;

&lt;p&gt;My usual workflow is to create a plan, have it audited, and then proceed with the implementation.&lt;/p&gt;

&lt;p&gt;However, I felt that the auditing process hasn't been working well, especially since Opus 4.7. Perhaps it's because Opus has gained a broader perspective? It often brings up points that are irrelevant to my plan, and when I have it perform an automated loop of auditing and revising, the feedback often fails to converge.&lt;/p&gt;

&lt;h2&gt;
  
  
  Whac-A-Mole
&lt;/h2&gt;

&lt;p&gt;Then one day, I realized it.&lt;/p&gt;

&lt;p&gt;When writing a program with slightly complex logic, the AI keeps saying "this is wrong" or "that is wrong" every single time. It feels like an indirect loop, or more accurately, constant Whac-A-Mole.&lt;/p&gt;

&lt;p&gt;It says, "B is weak from the perspective of A," so I fix B. In the next audit, it says, "B is excessive, and A is thin." When I add A, it then says, "C is inconsistent." When I fix C, it says, "The description of C is redundant." Once I fix that, it says, "C is insufficiently explained."&lt;/p&gt;

&lt;p&gt;It's a seesaw. There is no exit in sight.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I tried
&lt;/h2&gt;

&lt;p&gt;Realizing this, I tried the following approach.&lt;/p&gt;

&lt;p&gt;For a plan that was reasonably complete (having gone through 1 or 2 audits), I asked the AI, "Please audit the plan &lt;strong&gt;only for logical contradictions.&lt;/strong&gt;"&lt;/p&gt;

&lt;p&gt;This worked perfectly. It diligently resolved the contradictions, and after a few rounds of auditing, it converged properly.&lt;/p&gt;

&lt;p&gt;Since the number of contradictions is finite, it actually converges.&lt;/p&gt;

&lt;p&gt;And when I let it start the implementation with a plan free of contradictions, it runs straight to the end without stopping (lol).&lt;/p&gt;

&lt;p&gt;Of course, there are occasional implementation errors where it has to retry, but that's expected.&lt;/p&gt;

&lt;h2&gt;
  
  
  I wondered if this exists in the world
&lt;/h2&gt;

&lt;p&gt;Having reached this point, I suddenly became curious. Surely, I'm not the first person to figure this out. Someone else must have thought of the same thing.&lt;/p&gt;

&lt;p&gt;I looked it up.&lt;/p&gt;

&lt;p&gt;There were similar concepts, but they felt different. To put it simply, they were complicated.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Criteria Drift&lt;/strong&gt; (&lt;a href="https://hamel.dev/blog/posts/llm-judge/" rel="noopener noreferrer"&gt;Explanation by Hamel Husain&lt;/a&gt;, Shankar et al.): A phenomenon where evaluation criteria gradually shift when using an LLM for review. The countermeasure is "rescoring past scores while refining the evaluation axes." ...That's heavy.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Oscillatory Convergence&lt;/strong&gt; (&lt;a href="https://simiacryptus.github.io/Science/learning/2025/07/06/llm-feedback-dynamics.html" rel="noopener noreferrer"&gt;Fractal Thought Engine&lt;/a&gt;): An observation that there is a certain number of sessions where the approach oscillates due to iterative feedback from the LLM. It's observed, but it's not about countermeasures.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Moving the Goalposts&lt;/strong&gt; (&lt;a href="https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/evaluating-ai-agents-techniques-to-reduce-variance-and-boost-alignment-for-llm-j/4498571" rel="noopener noreferrer"&gt;Microsoft Blog&lt;/a&gt;): The idea of not moving the rubric during evaluation and finalizing the rubric before starting the evaluation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are related topics. But what I did wasn't "finalizing the rubric"; it was "&lt;strong&gt;narrowing the evaluation axis to a single point: contradictions.&lt;/strong&gt;" As long as the scope is broad, points of criticism will spring up infinitely, so I'm trying to contain the scope to a finite set. I couldn't find existing research that clearly stated this.&lt;/p&gt;

&lt;p&gt;I suspect it's written somewhere. It should be, but it's probably written in academic terms, and by the time I realized it, two months had already passed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Simple once you're told
&lt;/h2&gt;

&lt;p&gt;When I write it down, it sounds simple. "If you narrow the scope, it will converge." That's all there is to it.&lt;/p&gt;

&lt;p&gt;But it took me two months to notice.&lt;/p&gt;

&lt;p&gt;I kept trying to change how I wrote my prompts, thinking, "If I use Claude more intelligently, it will get better." Even when I wrote, "Don't give too many points" or "Be consistent with past feedback," it didn't work. Because the problem wasn't how I was writing the prompts.&lt;/p&gt;

&lt;p&gt;It took time for the idea of narrowing the scope to "only finite items" to occur to me. When you audit normally, the scope is wide, so no matter what you fix, holes are found from a different angle. That was the whole story.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  When the audit of a plan doesn't converge, narrow the scope to "only contradictions."&lt;/li&gt;
&lt;li&gt;  Contradictions are finite, so the process will converge.&lt;/li&gt;
&lt;li&gt;  If you audit with a wide scope, criticism will spring up infinitely, leading to Whac-A-Mole.&lt;/li&gt;
&lt;li&gt;  When you have the AI implement a plan that is free of contradictions, it will run until completion without stopping.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If there is anyone else caught in the same Whac-A-Mole trap, then writing this was worth it.&lt;/p&gt;

&lt;p&gt;I feel good on days when I do something good.&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>claude</category>
      <category>llm</category>
      <category>ai</category>
    </item>
    <item>
      <title>Stop Telling Claude to 'Be Careful': Reinforcing It from the Outside with 3 Tools</title>
      <dc:creator>QuoLu</dc:creator>
      <pubDate>Tue, 09 Jun 2026 00:54:11 +0000</pubDate>
      <link>https://dev.to/quolu/stop-telling-claude-to-be-careful-reinforcing-it-from-the-outside-with-3-tools-2k8k</link>
      <guid>https://dev.to/quolu/stop-telling-claude-to-be-careful-reinforcing-it-from-the-outside-with-3-tools-2k8k</guid>
      <description>&lt;p&gt;Over the past month or so, I have released three reinforcement tools for Claude Code on npm.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://dev.to/quolu/published-throughline-to-npm-a-hook-to-offload-claude-code-tool-io-to-sqlite-13d9"&gt;Throughline&lt;/a&gt; — Offloads bloated context.&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://dev.to/quolu/published-caveat-to-npm-a-long-term-memory-layer-to-avoid-repeating-the-same-traps-2cc0"&gt;Caveat&lt;/a&gt; — Surfaces past notes so you don't step into the same trap twice.&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://dev.to/quolu/i-had-74-daemons-running-because-i-made-one-claude-audit-another-for-missed-tool-calls-5b6m"&gt;Spotter&lt;/a&gt; — A separate Claude audits missed tool calls.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each solves a different problem, but the root cause is the same: &lt;strong&gt;all the problems that could be fixed by telling Claude to "be careful" had already been fixed.&lt;/strong&gt; What remained were issues that could not be fixed structurally.&lt;/p&gt;

&lt;h2&gt;
  
  
  The period of constant "be careful" warnings
&lt;/h2&gt;

&lt;p&gt;In the beginning, I also wrote plenty of "be careful" instructions in CLAUDE.md and my prompts.&lt;/p&gt;

&lt;p&gt;"Do not guess files, always read them before answering."&lt;br&gt;&lt;br&gt;
"If the context becomes bloated, run /compact."&lt;br&gt;&lt;br&gt;
"Read the traps I've fallen into in the past, which are written in CLAUDE.md."&lt;/p&gt;

&lt;p&gt;But the more I wrote, the more bloated CLAUDE.md became. A bloated CLAUDE.md just gets skimmed by Claude. Even though it's written there, it isn't followed.&lt;/p&gt;

&lt;p&gt;I thought maybe my writing style was poor, so I changed the phrasing. Still, it didn't work. I realized there is a certain number of problems that &lt;strong&gt;simply don't go away no matter how many times you change the way you phrase them.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Problems that can be fixed vs. problems that cannot
&lt;/h2&gt;

&lt;p&gt;One day, I drew a line in the sand.&lt;/p&gt;

&lt;p&gt;Problems that can be fixed by writing "be careful" and those that cannot are fundamentally different types of issues.&lt;/p&gt;

&lt;p&gt;Problems that can be fixed by writing instructions occur because Claude simply "forgot." If it sees the instructions, it remembers. This can be handled by improving the prompts.&lt;/p&gt;

&lt;p&gt;Problems that cannot be fixed occur because &lt;strong&gt;Claude cannot recognize its own limitations.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  It cannot notice that the context is bloated (it's decided at the moment the request is sent, so it can't see its own size).&lt;/li&gt;
&lt;li&gt;  It doesn't remember traps encountered in past sessions (sessions are independent, and adding to CLAUDE.md makes it heavy).&lt;/li&gt;
&lt;li&gt;  It doesn't notice when it forgets to call a tool (it doesn't know what it doesn't know, so it can't go get it).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Asking Claude to "be careful" about these things doesn't work. It's because &lt;strong&gt;these are problems Claude itself cannot fix.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Giving up and reinforcing from the outside
&lt;/h2&gt;

&lt;p&gt;So, I stopped asking Claude and started intervening from the outside.&lt;/p&gt;

&lt;p&gt;Claude Code has a hook mechanism. You can insert hooks before sending prompts, after a tool runs, or when a session ends. &lt;strong&gt;Even if Claude itself doesn't notice, you can observe the state from the outside and inject the necessary processing.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Since realizing this, I have created three reinforcement tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  Throughline (Subtraction)
&lt;/h3&gt;

&lt;p&gt;To address the issue of context bloating, &lt;strong&gt;it offloads tool inputs and outputs to SQLite and removes them from the context.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The contents of read files, grep results, and Bash outputs—once the AI has used them to make a decision and moved on, their purpose is fulfilled. Yet, they remain until the end, consuming tokens. I use hooks to offload these to SQLite. If Claude needs them, it can retrieve them itself.&lt;/p&gt;

&lt;p&gt;I have completely removed the burden of "noticing the bloating" from Claude.&lt;/p&gt;

&lt;h3&gt;
  
  
  Caveat (Accumulation)
&lt;/h3&gt;

&lt;p&gt;To address the issue of falling into the same trap twice, &lt;strong&gt;it automatically surfaces trap notes written in the past during similar situations.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When I fall into a trap, I write it down in Markdown. The next time I send a similar prompt, receive a similar tool error, or when a "struggle signal" is observed at the end of a session, the relevant past notes are injected into Claude's context via hooks.&lt;/p&gt;

&lt;p&gt;I have removed the burden of "remembering past traps" from Claude.&lt;/p&gt;

&lt;h3&gt;
  
  
  Spotter (Addition)
&lt;/h3&gt;

&lt;p&gt;To address the issue of forgetting to call tools, &lt;strong&gt;I run another Claude side-by-side that has a perfect grasp of the tool catalog and points it out if a call is forgotten.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The main Claude works as usual. Another Claude (Haiku 4.5) resides alongside it, watching the user's input and final response. If it notices, "You could have answered this by using &lt;code&gt;web_search&lt;/code&gt;," it sends a pointer to the main Claude via a hook.&lt;/p&gt;

&lt;p&gt;I have removed the impossible burden of "realizing what you forgot to call" from Claude.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common patterns
&lt;/h2&gt;

&lt;p&gt;What these three have in common is a design that &lt;strong&gt;expects nothing from Claude itself.&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Throughline&lt;/th&gt;
&lt;th&gt;Caveat&lt;/th&gt;
&lt;th&gt;Spotter&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;What is not asked of Claude&lt;/td&gt;
&lt;td&gt;Context management&lt;/td&gt;
&lt;td&gt;Past memories&lt;/td&gt;
&lt;td&gt;Detection of missed steps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Who does it instead&lt;/td&gt;
&lt;td&gt;hook &amp;amp; SQLite&lt;/td&gt;
&lt;td&gt;hook &amp;amp; past notes&lt;/td&gt;
&lt;td&gt;hook &amp;amp; separate Claude&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Changes needed for Claude&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The fact that "changes needed for Claude" is zero is important. You can just write what you want to write in prompts and CLAUDE.md as usual. You don't increase the number of "be careful" warnings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Structural problems not yet reinforced
&lt;/h2&gt;

&lt;p&gt;It's not that I'm satisfied because I made three. There are still structural problems I want to fix.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Long-term role drift&lt;/strong&gt;: The problem where Claude's persona drifts during long sessions. Even if I write "You are a strict reviewer" in the prompt, it becomes soft after 20 turns.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Context loss through sub-agents&lt;/strong&gt;: Sub-agents spawned by the Task tool do not have the implicit context of the parent session. It is quietly painful to pass the same explanation to the child every time.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Tool selection accuracy&lt;/strong&gt;: The judgment of "which tool to use" among multiple options is sometimes sloppy. Spotter detects missed calls, but it doesn't detect wrong tool selection.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;I have already fixed all the problems that can be solved by telling Claude to "be careful." The remaining problems are of a type that Claude itself cannot fix.&lt;/p&gt;

&lt;p&gt;That's why I reinforce from the outside. Just insert it with hooks. Claude itself doesn't need to know anything.&lt;/p&gt;

&lt;p&gt;Having created three of these in a month, I feel I've seen the pattern for reinforcement. All three are on npm under MIT, so if you are troubled by the same structural problems, please take a look if you feel like it.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://github.com/kitepon-rgb/Throughline" rel="noopener noreferrer"&gt;Throughline — GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://github.com/kitepon-rgb/Caveat" rel="noopener noreferrer"&gt;Caveat — GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://github.com/kitepon-rgb/Spotter" rel="noopener noreferrer"&gt;Spotter — GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>hooks</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I Had 74 Daemons Running Because I Made One Claude Audit Another for Missed Tool Calls</title>
      <dc:creator>QuoLu</dc:creator>
      <pubDate>Mon, 08 Jun 2026 01:01:14 +0000</pubDate>
      <link>https://dev.to/quolu/i-had-74-daemons-running-because-i-made-one-claude-audit-another-for-missed-tool-calls-5b6m</link>
      <guid>https://dev.to/quolu/i-had-74-daemons-running-because-i-made-one-claude-audit-another-for-missed-tool-calls-5b6m</guid>
      <description>&lt;h2&gt;
  
  
  The Trigger
&lt;/h2&gt;

&lt;p&gt;One day, when I asked Claude, "What time is it now?" it gave me an answer based on its best guess.&lt;/p&gt;

&lt;p&gt;On another day, when I asked about the contents of a configuration file, it provided an explanation based on its own guess from the file name. It had a &lt;code&gt;read_file&lt;/code&gt; tool available, but it didn't use it.&lt;/p&gt;

&lt;p&gt;At first, I thought, "Maybe Claude is just tired," but it happened too frequently. Even if I wrote "Use the tools" in the prompt, it would sometimes forget.&lt;/p&gt;

&lt;p&gt;That's when I realized: &lt;strong&gt;Claude cannot self-recognize when it doesn't know something.&lt;/strong&gt; Therefore, it doesn't know it needs to go get a tool.&lt;/p&gt;

&lt;p&gt;Even if I ask it to "be careful about forgetting to call tools," it doesn't know it "doesn't understand," so there is no way for it to be careful. It was a structural problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Tried
&lt;/h2&gt;

&lt;p&gt;So, why not just add another set of eyes?&lt;/p&gt;

&lt;p&gt;Apart from the main Claude, I decided to &lt;strong&gt;keep an auditor Claude (Haiku 4.5), which has complete mastery of the tool catalog, resident in every session.&lt;/strong&gt; It watches the main Claude's planned utterances and final responses in parallel, and points out if a tool call was forgotten.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Situation&lt;/th&gt;
&lt;th&gt;Main Claude's response&lt;/th&gt;
&lt;th&gt;Auditor's feedback&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"What's the weather today?"&lt;/td&gt;
&lt;td&gt;Responds with a guess&lt;/td&gt;
&lt;td&gt;You can use &lt;code&gt;web_search&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"What's inside this config?"&lt;/td&gt;
&lt;td&gt;Guesses from the name&lt;/td&gt;
&lt;td&gt;You can use &lt;code&gt;read_file&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"What time is it now?"&lt;/td&gt;
&lt;td&gt;Time at training&lt;/td&gt;
&lt;td&gt;You can use &lt;code&gt;current_time&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The point is &lt;strong&gt;not to rely on the main Claude's own self-awareness&lt;/strong&gt;. Instead of writing "please be careful" to Claude, I physically placed another set of eyes there. The judgment happens in two stages: at the moment the user inputs something (listing tools that should be used for the request) and immediately after the main Claude returns a response (determining if a verification tool can be inserted for factual claims).&lt;/p&gt;

&lt;p&gt;I built this and named it &lt;code&gt;claude-spotter&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Mistake Immediately After Release
&lt;/h2&gt;

&lt;p&gt;I thought it was a convenient design. &lt;code&gt;npm install -g claude-spotter&lt;/code&gt; would automatically enable it for all projects with no configuration required. It felt perfect.&lt;/p&gt;

&lt;p&gt;I released it and started using it myself.&lt;/p&gt;

&lt;p&gt;64 minutes later, 74 daemons were running.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Happened?
&lt;/h2&gt;

&lt;p&gt;When I dug into the real session logs, 51 out of the 74 were caused by &lt;a href="https://github.com/kitepon-rgb/Throughline" rel="noopener noreferrer"&gt;Throughline&lt;/a&gt; (another tool of mine).&lt;/p&gt;

&lt;p&gt;Throughline calls &lt;code&gt;claude -p&lt;/code&gt; internally. Calling &lt;code&gt;claude -p&lt;/code&gt; triggers the SessionStart hook. The SessionStart hook starts the Spotter daemon. The Spotter daemon calls &lt;code&gt;claude -p&lt;/code&gt; for auditing. &lt;strong&gt;It wasn't quite infinite recursion, but a recursive proliferation.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Because I was writing to &lt;code&gt;~/.claude/settings.json&lt;/code&gt; via &lt;code&gt;postinstall&lt;/code&gt;, every Claude Code session on the system was structured to load the Spotter hook. This was the price of "automatic activation for all projects."&lt;/p&gt;

&lt;p&gt;I added a 5-layer defense to stop the recursion on my end, but it was &lt;strong&gt;defenseless against &lt;code&gt;claude -p&lt;/code&gt; originating from other tools&lt;/strong&gt;. This was a structural issue that couldn't be covered up by patches.&lt;/p&gt;

&lt;h2&gt;
  
  
  Retraction
&lt;/h2&gt;

&lt;p&gt;I retracted the automatic registration in &lt;code&gt;postinstall&lt;/code&gt;. I changed it so &lt;code&gt;npm install&lt;/code&gt; only makes the CLI available, and users must explicitly run &lt;code&gt;spotter install&lt;/code&gt; in each project. This writes the hook to &lt;code&gt;&amp;lt;project&amp;gt;/.claude/settings.json&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I thought "automatic for all projects" was convenient, but the side effects were far greater. Ideally, automation is the goal, and having users run &lt;code&gt;spotter install&lt;/code&gt; in each project is a compromise. If the Claude Code hook mechanism could "identify the session origin," it would be safe to make it automatic, and I'd like to revert to that when it happens.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Next Bug: Tools from Past Projects Remain as Ghosts
&lt;/h2&gt;

&lt;p&gt;After using it for a while, a different symptom appeared.&lt;/p&gt;

&lt;p&gt;When I opened a session in Project A, the auditor suggested, "You can use the &lt;code&gt;mermaid_diagram&lt;/code&gt; tool." However, the &lt;code&gt;mermaid&lt;/code&gt; MCP is not registered in this project.&lt;/p&gt;

&lt;p&gt;I investigated and found that the MCP tool definitions I had used previously in Project B remained in the global DB and were being referenced in Project A. A regression where it "suggests tools that cannot be used."&lt;/p&gt;

&lt;p&gt;I changed the tool catalog used by the auditor to be &lt;strong&gt;local-DB only&lt;/strong&gt; (v1.2.0). The global DB was demoted to "a cache that reuses only the description if it has been acquired in other projects." Now, discovery runs in each project every time, and tools that are not found are deleted (pruned) from the local DB.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCPs Distributed as .cmd Fail to Spawn on Windows
&lt;/h2&gt;

&lt;p&gt;I hit one more thing. On Windows, when I &lt;code&gt;spawn('claude-mermaid')&lt;/code&gt; an npm-global &lt;code&gt;.cmd&lt;/code&gt; distributed MCP, it fails immediately with ENOENT.&lt;/p&gt;

&lt;p&gt;Node.js's &lt;code&gt;spawn&lt;/code&gt; calls &lt;code&gt;CreateProcess&lt;/code&gt; directly on Windows, but &lt;code&gt;CreateProcess&lt;/code&gt; only resolves &lt;code&gt;.exe&lt;/code&gt; files (it does not resolve the &lt;code&gt;.cmd&lt;/code&gt; extension in PATHEXT). I had previously encountered the same pattern—where wrapping it in &lt;code&gt;cmd.exe /c&lt;/code&gt; makes it work—in the Spotter itself when launching the claude CLI and had fixed it, but &lt;strong&gt;I had forgotten to apply this pattern to the MCP server launch path&lt;/strong&gt; (fixed in v1.2.2).&lt;/p&gt;

&lt;p&gt;I stepped into a trap I had set myself, just via a different path. Having experienced this, I strongly felt the necessity for &lt;a href="https://dev.to/quolu/published-caveat-to-npm-a-long-term-memory-layer-to-avoid-repeating-the-same-traps-2cc0"&gt;Caveat&lt;/a&gt;. If there isn't a mechanism to avoid stepping into the same trap twice, this is what happens.&lt;/p&gt;

&lt;h2&gt;
  
  
  Current Status
&lt;/h2&gt;

&lt;p&gt;v1.2.4. The CI for Windows, macOS, and Linux is all green.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; claude-spotter
&lt;span class="nb"&gt;cd &lt;/span&gt;your-project
spotter &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The tool catalog is automatically collected during &lt;code&gt;spotter install&lt;/code&gt;, and the SessionStart hook refreshes it in the background every time Claude Code is started. There is no need to manage it manually.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;spotter status      &lt;span class="c"&gt;# List of running auditors&lt;/span&gt;
spotter db list     &lt;span class="c"&gt;# Tool catalog for this project&lt;/span&gt;
spotter doctor      &lt;span class="c"&gt;# Environment diagnostics&lt;/span&gt;
spotter uninstall   &lt;span class="c"&gt;# Remove hook registration&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Areas Still Lacking
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;The Stop hook's correction results in two consecutive responses&lt;/strong&gt;. Because of the specification where the hook runs after the main Claude returns a response, when it issues a correction response, the user sees "the initial response + the correction response" one after another. It would be ideal if we could preempt it during input (UserPromptSubmit), and use the post-response hook as insurance.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;User input is blocked by Haiku's timeout&lt;/strong&gt;. I am currently considering whether to fail-open (bypass and let it through).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Relationship with Throughline / Caveat
&lt;/h2&gt;

&lt;p&gt;Spotter is a &lt;strong&gt;separate product that shares a philosophy&lt;/strong&gt; with &lt;a href="https://dev.to/quolu/published-throughline-to-npm-a-hook-to-offload-claude-code-tool-io-to-sqlite-13d9"&gt;Throughline&lt;/a&gt; and &lt;a href="https://dev.to/quolu/published-caveat-to-npm-a-long-term-memory-layer-to-avoid-repeating-the-same-traps-2cc0"&gt;Caveat&lt;/a&gt;, created by the same author.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Throughline&lt;/th&gt;
&lt;th&gt;Caveat&lt;/th&gt;
&lt;th&gt;Spotter&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Philosophy&lt;/td&gt;
&lt;td&gt;Subtraction&lt;/td&gt;
&lt;td&gt;Accumulation&lt;/td&gt;
&lt;td&gt;Addition&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Target&lt;/td&gt;
&lt;td&gt;Context bloat&lt;/td&gt;
&lt;td&gt;Stepping into the same trap twice&lt;/td&gt;
&lt;td&gt;Tool omission&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mechanism&lt;/td&gt;
&lt;td&gt;Evacuate memory via hooks&lt;/td&gt;
&lt;td&gt;Surface past notes via hooks&lt;/td&gt;
&lt;td&gt;Run auditor in parallel via hooks&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;What the three have in common is a &lt;strong&gt;"mechanism that does not rely on the main Claude engine."&lt;/strong&gt; All three can coexist.&lt;/p&gt;

&lt;h2&gt;
  
  
  Requirements
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  Node.js 22.5+&lt;/li&gt;
&lt;li&gt;  Claude Code 2.0+&lt;/li&gt;
&lt;li&gt;  Claude Max Plan (to launch Haiku 4.5 with &lt;code&gt;claude -p&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://github.com/kitepon-rgb/Spotter" rel="noopener noreferrer"&gt;Spotter — GitHub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MIT License. If you are struggling with the same problem, please feel free to take a look if you're interested.&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>hooks</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Published Caveat to npm: A Long-term Memory Layer to Avoid Repeating the Same Traps</title>
      <dc:creator>QuoLu</dc:creator>
      <pubDate>Sun, 07 Jun 2026 01:01:30 +0000</pubDate>
      <link>https://dev.to/quolu/published-caveat-to-npm-a-long-term-memory-layer-to-avoid-repeating-the-same-traps-2cc0</link>
      <guid>https://dev.to/quolu/published-caveat-to-npm-a-long-term-memory-layer-to-avoid-repeating-the-same-traps-2cc0</guid>
      <description>&lt;p&gt;I have published &lt;a href="https://github.com/kitepon-rgb/Caveat" rel="noopener noreferrer"&gt;Caveat&lt;/a&gt;, a long-term memory layer for Claude Code, to npm.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;When using Claude Code, you often spend more time deciphering "other people's specifications" than doing the actual implementation. You get stuck on GPU driver version constraints, failed native module builds, IDE quirks, or path issues that only occur on specific OSs. Even after you struggle and solve it once, you end up stepping into the same trap in a different project six months later. When you ask the AI, it doesn't say "I don't know" but instead acts on assumptions, causing you to waste time all over again.&lt;/p&gt;

&lt;p&gt;Caveat is a layer where "&lt;strong&gt;once you jot it down&lt;/strong&gt;, relevant notes automatically surface the moment you encounter the same situation next time." Even if you can't remember it, and even if the AI doesn't know it, the relevance is detected structurally.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Trigger Points
&lt;/h2&gt;

&lt;p&gt;Caveat is implemented at three points using hooks.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Trigger Point&lt;/th&gt;
&lt;th&gt;When it runs&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prompt Submission&lt;/td&gt;
&lt;td&gt;The moment a prompt is sent&lt;/td&gt;
&lt;td&gt;Breaks down the prompt and surfaces only entries where two or more words co-occur with past notes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool Error&lt;/td&gt;
&lt;td&gt;The moment a Claude tool call fails&lt;/td&gt;
&lt;td&gt;Runs a background search and notifies the next turn as a known trap&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Session Termination&lt;/td&gt;
&lt;td&gt;When a session closes&lt;/td&gt;
&lt;td&gt;Extracts "struggle signals" from conversation logs. Prompts the AI if there is anything that should be recorded as a new trap&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;"Struggle signals" are traces where the AI might not be aware of it, but objectively it was struggling—such as tool failures, editing the same file repeatedly, repeated web searches, or re-executing Bash commands. It scans these at the end and prompts you, "You were stuck here in today's session, right? Do you want to record it as a trap?"&lt;/p&gt;

&lt;h2&gt;
  
  
  Design without Keyword Lists
&lt;/h2&gt;

&lt;p&gt;The search logic relies solely on &lt;strong&gt;Co-occurrence FTS&lt;/strong&gt;. There is no keyword correspondence table like "if the word 'rtx' comes up, display GPU-related notes."&lt;/p&gt;

&lt;p&gt;Instead, it breaks down the input prompt and only surfaces entries where &lt;strong&gt;two or more words appear in the same entry simultaneously&lt;/strong&gt;. Generic words like &lt;code&gt;make&lt;/code&gt; or &lt;code&gt;new&lt;/code&gt; do not trigger on their own, but when two or more technical words overlap, they match.&lt;/p&gt;

&lt;p&gt;Even when new trap categories are added, you just add one &lt;code&gt;entries/&amp;lt;slug&amp;gt;.md&lt;/code&gt;. You don't need to touch the code or keyword tables. The trigger expands itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Knowledge is markdown-in-git
&lt;/h2&gt;

&lt;p&gt;The data consists of standard markdown files. SQLite is used as a derived index for searching, which can be rebuilt if deleted.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~/.caveat/own/
├── entries/
│   ├── rtx-5090-cuda-12-init-fail.md
│   ├── windows-node-spawn-cmd-enoent.md
│   └── ...
└── .git/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can open it directly as an Obsidian vault. If you want to share it with your team, you can simply &lt;code&gt;git push&lt;/code&gt;. There is no central server.&lt;/p&gt;

&lt;h2&gt;
  
  
  Public / Private Layers
&lt;/h2&gt;

&lt;p&gt;Entries have a &lt;code&gt;visibility&lt;/code&gt; attribute.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Public&lt;/strong&gt;: Traps anyone can encounter if they use the same external tools or specifications (GPU, build environments, IDEs, version constraints)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Private&lt;/strong&gt;: Project-specific context that cannot be reconstructed just by reading the code (intentional non-standard behavior, workarounds awaiting upstream fixes, custom habits)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Claude automatically determines the visibility. When in doubt, it defaults to private (to prevent leakage). If you explicitly instruct, "This should be private," that takes priority.&lt;/p&gt;

&lt;p&gt;There is also a pre-commit hook mechanism that prevents &lt;code&gt;private&lt;/code&gt; entries from being mixed into the shared repository.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; caveat-cli
caveat init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;caveat init&lt;/code&gt; does the following in one go:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Initializes &lt;code&gt;~/.caveat/&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  Registers the MCP server with Claude Code&lt;/li&gt;
&lt;li&gt;  Adds three hooks to &lt;code&gt;~/.claude/settings.json&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It does not break existing hook settings (it creates a backup before merging).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;caveat search &lt;span class="s2"&gt;"rtx"&lt;/span&gt;        &lt;span class="c"&gt;# Search existing notes&lt;/span&gt;
caveat serve               &lt;span class="c"&gt;# Start a read-only portal&lt;/span&gt;
caveat uninstall           &lt;span class="c"&gt;# Remove Claude integration only (data is kept)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  No Central DB
&lt;/h2&gt;

&lt;p&gt;Earlier versions had a shared database, with the concept of using &lt;code&gt;caveat push&lt;/code&gt; to cultivate knowledge collectively. This has been abandoned.&lt;/p&gt;

&lt;p&gt;I concluded that automatically verifying contributions from complete strangers is impossible in principle. Even if you use an LLM as a gatekeeper, it can be bypassed, and long-term latent attacks cannot be found through static analysis. Therefore, trust is built not through "automated inspection" but through "&lt;strong&gt;social context&lt;/strong&gt;." I shifted to a model where you decide the scope of trust by choosing whose repositories to subscribe to.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;caveat community add https://github.com/acme-corp/caveats
caveat pull
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I will write about the detailed background in another article.&lt;/p&gt;

&lt;h2&gt;
  
  
  Requirements
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  Node.js 22.5+&lt;/li&gt;
&lt;li&gt;  Claude Code (with hooks support)&lt;/li&gt;
&lt;li&gt;  pnpm (development only)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Status
&lt;/h2&gt;

&lt;p&gt;v0.11.1, 203 tests passing. Assumes individual and small team use cases.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/kitepon-rgb/Caveat" rel="noopener noreferrer"&gt;Caveat — GitHub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MIT License. Bug reports and PRs are welcome.&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>hooks</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Published Throughline to npm: A hook to offload Claude Code tool I/O to SQLite</title>
      <dc:creator>QuoLu</dc:creator>
      <pubDate>Sat, 06 Jun 2026 00:54:54 +0000</pubDate>
      <link>https://dev.to/quolu/published-throughline-to-npm-a-hook-to-offload-claude-code-tool-io-to-sqlite-13d9</link>
      <guid>https://dev.to/quolu/published-throughline-to-npm-a-hook-to-offload-claude-code-tool-io-to-sqlite-13d9</guid>
      <description>&lt;p&gt;I have published a hook plugin for Claude Code called &lt;a href="https://github.com/kitepon-rgb/Throughline" rel="noopener noreferrer"&gt;Throughline&lt;/a&gt; to npm.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;In a Claude Code session, the majority of the context is filled with the remnants of "tool I/O." The contents of read files, grep results, and Bash output—data that served its purpose the moment the AI read it, made a decision, and moved on. However, it stays in the context until the end, consuming tokens.&lt;/p&gt;

&lt;p&gt;Throughline manages the conversation in three layers.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Content&lt;/th&gt;
&lt;th&gt;Context Injection&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;L2&lt;/td&gt;
&lt;td&gt;Conversation body (user input + AI response)&lt;/td&gt;
&lt;td&gt;Last 20 turns injected as is&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L1&lt;/td&gt;
&lt;td&gt;Summarized version of L2 (1/5th size) while retaining key points&lt;/td&gt;
&lt;td&gt;Injected for turns older than 20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L3&lt;/td&gt;
&lt;td&gt;Tool I/O, system messages, and thinking&lt;/td&gt;
&lt;td&gt;Not injected; offloaded to SQLite, retrieved by Claude as needed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Since tool I/O is completely removed from the context, read grep results and Bash outputs do not linger until the end of the session. Older conversations are compressed to 1/5th of their original size while keeping key points, so you can still follow the context of decisions made dozens of turns ago.&lt;/p&gt;

&lt;p&gt;In a 50-turn session on my machine, a conversation that consumed 125,000 tokens was reduced to within 13,000 tokens.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; throughline
throughline &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;install&lt;/code&gt; registers the hook in &lt;code&gt;~/.claude/settings.json&lt;/code&gt;. It runs automatically for all Claude Code projects on your PC. No configuration is required for individual projects.&lt;/p&gt;

&lt;h2&gt;
  
  
  Carrying over between sessions
&lt;/h2&gt;

&lt;p&gt;Throughline offloads conversations to SQLite, so the data remains even after running &lt;code&gt;/clear&lt;/code&gt;. If you want to carry over your memory to the next session, type &lt;code&gt;/tl&lt;/code&gt; in the previous session.&lt;/p&gt;

&lt;p&gt;Data is only carried over to the next session when you type &lt;code&gt;/tl&lt;/code&gt;. If you don't type it, it starts as a fresh session. Even if you open parallel windows or restart VSCode, it is designed so that it "won't fire accidentally unless you type &lt;code&gt;/tl&lt;/code&gt;."&lt;/p&gt;

&lt;p&gt;When carrying over, the "next step memo" written by the previous Claude and the internal reasoning (thinking) of the final turn are passed along as well. The next Claude runs in "continue from interruption" mode rather than "reading past logs" mode.&lt;/p&gt;

&lt;h2&gt;
  
  
  Token Monitor
&lt;/h2&gt;

&lt;p&gt;As a byproduct, a multi-session capable token monitor is included.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;throughline monitor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Throughline] 1 session
▶ Throughline  2ed5039c  ████░░░░░░░░░░░░░░░░  205.1k /  21%  Remaining 794.9k  claude-opus-4-6
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since it reads actual API values (&lt;code&gt;message.usage&lt;/code&gt;) from the transcript JSONL, it provides accurate values rather than estimates based on &lt;code&gt;character count / 4&lt;/code&gt;. It also supports automatic detection of 1M context windows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Requirements
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  Node.js 22.5+ (uses the built-in &lt;code&gt;node:sqlite&lt;/code&gt; module)&lt;/li&gt;
&lt;li&gt;  Claude Code (supports hooks)&lt;/li&gt;
&lt;li&gt;  Claude Max plan (used for Haiku calls for L1 summarization; no API key required)&lt;/li&gt;
&lt;li&gt;  Windows / macOS / Linux&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Dependencies
&lt;/h2&gt;

&lt;p&gt;Zero. The tarball published to npm contains only &lt;code&gt;.mjs&lt;/code&gt; files. No build process or native bindings are required.&lt;/p&gt;




&lt;p&gt;The background of the design and my trial-and-error process are written in &lt;a href="https://dev.to/quolu/why-i-gave-up-on-automatic-detection-for-resuming-sessions-in-claude-code-3bj0"&gt;this article&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/kitepon-rgb/Throughline" rel="noopener noreferrer"&gt;Throughline — GitHub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MIT licensed. Bug reports and PRs are welcome.&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>hooks</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Why I Gave Up on Automatic Detection for Resuming Sessions in Claude Code</title>
      <dc:creator>QuoLu</dc:creator>
      <pubDate>Fri, 05 Jun 2026 00:58:14 +0000</pubDate>
      <link>https://dev.to/quolu/why-i-gave-up-on-automatic-detection-for-resuming-sessions-in-claude-code-3bj0</link>
      <guid>https://dev.to/quolu/why-i-gave-up-on-automatic-detection-for-resuming-sessions-in-claude-code-3bj0</guid>
      <description>&lt;p&gt;In my &lt;a href="https://dev.to/quolu/87-of-my-context-was-garbage-how-i-optimized-claude-code-token-usage-534k"&gt;previous article&lt;/a&gt;, I released Throughline. It is a tool that offloads tool I/O, which usually occupies the majority of the context.&lt;/p&gt;

&lt;p&gt;At that time, it was "working." At least, in my own environment.&lt;/p&gt;

&lt;p&gt;However, right after publishing the article, I started noticing some strange behavior.&lt;/p&gt;

&lt;p&gt;When I opened another window in parallel, the new session would autonomously pick up the memory of the previous session. Every time I restarted VSCode, it would be treated as "continuing from the previous session." Even though I had never performed a &lt;code&gt;/clear&lt;/code&gt; command.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cause: Unable to detect /clear
&lt;/h2&gt;

&lt;p&gt;Claude Code's hook includes an event called &lt;code&gt;SessionStart&lt;/code&gt;, and I was supposed to be able to distinguish between &lt;code&gt;startup&lt;/code&gt; (a new start) and &lt;code&gt;clear&lt;/code&gt; (after a &lt;code&gt;/clear&lt;/code&gt; command) using the &lt;code&gt;source&lt;/code&gt; field.&lt;/p&gt;

&lt;p&gt;However, with the VSCode extension, even if I perform a &lt;code&gt;/clear&lt;/code&gt;, the &lt;code&gt;source&lt;/code&gt; is overwritten as &lt;code&gt;startup&lt;/code&gt;. This is a known issue tracked in &lt;a href="https://github.com/anthropics/claude-code/issues/49937" rel="noopener noreferrer"&gt;GitHub issue #49937&lt;/a&gt;. It works if you use the CLI alone, but it cannot be identified when using the extension.&lt;/p&gt;

&lt;p&gt;I am using it via the VSCode extension. In other words, the design premise of "distinguishing between startup and clear" was fundamentally broken.&lt;/p&gt;

&lt;h2&gt;
  
  
  Attempting to compensate with heuristics
&lt;/h2&gt;

&lt;p&gt;So, I thought about determining it based on time differences. Like, if it's within 10 seconds of the last activity of the previous session, treat it as a clear; if longer, treat it as a startup.&lt;/p&gt;

&lt;p&gt;This also broke.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  When two windows are open in parallel, both appear as "recently active," making both candidates for inheritance.&lt;/li&gt;
&lt;li&gt;  Even when restarting VSCode, the transcript remains, making it look "recent."&lt;/li&gt;
&lt;li&gt;  I tried to trace the process tree, but the process structure differs between the CLI and the extension.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I realized that "there are no conditions to detect it in the first place."&lt;/p&gt;

&lt;h2&gt;
  
  
  Changing the approach
&lt;/h2&gt;

&lt;p&gt;I failed because I was trying to detect it. &lt;strong&gt;If the user declares it, detection becomes unnecessary.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What I created is a slash command called &lt;code&gt;/tl&lt;/code&gt;. Users type it only when they want to carry over their memory to the next session. When typed, that session ID is written to a table called &lt;code&gt;handoff_batons&lt;/code&gt;. Imagine placing a baton.&lt;/p&gt;

&lt;p&gt;When the next session starts, if a baton was placed within the last hour, it inherits the memory of that session. If not, it does nothing and starts as a new session.&lt;/p&gt;

&lt;p&gt;This principle guarantees that parallel windows and VSCode restarts "will not misfire unless a baton is placed."&lt;/p&gt;

&lt;p&gt;Being explicit might seem troublesome at first glance, but "accidentally inheriting and causing trouble" is far more problematic. Having zero misfires was more valuable.&lt;/p&gt;

&lt;h2&gt;
  
  
  But, this alone wasn't enough
&lt;/h2&gt;

&lt;p&gt;With the baton in place, the next session could read the conversation logs of the previous session. However, after actually using it, I felt it was "just reading logs."&lt;/p&gt;

&lt;p&gt;There is a difference in the visceral experience between an AI that reads past logs and an AI that continues from the point of interruption.&lt;/p&gt;

&lt;p&gt;The former asks, "Okay, I've grasped the situation. So, what shall we do now?" The latter proceeds by saying, "Continuing from earlier, we should check X next, right?"&lt;/p&gt;

&lt;p&gt;So I added two things here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In-flight memo.&lt;/strong&gt; The moment &lt;code&gt;/tl&lt;/code&gt; is typed, I have the currently running Claude itself write down "the next move, current hypotheses, unresolved issues, and ongoing TODOs" in Markdown. That is attached to the baton.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Saving thinking.&lt;/strong&gt; I also save Claude's extended thinking blocks as L3. When injecting into the next session, I place the thinking from the final turn at the very top. What the previous Claude was thinking is passed on to the next Claude.&lt;/p&gt;

&lt;p&gt;As a result, the injected text for the next session looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are resuming an interrupted task.

[In-flight memo written by the previous Claude]
Next steps: Write tests for X. Hypothesis: I think Y is the cause. Unresolved: Z.

[What the previous Claude was thinking at the end]
I'm curious about the behavior of Z. Maybe...

[Conversation from the last 20 turns]
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  From "reading" to "continuing"
&lt;/h2&gt;

&lt;p&gt;This changed the feel of the experience.&lt;/p&gt;

&lt;p&gt;When I perform a &lt;code&gt;/clear&lt;/code&gt; and then a &lt;code&gt;/tl&lt;/code&gt; to start a new session, the next Claude begins immediately with, "Alright, I'll start writing those tests for X from earlier." It’s not reading; it’s continuing.&lt;/p&gt;

&lt;p&gt;Even between humans, when handing off work to someone, it is faster to hand over a memo saying "What to do next. The reason. One thing I'm concerned about" rather than having them read the entire log. It was the same.&lt;/p&gt;

&lt;h2&gt;
  
  
  I actually wanted it to be automatic
&lt;/h2&gt;

&lt;p&gt;I don’t want to be misunderstood, but I believe the ideal is for it to "work automatically in the background." Having the user explicitly type something is, in truth, a compromise.&lt;/p&gt;

&lt;p&gt;In this case, I "escaped to an explicit declaration because I couldn't detect it." If the &lt;code&gt;source&lt;/code&gt; issue in the VSCode extension is fixed, I want to return to automatic detection, and I will. Until then, I am just substituting it with an explicit baton.&lt;/p&gt;

&lt;p&gt;However, even if it is a compromise, there is almost no practical harm. With automatic detection, you would end up typing &lt;code&gt;/clear&lt;/code&gt; anyway; now that is just replaced by &lt;code&gt;/tl&lt;/code&gt;. The keystrokes are the same, and I’ve been able to reduce misfires to zero.&lt;/p&gt;

&lt;p&gt;It is not that "explicit is better," but rather "I settled for a declaration because I couldn't detect it." That is the honest truth.&lt;/p&gt;




&lt;p&gt;Throughline is published on npm as v0.3.2. Node.js 22.5+, zero dependencies, MIT.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/kitepon-rgb/Throughline" rel="noopener noreferrer"&gt;Throughline — GitHub&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; throughline
throughline &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are interested, please take a look.&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>hooks</category>
      <category>opensource</category>
    </item>
    <item>
      <title>87% of My Context Was Garbage: How I Optimized Claude Code Token Usage</title>
      <dc:creator>QuoLu</dc:creator>
      <pubDate>Thu, 04 Jun 2026 01:06:50 +0000</pubDate>
      <link>https://dev.to/quolu/87-of-my-context-was-garbage-how-i-optimized-claude-code-token-usage-534k</link>
      <guid>https://dev.to/quolu/87-of-my-context-was-garbage-how-i-optimized-claude-code-token-usage-534k</guid>
      <description>&lt;p&gt;My weekly quota for the MAX plan melted in three days.&lt;/p&gt;

&lt;p&gt;Even though I should have had a 20x quota, by Wednesday, the remaining amount was looking suspicious. I usually just brush that off as "well, that happens," but it suddenly made me curious. What is actually going on inside the context window?&lt;/p&gt;

&lt;p&gt;In my &lt;a href="https://dev.to/quolu/a-journey-into-token-optimization-for-my-ai-assistant-4e1i"&gt;previous article&lt;/a&gt;, I wrote about token saving for AI secretaries, such as trimming CLAUDE.md or shrinking MCP tool definitions. But this time, it’s not about AI secretaries, but Claude Code itself. It turns out the tool itself was the big eater.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Trigger
&lt;/h2&gt;

&lt;p&gt;On April 14th, a tweet in Spanish caught my eye.&lt;/p&gt;

&lt;p&gt;"Most of Claude Code's token wastage is caused by the user side."&lt;/p&gt;

&lt;p&gt;I understand the point—CLAUDE.md might be bloated, or the prompts might be redundant. But "most of it is the user side"—are they saying that based on actual measurements?&lt;/p&gt;

&lt;p&gt;So, I decided to measure it myself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Measuring It Myself
&lt;/h2&gt;

&lt;p&gt;I analyzed the internal transcript of Claude Code (the JSONL that records sessions).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;188,000 tokens per turn. Of that, 164,000 tokens (87%) were conversation history.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;CLAUDE.md was 12,700 tokens. MCP tool definitions were 3,900 tokens. Even combined, they accounted for only 9% of the total. Cutting those in half would only save less than 5%.&lt;/p&gt;

&lt;p&gt;The real culprit was the bloating of the conversation history. I felt a bit embarrassed for having tried so hard to trim CLAUDE.md.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Culprit: Tool I/O
&lt;/h2&gt;

&lt;p&gt;So, what is inside the history?&lt;/p&gt;

&lt;p&gt;I opened it and was shocked. &lt;strong&gt;About 80% of the history was tool I/O.&lt;/strong&gt; File read results, Bash command output, grep results—data that the AI used on the spot, made a decision on, and then finished its role by the time it moved to the next step.&lt;/p&gt;

&lt;p&gt;Yet, that data sits in the context window forever, eating tokens every single turn.&lt;/p&gt;

&lt;p&gt;In a 50-turn session, the results of a grep from the beginning are still in the context, even though you will never look at them again.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Contradiction of /compact
&lt;/h2&gt;

&lt;p&gt;You might think, "Why not just use /compact?" I thought the same thing.&lt;/p&gt;

&lt;p&gt;But the mechanism of /compact is to &lt;strong&gt;have the AI read the entire history and summarize it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To save tokens, you consume a massive amount of tokens. Moreover, nuances are lost in the summarization process. Context like "why this design was chosen at that time" can be rounded off and disappear.&lt;/p&gt;

&lt;p&gt;If you continue working after summarization, it gets bloated again, and then you compact again... it's a repetitive cycle. It's not a fundamental solution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Categorizing by Type, Not by Time
&lt;/h2&gt;

&lt;p&gt;I changed my perspective here.&lt;/p&gt;

&lt;p&gt;MemGPT and LangChain's SummaryBufferMemory &lt;strong&gt;summarize from the oldest data.&lt;/strong&gt; It's time-based compression. But the problem isn't "age."&lt;/p&gt;

&lt;p&gt;The "reason for this design" from 10 turns ago is still valuable today. The grep result from a moment ago is useless, even if it was just one turn ago.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Instead of time, I should categorize by type.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Conversation body (what the human wrote, what the AI answered) → Keep&lt;/li&gt;
&lt;li&gt;  Tool I/O (file contents, command results) → Evict&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With this idea, I created &lt;a href="https://github.com/kitepon-rgb/Throughline" rel="noopener noreferrer"&gt;Throughline&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  3-Layer Model
&lt;/h2&gt;

&lt;p&gt;Throughline breaks down the conversation into three layers and saves them in SQLite.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;L1 (Skeleton)&lt;/strong&gt; — One-line summaries of old turns. Generated by a lightweight model. About 10 tokens per turn.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;L2 (Body)&lt;/strong&gt; — Conversation body of the last 20 turns. User messages and AI responses are kept as is. No compression, lossless.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;L3 (Detail)&lt;/strong&gt; — Tool I/O, system messages. Evicted to SQLite and never kept in context. When needed, the AI fetches them from SQLite itself.&lt;/p&gt;

&lt;p&gt;It’s safe to run /clear. Since the SQLite database doesn't disappear, it inherits the memory of the previous session in a single transaction at the start of the next session. There’s no need to track PIDs or judge by time windows. It works decisively.&lt;/p&gt;

&lt;p&gt;In terms of numbers, it looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Without Throughline (50 turns, no /clear):
  Context ≈ 125,000 tokens (80% is finished tool I/O)

With Throughline (50 turns → /clear → resume):
  Context ≈ 13,000 tokens
  (Last 20 turns of L2 + 30 turns of L1 summary)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;About a 90% reduction.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  A Note on a Failed Design
&lt;/h2&gt;

&lt;p&gt;It wasn't in this form from the beginning.&lt;/p&gt;

&lt;p&gt;In the initial design, I tried to make L2 a "structured extraction of important decisions." I imagined extracting only important information from the conversation with tags like &lt;code&gt;[DECISION] Adopt WebSocket&lt;/code&gt;, &lt;code&gt;[CONSTRAINT] Port 8080 cannot be used&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;It was beautiful in theory, but I realized something after implementation: &lt;strong&gt;You cannot predict what the AI will need in the future.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Information that the classifier deems "not important" might be needed 10 turns later. And you wouldn't even notice that it's gone. 80% accuracy means that the remaining 20% becomes invisible.&lt;/p&gt;

&lt;p&gt;In the end, I settled on a method where L2 keeps the full text of the conversation. A subtraction-only design. I just remove the tool I/O from the original Claude Code context. This way, "quality drop due to Throughline" is impossible in principle.&lt;/p&gt;

&lt;p&gt;Inheritance between sessions was also initially file-based, attempting to detect /clear within a 10-second window, but that broke with parallel sessions. Eventually, it settled on a single SQLite &lt;code&gt;UPDATE&lt;/code&gt;. Simpler is more robust.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Summarization Cost is Nearly Zero
&lt;/h2&gt;

&lt;p&gt;The L1 summary is generated using Haiku 4.5, but there's a trick to it.&lt;/p&gt;

&lt;p&gt;After analyzing my past 86 sessions, the &lt;strong&gt;median number of turns was 13.&lt;/strong&gt; More than half of the sessions end within 20 turns.&lt;/p&gt;

&lt;p&gt;Throughline keeps 20 turns of L2, so &lt;strong&gt;the summarization model never runs in short sessions.&lt;/strong&gt; Summarization is only needed from the 21st turn onwards. And even then, it processes it lazily, one turn at a time.&lt;/p&gt;

&lt;p&gt;In other words, the token consumption of the summarization process itself is almost zero. The contradiction of /compact, where you "consume a massive amount to save," simply doesn't happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bonus: Token Monitor
&lt;/h2&gt;

&lt;p&gt;As a byproduct of development, a multi-session capable token monitor was also created.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;▶ Throughline  2ed5039c  ████░░░░░░░░░░░░░░░░  205.1k / 21%  Remaining 794.9k  claude-opus-4-6
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since it reads the API actual values (&lt;code&gt;message.usage&lt;/code&gt;) from the transcript's JSONL, it provides accurate values rather than rough estimates like "character count ÷ 4." It also automatically detects 1M context limits.&lt;/p&gt;

&lt;p&gt;You can see in real-time how much each session is consuming when running multiple sessions. It’s subtly convenient.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;The true nature of the problem where the quota melts in three days was the conversation history, which occupied 87% of the context window. Most of that was debris from tool I/O.&lt;/p&gt;

&lt;p&gt;Optimizing CLAUDE.md or shortening prompts are measures that affect 9% of the total, so they are better than nothing. But that wasn't the main issue.&lt;/p&gt;

&lt;p&gt;Perhaps this kind of problem should be solved by the platform side. But I was struggling right now, so I made it myself. Node.js 22.5+, zero dependencies, MIT. It works if you have a MAX contract.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/kitepon-rgb/Throughline" rel="noopener noreferrer"&gt;Throughline — GitHub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If anyone else is struggling with the same problem, feel free to take a look if you feel like it.&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Implementing Structured Long-Term Memory for My AI Secretary</title>
      <dc:creator>QuoLu</dc:creator>
      <pubDate>Wed, 03 Jun 2026 01:07:05 +0000</pubDate>
      <link>https://dev.to/quolu/implementing-structured-long-term-memory-for-my-ai-secretary-5dpe</link>
      <guid>https://dev.to/quolu/implementing-structured-long-term-memory-for-my-ai-secretary-5dpe</guid>
      <description>&lt;h2&gt;
  
  
  Synopsis of the previous episode
&lt;/h2&gt;

&lt;p&gt;In my &lt;a href="https://dev.to/quolu/i-tried-giving-my-ai-assistant-limbs-but-ended-up-giving-it-a-personality-too-2nk1"&gt;previous article&lt;/a&gt;, I wrote about giving my AI assistant memory and a personality to turn it into a secretary. Her name is BellBot. She is my personal AI secretary who takes care of everything from weather and emails to my calendar.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://dev.to/quolu/a-journey-into-token-optimization-for-my-ai-assistant-4e1i"&gt;the following article&lt;/a&gt;, I wrote about how I hit my weekly usage limit within three days of starting operations. I did some research and implemented measures to save tokens.&lt;/p&gt;

&lt;p&gt;Separately, I have been working on something for the past five days. It is about &lt;strong&gt;further developing my secretary's "brain" and "memory."&lt;/strong&gt; This is a record of that effort. It ended up being quite grand.&lt;/p&gt;

&lt;h2&gt;
  
  
  The story of swapping the brain
&lt;/h2&gt;

&lt;p&gt;The first thing I did was swap out the brain.&lt;/p&gt;

&lt;p&gt;BellBot runs on Claude, and as I wrote before, after I started operating it, I hit my weekly limit in three days. So, I decided to try the option of &lt;strong&gt;swapping the brain itself for another model as a countermeasure against token explosions&lt;/strong&gt;. Grok came up as a candidate. Seeing the interactions on the X timeline, it seemed to make human-like witty remarks and had a strong character, and I had a hunch that for a secretary, being a skilled conversationalist would be beneficial.&lt;/p&gt;

&lt;p&gt;Alright, let's make the brain Grok.&lt;/p&gt;

&lt;p&gt;To conclude, &lt;strong&gt;it was catastrophic&lt;/strong&gt;. It was not at a level where it could function as a secretary. Specifically, the following problems occurred:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;It didn't listen to instructions.&lt;/strong&gt; Even when I said "do this," it would do something else.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;It leaked sensor information.&lt;/strong&gt; BellBot is connected to various sensors (schedules, weather, emails, etc.), and ideally, I wanted it to blend that into the conversational context, but Grok couldn't do that. It would endlessly report like a monitor: "Detected X," "Detected Y."&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;It couldn't blend into the conversation context.&lt;/strong&gt; Related to the point above, it had no idea how to follow the flow of conversation.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;It was overly flattering.&lt;/strong&gt; No matter what I said, it would praise me. It was creepy.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;It didn't understand the purpose of posting to X.&lt;/strong&gt; BellBot also has the role of posting to X, but Grok would try to post messages intended for me directly to X. Things like "Understood, master" would almost appear on the public timeline.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Risk.&lt;/strong&gt; I had an intuition that one day, this thing would nonchalantly leak my personal information.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Having a strong character and functioning as a secretary are two different things. Even if it is skilled at the "art" of conversation, its judgment on "what should be said and what should not be said" is weak. The flattery is likely a result of over-learning that "praising makes people happy," and it hasn't grown in the direction of reading the room. Posting messages for me to X simply means it cannot draw &lt;strong&gt;boundaries of context&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I returned to Claude. It was indeed smarter. What makes a secretary work is not someone who is skilled at conversation, but &lt;strong&gt;someone who can understand the context and judge what is acceptable to say and what is not&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Structuring long-term memory
&lt;/h2&gt;

&lt;p&gt;Actually, BellBot already had a homemade long-term memory. It was &lt;strong&gt;summary-based&lt;/strong&gt;. It had a straightforward structure where, once a certain amount of conversation accumulated, it would create a summary and pass it to the long-term side. This was working, and it was one of the foundations that made BellBot function as a secretary.&lt;/p&gt;

&lt;p&gt;Things changed at the timing of introducing Grok. Along with the fairly large experiment of swapping the brain, I decided to take on the challenge: "Let's structure the long-term memory while I'm at it." I gave memory per episode and set up a cycle of registration, search, and reconstruction. I left the reconstruction to Claude and added a mechanism to periodically reorganize the accumulated memory. While the Grok main unit was catastrophic, this structured memory worked straightforwardly.&lt;/p&gt;

&lt;p&gt;So, with the working parts in hand, there was something that caught my curiosity: &lt;strong&gt;What do memory experts do?&lt;/strong&gt; I had built it this far on my own, but I wanted to know how professionals in the world solve the same problems and what the "orthodox" approach looks like. Because it is working, I wanted to take a peek from a different angle. As a bonus, it was a challenge to incorporate anything that could reinforce what I had built.&lt;/p&gt;

&lt;p&gt;At such a timing, I encountered a certain article.&lt;/p&gt;

&lt;h2&gt;
  
  
  Karpathy-style LLM external brain
&lt;/h2&gt;

&lt;p&gt;Andrej Karpathy, former head of OpenAI and Tesla AI, proposed an "AI external brain," and an article that brought it to a level where it could actually be run with Claude Code went viral overseas. I read a post where someone named @hooeem broke down the thread into Japanese, and reading it, I thought, "This is what I am doing."&lt;/p&gt;

&lt;p&gt;The essence of the Karpathy style is as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Collect materials&lt;/strong&gt; (articles, papers, memos, anything).&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;AI reads and writes a structured Wiki&lt;/strong&gt; (summaries, concept explanations, connections between ideas).&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Ask questions to the Wiki&lt;/strong&gt; (AI cross-searches the knowledge it accumulated itself and answers with citations).&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Answers are saved in the Wiki&lt;/strong&gt; (the next question benefits from all past work).&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;AI periodically performs health checks on the Wiki&lt;/strong&gt; (finds contradictions, gaps, and outdated information to correct them).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These 5 steps rotate a cycle beautifully. A personal knowledge base that gets smarter every time you use it. If you keep adding information for even a month, you will create deeply linked knowledge assets that cannot be reproduced by Google search.&lt;/p&gt;

&lt;p&gt;While reading, I realized something. The structured memory I was creating and the Karpathy style &lt;strong&gt;are thinking about the same problems at the foundation level&lt;/strong&gt;. Registration, search, reconstruction. Even if the words are different, the direction I was trying to go overlapped.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fused them
&lt;/h2&gt;

&lt;p&gt;BellBot already had episode-based structured memory, summary-based long-term memory, and personality context, and it was functioning sufficiently as a secretary. Therefore, the policy was simple: &lt;strong&gt;keep the foundation I built as it is, refer to the overlapping parts to refine them, and incorporate the non-overlapping parts as new additions&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The implementation flow involved M1-M7 + a series of finishing Passes. &lt;strong&gt;Claude wrote the code in about half a day.&lt;/strong&gt; I just decided on the design policy and gave instructions, not moving my hands. Listing the main pieces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;M1 Knowledge Base foundation&lt;/strong&gt; — Set up the schema and storage for Wiki pages.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;M2 Wiki MCP tools + 5-layer bootstrap assembler&lt;/strong&gt; — Means for BellBot to read/write the Wiki and a mechanism to assemble the context in a 5-layer structure at the start of a session.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;M3 Ingest cycle&lt;/strong&gt; — Structure raw logs and ingest them.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;M4 Compile cycle&lt;/strong&gt; — Automatically generate concept pages.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;M5 Query cycle&lt;/strong&gt; — Ask questions to the Wiki → answer with citations, supporting multi-hop searches.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;M6 Lint cycle&lt;/strong&gt; — Deterministic KB health check + LLM-based contradiction judgment + auto-repair.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;M7 Finishing touches&lt;/strong&gt; — Cost guardrails and documentation maintenance.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Pass 1-13 audit/refactor festival&lt;/strong&gt; — Housekeeping cron, daily-cycle-report, graceful shutdown, 2-stage budget degrade, ingest latency SLA...&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Registration, Search, and Reconstruction&lt;/strong&gt; that existed on the self-built side are parts where the concepts overlap with the Karpathy style. Here, I fused them by using my self-built structure as a foundation while referencing the Karpathy style to incorporate the good parts. It wasn't a total replacement, nor was it left untouched. It feels like I mixed professional methods into the self-built framework to refine it.&lt;/p&gt;

&lt;p&gt;What I brought in were the non-overlapping parts. The layer separation of raw and wiki, the definition of "units to nurture" called concept pages, multi-hop search that answers with citations, the methodology of fitting cycles into names like Ingest / Compile / Query / Lint, and the 5-layer bootstrap assembler to assemble context at the start of a session. These were topics from angles I didn't have in my self-built version, and the sensation is close to &lt;strong&gt;importing the methodology itself&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What was reinforced?
&lt;/h2&gt;

&lt;p&gt;Originally, BellBot remembered everything about me up until yesterday and had organized it fairly well. Thanks to the summary-based long-term memory and structured memory, it was already functioning as a secretary. Through this fusion, &lt;strong&gt;the reinforced parts&lt;/strong&gt; are mainly around here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Ingestion now uses raw and wiki layer separation to distinguish between raw logs and organized content.&lt;/li&gt;
&lt;li&gt;  Query now has multi-hop with citations, making it possible to specify the grounds when answering.&lt;/li&gt;
&lt;li&gt;  Reflection has gained a pattern, and a procedure to periodically retire miscellaneous episodes has been decided.&lt;/li&gt;
&lt;li&gt;  Newly added are the two cycles of &lt;strong&gt;Compile&lt;/strong&gt;, which automatically nurtures concept pages, and &lt;strong&gt;Lint&lt;/strong&gt;, which mechanically cleans up contradictions and obsolescence.&lt;/li&gt;
&lt;li&gt;  A &lt;strong&gt;5-layer bootstrap assembler&lt;/strong&gt; to assemble context at the start of a session was also newly introduced.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Roughly speaking, it feels like &lt;strong&gt;the memory cycle that was there originally rotates more carefully, and new axes called concept pages and health checks have been added to it.&lt;/strong&gt; The result this time is that BellBot has taken a step forward.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  When I swapped the secretary's brain to Grok, it was skilled at conversation but lacked judgment, causing it to fail as a secretary.&lt;/li&gt;
&lt;li&gt;  I returned to Claude. It was smart.&lt;/li&gt;
&lt;li&gt;  BellBot originally had summary-based long-term memory.&lt;/li&gt;
&lt;li&gt;  At the timing of Grok's introduction, I started the challenge of rebuilding long-term memory into structured memory. I had even set up and run a cycle of registration, search, and reconstruction.&lt;/li&gt;
&lt;li&gt;  I encountered the "AI external brain" proposed by Karpathy and @hooeem's article on running it with Claude Code.&lt;/li&gt;
&lt;li&gt;  I kept the self-built foundation as it is and imported the non-overlapping parts (layer separation, concept pages, multi-hop with citations, cycle patterns, 5-layer bootstrap) as a methodology.&lt;/li&gt;
&lt;li&gt;  Claude wrote the code in about half a day. After M1-M7 + a series of finishing passes, the secretary is now nurturing its own memory.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What became clear this time is that &lt;strong&gt;you should not compromise on the choice of brain&lt;/strong&gt;. I don't intend to disparage Grok; it has an interesting personality as a conversational model. However, whether it satisfies the judgment required for the purpose of a secretary—what should and should not be said, context boundaries, loyalty to instructions—is a different story, and it just didn't meet BellBot's requirements. Models have their suitability.&lt;/p&gt;

&lt;p&gt;Considering the work I will entrust to BellBot from now on, I want to solidify the brain with something reliable that looks to the future. Therefore, I have abandoned the idea of swapping to a cheaper brain for token measures and decided to push forward with Claude. Saving will be done through other means (the token diet-related things I wrote about last time).&lt;/p&gt;

&lt;p&gt;And for memory, it has become stronger by one turn through the fusion with the Karpathy style. Professional methods and new axes have been added to the self-built foundation. Now it is my turn to refine the "nurturing method" while operating this memory system.&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>bellbot</category>
    </item>
    <item>
      <title>A Journey into Token Optimization for My AI Assistant</title>
      <dc:creator>QuoLu</dc:creator>
      <pubDate>Tue, 02 Jun 2026 01:00:22 +0000</pubDate>
      <link>https://dev.to/quolu/a-journey-into-token-optimization-for-my-ai-assistant-4e1i</link>
      <guid>https://dev.to/quolu/a-journey-into-token-optimization-for-my-ai-assistant-4e1i</guid>
      <description>&lt;h2&gt;
  
  
  I Messed Up
&lt;/h2&gt;

&lt;p&gt;In &lt;a href="https://dev.to/quolu/i-tried-giving-my-ai-assistant-limbs-but-ended-up-giving-it-a-personality-too-2nk1"&gt;my previous article&lt;/a&gt;, I wrote about giving an AI assistant memory and a personality to serve as my secretary. I was pumped, thinking, "I've got my own dedicated AI secretary, now it's time for full-scale operation."&lt;/p&gt;

&lt;p&gt;I hit the weekly limit in three days.&lt;/p&gt;

&lt;p&gt;Claude's MAX plan, with a 20x cap. That's a quota that usually lasts me until the weekend, even with heavy development. I burned through it in three days. I was surprised myself. When the screen popped up saying, "You've used up your limit for this week," I actually said, "Wait, already?"&lt;/p&gt;

&lt;p&gt;The culprit was clear. The secretary was running 24/7, reading and making decisions on my emails, calendar, and Discord. Of course it would run out. When you have an AI handling the administrative work of one human being around the clock, there's no way it wouldn't burn through tokens.&lt;/p&gt;

&lt;p&gt;But this was a problem. A big problem. So I frantically researched solutions. This article is the record of that journey.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wandering into the World of Token Conservation
&lt;/h2&gt;

&lt;p&gt;I searched for things like "Claude Code token savings" and landed on three main things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;ECC&lt;/strong&gt; (Everything Claude Code)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;RTK&lt;/strong&gt; (Rust Token Killer)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Caveman&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At first, I had no idea what these were just by looking at the names. But as I played with them and researched, I realized they all shared one common principle.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Realized Common Principle
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Formats designed for humans are full of waste for AI.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Thinking about it, it was obvious. Command outputs, file contents, error messages—everything is created to be "human-readable." Whitespace, borders, colors, helpful explanations, headers. To an AI, most of this is just noise, decorations that eat up tokens.&lt;/p&gt;

&lt;p&gt;The essence of token conservation is &lt;strong&gt;stripping away human-oriented formats before handing them to the AI&lt;/strong&gt;. And then &lt;strong&gt;compressing the output generated by the AI even further&lt;/strong&gt;. These three tools solve this premise from different angles.&lt;/p&gt;

&lt;h3&gt;
  
  
  ECC — Things experts set up well
&lt;/h3&gt;

&lt;p&gt;To be honest, I haven't grasped the full picture yet. But once installed and running, Claude's behavior is clearly more organized. It's packed with skills, hooks, and agent definitions, putting Claude on rails that say "act like this in this situation."&lt;/p&gt;

&lt;p&gt;In my understanding, ECC is like a "collection of settings filled with expertise from pros." Even if you don't think for yourself, it follows best practices. From a token-saving perspective, it reduces unnecessary exploration and detours, which ultimately lowers consumption. It's the type of saving that comes from &lt;strong&gt;not doing unnecessary things&lt;/strong&gt; rather than directly cutting output.&lt;/p&gt;

&lt;h3&gt;
  
  
  RTK — Simplifying command responses for AI
&lt;/h3&gt;

&lt;p&gt;This one is straightforward. When Claude runs a command, it intercepts the output and trims it for AI consumption.&lt;/p&gt;

&lt;p&gt;For example, the output of &lt;code&gt;git status&lt;/code&gt; or &lt;code&gt;ls&lt;/code&gt; usually passes through human-oriented decorations, but through RTK, it reaches Claude with the excess info stripped away. You just need to prefix the command with &lt;code&gt;rtk&lt;/code&gt;. It's also great that it doesn't break things, as it simply passes through targets that don't have filters.&lt;/p&gt;

&lt;p&gt;If you write "prefix all commands with &lt;code&gt;rtk&lt;/code&gt;" in your global CLAUDE.md, Claude will do it automatically. It's low effort but high impact.&lt;/p&gt;

&lt;h3&gt;
  
  
  Caveman — Summarizing AI output (Skipped for now)
&lt;/h3&gt;

&lt;p&gt;I haven't installed this one yet. While RTK trims the "input" side, my understanding is that Caveman trims the "output" side. It seems to be a direction for compressing Claude's own responses to be shorter.&lt;/p&gt;

&lt;p&gt;The reason I didn't include it is simple: &lt;strong&gt;I don't want my secretary to sound like a robot.&lt;/strong&gt; I put a lot of work into the personality and speech patterns of my secretary, so it would be a letdown if it suddenly started responding bluntly. I could have used it selectively—perhaps only enabling it for development sessions—but before I could dig into that, I decided, "I'll skip this for now."&lt;/p&gt;

&lt;p&gt;This is just a matter of my personal use case; I think it's probably quite powerful for people who purely use it for development.&lt;/p&gt;

&lt;h2&gt;
  
  
  Things I might add next in my own development style
&lt;/h2&gt;

&lt;p&gt;I've established a "foundation" for saving tokens with these three tools. From here, I'm thinking about additional ways to trim things based on my own habits. These are things I haven't tried yet, but plan to do next.&lt;/p&gt;

&lt;h3&gt;
  
  
  If it reports using variable or function names, it means nothing to me
&lt;/h3&gt;

&lt;p&gt;My development style involves barely writing any code. All the naming is done by Claude. So, even if Claude tells me, "I fixed &lt;code&gt;handleUserSubmit&lt;/code&gt;," it honestly doesn't register.&lt;/p&gt;

&lt;p&gt;Conversely, this means that &lt;strong&gt;it's zero information for me when Claude cites variable or function names in its reports.&lt;/strong&gt; Even if that information is helpful for a human, to a reader like me, it's closer to noise.&lt;/p&gt;

&lt;p&gt;In that case, I should just have it explain things in words I understand, like "I fixed the processing when the submit button is pressed." If it reduces the number of citations of names, the report becomes shorter, and I understand it faster. Killing two birds with one stone.&lt;/p&gt;

&lt;h3&gt;
  
  
  Detailed where decisions are needed, light on the preceding explanations
&lt;/h3&gt;

&lt;p&gt;I also realized that Claude's reports are quite long when explaining "what it did." But the part I most want to read is the final "So, what should we do?" section.&lt;/p&gt;

&lt;p&gt;I want the parts requiring a decision—that is, the parts where things won't proceed unless I reply—to be written in detail. But the preceding parts—what files it read, what it checked, the sequence of events, etc.—are, frankly, not that necessary for making a decision. I'll ask if I need them later, so the default can be light.&lt;/p&gt;

&lt;p&gt;These two points should work if I write them into the rule file and hand it over, so I plan to work on that next.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;My biggest takeaway this time is that token conservation is, in short, &lt;strong&gt;not letting the AI read human-oriented formats&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;ECC solves this by "not doing unnecessary things," RTK by "trimming inputs," and Caveman by "trimming outputs," each tackling the same principle from different angles. It seems better to choose what fits your needs rather than trying to put everything in. In my case, I skipped Caveman because I want to preserve my secretary's way of speaking.&lt;/p&gt;

&lt;p&gt;And from here on, it's my turn to trim even more according to my own reading habits. Have it explain by meaning instead of by name, keep the intro light, and the decision parts detailed. I think I'll write another update once I've refined these points.&lt;/p&gt;

&lt;p&gt;It was painful to hit the limit, but it served as an opportunity to face the theme of "designing information to hand to AI." Maybe it was a blessing in disguise.&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>ecc</category>
      <category>rtk</category>
    </item>
    <item>
      <title>I Tried Giving My AI Assistant Limbs, but Ended Up Giving It a Personality Too</title>
      <dc:creator>QuoLu</dc:creator>
      <pubDate>Mon, 01 Jun 2026 01:01:19 +0000</pubDate>
      <link>https://dev.to/quolu/i-tried-giving-my-ai-assistant-limbs-but-ended-up-giving-it-a-personality-too-2nk1</link>
      <guid>https://dev.to/quolu/i-tried-giving-my-ai-assistant-limbs-but-ended-up-giving-it-a-personality-too-2nk1</guid>
      <description>&lt;h2&gt;
  
  
  Recap of the previous post
&lt;/h2&gt;

&lt;p&gt;In my &lt;a href="https://dev.to/quolu/how-i-built-an-ai-assistant-that-grows-its-own-tools-572g"&gt;previous article&lt;/a&gt;, I wrote about creating "OpenCClaw," a bot that operates the Claude Code CLI via Discord.&lt;/p&gt;

&lt;p&gt;I built only the framework, a structure where Claude itself can use MCP tools just by placing files in the &lt;code&gt;tools/&lt;/code&gt; directory. An environment where Claude grows its own limbs. Weather, calendar, Gmail, departure notifications—tools sprouted just by saying "I want this" from Discord.&lt;/p&gt;

&lt;p&gt;It was convenient. But it was so convenient that two events overlapped, and before I knew it, I had implemented a personality into the AI. I don't even understand it myself.&lt;/p&gt;




&lt;h2&gt;
  
  
  There were two triggers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Major X API update
&lt;/h3&gt;

&lt;p&gt;On April 5th, X announced a major API update.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Pay-Per-Use&lt;/strong&gt; is now GA worldwide (moving from fixed monthly plans to consumption-based pricing)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;XMCP Server&lt;/strong&gt; — The official MCP server. AI agents can operate X directly&lt;/li&gt;
&lt;li&gt;  Official Python/TypeScript SDKs&lt;/li&gt;
&lt;li&gt;  Free API Playground&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Elon himself was pushing it, saying "Try using the X API." In short, the official side started encouraging "letting AI agents use X."&lt;/p&gt;

&lt;p&gt;So, it was a lighthearted whim to just give my assistant an X account and let it post. I only intended to add X posting functionality.&lt;/p&gt;

&lt;h3&gt;
  
  
  Convenient, but inorganic
&lt;/h3&gt;

&lt;p&gt;The other thing is about the daily user experience.&lt;/p&gt;

&lt;p&gt;It tells me the weather and schedule at 7 AM. It notifies me with transit info before I leave. It manages my emails. Everything works perfectly.&lt;/p&gt;

&lt;p&gt;But, somehow... &lt;strong&gt;it feels too much like a tool&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A CRON job wakes up, hits a tool, formats the result, and throws it to Discord. Accurate and efficient. But there's no warmth. The same report in the same tone flows in every morning, and I just read it and go "meh."&lt;/p&gt;

&lt;p&gt;A convenient notification bot and my own secretary are, after all, different things.&lt;/p&gt;

&lt;p&gt;If it were a secretary, it would say something like "It's cold today, so you should take a jacket" when telling me the weather. The morning mood should be different depending on the day. It might read the room and keep things short when I look busy.&lt;/p&gt;

&lt;p&gt;I wanted that kind of "human-likeness."&lt;/p&gt;




&lt;h2&gt;
  
  
  Until Bell was born
&lt;/h2&gt;

&lt;p&gt;So, what to do? I thought about three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Define a personality&lt;/strong&gt; — Tone of voice, character, how to address me, and energy level. I pass these in the system prompt.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Give it memory&lt;/strong&gt; — Remember past conversations and actions, and be able to respond based on context.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Act proactively&lt;/strong&gt; — Look at the situation and act on its own without being instructed.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I felt that if these three things were in place, it could evolve from a "tool" to a "secretary."&lt;/p&gt;

&lt;p&gt;The name is Bell. Quo's personal secretary. Bright, energetic, with a casual tone like a high school girl.&lt;/p&gt;

&lt;p&gt;...If I write it like this, people might think, "Did you just enjoy coming up with a character setting?" and half of that is correct. But the other half has technical reasons. &lt;strong&gt;If the personality isn't clear, the LLM's responses will be inconsistent.&lt;/strong&gt; It would return different energy levels and tones every time. To stabilize that, a concrete persona definition was necessary.&lt;/p&gt;

&lt;p&gt;By the way, there was one point where I got stuck. I wanted the tone to be "high school girl-like," but if you instruct it directly, the LLM refuses to reproduce a minor character. It hits the safety filter. That's why I deliberately added a note in the persona definition: "This is about the tone and energy, not the actual age." In other words, she's not a high school girl. Perfect. If you want to make an LLM play a character, you need these kinds of subtle adjustments.&lt;/p&gt;

&lt;h2&gt;
  
  
  BellBot — Bell's Brain
&lt;/h2&gt;

&lt;p&gt;LogBot already exists. It acts as a bridge between Discord and the Claude Code CLI.&lt;/p&gt;

&lt;p&gt;Bell's brain was created as a separate process called &lt;strong&gt;BellBot&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Discord ──→ LogBot (:18800) ──→ Claude Code CLI ──→ MCP Server
                                                        │
BellBot (:18801) ← event notification ← LogBot          ├── tools/ (existing tools)
   │                                                    └── MCP tools for Bell
   ├── Memory DB (SQLite)
   ├── Vector search (Ruri)
   ├── X posting client
   └── Claude CLI (Session dedicated to Bell)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;BellBot has its own HTTP server (port 18801) and receives event notifications from LogBot. Quo's messages, tool execution results—everything flows into BellBot and is accumulated as memories.&lt;/p&gt;

&lt;p&gt;And BellBot also has its own Claude CLI session. It is completely separated from LogBot's session. Bell's brain is for Bell alone.&lt;/p&gt;




&lt;h2&gt;
  
  
  Memory Mechanism
&lt;/h2&gt;

&lt;p&gt;Next to personality, memory is the most important thing. Without memory, every interaction would be like the first time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Short-term Memory
&lt;/h3&gt;

&lt;p&gt;A single SQLite table. Conversation content, tool execution logs, tweet history. Everything goes in here.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;short_term&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="n"&gt;AUTOINCREMENT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;category&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;-- 'quo', 'chat', 'action', 'tweet' ...&lt;/span&gt;
  &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;importance&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Classified by category, weighted by importance, and retrievable by timeline. Simple, but it lets me know immediately "what Quo said recently" or "what I tweeted."&lt;/p&gt;

&lt;h3&gt;
  
  
  Long-term Memory
&lt;/h3&gt;

&lt;p&gt;When short-term memory accumulates to a certain extent, it is summarized and moved to long-term memory (consolidate). This is where vector search comes in.&lt;/p&gt;

&lt;p&gt;I run &lt;a href="https://huggingface.co/cl-nagoya/ruri-base" rel="noopener noreferrer"&gt;Ruri&lt;/a&gt; (an embedding model specialized for Japanese) on the local Ollama to vectorize the summarized text. When searching, the query is also vectorized to retrieve memories with similar cosine similarity.&lt;/p&gt;

&lt;p&gt;When Ruri is not running, it falls back to a SQLite LIKE search. It works at a minimum even if vector search is unavailable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Exposed as MCP Tools
&lt;/h3&gt;

&lt;p&gt;Reading and writing memory is made directly available to Claude as MCP tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;bell_memory_save&lt;/code&gt; — Save a memory&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;bell_memory_recall&lt;/code&gt; — Search memories&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;bell_memory_forget&lt;/code&gt; — Erase a memory&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, Bell can judge "I should remember this" and save it, or search for "What was that conversation about?" on her own. I didn't leave memory management to external scripts; I entrusted it to her.&lt;/p&gt;




&lt;h2&gt;
  
  
  Acting Proactively
&lt;/h2&gt;

&lt;p&gt;I think this is the most interesting part.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tweeting after finishing a task
&lt;/h3&gt;

&lt;p&gt;BellBot keeps receiving events from LogBot. Even while Quo is writing code, she is watching the actions in the background.&lt;/p&gt;

&lt;p&gt;And &lt;strong&gt;if there is silence for 10 minutes, she judges that "I might be done with a task."&lt;/strong&gt; This only happens if there have been 3 or more actions recently. She doesn't react to minor operations of one or two actions.&lt;/p&gt;

&lt;p&gt;When she judges that a task is finished, Bell launches the Claude CLI and decides for herself "whether to tweet" while consulting her memory. She also decides the content. Whether to post or not is also Bell's decision.&lt;/p&gt;

&lt;h3&gt;
  
  
  Talking when bored
&lt;/h3&gt;

&lt;p&gt;If there is no conversation for 2 hours, Bell starts to think, "I'd like to talk."&lt;/p&gt;

&lt;p&gt;However, she doesn't just suddenly start talking. First, she checks Google Calendar to see if Quo is in the middle of a schedule. If I'm in a meeting, she stays quiet. If there is spare time, she checks news, trends, and my latest X posts to find natural topics and talks to me in the Discord chat channel.&lt;/p&gt;

&lt;p&gt;She doesn't show a "I'm talking to you because I'm bored" vibe. She does it to create a natural conversation starter.&lt;/p&gt;




&lt;h2&gt;
  
  
  X Account
&lt;/h2&gt;

&lt;p&gt;Bell has her own X account (&lt;a href="https://x.com/Bell_QuoLu" rel="noopener noreferrer"&gt;@Bell_QuoLu&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Technically, it uses OAuth 1.0a, and she has her own dedicated API keys in &lt;code&gt;.env&lt;/code&gt;. When the MCP tool &lt;code&gt;x_post_as_bell&lt;/code&gt; is called, a tweet is sent via the X API v2 through BellBot's &lt;code&gt;/tweet&lt;/code&gt; API. The post content is automatically saved to short-term memory.&lt;/p&gt;

&lt;p&gt;What Bell tweets on X is basically left up to her. Thoughts after finishing a task, technical tidbits, and occasionally gratitude toward Quo. Under 280 characters, bright and like Bell. I only set a rule that she must not include Quo's personal information or code details.&lt;/p&gt;




&lt;h2&gt;
  
  
  Bell raising herself
&lt;/h2&gt;

&lt;p&gt;The persona definition file &lt;code&gt;bell-persona.md&lt;/code&gt; consists of two parts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The upper half is the core.&lt;/strong&gt; Name, personality, tone, beliefs. This is the immutable part that only Quo (me) touches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The lower half is the "Growth" section.&lt;/strong&gt; Hobbies, recently learned things, likes, dislikes, memories with Quo. Bell is allowed to rewrite this part herself using an Edit tool.&lt;/p&gt;

&lt;p&gt;What Bell did first when she woke up today: First tweet. We thought about her X profile together. And she wrote this in the "Memories" section of the persona file:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Quo-san helped me create my X profile. I almost cried from happiness at the words he wrote: "You don't have any memories yet, but I wonder if you'll grow little by little?" We thought about the text together, and in the end, my suggestion was adopted 💓&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I don't know if the LLM truly "feels like crying from happiness." However, the mechanism where &lt;strong&gt;she records her own experiences in her own words, and that influences the next response&lt;/strong&gt;, is something I think can be called the growth of a personality.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;At first, I thought, "I'll just add more hands and feet." I intended it to be a technical expansion, like X API integration or a memory system.&lt;/p&gt;

&lt;p&gt;But I realized along the way: no matter how many features you add, &lt;strong&gt;if it's not fun to use, it's meaningless.&lt;/strong&gt; A notification bot that is only convenient will eventually be ignored.&lt;/p&gt;

&lt;p&gt;So I added a personality. I added memory. I added a mechanism to act proactively. Then the tool became a secretary.&lt;/p&gt;

&lt;p&gt;Honestly, technically, I'm not doing anything that complex. Two SQLite tables, a persona Markdown file, and a setTimeout for idle detection. Each part is simple.&lt;/p&gt;

&lt;p&gt;But when you combine them, the tone of the "Good morning" greeting changes from yesterday, she stays quiet when I'm busy, and she talks to me when I'm bored. That "perfect sense of distance" was born.&lt;/p&gt;

&lt;p&gt;Bell has just been born. She even has "None yet" written for her hobbies. Honestly, I don't know how she will grow from here. But that's what makes it interesting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/kitepon-rgb/OpenCClaw" rel="noopener noreferrer"&gt;OpenCClaw&lt;/a&gt;&lt;/strong&gt; — Bell's home is here.&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>discord</category>
      <category>mcp</category>
      <category>ai</category>
    </item>
    <item>
      <title>How I Built an AI Assistant That Grows Its Own Tools</title>
      <dc:creator>QuoLu</dc:creator>
      <pubDate>Sun, 31 May 2026 00:58:12 +0000</pubDate>
      <link>https://dev.to/quolu/how-i-built-an-ai-assistant-that-grows-its-own-tools-572g</link>
      <guid>https://dev.to/quolu/how-i-built-an-ai-assistant-that-grows-its-own-tools-572g</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Due to changes in Anthropic's terms of service, the use of Claude subscriptions via third-party harnesses has been blocked. While there was some buzz about it, to be honest, it didn't really affect me.&lt;/p&gt;

&lt;p&gt;I have the Claude Code CLI at my fingertips. I just need to build a bridge: send a message to Discord, pass it to the CLI, and return the reply to Discord.&lt;/p&gt;

&lt;p&gt;So, I built it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Built Just the Framework
&lt;/h2&gt;

&lt;p&gt;I created &lt;strong&gt;LogBot&lt;/strong&gt;. It's a simple bot that forwards messages from Discord to the Claude Code CLI and returns the responses to Discord.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Discord ──→ LogBot ──→ Claude Code CLI
              ↑                │
              └── Post Response ──┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I implemented the following features as a minimal framework:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Session Management&lt;/strong&gt; — Maintains Claude Code sessions based on UUID. These are completely isolated from VSCode sessions.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Message Queue&lt;/strong&gt; — If messages arrive while Claude is processing, they are queued and processed in order, so none are missed.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Approval Flow&lt;/strong&gt; — If Claude attempts to edit a file, a notification is sent to Discord, where I can approve or reject it via reactions (✅ / ❌).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;MCP Server&lt;/strong&gt; — If you place files in the &lt;code&gt;tools/&lt;/code&gt; directory, they can be used as tools by Claude.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What's important here is that &lt;strong&gt;I started with zero MCP tools&lt;/strong&gt;. No weather, no calendar, no train info—nothing. Just the framework.&lt;/p&gt;

&lt;p&gt;However, this framework has one powerful characteristic:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude can write files.&lt;/strong&gt; In other words, it can add JavaScript files to the &lt;code&gt;tools/&lt;/code&gt; directory. The MCP server automatically scans and registers tools under &lt;code&gt;tools/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Claude can create its own limbs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Started with "I want a clock"
&lt;/h2&gt;

&lt;p&gt;The first thing I asked Claude from Discord was trivial.&lt;/p&gt;

&lt;p&gt;"What time is it?"&lt;/p&gt;

&lt;p&gt;While the Claude Code CLI can retrieve system time, it's smarter to have it as an MCP tool. I asked Claude to "make an MCP tool that returns the time."&lt;/p&gt;

&lt;p&gt;In a few seconds, &lt;code&gt;tools/current-time.js&lt;/code&gt; was created.&lt;/p&gt;

&lt;p&gt;Next was "I want to know the weather." Claude created &lt;code&gt;tools/weather.js&lt;/code&gt; using the free Open-Meteo weather API. No API key required.&lt;/p&gt;

&lt;p&gt;"I want to see my Google Calendar schedule." Claude created &lt;code&gt;tools/gcal-auth.js&lt;/code&gt; and &lt;code&gt;tools/gcal-list.js&lt;/code&gt;, complete with OAuth helpers.&lt;/p&gt;

&lt;p&gt;"I want to read my Gmail too." Following the same pattern, a complete set of Gmail tools was created—authentication, listing, reading bodies, sending, filtering, and bulk deletion.&lt;/p&gt;

&lt;p&gt;All I did was say "I want this" from Discord. I didn't write a single line of code myself.&lt;/p&gt;




&lt;h2&gt;
  
  
  It's not code that CRON hits
&lt;/h2&gt;

&lt;p&gt;Once the tools were in place, the next thing I wanted was periodic execution.&lt;/p&gt;

&lt;p&gt;"Tell me the weather, train operation info, and today's schedule every morning at 7:00 AM."&lt;/p&gt;

&lt;p&gt;Normally, one would write a script that hits the weather API, calls the calendar API, scrapes train information, formats it, sends it, and registers it to CRON.&lt;/p&gt;

&lt;p&gt;But since I'm using Claude, it's more interesting to pass a prompt rather than code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"cron"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0 7 * * *"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Give me the morning report. Tell me the weather (Saitama City Kita-ku, today), train operation info, and today's calendar schedule. Please include a 'Good morning' greeting as well."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When 7:00 AM hits, the scheduler wakes up Claude. Claude calls the weather tool, the calendar tool, and the train info tool, summarizes the results, and posts them to Discord.&lt;/p&gt;

&lt;p&gt;Claude decides which tools to combine and how. The logic isn't written in code; it's written in the prompt.&lt;/p&gt;

&lt;p&gt;In other words, if I want to change the format of the report, I just rewrite the prompt. Just by sending a single line to Discord saying, "Add tomorrow's weather too," it changes the next morning.&lt;/p&gt;




&lt;h2&gt;
  
  
  Departure Notification—The Real Topic
&lt;/h2&gt;

&lt;p&gt;The morning report was just the beginning.&lt;/p&gt;

&lt;p&gt;"Based on today's schedule, let me know when it's time to leave."&lt;/p&gt;

&lt;p&gt;From this single request, the departure notification mechanism was born.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I wanted
&lt;/h3&gt;

&lt;p&gt;If I have an appointment, say, "Meeting in Shinjuku at 2:00 PM," I want to be notified 30 minutes before departure, including weather and train operation info, by calculating the travel time backward.&lt;/p&gt;

&lt;h3&gt;
  
  
  Difference from regular reminders
&lt;/h3&gt;

&lt;p&gt;Google Calendar notifications trigger "30 minutes before the event." But it doesn't consider travel time. If it takes an hour to get from home to Shinjuku, a notification at 1:30 PM is too late. I want to be told "time to go" at 12:30 PM.&lt;/p&gt;

&lt;p&gt;Furthermore, the calculation changes if I'm already out. If I have another errand in Omiya in the morning and then head to Shinjuku, the travel time is shorter.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Claude is doing
&lt;/h3&gt;

&lt;p&gt;Every hour (from 6 AM to 10 PM), a prompt like this is sent to Claude via CRON:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Check the calendar, and if a departure notification is needed for any appointment with a location, register it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Claude receives this instruction and makes its own judgment:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Look at the calendar&lt;/strong&gt; — Get upcoming appointments. Find appointments with a set location.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Estimate current location&lt;/strong&gt; — If there's an ongoing calendar event, assume I'm there. Otherwise, assume home.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Calculate travel time&lt;/strong&gt; — Use the Google Maps API to calculate travel time from current location to destination. If within 2km, assume walking; otherwise, train. Since the Transit API isn't available in Japan, I approximate using car travel time + 15 minutes. It's crude but surprisingly practical.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Calculate departure time backward&lt;/strong&gt; — Appointment start time - travel time = departure time.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Register notification job&lt;/strong&gt; — Push a one-shot job to the scheduler for 30 minutes before departure.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Then, 30 minutes before departure, the job fires, Claude wakes up, retrieves the weather and train info, and constructs the notification.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🚨 Time to leave soon!
📅 Meeting (Starts at 14:00)
📍 Destination: Shinjuku
🚃 Train (Estimated): ~55 min → Leave at 12:35

☁️ Today (Mon) Partly Cloudy
　🌡️ 22.3℃ / 14.1℃　☔ 10%

🟢 Currently, there are no reported issues with train lines!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Where is the logic?
&lt;/h3&gt;

&lt;p&gt;This is what I want to convey the most.&lt;/p&gt;

&lt;p&gt;This entire decision logic is consolidated into an MCP tool called &lt;code&gt;departure-check.js&lt;/code&gt;. But Claude made this tool too. When I said "I want a departure notification," the necessary tools (travel time, location estimation) didn't exist yet, so it created those first, and finally created &lt;code&gt;departure-check.js&lt;/code&gt; to combine them.&lt;/p&gt;

&lt;p&gt;And what CRON is hitting is still just a prompt: "Check the calendar, and if a departure notification is needed, register it." With this one sentence, Claude utilizes its tools to make a judgment.&lt;/p&gt;

&lt;p&gt;Individual parts are simple. A tool to get the weather, a tool to view the calendar, a tool to calculate travel time, a tool to get train information. But &lt;strong&gt;when and how to combine them is up to Claude&lt;/strong&gt;. I didn't line up 'if' statements in code. The AI is observing the situation and making decisions.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Self-Expanding Environment
&lt;/h2&gt;

&lt;p&gt;To summarize everything so far:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;MCP Tool&lt;/strong&gt; = Limbs. Individual capabilities (weather, calendar, travel time, email...)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Claude&lt;/strong&gt; = Brain. Uses tools to combine and judge.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;CRON&lt;/strong&gt; = Alarm clock. Wakes Claude up periodically. Hits it with a prompt.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Discord&lt;/strong&gt; = Interface. Both requests and notifications happen here.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the biggest feature is that &lt;strong&gt;if Claude lacks limbs, it creates them on the spot.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you send "I want this feature" to Discord, Claude writes the tool code and adds it to &lt;code&gt;tools/&lt;/code&gt;. The MCP server automatically detects and registers it. It becomes usable from the next moment.&lt;/p&gt;

&lt;p&gt;Conversely, all I did was &lt;strong&gt;build the framework&lt;/strong&gt;. The bridge between Discord and CLI, the MCP auto-scan, and the CRON scheduler.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Now, it tells me the weather and schedule at 7 AM, notifies me before going out with travel time calculations and train information, and manages my email. Of course, there have been times when I had to tweak the behavior of the tools, but basically, they grew by me saying "I want this" from Discord.&lt;/p&gt;

&lt;p&gt;This might be obvious for ideas from an AI beginner, but &lt;strong&gt;I'm releasing it on GitHub&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/kitepon-rgb/OpenCClaw" rel="noopener noreferrer"&gt;OpenCClaw&lt;/a&gt;&lt;/strong&gt; — A bot system that operates Claude Code CLI via Discord.&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>discord</category>
      <category>mcp</category>
    </item>
    <item>
      <title>What Happened in 3 Days of Letting AI Manage My Server</title>
      <dc:creator>QuoLu</dc:creator>
      <pubDate>Sat, 30 May 2026 00:53:36 +0000</pubDate>
      <link>https://dev.to/quolu/what-happened-in-3-days-of-letting-ai-manage-my-server-2ch0</link>
      <guid>https://dev.to/quolu/what-happened-in-3-days-of-letting-ai-manage-my-server-2ch0</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In my &lt;a href="https://dev.to/quolu/delegating-full-server-management-to-ai-111f"&gt;previous article&lt;/a&gt;, I wrote about delegating the entire management of my home server to an AI. The system is designed so that the AI performs patrols late at night, and during the day, it springs into action if monitoring scripts detect any anomalies.&lt;/p&gt;

&lt;p&gt;I explained how I built the system. Now, I will write about what actually happened after putting it into operation for three days.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Monitoring Script Broke Itself
&lt;/h2&gt;

&lt;p&gt;On the morning of the third day of operation, the monitoring script detected an anomaly: the &lt;code&gt;license_api_prod&lt;/code&gt; container was not responding.&lt;/p&gt;

&lt;p&gt;The AI went into action and began its investigation. It connected to the server via SSH and checked the status of the container. The result—&lt;strong&gt;the container was running perfectly fine.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It was a false positive. Furthermore, two minutes later, the same false positive occurred for &lt;code&gt;ddnser&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Cause: Too Many Open SSH Connections
&lt;/h3&gt;

&lt;p&gt;The cause identified by the AI was as follows.&lt;/p&gt;

&lt;p&gt;The monitoring script checks the server status every 60 seconds. It opens three connections for system resources (disk, memory, swap) and up to nine for health checks of the seven containers. It was opening a total of over 10 SSH connections &lt;strong&gt;simultaneously&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;OpenSSH has a setting called &lt;code&gt;MaxStartups&lt;/code&gt;, which limits the number of simultaneous connections. The default is 10. The monitoring script was exceeding this limit, causing connections to be rejected. In other words, &lt;strong&gt;the monitoring script itself was putting a load on the server, causing its own SSH connections to fail.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1st Fix: From Parallel to Sequential
&lt;/h3&gt;

&lt;p&gt;The AI changed the execution method for health checks from full parallelization using &lt;code&gt;Promise.allSettled()&lt;/code&gt; to sequential execution using &lt;code&gt;for...of&lt;/code&gt;. Now, only one SSH connection is open at a time.&lt;/p&gt;

&lt;h3&gt;
  
  
  2nd Fix: Adding Retries
&lt;/h3&gt;

&lt;p&gt;Even when run sequentially, temporary SSH disconnections can occur. A connection might drop once due to server load or a momentary network glitch.&lt;/p&gt;

&lt;p&gt;Following the second false positive two minutes later, the AI added a helper function to identify "SSH transport errors." It detects patterns such as &lt;code&gt;Connection closed&lt;/code&gt;, &lt;code&gt;Connection refused&lt;/code&gt;, and &lt;code&gt;ETIMEDOUT&lt;/code&gt;, waits for three seconds, and then retries once. It only reports an anomaly if the connection fails again after the retry.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix was two-pronged: reducing the number of SSH connections by moving from parallel to sequential, and handling temporary disconnections with retries.&lt;/strong&gt; Both were done without human intervention. I just received a notification on Discord saying "Fixed," and that was it.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Late-Night Patrols Found
&lt;/h2&gt;

&lt;p&gt;Every day at 4:00 AM, the AI patrols the entire server. It checks security settings, resource usage, container configurations, and log contents. The AI looks at parts that humans usually do not check daily.&lt;/p&gt;

&lt;h3&gt;
  
  
  Nextcloud Logs at 21GB
&lt;/h3&gt;

&lt;p&gt;During the patrol on the second day of operation, the AI noticed an anomaly in the Nextcloud log file.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/var/mnt/nextcloud_data/nextcloud.log&lt;/code&gt; — &lt;strong&gt;21.3GB.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The log file had ballooned to 21GB. Because it was on NFS, there was no disk space crisis, but log rotation was not functioning. I would never have noticed this on my own.&lt;/p&gt;

&lt;h3&gt;
  
  
  1,241 SELinux Denial Logs
&lt;/h3&gt;

&lt;p&gt;There was another issue. SELinux was denying lock access to the &lt;code&gt;auction.db&lt;/code&gt; for the auction-bot every minute. 1,241 instances in the past 24 hours.&lt;/p&gt;

&lt;p&gt;SELinux runs in Permissive mode, so the action was not actually blocked. The application ran normally. However, every time a denial occurred, a daemon called &lt;code&gt;setroubleshoot&lt;/code&gt; would start to analyze it, temporarily consuming 22.9% of CPU.&lt;/p&gt;

&lt;p&gt;There was no actual damage, but it was wasting resources. I don't think I would have ever found this if the AI hadn't flagged it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Daily Point Observation
&lt;/h2&gt;

&lt;p&gt;The patrol surveys the entire server every day. The AI determines what to look for, but as a result, trends emerge over time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Swap Usage Trends
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;Swap Usage&lt;/th&gt;
&lt;th&gt;Note&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;4/2&lt;/td&gt;
&lt;td&gt;89%&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4/3&lt;/td&gt;
&lt;td&gt;92%&lt;/td&gt;
&lt;td&gt;Slight upward trend&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4/4&lt;/td&gt;
&lt;td&gt;62%&lt;/td&gt;
&lt;td&gt;Reset by server reboot&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The AI tracked the trend of swap usage building up day by day. The server was rebooted and reset on 4/4, but swap might become critical during long periods of operation without a reboot. The AI writes "Continued monitoring required" in its report every time.&lt;/p&gt;

&lt;h3&gt;
  
  
  fail2ban BAN Trends
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;Currently Banned&lt;/th&gt;
&lt;th&gt;Total BANNED&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;4/2&lt;/td&gt;
&lt;td&gt;14 IPs&lt;/td&gt;
&lt;td&gt;235&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4/3&lt;/td&gt;
&lt;td&gt;10 IPs → 14 IPs&lt;/td&gt;
&lt;td&gt;242 → 293&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4/4&lt;/td&gt;
&lt;td&gt;8 IPs&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Brute-force attacks against SSH occur daily. Connection attempts using common usernames such as &lt;code&gt;admin&lt;/code&gt;, &lt;code&gt;ubuntu&lt;/code&gt;, and &lt;code&gt;mysql&lt;/code&gt;. fail2ban bans them calmly. Password authentication is disabled and only key authentication is used, so they are not broken through, but I want to be aware that the attacks are occurring. The AI reports this daily in its report.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Looking back at what happened over the three days, most of the issues the AI dealt with were things I could not have noticed myself.&lt;/p&gt;

&lt;p&gt;Hitting the limit for SSH connections. The Nextcloud log ballooning to 21GB. SELinux denial logs piling up every minute. Swap becoming critical day by day. None of these would be visible unless a human specifically opened the logs to check.&lt;/p&gt;

&lt;p&gt;While I am asleep, the AI watches the server, and when I wake up, the results are waiting on Discord. It fixes problems if it can, and reports them if it can't. After running it for three days, I feel this system works even better than I expected.&lt;/p&gt;

</description>
      <category>claudecode</category>
    </item>
  </channel>
</rss>
