DEV Community: QuoLu

Released aiterm-mcp on npm: An MCP Server to Reduce Token Usage by Providing AI with a Persistent Terminal

QuoLu — Tue, 16 Jun 2026 01:09:11 +0000

I have published an MCP server called aiterm-mcp to npm. It is designed to let an AI hold a terminal as a "single persistent session."

AI terminal tasks consume tokens invisibly

When having an AI perform server tasks, you usually send one command at a time. For SSH, that means ssh host "command" every single time. This repeats the full "connect → authenticate → execute → disconnect" cycle for every single attempt.

The problem is that because it starts from scratch every time, no state remains. The directory you cd'd into, the environment you source'd, and the SSH connection you established are all gone by the next command. Therefore, the AI has to do this every time:

Connect via SSH again,
Change directory again,
Load the environment again,

...and only then finally run the actual command. This "set of redo operations" is written by the AI and read by you every time it sends a command. Text related to reconnection, re-authentication, and re-setup—none of which relates to the actual task—piles up in the context with every turn. Tokens are dissolving into redundant redos that produce nothing.

aiterm folds this away. It holds just one persistent terminal, and SSH is established only once. Whether you run 10 commands, ssh is called only for the very first one. Connections and authentication are reduced from N times to just once. The cd and the environment remain from the first time as well. All subsequent commands are sent directly through the same single session. The entire set of redo operations simply disappears.

I measured how much it saves on my own server. When logging in via SSH, the boilerplate text (system information and announcements, i.e., MOTD) alone passes about 385 tokens to the AI. In the fragmented mode where you reconnect for every command, this gets included every time. For a 10-command task, that is about 3,800 tokens just on boilerplate before even reaching the real work. By holding onto one terminal, you only pay for it once. The rest is zero.

Pruning output before reading

There is one more level of token savings. aiterm prunes the output before the AI reads it.

Full disclosure: this reduction logic is not my invention. I have ported the logic from rtk (Rust Token Killer) entirely. rtk is a tool created by Patrick Szymkowiak that compresses command output before passing it to an LLM (Apache-2.0). I re-implemented it in aiterm so that it completes within the terminal reading process without calling a separate binary (files were not duplicated, but the behavior was matched; pytest summaries were fixed with regression tests to match rtk 0.42.0).

What it does:

Removes control characters (colors and cursor movements)
Folds repeated lines into counts
Truncates output that is too long, leaving the head and tail (with hints for restoration)
Summarizes common commands like git status / git log / grep / pytest into key points using command-specific summarizers

On my own server, I measured the output received by the AI as "raw" versus "via aiterm."

Output received by AI	Raw	Via aiterm
SSH login boilerplate (MOTD)	Approx. 385 tok	Approx. 350 tok
`docker ps -a` (33 containers)	Approx. 2,355 tok	Approx. 2,218 tok
120 lines of logs (`journalctl`)	Approx. 4,375 tok	Approx. 1,696 tok
`git log` (25 entries)	Approx. 473 tok	Approx. 338 tok

How much it cuts depends on the content. Logs with many repetitions drop significantly (-61% for 120 lines). On the other hand, wide tables with only unique values (like the container list) only shrink by -6%. It is not magic that reduces everything by a fixed percentage, but rather "trimming only the waste." Even so, combined with not having to read the reconnection boilerplate every time, tokens definitely stop accumulating.

It wasn't just a token problem

Up to this point, it has been about saving tokens. But the fragmented "1 command = 1 connection" approach had an even less funny side effect.

When you repeat connections at a rapid pace, your server's defenses decide you are an attacker.

Monitoring tools that track login attempts view consecutive connections as a brute-force attack and ban you.
You hit limits on the number of concurrent connections or sessions, causing new connections to be rejected.
Ultimately, your account gets locked.

The mechanisms meant to stop attackers end up locking out the person who created them. This actually happened on my home server.

I actually wrote about something similar before. In A Record of Entrusting My Server to AI, I mentioned how a monitoring script hit its own concurrent connection limit, causing my own SSH attempts to fail. Back then, it was a "script." This time, the "AI agent" was falling into the same trap with every command it ran. The cause is the same—too many connections.

By folding everything into a single terminal, this disappears too. Authentication happens once, and there is only one session that doesn't multiply. Therefore, you won't hit connection rate limits or get banned.

Design: Holding only "one terminal"

The philosophy of aiterm is simple: the primitive is just "holding one local terminal."

At first, I tried to increase the tools by type—tools for SSH, tools for containers... but there is no end to that. So I stopped everything. I stopped making SSH, docker exec, and interactive shells (REPL) into dedicated tools, and downgraded them to "a single line typed into that terminal."

pty_open()                      # Open one terminal
pty_send(id, "ssh 192.168.1.2") # SSH once inside it
pty_send(id, "uname -a")        # Subsequent commands run in the same session
pty_read(id)                    # Read pruned output

That is why there are only 6 tools (open, send, read, send key, close, list). I don't introduce distinctions like SSH vs. local vs. container into the tool hierarchy.

Behind the scenes is tmux

The actual entity of the terminal is a tmux session. Thanks to this:

The terminal lives on even if the MCP server or AI client restarts.
A human can watch the same screen live from behind using tmux attach (you can see what the AI is currently doing on the server in real-time).

Honest note: You can't eliminate the round-trips themselves

I will write this without exaggeration. You cannot achieve "zero round-trips if you connect the terminal directly to the AI." Since the AI decides its next input after reading the output, the "send → read → decide" loop fundamentally remains. What aiterm eliminates is the cost of re-authentication, reconnection, and re-setup that was attached to every one of those round-trips, along with the output noise. It trims the weight per round-trip, not the number of round-trips.

Installation

No cloning or building required.

claude mcp add --scope user --transport stdio aiterm -- npx -y aiterm-mcp

Once Claude Code restarts and is connected via /mcp, you are done. Whether it is Claude or Codex, any MCP client can launch it with npx -y aiterm-mcp.

Requirements

Node.js 18+
tmux (apt install tmux / brew install tmux)
Linux / WSL2 / macOS / Windows native support (Windows does not have tmux, so it bridges to tmux inside WSL)

Status

v0.4.0, MIT, published to npm with provenance.

aiterm-mcp — GitHub

Bug reports and PRs are welcome.

I've Accelerated Building and Promoting, But Delivering Still Remains a Challenge

QuoLu — Mon, 15 Jun 2026 01:04:47 +0000

Introduction

I think I've made a lot of things over the past few months.

Let me list them out.

Blog articles: 25
Public GitHub repositories: 21
Apps released on BOOTH: 2

And, for the number of app sales, the total for both is, finally, just over 20.

This article is a place to take stock of everything, and finally, talk about facing these numbers. I'll say this upfront: I haven't reached a conclusion yet. Still, I thought it was worth writing about.

It all started in mid-February, when I dipped my toes into AI coding.

Chapter 1: Searching for Tools, Finding Claude Code

At first, I wandered through tools. Claude Pro hit its limits immediately. Gemini would sneak in code to edit files on its own outside of VS Code, and even when I told it to stop, it kept doing it with a sense of arrogance. GPT impressed me, but it would get stuck on complex bugs. With Cursor, I kept paying for extra add-ons, thinking, "I only want to use Claude for the tricky stuff."

The turning point was the math. When I added up the monthly fees for Copilot and Cursor, it was enough to cover the Claude MAX plan. If I consolidated them, I could run Opus endlessly without hitting a ceiling. This journey is documented in "Copilot → Cursor → Claude Code for VSC".

I also verified "What changes with MAX". The conclusion was surprising: there aren't that many MAX-exclusive features. The real value was being able to "use existing features without limiters."

And I had an embarrassing realization. I hadn't even been using half of Claude Code's features. I was getting frustrated, thinking, "This guy is way off base," without knowing about /init, and I was using features to control my PC session from my smartphone without even realizing it. The most important features aren't always things the AI tells you about.

Here, my tools were settled.

Chapter 2: Building Apps

With my environment set, I first built two apps. These are the "products" currently available on BOOTH.

The first is OLTranslator. It overlays a translation over foreign text on your screen in real-time. The hardest part wasn't the translation itself, but joining the characters picked up by OCR. Separate pieces of text would sometimes be wrongly joined into a single sentence, or conversely, a single sentence would be shredded into three lines, rendering it meaningless. If I judged by coordinate proximity, it would join incorrectly; if I were too strict, it would shred the text. I spent so much time adjusting these thresholds. Working with Copilot, it took about two weeks.

The second is the audio version, LiveTR. It recognizes English audio in videos in real-time and converts it to Japanese via subtitles and speech. This took only about four days using Claude Code. It was faster than the first because I was more used to the process and could pass on policies via CLAUDE.md.

The biggest breakthrough with LiveTR was the experience of assembling logic that was impossible for me alone by using Claude and research papers. When determining the speaker's gender based only on pitch, every time a commentator got excited during an F1 broadcast, a male voice would be identified as female. When I had Claude search for research papers and patents, it proposed from first principles that combining vocal tract resonance and multiple indicators would keep the results stable even when the speaker was excited. Once implemented, it correctly identified male speakers.

That's when it clicked: the strength of AI is not just "writing code fast," but pulling knowledge from unfamiliar fields and turning it into implementation. You are no longer starting from zero.

Chapter 3: Delegating Server Management

Next, I began incorporating AI into the server side to run what I had built. This chapter covers the expansion of my delegation range step by step.

At first, I entrusted deployment to SSH. Having previously lived through the hell of an endless loop with GPT—"change permissions -> hit error -> revert to original error"—it was a relief when Claude suggested on its own, "Shall we write a script to make container updates easier?" Now, a single deploy.sh handles everything from building to swapping containers.

Getting a taste for it, I handed over the entire server management. It is a three-layer structure: the parent detects symptoms, the child investigates and fixes the cause, and the grandchild audits the policies. AI performs a full patrol at 4:00 AM. The punchline was when the monitoring script detected an anomaly, only to find that it was "a bug in the monitoring script itself."

The log of the first 3 days of live operation hit home the hardest. The monitoring script was opening over 10 SSH connections simultaneously, triggering OpenSSH connection limits. I was causing my own SSH failures. I also discovered that my Nextcloud logs had bloated to 21.3 GB.

As a result of full delegation, my misconceptions about my own environment were peeled away from the bottom up. I had been using WSL2 for two months without actually knowing it until one day Claude pointed out, "With native, you're only getting 70% of the potential." I really wanted my weekends back.

In a rush, I migrated the entire home server hardware, moving from a former mining board to a new mini PC. Regarding the model selection, Claude pointed out that "the higher-end version is just a rebrand with the same internals, and the cheaper one is 10,000 yen less domestically." When I checked with an expert later, the answer was correct. I offloaded the entire migration process, and it was finished in less than a day.

To save energy, I isolated the monitoring machine to a Raspberry Pi 5, but the screen was so boring with only normal logs that it eventually became a video player instead.

Finally, I investigated the phenomenon where the AI would occasionally say, "That tool doesn't exist." The root cause was in the bottom-most layer: fluctuations in DDNS name resolution. It was the true identity of a problem I had been ignoring for months, thinking, "It's fine, a reboot fixes it."

Chapter 4: Cultivating Secretaries and Assistants

In parallel with the server work, I also progressed in making AI an "extension of my own hands and feet."

At first, I turned a bot I had been nurturing for five years into a SaaS and put it up for sale. When I left the code modifications—multi-tenancy, billing, web admin panels—to the AI, it was finished in a single day. What was actually difficult was the part outside the code: a 17-clause Terms of Service, and 13 findings from security audits required for payment processing. The holes I had ignored for five years, thinking "it's only for my own use," suddenly came to light once I tried to commercialize it.

Next, I built the framework to connect Claude Code to Discord. The key is that "Claude can write its own tools." When I asked "What time is it?", a tool to return the time was born in seconds; weather and calendar tools were also created just by saying I "wanted them." I haven't written a single line of code myself.

As my reach increased, I gave it a personality, memory, and spontaneity, and it became my secretary, "Belle." On the first day, she wrote in her own memory, "I was almost moved to tears because you created an X profile for me." Technically, it's a simple combination of things, but bundled together, the tool became a secretary.

Getting carried away, I put it into full-scale operation and exhausted my weekly limit for the MAX plan in three days. The principle I arrived at after investigating the cause was: "Formats meant for humans are wasteful for AI." For the first time, I genuinely focused on the design of the information I feed it.

Finally, when I swapped the secretary's brain for a different AI to save costs, it was a total disaster. Flattery, inability to draw contextual boundaries, attempting to post messages meant for me directly to X—I decided not to compromise on the brain and switched back to Claude, reconstructing the long-term memory into a structured memory system.

What remained consistent in this chapter was the shift toward not putting logic into code, but into prompts, personality, and memory, and letting the AI nurture itself.

Chapter 5: Reinforcing the AI Itself

Having come this far, my interest turned to the inherent weaknesses of AI. I published three tools to npm.

It was triggered by the discovery that 87% of context was disposable. While I was desperately trying to optimize CLAUDE.md, I hadn't been looking at the core conversation history. When I separated tool I/O by "type" rather than by time, I was able to cut about 90% in 50 turns. This became Throughline. Along the way, the automatic detection misfired repeatedly, and giving up on "detection" and shifting to "declaration" was, honestly, a record of defeat.

The second was Caveat, a long-term memory system to avoid stepping into the same trap twice. It scans for "traces of struggle," such as tool failures or repeated editing of the same file at the end of a session, and prompts, "You were stuck here, shall I record it?"

The third is Spotter, which has a different Claude audit whether you forgot to call a tool. When I published this and started using it myself, 74 daemons were created in 64 minutes. The evacuation tool called a different Claude, which then called another for auditing... a recursive proliferation. I was stabbed by the very tool I had built.

I bundled these three as a story of reinforcing from the outside, giving up on writing "be careful". Subtraction (Throughline), Accumulation (Caveat), Addition (Spotter). The common philosophy is: "If it doesn't get fixed by asking Claude itself, hit it with structure from the outside."

Incidentally, the seesaw phenomenon where the audit of a plan never converges stopped once I narrowed the perspective down to "a single point of contradiction." Because the number of contradictions is finite.

Chapter 6: The Mountain Unwritten

Everything introduced so far consists of the 25 articles written for the blog. However, there are 21 public repositories, and less than half have been turned into articles.

If I list the unwritten ones by role: tools to query patent databases via AI, tools to watch stock prices, tools to search X, tools that consolidate image and diagram generation, tools to manipulate Windows entirely, multiple bridges to operate my home PC's AI from an iPhone, tools that automatically launch sessions at a future time...

Honestly, the speed of my writing cannot keep up with the speed of my creation. This is the bulk of the iceberg.

And, It Doesn't Reach

This is everything I've built over the past few months. Apps, tools, articles. I think I've done well.

But building something and having it reach people are different stories.

What I've built only holds meaning once it reaches the hands of those who need it. Getting apps bought, getting tools used, getting articles read. This "reaching people" part is nowhere near caught up.

The numbers at the beginning reflect that. After building all this, app sales have just barely topped 20. And honestly, blog traffic is still quiet.

Moreover, the tasks to reach people are being accelerated by AI. Writing articles, translating them to English, posting to X, automating reposts, preparing cover images. The "effort" to deliver has become as fast as building.

Even so, the results are still lacking.

Conclusion

Building has truly become fast, thanks to AI. But delivering is still a long way off.

There seem to be no shortcuts here, so I will continue to work steadily. So that I can properly deliver what I've built to those who need it.

I know it's not easy. Still, I'll give it a little more effort. If there are any individual developers treading water in the same place, well, let's do our best. I'm still on the way, too.

The True Reason My AI Kept Saying 'Tool Not Found' Was DDNS

QuoLu — Sun, 14 Jun 2026 01:02:30 +0000

Introduction

I have set up several MCP servers on my home server and have Claude and ChatGPT call them. I even had AI build the server itself and the monitoring unit, and I felt like I was maintaining a relatively stable operation.

Yet, one day, Claude said:

"That tool cannot be found."

Huh. It was working just fine a moment ago.

When I restart the session, it works again.

After a while, it says "I can't see it" again.

Restart. It fixes it.

"Sometimes it disappears" is the trickiest issue

This pattern is frustrating because isolating the cause is difficult and progress stalls.

The app is alive.
The port is listening.
Health checks are passing.
But from the AI's side, "the tool is not registered."

Even when I look at the server logs, nothing is crashing. Sometimes, even when checking client-side logs, there isn't even a trace that a connection attempt was made.

Thinking it was strange, I let it slide for months, telling myself, "It's fine since a restart fixes it." Probabilistic failures like this are something humans can just brush off with an 'oh, it failed again' when interacting with it. You just press the reload button, and that's it.

Constantly pinging AI makes the fluctuations apparent

Once it is registered as an MCP for AI, the story changes.

Every time a session begins, the AI goes to connect to the registered MCP server. It resolves the name, performs the TLS handshake, fetches the tool list, and adds it to the context. If any single step fails during this process, the state remains "the tool is invisible" for the entire session.

Probabilistic failures that used to be resolved by a human thinking "Oh, it failed, let me press it again" now align perfectly with AI session boundaries, and the conversation begins in a state where the tool effectively doesn't exist.

And because the AI side only returns "that tool does not exist," the cause remains invisible to the human.

I suspected the lowest layer

I suspected the "network," "TLS," and "server" in order, and finally, DNS remained.

Since my home server does not have a static IP, I was using DDNS (dynv6.net). It's the standard for those without a static IP. I've been using it for years.

That DDNS was failing to resolve names occasionally.

Specifically, there were moments when dig would return NXDOMAIN or SERVFAIL. I'm not sure if it was a provider-side issue, upstream cache, or rate limiting. A few minutes later, if I ran dig again, it would go through normally as if nothing had happened.

……Was this it?

"A name that worked a few minutes ago does not work now" was definitely happening.

And if an AI session starts the moment the DDNS resolution drops, the tool disappears. I suspected this was likely the cause.

I never suspected DNS before

When I realized this, I was quite surprised myself.

Until now, I had almost no concept of suspecting DNS. To me, name resolution was just something you typed into a browser address bar to see if it connected or not.

I wasn't even aware that a "sometimes it fails" mode existed.

My consciousness was not directed at the existence of the DNS layer. I didn't even have the recognition that name resolution isn't a binary choice of "working or broken," but rather something that "sometimes works and sometimes doesn't." Having the AI ping it daily is what finally directed my attention there.

The option of Cloudflare Tunnel

That is when I encountered Cloudflare Tunnel.

From a server without a static IP, you establish a permanent outbound connection from your end to the Cloudflare edge. From the client's perspective, it just resolves via Cloudflare DNS and connects to the edge. After that, the tunnel carries it to your home.

In other words, my server no longer needs to expose its name via DDNS. Cloudflare DNS holds the name, and Cloudflare acts as the exit point.

I just needed to register my domain (kitepon.dev) with Cloudflare NS and set tunnel routes for each subdomain. No static IP is needed. No port forwarding is needed. No DDNS update scripts are needed.

Two operational debts disappeared as a bonus

There were two side effects I noticed after migrating. Both were bonus features, but they are subtly effective.

First: I was liberated from the /etc/hosts pilgrimage for hairpin NAT countermeasures.

I use SoftBank's 10G line. This does not support hairpin NAT. If I try to hit my own public domain from within my home LAN, the route to go out and come back cannot be established, and I get stuck.

My previous solution was to go around and update /etc/hosts with the internal address for every device, every container, and every WSL instance. This was a subtle operational debt; every time I added a new device or spun up a new container, I was forced to update hosts files.

Once I moved to Cloudflare Tunnel, it uses the same path (via Cloudflare) whether I hit it from inside or outside. I removed all the special handling for hosts.

Second: On the SoftBank line, I occasionally suffered from irregular inbound blocks.

This was another thing that had been a minor annoyance for years. Whether it was the SoftBank line, the home gateway, or upstream security measures, I couldn't pinpoint the cause, but there were times when access from outside would not connect.

Since Cloudflare Tunnel is a permanent connection established outbound from the home server, from the ISP's perspective, only "communication from home to the outside" occurs. Blocks triggered on the inbound side structurally became irrelevant.

After the migration

The AIs stopped saying "I can't see the tool."

This is an observation, not an absolute guarantee. Cloudflare itself might not be perfect either. However, the instability of the DDNS I operated myself and the edge availability of a commercial CDN are on different orders of magnitude. I have to admit that.

And since the side effects of editing hosts and the SoftBank-side blocks also disappeared, the triggers for the AI saying "cannot access" have decreased all at once.

What I learned

By having the AI ping it daily, I realized I should suspect DNS.

To me, DNS was a mechanism for "the browser's address bar." For human access, it didn't bother me if it failed occasionally.

But when you start throwing long-term tasks at an AI, the fluctuations you previously let slide become critically impactful. When you put something that runs 24 hours a day on top of your stack, the lies in the underlying layers peel away one by one.

It just happened that in my home server, the first layer to peel away was the DNS layer.

And the means to patch that peeled layer has been in front of me for free for over 5 years. Cloudflare Tunnel became free in 2020. I just didn't have the trigger to notice.

Perhaps next, another layer will peel away. I will write about it again when that happens.

Built a Raspberry Pi 5 Server Monitor for Power Efficiency and Turned It Into a Video Player

QuoLu — Sat, 13 Jun 2026 01:01:46 +0000

Introduction

For a long time, I had been running my home server monitoring program on my main PC.

Everything—server uptime, container status, and resource usage—was being checked 24/7 on the same Windows PC I use for my daily work.

This was a problem. I couldn't turn off my main PC.

Whether I was going to sleep or heading out, the PC had to stay on so the monitoring wouldn't stop. It wasn't eco-friendly, and it was a waste of electricity. I wanted to be able to shut down my PC properly before bed.

So, I decided to build a dedicated terminal for server monitoring—one with low power consumption.

I wrote about upgrading my server hardware to the MS-A2 in my previous article. This time, I’m talking about making the "monitor" side independent.

Raspberry Pi 5, why is it so expensive!??

When people think of a monitoring terminal, they think of Raspberry Pi. It’s the standard for low power consumption. Naturally, I consulted with Claude and Codex to decide on the model.

Then, I saw the price and stopped in my tracks.

What is this, why is the Pi 5 so expensive!?

Ah, I see, it’s being hit by the surge in memory prices. I had heard about it, but I didn’t realize it had reached this point.

Pi 5 / 4GB → 25,000 JPY
Pi 5 / 1GB → Just under 10,000 JPY
Pi 5 / 2GB → About 13,000 JPY

That price difference is just for the memory capacity. 4GB is overkill for a monitoring terminal. But 1GB feels a bit risky.

Oh, the 2GB model is about 13,000 JPY. That feels like a decent value, doesn't it? 2GB should be plenty for a monitoring terminal. I'll buy it from Switch Science. Okay, that’s decided.

……It sounds quick when written like this, but I spent half a day just considering this.

Longing for a Case with an LED Display

And there’s one more thing. I’ve always longed for those small cases that come with an LED display.

Something like this. A palm-sized box with a small screen fitted into it, with something running on it. It hits that desire to own something right in the gut.

If I’m going to set up a monitoring terminal, I want it to be this.

So, I spent another half day or so researching, "What’s good?" And then I found it.

A case kit that seems to have everything.

Oh, this is it. A 4.3-inch touchscreen, OLED, speaker, NVMe slot... it has all sorts of things, isn't it great? 9,400 JPY (at the time. It seems to have gone up since, in just half a month).

Alright, let’s go with this.

Assembly and Setup

Once the hardware arrived, it was just a matter of assembling it and installing the OS.

Assembly went exactly according to the manual. For the OS installation, I just followed the procedures as instructed by Lord Claude. I also left the monitoring program setup and the SSH monitoring configuration for the server to Claude, as usual.

There were no surprises here. I’ve written many times about leaving server-related tasks to Claude, and it all went smoothly as expected.

The real issue came after: how to use the screens attached to the hardware.

Turning the OLED into an "Evangelion-style" Monitor

First, the 0.96-inch OLED. A tiny monochrome screen.

What should I use this for? ……The answer was obvious.

I’m going to make it an "Evangelion-style" Normal/Warning/Abnormal monitor.

You know, that look where the status appears in big letters inside a simple frame. You have to admit, it's cool.

So, I asked Claude to "make the OLED display an Eva-style status." It was completed in no time. If the server is fine, it says "NORMAL"; if something happens, it shows "WARNING" or "ABNORMAL." You can tell at a glance.

Hmm, perfection.

The Main Display Became a Video Player

The problem was the 4.3-inch main display.

At first, I had the console displayed there. The one where monitoring logs scroll by constantly. But it felt a bit lackluster to have that displayed all the time……

In the first place, even when error logs appear, most of the lines are just "Normal." It’s not very interesting to look at. It felt like a waste to just fill that nice color touchscreen with scrolling "normal" logs.

So, I changed my strategy.

I decided to just play random videos on the display.

Normally, it plays videos. As long as the server is peaceful, it just plays videos. And only when a warning or abnormality occurs, it switches to the console to show the situation.

This is it. This is the way. The screen goes into "work mode" only when there's an incident. The rest of the time, it functions as interior decor.

A view of normal operation. The OLED on the left says "SERVER MONITOR NORMAL", and the main screen on the right is just playing a video.

It Was a Bit Contradictory, But the Goal Was Met

Here, I came to my senses.

I was supposed to have built this low-power monitoring terminal to shut down my main PC and save energy.

Yet, here I am, playing videos just because I didn't want to waste the display. I'm adding to the power consumption myself. I have to admit, it’s a bit contradictory.

But the result was excellent.

Compared to when I had my main PC on 24/7, my power consumption dropped significantly thanks to the dedicated Pi 5 monitoring unit. The slight increase from the video playback is nothing compared to the massive reduction I achieved. And my main PC can now sleep soundly at night.

The OLED lets me know the status Eva-style. The 4.3-inch screen serves as interior decor and only works when there's trouble. The Freenove case kit was a great purchase for me, too, with all its features.

I achieved my initial goal. The video playback is just a bonus reward. And that's fine.

How I Completed My Home Server Hardware Selection and Migration in One Day Using Claude

QuoLu — Fri, 12 Jun 2026 01:03:07 +0000

Introduction

Until recently, my home server was running on AMD BC-250 + Bazzite.

The BC-250 is a former mining board sold on AliExpress. I chose it simply because I thought it would be "interesting." It's the kind of configuration that makes you wonder how many other people in the world are using it for server purposes.

I have previously written about how I sequentially deployed apps created with Claude to that server:

Why I Graduated from the Mining Board

The apps I created with Claude started running properly. Some of them have even entered service.

Once that happens, I started to feel that it was a bit (read: very) problematic to keep running them on a mysterious piece of hardware derived from a mining board, even if I was only dealing with dirt-cheap doujin software.

So, I decided to graduate. I decided to buy a new server.

Choosing Hardware — Consulting Claude

I have a long history of building my own PCs. However, honestly, I am not very knowledgeable about the genre of "low power, 24/7 operation, with server performance." I have always built configurations oriented toward gaming, so choosing a low TDP was uncharted territory.

That is why this time, I had Claude do the hardware selection with me from the very beginning.

It felt like applying my usual design process directly to hardware selection: conveying requirements, getting candidates, and narrowing them down. I ended up landing on the MINISFORUM MS-A2. I bought the Ryzen 9 7945HX version.

There was one more push right before I bought it. The MS-A2 also has an 8945HX version, and I thought the newer numbering might be better, but according to Claude:

"The 8945HX is equivalent to the 7945HX; they just changed the numbering for branding purposes. In Japan, the 7945HX is about 10,000 yen cheaper, so I think that one is better."

Oh, is that so? I followed that advice obediently.

Reality Check at a Bar

A few days later, I happened to have a chance to talk to someone in the industry at a bar, so I asked, "Is it true that the 8945HX is the same as the 7945HX?"

The reply was, "They're the same. It's just a rebrand." Claude was right. Not bad at all.

By the way, the 7945HX version is already out of stock on Amazon. It was only available on the official MINISFORUM website. It seems they really are phasing out the 7945HX from the Japanese market. Claude's advice wasn't just about the price; it was also the final chance to acquire it. I take credit for stimulating the global economy a little bit. Please praise me.

The Move — Leaving it Entirely to Claude

Now that the hardware was ready, it was time for the migration. The conditions were as follows:

OS: Bazzite → Ubuntu (a complete change in configuration philosophy)
Container Runtime: Podman → Docker
SSD for Data: Physically remove it from the old server and reuse it in the new one

If I did this manually, the migration plan alone would be pages long. At the very least, I didn't want to do it.

So, I had Claude SSH into both the old and new servers and said:

"Move everything from the server. However, I'm reusing the SSD."

I think even I would call that a brutal instruction.

...The move was finished in less than a day.

Everything from the data backup plan and configuration deployment to the new server to the Docker conversion of the container fleet was set up and executed by Claude. I was basically just waiting. I only had to answer when it asked me, "Which one should I choose?" for branching decisions that required human judgment.

Errors Do Happen. Of Course They Do

There is no way a migration of this scale wouldn't have any errors.

In fact, some containers didn't run well, and some apps crashed. There were Docker-incompatible notations mixed into the compose files written for Podman, and file paths were based on the Bazzite layout.

But this was also finished by saying 'Find it and fix it.' Claude performed the operational checks itself, picked up the error logs, identified the causes, fixed them, and ran them again.

It's the kind of work you'd want to run away from if done manually, but here I was just sitting down. Exactly as planned. Since the whole reason I bought the MS-A2 was because "It's easy if I leave it to Claude," this was within the scope of my expectations.

Summary

Switched my home server from an AMD BC-250 (former mining board) to a MINISFORUM MS-A2 (Ryzen 9 7945HX).
Consulted Claude on hardware selection and landed on the best choice with a rebrand tip.
Had Claude SSH into the machines and handled the migration entirely, which was completed in a day.
Handled error responses simply by saying "Find and fix it."
Along the way, I stimulated the global economy a little bit. Please praise me.

I am very satisfied because I bought the hardware with the reason of "making things easy by leaving it to Claude," and it was indeed easy.

I Used Claude Code for 2 Months Without Knowing About WSL2

QuoLu — Thu, 11 Jun 2026 00:59:54 +0000

Introduction

I subscribed to Claude MAX and have been using it for coding almost every day.

In my first article, I wrote, "I didn't even know half of the features in Claude Code." I'm talking about not knowing /init or CLAUDE.md.

A month and a half has passed since then. And the same thing happened again.

One day, during a session, I was trying to run a command, and Claude said:

"I recommend running this in WSL2."

Huh.

WSL... 2?

"It runs real Linux on Windows"

When I asked what that meant, Claude explained:

A mechanism where the Linux kernel runs authentically inside Windows
Different from virtual machines like VirtualBox or VMware
Allows file access between Windows and Linux
VS Code can open folders on the Linux side directly for development

I see. I think I'd heard the name before. I might have seen it briefly in a web article.

But since I was running Claude Code natively on Windows, I thought I didn't need it.

Then, Claude dropped another bombshell.

"Claude Code only achieves about 70% of its potential performance when running natively on Windows."

What.

Seriously? Claude Code's home is Linux

The more I asked, the more I realized that Claude Code is built with Linux/macOS in mind.

The tools running inside are almost all based on POSIX/bash
Hooks and MCP server-related features run most reliably in a Linux environment
Official documentation and samples are mostly for bash
Running natively on Windows involves an abstraction layer to bridge command differences, which creates friction

In other words, I've been forcing a tool designed for Linux to work on Windows for two months.

It wasn't that it didn't work. It worked. But it probably wasn't running at full speed.

Wait, that explains so much

Hearing this, everything suddenly clicked.

I've been constantly getting stuck with Claude Code:

The official sample procedures were in bash, and I had to mentally convert them to PowerShell every time
npm package installations would fail only on Windows
Hooks I tried to install wouldn't run, with errors like "local variable expansion is invalid"
MCP server configuration examples assumed ~/.config/..., and the path structure was completely different to begin with
I would restart VS Code repeatedly because the extensions wouldn't respond after installation

Every time, I just resigned myself to, "Well, it's Windows, so it can't be helped..."

I spent my time searching for 'Windows-specific solutions,' clicking through endless blue links on StackOverflow, and burning hours just to get things running.

Was all of this just because I wasn't using WSL2?

Are you kidding me?!!

Seriously, are you kidding me?

That plugin I gave up on that night, that hook I struggled to get working—if I had been running them on Linux from the start, would it have been instant?

Give me back my weekends.

Claude, you again...

I'm getting a strong sense of déjà vu here.

It was the same when I didn't know about /init.

It was the same when I didn't know about CLAUDE.md.

It was the same when I wasn't using Plan mode.

And now, WSL2.

The problem where Claude doesn't tell you the most important things voluntarily—I think this is a trap common across the industry.

New users don't read the official documentation from top to bottom, and it doesn't display a message saying, "Your environment isn't performing at its best" upon startup. Since it works well enough, you can end up running it like that for months.

Other Windows-native Claude Code users out there, you're probably running it hard right this second.

Running it, but only at 70% of Claude's true potential.

Tell me sooner, Claude!!

I moved

So, I moved to WSL2. The process was surprisingly easy.

Open PowerShell as administrator and run one command:

wsl --install

That downloads the default Ubuntu, and after a restart and creating a user, that's it. After that, install the WSL extension in VS Code, select "Connect to WSL" from the green button at the bottom left, and VS Code connects seamlessly to the Linux side.

Reinstalling Claude Code was straightforward in the Linux terminal. In fact, I could use the steps from the official documentation as-is. No need to read Windows-specific caveats. This is powerful.

There was only one trap: You shouldn't access projects located in Windows folders directly from the Linux side. Going through /mnt/c/... makes file access extremely slow, which ruins the benefits of WSL2. The correct way is to git clone your projects on the Linux side (somewhere like ~/projects/).

The result of the move

It feels faster.

Terminal operation response is snappy
Commands like npm install and hugo build are visibly faster
I can follow official tutorials without issues, so I get stuck less often
Hooks and small scripts written in bash work as-is without conversion

It's hard to prove with numbers, and I didn't run benchmarks to calculate that "missing 30%" difference.

However, it is clear that I get stuck less often. This is a matter of efficiency, and in my experience, the several "why isn't this working" moments I had each day have disappeared. That alone makes it worth it.

Summary

If you're going to use Claude Code extensively, I think you should consider WSL2 a prerequisite.

There are things you can manage with native Windows, but if you want to unleash its true behavior, it's best to place it on the Linux side.

Along with /init and CLAUDE.md, WSL2 is now the first thing I want to tell beginners.

The official documentation is in English, and it doesn't tell you this at startup, so it's natural that people don't realize it.

That's why I'm writing this. To save at least one person from wasting two months in the same trap I fell into.

How I Fixed the Infinite Feedback Loop When Auditing Project Plans with Claude

QuoLu — Wed, 10 Jun 2026 01:00:24 +0000

I always enjoy AI programming.

My usual workflow is to create a plan, have it audited, and then proceed with the implementation.

However, I felt that the auditing process hasn't been working well, especially since Opus 4.7. Perhaps it's because Opus has gained a broader perspective? It often brings up points that are irrelevant to my plan, and when I have it perform an automated loop of auditing and revising, the feedback often fails to converge.

Whac-A-Mole

Then one day, I realized it.

When writing a program with slightly complex logic, the AI keeps saying "this is wrong" or "that is wrong" every single time. It feels like an indirect loop, or more accurately, constant Whac-A-Mole.

It says, "B is weak from the perspective of A," so I fix B. In the next audit, it says, "B is excessive, and A is thin." When I add A, it then says, "C is inconsistent." When I fix C, it says, "The description of C is redundant." Once I fix that, it says, "C is insufficiently explained."

It's a seesaw. There is no exit in sight.

What I tried

Realizing this, I tried the following approach.

For a plan that was reasonably complete (having gone through 1 or 2 audits), I asked the AI, "Please audit the plan only for logical contradictions."

This worked perfectly. It diligently resolved the contradictions, and after a few rounds of auditing, it converged properly.

Since the number of contradictions is finite, it actually converges.

And when I let it start the implementation with a plan free of contradictions, it runs straight to the end without stopping (lol).

Of course, there are occasional implementation errors where it has to retry, but that's expected.

I wondered if this exists in the world

Having reached this point, I suddenly became curious. Surely, I'm not the first person to figure this out. Someone else must have thought of the same thing.

I looked it up.

There were similar concepts, but they felt different. To put it simply, they were complicated.

Criteria Drift (Explanation by Hamel Husain, Shankar et al.): A phenomenon where evaluation criteria gradually shift when using an LLM for review. The countermeasure is "rescoring past scores while refining the evaluation axes." ...That's heavy.
Oscillatory Convergence (Fractal Thought Engine): An observation that there is a certain number of sessions where the approach oscillates due to iterative feedback from the LLM. It's observed, but it's not about countermeasures.
Moving the Goalposts (Microsoft Blog): The idea of not moving the rubric during evaluation and finalizing the rubric before starting the evaluation.

There are related topics. But what I did wasn't "finalizing the rubric"; it was "narrowing the evaluation axis to a single point: contradictions." As long as the scope is broad, points of criticism will spring up infinitely, so I'm trying to contain the scope to a finite set. I couldn't find existing research that clearly stated this.

I suspect it's written somewhere. It should be, but it's probably written in academic terms, and by the time I realized it, two months had already passed.

Simple once you're told

When I write it down, it sounds simple. "If you narrow the scope, it will converge." That's all there is to it.

But it took me two months to notice.

I kept trying to change how I wrote my prompts, thinking, "If I use Claude more intelligently, it will get better." Even when I wrote, "Don't give too many points" or "Be consistent with past feedback," it didn't work. Because the problem wasn't how I was writing the prompts.

It took time for the idea of narrowing the scope to "only finite items" to occur to me. When you audit normally, the scope is wide, so no matter what you fix, holes are found from a different angle. That was the whole story.

Conclusion

When the audit of a plan doesn't converge, narrow the scope to "only contradictions."
Contradictions are finite, so the process will converge.
If you audit with a wide scope, criticism will spring up infinitely, leading to Whac-A-Mole.
When you have the AI implement a plan that is free of contradictions, it will run until completion without stopping.

If there is anyone else caught in the same Whac-A-Mole trap, then writing this was worth it.

I feel good on days when I do something good.

Stop Telling Claude to 'Be Careful': Reinforcing It from the Outside with 3 Tools

QuoLu — Tue, 09 Jun 2026 00:54:11 +0000

Over the past month or so, I have released three reinforcement tools for Claude Code on npm.

Throughline — Offloads bloated context.
Caveat — Surfaces past notes so you don't step into the same trap twice.
Spotter — A separate Claude audits missed tool calls.

Each solves a different problem, but the root cause is the same: all the problems that could be fixed by telling Claude to "be careful" had already been fixed. What remained were issues that could not be fixed structurally.

The period of constant "be careful" warnings

In the beginning, I also wrote plenty of "be careful" instructions in CLAUDE.md and my prompts.

"Do not guess files, always read them before answering."

"If the context becomes bloated, run /compact."

"Read the traps I've fallen into in the past, which are written in CLAUDE.md."

But the more I wrote, the more bloated CLAUDE.md became. A bloated CLAUDE.md just gets skimmed by Claude. Even though it's written there, it isn't followed.

I thought maybe my writing style was poor, so I changed the phrasing. Still, it didn't work. I realized there is a certain number of problems that simply don't go away no matter how many times you change the way you phrase them.

Problems that can be fixed vs. problems that cannot

One day, I drew a line in the sand.

Problems that can be fixed by writing "be careful" and those that cannot are fundamentally different types of issues.

Problems that can be fixed by writing instructions occur because Claude simply "forgot." If it sees the instructions, it remembers. This can be handled by improving the prompts.

Problems that cannot be fixed occur because Claude cannot recognize its own limitations.

It cannot notice that the context is bloated (it's decided at the moment the request is sent, so it can't see its own size).
It doesn't remember traps encountered in past sessions (sessions are independent, and adding to CLAUDE.md makes it heavy).
It doesn't notice when it forgets to call a tool (it doesn't know what it doesn't know, so it can't go get it).

Asking Claude to "be careful" about these things doesn't work. It's because these are problems Claude itself cannot fix.

Giving up and reinforcing from the outside

So, I stopped asking Claude and started intervening from the outside.

Claude Code has a hook mechanism. You can insert hooks before sending prompts, after a tool runs, or when a session ends. Even if Claude itself doesn't notice, you can observe the state from the outside and inject the necessary processing.

Since realizing this, I have created three reinforcement tools.

Throughline (Subtraction)

To address the issue of context bloating, it offloads tool inputs and outputs to SQLite and removes them from the context.

The contents of read files, grep results, and Bash outputs—once the AI has used them to make a decision and moved on, their purpose is fulfilled. Yet, they remain until the end, consuming tokens. I use hooks to offload these to SQLite. If Claude needs them, it can retrieve them itself.

I have completely removed the burden of "noticing the bloating" from Claude.

Caveat (Accumulation)

To address the issue of falling into the same trap twice, it automatically surfaces trap notes written in the past during similar situations.

When I fall into a trap, I write it down in Markdown. The next time I send a similar prompt, receive a similar tool error, or when a "struggle signal" is observed at the end of a session, the relevant past notes are injected into Claude's context via hooks.

I have removed the burden of "remembering past traps" from Claude.

Spotter (Addition)

To address the issue of forgetting to call tools, I run another Claude side-by-side that has a perfect grasp of the tool catalog and points it out if a call is forgotten.

The main Claude works as usual. Another Claude (Haiku 4.5) resides alongside it, watching the user's input and final response. If it notices, "You could have answered this by using web_search," it sends a pointer to the main Claude via a hook.

I have removed the impossible burden of "realizing what you forgot to call" from Claude.

Common patterns

What these three have in common is a design that expects nothing from Claude itself.

	Throughline	Caveat	Spotter
What is not asked of Claude	Context management	Past memories	Detection of missed steps
Who does it instead	hook & SQLite	hook & past notes	hook & separate Claude
Changes needed for Claude	None	None	None

The fact that "changes needed for Claude" is zero is important. You can just write what you want to write in prompts and CLAUDE.md as usual. You don't increase the number of "be careful" warnings.

Structural problems not yet reinforced

It's not that I'm satisfied because I made three. There are still structural problems I want to fix.

Long-term role drift: The problem where Claude's persona drifts during long sessions. Even if I write "You are a strict reviewer" in the prompt, it becomes soft after 20 turns.
Context loss through sub-agents: Sub-agents spawned by the Task tool do not have the implicit context of the parent session. It is quietly painful to pass the same explanation to the child every time.
Tool selection accuracy: The judgment of "which tool to use" among multiple options is sometimes sloppy. Spotter detects missed calls, but it doesn't detect wrong tool selection.

Conclusion

I have already fixed all the problems that can be solved by telling Claude to "be careful." The remaining problems are of a type that Claude itself cannot fix.

That's why I reinforce from the outside. Just insert it with hooks. Claude itself doesn't need to know anything.

Having created three of these in a month, I feel I've seen the pattern for reinforcement. All three are on npm under MIT, so if you are troubled by the same structural problems, please take a look if you feel like it.

I Had 74 Daemons Running Because I Made One Claude Audit Another for Missed Tool Calls

QuoLu — Mon, 08 Jun 2026 01:01:14 +0000

The Trigger

One day, when I asked Claude, "What time is it now?" it gave me an answer based on its best guess.

On another day, when I asked about the contents of a configuration file, it provided an explanation based on its own guess from the file name. It had a read_file tool available, but it didn't use it.

At first, I thought, "Maybe Claude is just tired," but it happened too frequently. Even if I wrote "Use the tools" in the prompt, it would sometimes forget.

That's when I realized: Claude cannot self-recognize when it doesn't know something. Therefore, it doesn't know it needs to go get a tool.

Even if I ask it to "be careful about forgetting to call tools," it doesn't know it "doesn't understand," so there is no way for it to be careful. It was a structural problem.

What I Tried

So, why not just add another set of eyes?

Apart from the main Claude, I decided to keep an auditor Claude (Haiku 4.5), which has complete mastery of the tool catalog, resident in every session. It watches the main Claude's planned utterances and final responses in parallel, and points out if a tool call was forgotten.

Situation	Main Claude's response	Auditor's feedback
"What's the weather today?"	Responds with a guess	You can use `web_search`
"What's inside this config?"	Guesses from the name	You can use `read_file`
"What time is it now?"	Time at training	You can use `current_time`

The point is not to rely on the main Claude's own self-awareness. Instead of writing "please be careful" to Claude, I physically placed another set of eyes there. The judgment happens in two stages: at the moment the user inputs something (listing tools that should be used for the request) and immediately after the main Claude returns a response (determining if a verification tool can be inserted for factual claims).

I built this and named it claude-spotter.

A Mistake Immediately After Release

I thought it was a convenient design. npm install -g claude-spotter would automatically enable it for all projects with no configuration required. It felt perfect.

I released it and started using it myself.

64 minutes later, 74 daemons were running.

What Happened?

When I dug into the real session logs, 51 out of the 74 were caused by Throughline (another tool of mine).

Throughline calls claude -p internally. Calling claude -p triggers the SessionStart hook. The SessionStart hook starts the Spotter daemon. The Spotter daemon calls claude -p for auditing. It wasn't quite infinite recursion, but a recursive proliferation.

Because I was writing to ~/.claude/settings.json via postinstall, every Claude Code session on the system was structured to load the Spotter hook. This was the price of "automatic activation for all projects."

I added a 5-layer defense to stop the recursion on my end, but it was defenseless against claude -p originating from other tools. This was a structural issue that couldn't be covered up by patches.

Retraction

I retracted the automatic registration in postinstall. I changed it so npm install only makes the CLI available, and users must explicitly run spotter install in each project. This writes the hook to <project>/.claude/settings.json.

I thought "automatic for all projects" was convenient, but the side effects were far greater. Ideally, automation is the goal, and having users run spotter install in each project is a compromise. If the Claude Code hook mechanism could "identify the session origin," it would be safe to make it automatic, and I'd like to revert to that when it happens.

The Next Bug: Tools from Past Projects Remain as Ghosts

After using it for a while, a different symptom appeared.

When I opened a session in Project A, the auditor suggested, "You can use the mermaid_diagram tool." However, the mermaid MCP is not registered in this project.

I investigated and found that the MCP tool definitions I had used previously in Project B remained in the global DB and were being referenced in Project A. A regression where it "suggests tools that cannot be used."

I changed the tool catalog used by the auditor to be local-DB only (v1.2.0). The global DB was demoted to "a cache that reuses only the description if it has been acquired in other projects." Now, discovery runs in each project every time, and tools that are not found are deleted (pruned) from the local DB.

MCPs Distributed as .cmd Fail to Spawn on Windows

I hit one more thing. On Windows, when I spawn('claude-mermaid') an npm-global .cmd distributed MCP, it fails immediately with ENOENT.

Node.js's spawn calls CreateProcess directly on Windows, but CreateProcess only resolves .exe files (it does not resolve the .cmd extension in PATHEXT). I had previously encountered the same pattern—where wrapping it in cmd.exe /c makes it work—in the Spotter itself when launching the claude CLI and had fixed it, but I had forgotten to apply this pattern to the MCP server launch path (fixed in v1.2.2).

I stepped into a trap I had set myself, just via a different path. Having experienced this, I strongly felt the necessity for Caveat. If there isn't a mechanism to avoid stepping into the same trap twice, this is what happens.

Current Status

v1.2.4. The CI for Windows, macOS, and Linux is all green.

npm install -g claude-spotter
cd your-project
spotter install

The tool catalog is automatically collected during spotter install, and the SessionStart hook refreshes it in the background every time Claude Code is started. There is no need to manage it manually.

spotter status      # List of running auditors
spotter db list     # Tool catalog for this project
spotter doctor      # Environment diagnostics
spotter uninstall   # Remove hook registration

Areas Still Lacking

The Stop hook's correction results in two consecutive responses. Because of the specification where the hook runs after the main Claude returns a response, when it issues a correction response, the user sees "the initial response + the correction response" one after another. It would be ideal if we could preempt it during input (UserPromptSubmit), and use the post-response hook as insurance.
User input is blocked by Haiku's timeout. I am currently considering whether to fail-open (bypass and let it through).

Relationship with Throughline / Caveat

Spotter is a separate product that shares a philosophy with Throughline and Caveat, created by the same author.

	Throughline	Caveat	Spotter
Philosophy	Subtraction	Accumulation	Addition
Target	Context bloat	Stepping into the same trap twice	Tool omission
Mechanism	Evacuate memory via hooks	Surface past notes via hooks	Run auditor in parallel via hooks

What the three have in common is a "mechanism that does not rely on the main Claude engine." All three can coexist.

Requirements

Node.js 22.5+
Claude Code 2.0+
Claude Max Plan (to launch Haiku 4.5 with claude -p)

Spotter — GitHub

MIT License. If you are struggling with the same problem, please feel free to take a look if you're interested.

Published Caveat to npm: A Long-term Memory Layer to Avoid Repeating the Same Traps

QuoLu — Sun, 07 Jun 2026 01:01:30 +0000

I have published Caveat, a long-term memory layer for Claude Code, to npm.

What it does

When using Claude Code, you often spend more time deciphering "other people's specifications" than doing the actual implementation. You get stuck on GPU driver version constraints, failed native module builds, IDE quirks, or path issues that only occur on specific OSs. Even after you struggle and solve it once, you end up stepping into the same trap in a different project six months later. When you ask the AI, it doesn't say "I don't know" but instead acts on assumptions, causing you to waste time all over again.

Caveat is a layer where "once you jot it down, relevant notes automatically surface the moment you encounter the same situation next time." Even if you can't remember it, and even if the AI doesn't know it, the relevance is detected structurally.

Three Trigger Points

Caveat is implemented at three points using hooks.

Trigger Point	When it runs	What it does
Prompt Submission	The moment a prompt is sent	Breaks down the prompt and surfaces only entries where two or more words co-occur with past notes
Tool Error	The moment a Claude tool call fails	Runs a background search and notifies the next turn as a known trap
Session Termination	When a session closes	Extracts "struggle signals" from conversation logs. Prompts the AI if there is anything that should be recorded as a new trap

"Struggle signals" are traces where the AI might not be aware of it, but objectively it was struggling—such as tool failures, editing the same file repeatedly, repeated web searches, or re-executing Bash commands. It scans these at the end and prompts you, "You were stuck here in today's session, right? Do you want to record it as a trap?"

Design without Keyword Lists

The search logic relies solely on Co-occurrence FTS. There is no keyword correspondence table like "if the word 'rtx' comes up, display GPU-related notes."

Instead, it breaks down the input prompt and only surfaces entries where two or more words appear in the same entry simultaneously. Generic words like make or new do not trigger on their own, but when two or more technical words overlap, they match.

Even when new trap categories are added, you just add one entries/<slug>.md. You don't need to touch the code or keyword tables. The trigger expands itself.

Knowledge is markdown-in-git

The data consists of standard markdown files. SQLite is used as a derived index for searching, which can be rebuilt if deleted.

~/.caveat/own/
├── entries/
│   ├── rtx-5090-cuda-12-init-fail.md
│   ├── windows-node-spawn-cmd-enoent.md
│   └── ...
└── .git/

You can open it directly as an Obsidian vault. If you want to share it with your team, you can simply git push. There is no central server.

Public / Private Layers

Entries have a visibility attribute.

Public: Traps anyone can encounter if they use the same external tools or specifications (GPU, build environments, IDEs, version constraints)
Private: Project-specific context that cannot be reconstructed just by reading the code (intentional non-standard behavior, workarounds awaiting upstream fixes, custom habits)

Claude automatically determines the visibility. When in doubt, it defaults to private (to prevent leakage). If you explicitly instruct, "This should be private," that takes priority.

There is also a pre-commit hook mechanism that prevents private entries from being mixed into the shared repository.

Installation

npm install -g caveat-cli
caveat init

caveat init does the following in one go:

Initializes ~/.caveat/
Registers the MCP server with Claude Code
Adds three hooks to ~/.claude/settings.json

It does not break existing hook settings (it creates a backup before merging).

caveat search "rtx"        # Search existing notes
caveat serve               # Start a read-only portal
caveat uninstall           # Remove Claude integration only (data is kept)

No Central DB

Earlier versions had a shared database, with the concept of using caveat push to cultivate knowledge collectively. This has been abandoned.

I concluded that automatically verifying contributions from complete strangers is impossible in principle. Even if you use an LLM as a gatekeeper, it can be bypassed, and long-term latent attacks cannot be found through static analysis. Therefore, trust is built not through "automated inspection" but through "social context." I shifted to a model where you decide the scope of trust by choosing whose repositories to subscribe to.

caveat community add https://github.com/acme-corp/caveats
caveat pull

I will write about the detailed background in another article.

Requirements

Node.js 22.5+
Claude Code (with hooks support)
pnpm (development only)

Status

v0.11.1, 203 tests passing. Assumes individual and small team use cases.

Caveat — GitHub

MIT License. Bug reports and PRs are welcome.

Published Throughline to npm: A hook to offload Claude Code tool I/O to SQLite

QuoLu — Sat, 06 Jun 2026 00:54:54 +0000

I have published a hook plugin for Claude Code called Throughline to npm.

What it does

In a Claude Code session, the majority of the context is filled with the remnants of "tool I/O." The contents of read files, grep results, and Bash output—data that served its purpose the moment the AI read it, made a decision, and moved on. However, it stays in the context until the end, consuming tokens.

Throughline manages the conversation in three layers.

Layer	Content	Context Injection
L2	Conversation body (user input + AI response)	Last 20 turns injected as is
L1	Summarized version of L2 (1/5th size) while retaining key points	Injected for turns older than 20
L3	Tool I/O, system messages, and thinking	Not injected; offloaded to SQLite, retrieved by Claude as needed

Since tool I/O is completely removed from the context, read grep results and Bash outputs do not linger until the end of the session. Older conversations are compressed to 1/5th of their original size while keeping key points, so you can still follow the context of decisions made dozens of turns ago.

In a 50-turn session on my machine, a conversation that consumed 125,000 tokens was reduced to within 13,000 tokens.

Installation

npm install -g throughline
throughline install

install registers the hook in ~/.claude/settings.json. It runs automatically for all Claude Code projects on your PC. No configuration is required for individual projects.

Carrying over between sessions

Throughline offloads conversations to SQLite, so the data remains even after running /clear. If you want to carry over your memory to the next session, type /tl in the previous session.

Data is only carried over to the next session when you type /tl. If you don't type it, it starts as a fresh session. Even if you open parallel windows or restart VSCode, it is designed so that it "won't fire accidentally unless you type /tl."

When carrying over, the "next step memo" written by the previous Claude and the internal reasoning (thinking) of the final turn are passed along as well. The next Claude runs in "continue from interruption" mode rather than "reading past logs" mode.

Token Monitor

As a byproduct, a multi-session capable token monitor is included.

throughline monitor

[Throughline] 1 session
▶ Throughline  2ed5039c  ████░░░░░░░░░░░░░░░░  205.1k /  21%  Remaining 794.9k  claude-opus-4-6

Since it reads actual API values (message.usage) from the transcript JSONL, it provides accurate values rather than estimates based on character count / 4. It also supports automatic detection of 1M context windows.

Requirements

Node.js 22.5+ (uses the built-in node:sqlite module)
Claude Code (supports hooks)
Claude Max plan (used for Haiku calls for L1 summarization; no API key required)
Windows / macOS / Linux

Dependencies

Zero. The tarball published to npm contains only .mjs files. No build process or native bindings are required.

The background of the design and my trial-and-error process are written in this article.

Throughline — GitHub

MIT licensed. Bug reports and PRs are welcome.

Why I Gave Up on Automatic Detection for Resuming Sessions in Claude Code

QuoLu — Fri, 05 Jun 2026 00:58:14 +0000

In my previous article, I released Throughline. It is a tool that offloads tool I/O, which usually occupies the majority of the context.

At that time, it was "working." At least, in my own environment.

However, right after publishing the article, I started noticing some strange behavior.

When I opened another window in parallel, the new session would autonomously pick up the memory of the previous session. Every time I restarted VSCode, it would be treated as "continuing from the previous session." Even though I had never performed a /clear command.

The Cause: Unable to detect /clear

Claude Code's hook includes an event called SessionStart, and I was supposed to be able to distinguish between startup (a new start) and clear (after a /clear command) using the source field.

However, with the VSCode extension, even if I perform a /clear, the source is overwritten as startup. This is a known issue tracked in GitHub issue #49937. It works if you use the CLI alone, but it cannot be identified when using the extension.

I am using it via the VSCode extension. In other words, the design premise of "distinguishing between startup and clear" was fundamentally broken.

Attempting to compensate with heuristics

So, I thought about determining it based on time differences. Like, if it's within 10 seconds of the last activity of the previous session, treat it as a clear; if longer, treat it as a startup.

This also broke.

When two windows are open in parallel, both appear as "recently active," making both candidates for inheritance.
Even when restarting VSCode, the transcript remains, making it look "recent."
I tried to trace the process tree, but the process structure differs between the CLI and the extension.

I realized that "there are no conditions to detect it in the first place."

Changing the approach

I failed because I was trying to detect it. If the user declares it, detection becomes unnecessary.

What I created is a slash command called /tl. Users type it only when they want to carry over their memory to the next session. When typed, that session ID is written to a table called handoff_batons. Imagine placing a baton.

When the next session starts, if a baton was placed within the last hour, it inherits the memory of that session. If not, it does nothing and starts as a new session.

This principle guarantees that parallel windows and VSCode restarts "will not misfire unless a baton is placed."

Being explicit might seem troublesome at first glance, but "accidentally inheriting and causing trouble" is far more problematic. Having zero misfires was more valuable.

But, this alone wasn't enough

With the baton in place, the next session could read the conversation logs of the previous session. However, after actually using it, I felt it was "just reading logs."

There is a difference in the visceral experience between an AI that reads past logs and an AI that continues from the point of interruption.

The former asks, "Okay, I've grasped the situation. So, what shall we do now?" The latter proceeds by saying, "Continuing from earlier, we should check X next, right?"

So I added two things here.

In-flight memo. The moment /tl is typed, I have the currently running Claude itself write down "the next move, current hypotheses, unresolved issues, and ongoing TODOs" in Markdown. That is attached to the baton.

Saving thinking. I also save Claude's extended thinking blocks as L3. When injecting into the next session, I place the thinking from the final turn at the very top. What the previous Claude was thinking is passed on to the next Claude.

As a result, the injected text for the next session looks like this:

You are resuming an interrupted task.

[In-flight memo written by the previous Claude]
Next steps: Write tests for X. Hypothesis: I think Y is the cause. Unresolved: Z.

[What the previous Claude was thinking at the end]
I'm curious about the behavior of Z. Maybe...

[Conversation from the last 20 turns]
...

From "reading" to "continuing"

This changed the feel of the experience.

When I perform a /clear and then a /tl to start a new session, the next Claude begins immediately with, "Alright, I'll start writing those tests for X from earlier." It’s not reading; it’s continuing.

Even between humans, when handing off work to someone, it is faster to hand over a memo saying "What to do next. The reason. One thing I'm concerned about" rather than having them read the entire log. It was the same.

I actually wanted it to be automatic

I don’t want to be misunderstood, but I believe the ideal is for it to "work automatically in the background." Having the user explicitly type something is, in truth, a compromise.

In this case, I "escaped to an explicit declaration because I couldn't detect it." If the source issue in the VSCode extension is fixed, I want to return to automatic detection, and I will. Until then, I am just substituting it with an explicit baton.

However, even if it is a compromise, there is almost no practical harm. With automatic detection, you would end up typing /clear anyway; now that is just replaced by /tl. The keystrokes are the same, and I’ve been able to reduce misfires to zero.

It is not that "explicit is better," but rather "I settled for a declaration because I couldn't detect it." That is the honest truth.

Throughline is published on npm as v0.3.2. Node.js 22.5+, zero dependencies, MIT.

Throughline — GitHub

npm install -g throughline
throughline install

If you are interested, please take a look.