DEV Community

QuoLu
QuoLu

Posted on

Released aiterm-mcp on npm: An MCP Server to Reduce Token Usage by Providing AI with a Persistent Terminal

I have published an MCP server called aiterm-mcp to npm. It is designed to let an AI hold a terminal as a "single persistent session."

AI terminal tasks consume tokens invisibly

When having an AI perform server tasks, you usually send one command at a time. For SSH, that means ssh host "command" every single time. This repeats the full "connect → authenticate → execute → disconnect" cycle for every single attempt.

The problem is that because it starts from scratch every time, no state remains. The directory you cd'd into, the environment you source'd, and the SSH connection you established are all gone by the next command. Therefore, the AI has to do this every time:

  • Connect via SSH again,
  • Change directory again,
  • Load the environment again,

...and only then finally run the actual command. This "set of redo operations" is written by the AI and read by you every time it sends a command. Text related to reconnection, re-authentication, and re-setup—none of which relates to the actual task—piles up in the context with every turn. Tokens are dissolving into redundant redos that produce nothing.

aiterm folds this away. It holds just one persistent terminal, and SSH is established only once. Whether you run 10 commands, ssh is called only for the very first one. Connections and authentication are reduced from N times to just once. The cd and the environment remain from the first time as well. All subsequent commands are sent directly through the same single session. The entire set of redo operations simply disappears.

I measured how much it saves on my own server. When logging in via SSH, the boilerplate text (system information and announcements, i.e., MOTD) alone passes about 385 tokens to the AI. In the fragmented mode where you reconnect for every command, this gets included every time. For a 10-command task, that is about 3,800 tokens just on boilerplate before even reaching the real work. By holding onto one terminal, you only pay for it once. The rest is zero.

Pruning output before reading

There is one more level of token savings. aiterm prunes the output before the AI reads it.

Full disclosure: this reduction logic is not my invention. I have ported the logic from rtk (Rust Token Killer) entirely. rtk is a tool created by Patrick Szymkowiak that compresses command output before passing it to an LLM (Apache-2.0). I re-implemented it in aiterm so that it completes within the terminal reading process without calling a separate binary (files were not duplicated, but the behavior was matched; pytest summaries were fixed with regression tests to match rtk 0.42.0).

What it does:

  • Removes control characters (colors and cursor movements)
  • Folds repeated lines into counts
  • Truncates output that is too long, leaving the head and tail (with hints for restoration)
  • Summarizes common commands like git status / git log / grep / pytest into key points using command-specific summarizers

On my own server, I measured the output received by the AI as "raw" versus "via aiterm."

Output received by AI Raw Via aiterm
SSH login boilerplate (MOTD) Approx. 385 tok Approx. 350 tok
docker ps -a (33 containers) Approx. 2,355 tok Approx. 2,218 tok
120 lines of logs (journalctl) Approx. 4,375 tok Approx. 1,696 tok
git log (25 entries) Approx. 473 tok Approx. 338 tok

How much it cuts depends on the content. Logs with many repetitions drop significantly (-61% for 120 lines). On the other hand, wide tables with only unique values (like the container list) only shrink by -6%. It is not magic that reduces everything by a fixed percentage, but rather "trimming only the waste." Even so, combined with not having to read the reconnection boilerplate every time, tokens definitely stop accumulating.

It wasn't just a token problem

Up to this point, it has been about saving tokens. But the fragmented "1 command = 1 connection" approach had an even less funny side effect.

When you repeat connections at a rapid pace, your server's defenses decide you are an attacker.

  • Monitoring tools that track login attempts view consecutive connections as a brute-force attack and ban you.
  • You hit limits on the number of concurrent connections or sessions, causing new connections to be rejected.
  • Ultimately, your account gets locked.

The mechanisms meant to stop attackers end up locking out the person who created them. This actually happened on my home server.

I actually wrote about something similar before. In A Record of Entrusting My Server to AI, I mentioned how a monitoring script hit its own concurrent connection limit, causing my own SSH attempts to fail. Back then, it was a "script." This time, the "AI agent" was falling into the same trap with every command it ran. The cause is the same—too many connections.

By folding everything into a single terminal, this disappears too. Authentication happens once, and there is only one session that doesn't multiply. Therefore, you won't hit connection rate limits or get banned.

Design: Holding only "one terminal"

The philosophy of aiterm is simple: the primitive is just "holding one local terminal."

At first, I tried to increase the tools by type—tools for SSH, tools for containers... but there is no end to that. So I stopped everything. I stopped making SSH, docker exec, and interactive shells (REPL) into dedicated tools, and downgraded them to "a single line typed into that terminal."

pty_open()                      # Open one terminal
pty_send(id, "ssh 192.168.1.2") # SSH once inside it
pty_send(id, "uname -a")        # Subsequent commands run in the same session
pty_read(id)                    # Read pruned output
Enter fullscreen mode Exit fullscreen mode

That is why there are only 6 tools (open, send, read, send key, close, list). I don't introduce distinctions like SSH vs. local vs. container into the tool hierarchy.

Behind the scenes is tmux

The actual entity of the terminal is a tmux session. Thanks to this:

  • The terminal lives on even if the MCP server or AI client restarts.
  • A human can watch the same screen live from behind using tmux attach (you can see what the AI is currently doing on the server in real-time).

Honest note: You can't eliminate the round-trips themselves

I will write this without exaggeration. You cannot achieve "zero round-trips if you connect the terminal directly to the AI." Since the AI decides its next input after reading the output, the "send → read → decide" loop fundamentally remains. What aiterm eliminates is the cost of re-authentication, reconnection, and re-setup that was attached to every one of those round-trips, along with the output noise. It trims the weight per round-trip, not the number of round-trips.

Installation

No cloning or building required.

claude mcp add --scope user --transport stdio aiterm -- npx -y aiterm-mcp
Enter fullscreen mode Exit fullscreen mode

Once Claude Code restarts and is connected via /mcp, you are done. Whether it is Claude or Codex, any MCP client can launch it with npx -y aiterm-mcp.

Requirements

  • Node.js 18+
  • tmux (apt install tmux / brew install tmux)
  • Linux / WSL2 / macOS / Windows native support (Windows does not have tmux, so it bridges to tmux inside WSL)

Status

v0.4.0, MIT, published to npm with provenance.

aiterm-mcp — GitHub

Bug reports and PRs are welcome.

Top comments (0)