DEV Community: YHH

The boring secret to a cheap AI coding agent — a byte-stable prompt prefix

YHH — Wed, 06 May 2026 01:18:07 +0000

I tried using Claude Code for a few weeks and quietly stopped. Not because it was bad — it's great — but because I'd hesitate before kicking off any non-trivial task, doing the math on what a 30-minute debugging session would cost. That's the wrong incentive for an agent. The whole pitch of "let it run" depends on the bill not scaring you.

So I went the other direction. I built Reasonix — same shape as Claude Code or Aider, runs in your terminal, plan mode, tool calls, MCP — but it only talks to DeepSeek, and the entire loop is engineered around one invariant:

The prompt prefix must be byte-identical to the previous turn's prefix.

That's it. That's the whole architectural constraint. Everything else falls out of it.

This post is about why that constraint matters, what silently breaks it, and what the loop ends up looking like when you take it seriously.

The mechanic

DeepSeek's API has prefix caching. If the first N tokens of your request match a recent request byte-for-byte, those tokens are billed at roughly 1/10th the normal input price. Cache TTL is generous — minutes — long enough that within a single conversation turn you basically always hit it if you didn't break the prefix.

Most providers have something like this now. What's different is the price ratio and the granularity. On DeepSeek, if your agent is built right, every turn after the first is mostly cached, and a long session costs cents instead of dollars.

The catch: "built right" turns out to be load-bearing.

What silently breaks the cache

These are the things I found that quietly destroy your cache hit rate. None of them throw errors. Your agent works fine. You just pay full price every turn and don't know why.

1. Non-deterministic JSON.stringify of tool schemas

// Looks fine. Is poison.
const toolSchema = JSON.stringify({
  name: "edit_file",
  parameters: someObject,
});

JSON.stringify does not guarantee key order across runs in all engines, and even when it does, your someObject may have been built from an object spread that depends on insertion order. One reordered key — {path, content} vs {content, path} — and the cached prefix is gone.

Fix: serialize tool schemas with a deterministic stringifier (sorted keys), and freeze the output. Once at startup, never re-serialize per turn.

2. Timestamps or run IDs in the system prompt

const systemPrompt = `You are a coding agent. Session: ${sessionId}. Started: ${now}.`;

Looks harmless. Destroys cache on every single turn because the prefix differs every run.

Fix: nothing variable goes into the system prompt. Session metadata, if you really need the model to see it, goes into the first user message — which is fine because that's already turn-1-only content.

3. Re-rendering tool results with one whitespace difference

This one is sneaky. You call a tool, get a result, format it into a message, send it back. Next turn, you re-render the same tool result from your event log — but a pretty-printer adds a trailing newline this time, or strips one. Cache gone.

Fix: format tool results once, store the exact rendered string, append-only. Never re-derive past content from upstream sources.

4. In-place edits to message history

Summarization, truncation, "let me clean up this old turn so we fit in the context window" — every one of these mutates the prefix. Even if the new shortened version is what you want the model to see, the cache was built against the old version.

Fix: history is append-only. If you need to compress old context, do it as a new turn ("here's a summary, ignore turns 1–8"), not as an in-place edit.

5. Switching tool definitions mid-session

Adding a tool, removing one, even reordering the tools array — all of these change the system message that includes the tool schemas, which is part of the prefix.

Fix: pin the tool set at session start. If you need dynamic tools, accept the cache miss as a deliberate event and surface it in your cost dashboard.

What doesn't break it

For completeness, things I worried about that turned out to be fine:

Streaming vs non-streaming — same prefix, same cache.
Function-calling format vs JSON-tool-call format — pick one and stick, but either works.
Adding new turns — obviously, that's the entire point. Append is free.
Tool result content changing turn-to-turn — only the formatting of past results matters. Future tool calls returning different data is normal and doesn't touch the prefix.

What the loop looks like when you take this seriously

Reasonix's loop is built backward from the byte-stability requirement:

Turn N+1 prefix = Turn N prefix + (assistant turn N) + (user turn N+1)
                                  ↑                    ↑
                          rendered once,        rendered once,
                          stored as string      stored as string

That's the whole shape. There is no code path anywhere in the loop that re-derives past content from upstream sources at request time. Past content is strings, in an array, appended to. Period.

A few specific design consequences:

System prompt is a constant. Compiled at startup, frozen. No template variables.
Tool schemas are serialized once with a sorted-key stringifier and concatenated into the system message.
Tool results are formatted at the moment of receipt and stored as the exact bytes that will be sent. The loop replays bytes, not objects.
There is no summarization step in the main loop. When context gets large, the user can /compact explicitly — which is a new turn containing a summary, not an in-place rewrite.
Permissions / plan mode / hooks all operate on what the user sees, never on what gets sent to the model.

This is more restrictive than a generic agent framework wants to be. That's the point. The constraint is the feature.

What this gets you

On a long debugging session — say, an hour of back-and-forth on a real codebase, 50–80 turns, lots of tool calls — Reasonix bills come in around 5–15 cents depending on how much code the model reads. The same session through a non-cache-aware framework on DeepSeek would be roughly $1–$3. Through Claude (Sonnet) it'd be $5–$15.

The cheapness isn't the goal — the goal is changing the posture. When a session costs cents, you stop curating prompts and start delegating real chunks of work. You leave it running while you go to lunch. That's the whole user-experience shift.

Things I'm honest about

It's DeepSeek-only. That's a feature, not a bug — every layer is tuned to one provider's cache mechanic. But if DeepSeek goes down, you're down. I think the cost ratio is worth it; you may not.
DeepSeek is a Chinese provider. Some companies can't use it. That's a real constraint, not something I can engineer around.
R1's quality on agentic tool-use is a notch below Sonnet. Closer than you'd expect, but it's there. The cost ratio still wins for me.
The prefix-stability discipline is contagious — once you've enforced it in the loop, you start noticing every place else in your stack that quietly mutates state for no reason.

Try it

npx reasonix code

Repo: github.com/esengine/reasonix

MIT, TypeScript, Node 22+. Works on macOS, Linux, Windows (PowerShell, Git Bash, Windows Terminal).

Architecture writeup with the four-pillar breakdown is in docs/ARCHITECTURE.md.

If you've measured cache hit rates in your own agent setup — generic framework or otherwise — I'd genuinely like to see numbers. The thing I can't tell from the outside is whether everyone is silently eating full-price tokens, or whether some of the popular frameworks have quietly fixed this and I missed it.

Designing a Hooks System for AI Agent CLIs: 4 Lifecycle Points That Cover Everything

YHH — Thu, 23 Apr 2026 10:38:50 +0000

Designing Hooks for an AI Agent CLI: 4 Lifecycle Points, Shell-Only, and What I Cut

A few weeks ago I shipped Reasonix — a DeepSeek-native TypeScript agent framework. The first article was about the cache-first prompt structure that pushed cache hit rates to 85-95% on real sessions. That got picked up on dev.to.

Two weeks later, I needed hooks. Specifically I wanted my coding agent to:

Refuse rm -rf even if the model decided that was a great idea
Auto-format files after every edit
Auto-commit at the end of a session if the diff was clean

I could have hardcoded each of these. But every time I did, three days later I'd want a fourth thing — and the agent would grow a "configuration ballast" problem.

So I designed a hooks system. This post is the design doc, including what I cut and the bugs that shaped it.

Why hooks at all (and not the obvious alternatives)

Three things that almost made it in but didn't:

1. A TypeScript plugin system.
You'd npm install reasonix-hook-prettier, the framework would import it, hooks would be JS callbacks. I rejected this because hooks are a sysadmin concern, not a JS concern. The user who wants "run prettier after every edit" already has prettier on $PATH and knows how to invoke it. Forcing them through npm and a TS API is gatekeeping.

2. A middleware chain like Express.
loop.use((event, next) => ...). I rejected this because hooks aren't transformers. They observe, and sometimes veto. Middleware semantics imply "you can rewrite the payload and pass it on" — a much bigger contract than what most users actually want.

3. Webhooks (HTTP callouts).
Too much infrastructure. "After every edit, run prettier" should not require standing up an HTTP server.

What won: a hook is a shell command. Reasonix invokes it with a JSON payload on stdin and reads the exit code.

// .reasonix/settings.json
{
  "hooks": {
    "PostToolUse": [
      { "match": "edit_file", "command": "prettier --write \"$(jq -r .toolArgs.path)\"" }
    ]
  }
}

Language-agnostic. Composable with everything in the user's terminal. Mental model is identical to "what would I run in my shell?"

The 4 events — split into 2 categories

Event	Fires when	Category
`PreToolUse`	The model decided to call a tool, before dispatch	Gating
`PostToolUse`	A tool returned, before the result reaches the model	Observing
`UserPromptSubmit`	The user typed a prompt, before it goes to the model	Gating
`Stop`	The loop finished a turn (assistant returned final text)	Observing

Two categories, not four. The split drives every other decision in the design:

const BLOCKING_EVENTS: ReadonlySet<HookEvent> = new Set(["PreToolUse", "UserPromptSubmit"]);

const DEFAULT_TIMEOUTS_MS: Record<HookEvent, number> = {
  PreToolUse: 5_000,
  UserPromptSubmit: 5_000,
  PostToolUse: 30_000,
  Stop: 30_000,
};

Gating events can refuse to let the loop proceed (exit code 2 = block). They have a tight 5-second timeout because they hold up forward progress.

Observing events fire after the action is already done, so blocking is meaningless — exit 2 from a PostToolUse hook just becomes a warning. They get 30 seconds because nobody is waiting on them.

This asymmetry is hardcoded in the decision matrix:

export function decideOutcome(event: HookEvent, raw: HookSpawnResult) {
  if (raw.spawnError) return "error";
  if (raw.timedOut) return BLOCKING_EVENTS.has(event) ? "block" : "warn";
  if (raw.exitCode === 0) return "pass";
  if (raw.exitCode === 2 && BLOCKING_EVENTS.has(event)) return "block";
  return "warn";
}

Events I cut

SessionStart / SessionEnd — solved by the user's shell. If you want to run something before reasonix chat, you write pre-reasonix && reasonix chat. I'm not the right system to schedule that.
PreCompact — Reasonix doesn't do automatic context compaction (the cache-first design works against compaction). No event to hook.
Per-token streaming — would have been a hook firing for every token of model output. Use cases (PII redaction, profanity filter) exist, but the per-call cost would be brutal and post-processing the final text is more sensible.
OnError — error handling lives at the loop level; a hook can't meaningfully recover from "the model returned malformed JSON." A subprocess can't fix what only the loop sees.

The protocol — JSON in, exit code out

Every hook gets a single-line JSON envelope on stdin:

{
  "event": "PreToolUse",
  "cwd": "/Users/me/my-project",
  "toolName": "edit_file",
  "toolArgs": { "path": "src/foo.ts", "content": "..." }
}

Fields differ by event. PostToolUse adds toolResult. UserPromptSubmit has prompt. Stop has lastAssistantText and turn. The shape is documented in hooks.ts.

Exit code is the protocol:

Exit code	Gating event	Observing event
`0`	pass (continue)	pass
`2`	block (stop the chain)	warn (action already done)
anything else	warn (continue, log)	warn

Why exit code instead of structured stdout JSON? Because I wanted hooks to be writable in one line of bash:

test -f .git/MERGE_HEAD && exit 2  # block any tool call during a merge

Versus:

test -f .git/MERGE_HEAD && echo '{"decision":"block","reason":"merge in progress"}' || echo '{"decision":"pass"}'

The second is cleaner from a typed-API perspective and worse from a "I'll write this in 30 seconds" perspective. I optimized for 30 seconds.

The cost: hooks can't return structured data back to the model. A PreToolUse hook can only veto or pass — it can't say "let me rewrite these args first." That's a real limitation. The upgrade path, if demand grows, is opt-in: a parseStdoutAsJson: true flag in settings. For now, not worth the complexity.

Two surprising design calls

Call 1 — `match` is anchored regex, not substring

export function matchesTool(hook: ResolvedHook, toolName: string): boolean {
  if (hook.event !== "PreToolUse" && hook.event !== "PostToolUse") return true;
  const m = hook.match;
  if (!m || m === "*") return true;
  try {
    const re = new RegExp(`^(?:${m})$`);
    return re.test(toolName);
  } catch {
    return false; // fail closed — see "Bugs that shaped the design"
  }
}

So "match": "file" does not trigger on read_file. You have to write ".*file" or "read_file|write_file|edit_file".

I tried substring first. Within a day I had a hook configured as match: "edit" firing on tool_called_edit_text, git_edit_commit, and audit_log_edit — all unintentional. Substring matching is intuitive in the small, dangerous in the large.

The cost: more typing. The benefit: when you write match: "shell", that's exactly what fires. No surprises.

I expect this to be the most-debated decision in this post. Feel free to argue with me in the comments.

Call 2 — events fire from two different layers

src/loop.ts        → fires PreToolUse, PostToolUse
src/cli/ui/App.tsx → fires UserPromptSubmit, Stop

The loop is a library. It runs headless, in tests, in scripts. It doesn't know what a "user prompt submission" is — it just knows about step(text). The text could come from a TUI, a JSON RPC call, or a for loop in a script.

So UserPromptSubmit lives at the App boundary — the TUI is what decides "the user just hit enter on this." Same for Stop — the loop emits assistant_final events, but "the turn is done from the user's perspective" is a UI concept.

Practically: if you embed CacheFirstLoop in your own app, you get tool-related hooks for free. You wire prompt and stop hooks yourself if you want them.

// Embedded usage — tool hooks "just work"
const loop = new CacheFirstLoop({ client, prefix, hooks });
for await (const ev of loop.step("...")) { /* ... */ }

// You decide what counts as a "submitted prompt"
const promptReport = await runHooks({
  hooks,
  payload: { event: "UserPromptSubmit", cwd, prompt: text }
});

Slightly more boilerplate. Way cleaner separation.

Real configurations that do useful things

All copy-pasteable. Drop them in .reasonix/settings.json (project) or ~/.reasonix/settings.json (global).

Block dangerous shell commands

{
  "hooks": {
    "PreToolUse": [
      {
        "match": "shell",
        "command": "jq -r .toolArgs.command | grep -qE '^(rm -rf|sudo|curl.*\\| bash)' && { echo 'denied: dangerous command' >&2; exit 2; } || exit 0",
        "description": "block rm -rf / sudo / curl|bash"
      }
    ]
  }
}

The >&2 matters — Reasonix surfaces stderr as the block reason in both the UI and the tool result the model sees. The model gets a structured refusal, not silence.

Auto-format after edits

{
  "hooks": {
    "PostToolUse": [
      {
        "match": "edit_file|write_file",
        "command": "path=$(jq -r .toolArgs.path) && prettier --write \"$path\" 2>/dev/null || true"
      }
    ]
  }
}

The || true is intentional — if prettier doesn't recognize the file type, the hook still passes. We don't want a yellow warning row for every .txt edit.

Daily cost ceiling

{
  "hooks": {
    "UserPromptSubmit": [
      {
        "command": "test \"$(reasonix stats --today --json | jq .totalCostUsd)\" \\< 5.00 || { echo 'daily budget hit ($5)' >&2; exit 2; }",
        "description": "stop accepting prompts after $5/day"
      }
    ]
  }
}

Composes the /stats feature with the hooks system. reasonix stats --today --json returns running cost; the hook compares and blocks. Zero code changes to Reasonix needed.

Auto-commit on Stop, only if clean

{
  "hooks": {
    "Stop": [
      {
        "command": "git diff --quiet || (git add -A && git commit -m \"reasonix: $(jq -r .lastAssistantText | head -c 60)\" --no-verify)"
      }
    ]
  }
}

If you trust the agent end-to-end, this turns every chat into a commit. I personally don't enable it — but several Reasonix users do, and it's been stable enough to mention.

The bugs that shaped the design

Three real bugs that turned into permanent design decisions.

1. SIGTERM doesn't always land on Windows shell children

First version killed timed-out hooks with child.kill("SIGTERM"). On Windows, when the hook ran through cmd.exe /c, SIGTERM was caught by the shell but never propagated to the actual hook process. The shell exited; the hook kept running.

const timer = setTimeout(() => {
  timedOut = true;
  child.kill("SIGTERM");
  setTimeout(() => {
    try { child.kill("SIGKILL"); } catch { /* gone */ }
  }, 500);
}, input.timeoutMs);

500ms grace, then a hard kill. Slightly inelegant. Actually works.

2. Malformed regex in `match` used to fire on every tool

new RegExp("[unclosed") throws. The first version caught the throw and returned true — assuming the user's intent was permissive. That's wrong: a typo in a regex shouldn't suddenly cause a PreToolUse hook to fire on every tool call.

} catch {
  // malformed regex → don't fire (safer than firing on every tool)
  return false;
}

A typo in match now makes the hook silently inactive (visible in /hooks list) instead of suddenly gating every tool call.

3. A typo in settings.json crashed the entire CLI

JSON.parse throws on malformed JSON. The first version let it propagate. One missing comma in ~/.reasonix/settings.json and reasonix chat exited with a stack trace before showing the TUI.

function readSettingsFile(path: string): HookSettings | null {
  if (!existsSync(path)) return null;
  try {
    const raw = readFileSync(path, "utf8");
    const parsed = JSON.parse(raw);
    if (parsed && typeof parsed === "object") return parsed as HookSettings;
  } catch {
    /* malformed JSON → treat as no hooks; don't lose the whole CLI to a typo */
  }
  return null;
}

The principle: a tool that's broken should still be openable, even if degraded. Configuration is the most fragile part of any system; it should never take down the part that lets you fix the configuration.

What's still missing

No structured rewrite from hooks. A PreToolUse hook can block or pass; it can't rewrite arguments. Want to redact secrets from toolArgs.path? You can't — only refuse the call.
No hook timing in /stats. Hook duration is captured per-outcome (durationMs) but isn't aggregated. A 30-second formatter hook on every edit is a real productivity tax — should be visible.
No match for non-tool events. UserPromptSubmit always fires for every prompt. There's an argument for match working as a regex over the prompt text. Haven't been convinced yet.
Composition with skills. Reasonix 0.4.26 added subagents-via-skills. Subagent invocations don't currently fire PreToolUse (a subagent isn't a tool in the model's eyes). Should they? Probably yes, separate event: PreSubagent. On the roadmap.

Open questions I'd love feedback on

Anchored vs substring match. Strong opinion, weakly held. Substring crowd has a point about ergonomics. Anchored crowd has a point about predictability. Vote in the comments.
Should hooks be allowed to mutate the payload? I deliberately said no. But "redact this argument before the tool runs" is real. Worth a parseStdoutAsJson: true flag?
Per-event vs per-tool default timeouts. I picked 5s for gating, 30s for observing — uniform across tools. But a PostToolUse hook on web_search (already 10s of latency) is different from one on edit_file (instant). Should defaults adapt?

Quick start

npm install -g reasonix
mkdir -p .reasonix
cat > .reasonix/settings.json <<'EOF'
{
  "hooks": {
    "Stop": [
      { "command": "echo 'turn done.' && date" }
    ]
  }
}
EOF
reasonix chat
# /hooks         list active
# /hooks reload  re-read after editing settings.json

Full source: github.com/esengine/reasonix, specifically src/hooks.ts and tests/hooks.test.ts.

If you want the bigger picture on Reasonix (cache-first loop, R1 thought harvesting, branching), the first article covers that.

Issues, design arguments, and counter-examples especially welcome.

How a DeepSeek-only agent framework hit 85% prefix cache rate (and saved 93% vs Claude)

YHH — Tue, 21 Apr 2026 11:54:44 +0000

I've been running DeepSeek behind LangChain for a few months for a side project. Worked fine, except one day I noticed
something weird: DeepSeek's pricing page advertises cached input tokens at ~10% of the miss rate, but my bills didn't
reflect that at all.

I dug in. The cache is byte-prefix based. The moment your request's prefix differs from the previous one by even a single
character, you pay full price. And LangChain — along with every generic agent framework I checked — rebuilds the prompt
every turn. Timestamps get injected. History gets reordered. Tool schemas re-serialize with different whitespace. The prefix
drifts, the cache never hits.

So I wrote something opinionated: Reasonix — a TypeScript agent framework built only for DeepSeek. No multi-provider
abstraction, no orchestration graph, no RAG. Just three things done deeply.

📦 npm install -g reasonix && reasonix chat
🔗 GitHub: esengine/reasonix
📜 MIT License

## The numbers up front

Measured against the live DeepSeek API, not marketing math:

| Scenario | Model | Turns | Cache hit | Cost | Same on Claude Sonnet 4.6 | Savings |
|---|---|---|---|---|---|---|
| Multi-turn chat | deepseek-chat | 5 | 85.2% | $0.000923 | $0.015174 | 93.9% |
| Tool-use (calculator) | deepseek-chat | 2 | 94.9% | $0.000142 | $0.003351 | 95.8% |
| R1 reasoning + harvest | deepseek-reasoner | 1 | 72.7% | $0.006478 | $0.044484 | 85.4% |

Numbers come straight from usage.prompt_cache_hit_tokens on real API responses. You can install Reasonix and verify in 2
minutes.

## Pillar 1 — Cache-First Loop

The problem again: DeepSeek's cache only fires on identical byte prefix. Generic frameworks rebuild prompts, so the prefix
drifts, so the cache rarely hits.

The fix is structural. Every request's context gets partitioned into three regions with strict invariants:

  ┌─────────────────────────────────────┐
  │ IMMUTABLE PREFIX                    │ ← frozen at session start
  │   system + tool_specs + few_shots   │   this is the cache target
  ├─────────────────────────────────────┤
  │ APPEND-ONLY LOG                     │ ← grows monotonically
  │   [user₁][assistant₁][tool₁]...     │   prior turns preserve as prefix
  ├─────────────────────────────────────┤
  │ VOLATILE SCRATCH                    │ ← reset each turn
  │   R1 thoughts, transient state      │   never sent upstream
  └─────────────────────────────────────┘

In code, the prefix is hashed at construction and pinned. The log's append() method refuses any mutation. The scratch gets
wiped at every turn boundary.

That's it. That single discipline is enough to push cache hit rates to 85-95% on real sessions. Nothing else in the
framework would matter if this was wrong.

## Pillar 2 — R1 Thought Harvesting

DeepSeek's reasoning model deepseek-reasoner (aka R1) emits extensive reasoning_content — often 1000+ tokens of
step-by-step thinking. DeepSeek's own docs recommend not feeding it back to the next turn (it hurts quality). So most
frameworks just display it or drop it.

That's leaving a plan on the table. R1's reasoning trace is literally the model thinking out loud about subgoals,
hypotheses, and uncertainties. I pipe it through a cheap secondary V3 call in JSON mode and extract structured state:

  interface TypedPlanState {
    subgoals: string[];      // concrete intermediate objectives
    hypotheses: string[];    // candidate approaches being weighed
    uncertainties: string[]; // things R1 flags as unclear
    rejectedPaths: string[]; // approaches considered and abandoned
  }

Here's R1 on a classic logic puzzle — "3 boxes with swapped labels; pick one fruit to determine all three contents":

  ‹ subgoals (3): enumerate label-content permutations · decide which box to sample · verify uniqueness
  ‹ hypotheses (3): sample from "apple" box · sample from "orange" box · sample from "mixed" box
  ‹ uncertainties (2): can a single pick uniquely determine all? · does "mixed" contain equal ratios?
  ‹ rejected (2): sampling from "apple" box (ambiguous) · sampling from "orange" box (symmetric)

Every field maps to actual content in R1's reasoning trace. V3 is cheap enough (~$0.0001/turn) that this is essentially
free. Opt-in via reasonix chat --harvest or /harvest on inside the TUI.

## Pillar 3 — Tool-Call Repair

DeepSeek has several known tool-use quirks that generic frameworks don't handle:

Deep or wide schemas drop arguments. Tool schemas with more than ~10 leaf parameters or more than 2 levels of nesting cause V3/R1 to silently omit fields.
R1 leaks tool calls into <think>. The model writes tool-call JSON inside its reasoning trace and forgets to surface it in the actual tool_calls field.
JSON gets truncated. Long arguments payloads hit max_tokens mid-structure.
Call storms. The model hammers the same tool with identical arguments in an infinite loop.

Reasonix's repair layer has four passes running on every turn:

  // 1. Auto-flatten deep/wide schemas
  ToolRegistry.register({
    name: "updateProfile",
    parameters: {
      type: "object",
      properties: {
        user: { type: "object", properties: {
          profile: { type: "object", properties: {
            name: { type: "string" },
            age: { type: "integer" },
          }},
        }},
      },
    },
    fn: ({ user }) => updateInDB(user),
  });
  // Internally shown to the model as a flat schema:
  //   {"user.profile.name": "...", "user.profile.age": ...}
  // On dispatch, args re-nested back to { user: { profile: { ... } } }

  // 2. Scavenge: regex + JSON parser sweeps reasoning_content for missed calls
  // 3. Truncation recovery: close braces, trim trailing commas, fill dangling keys
  // 4. Storm breaker: sliding-window dedup of (tool, args) tuples

All four are always on. No user configuration.

## Bonus: Self-Consistency Branching

Here's the fun one. DeepSeek is roughly 20× cheaper than Claude Sonnet 4.6. That means three parallel R1 samples per turn
is still cheaper than a single Claude call. What was a research luxury (self-consistency sampling) becomes a practical
default.

  reasonix chat --branch 3
  # or inside the TUI:
  > /preset max

Three samples fire in parallel at temperatures 0.0 / 0.5 / 1.0. Each one's reasoning is harvested. The default selector
picks whichever sample has the fewest flagged uncertainties (tie-break on shorter answer length — Occam's razor as a
heuristic).

TUI shows this live:

  🔀 branched 3 samples → picked #1   #0 T=0.0 u=2   ▸#1 T=0.5 u=0   #2 T=1.0 u=3

Anecdotally it lifts accuracy 10-15 percentage points on medium-difficulty reasoning, at roughly 1/5 the cost of a single
Claude pass. I haven't run a formal benchmark yet — that's next.

## What it's explicitly not

Not a LangChain replacement. No multi-provider, no graph orchestration, no RAG.
Not a drop-in for OpenAI-compatible code. The whole point is DeepSeek-specific.
Not production-ready. v0.0.6 pre-alpha, 135 passing tests, no formal benchmarks yet.

## Quick start

  npm install -g reasonix
  reasonix chat

First launch prompts for your DeepSeek API key and saves it to ~/.reasonix/config.json. Sessions auto-persist, so chat 2
hours of work, quit, come back tomorrow, type reasonix chat — you're back where you left off.

Inside the TUI, slash commands cover everything:

  /preset fast|smart|max    one-tap config (fast = default)
  /model <id>               deepseek-chat or deepseek-reasoner
  /harvest [on|off]         Pillar 2 toggle
  /branch <N|off>           N parallel samples (>=2)
  /sessions                 list saved sessions
  /forget                   delete current session
  /help                     full list

No flag-soup to memorize. A command strip under the prompt shows the top-level commands at all times.

## Library usage

  import {
    CacheFirstLoop,
    DeepSeekClient,
    ImmutablePrefix,
    ToolRegistry,
  } from "reasonix";

  const client = new DeepSeekClient(); // reads DEEPSEEK_API_KEY
  const tools = new ToolRegistry();

  tools.register({
    name: "add",
    parameters: {
      type: "object",
      properties: { a: { type: "integer" }, b: { type: "integer" } },
      required: ["a", "b"],
    },
    fn: ({ a, b }: { a: number; b: number }) => a + b,
  });

  const loop = new CacheFirstLoop({
    client,
    tools,
    prefix: new ImmutablePrefix({
      system: "You are a math helper.",
      toolSpecs: tools.specs(),
    }),
    harvest: true,
    branch: 3,
    session: "math-tutor",
  });

  for await (const ev of loop.step("What is 17 + 25?")) {
    if (ev.role === "assistant_final") console.log(ev.content);
  }

  console.log(loop.stats.summary());
  // { turns: 2, totalCostUsd: 0.0003, savingsVsClaudePct: 94, cacheHitRatio: 0.87 }

## Open questions I'd love feedback on

Branching selector heuristic. The default is min(uncertainties.length) with length tie-break. That's obviously
naive. What signals would you combine? Cross-sample answer similarity? Tool-call success rate per sample? An LLM-judge pass?
Harvest cost/value trade-off. The $0.0001/turn V3 call feels negligible but it's a floor on per-turn cost. Has anyone
tried fine-tuning R1 to output structured plan state directly?
Cache continuity across config changes. Right now changing the system prompt mid-session invalidates the prefix
cache. Is there a migration path that preserves the existing log's value?

Full source: github.com/esengine/reasonix

Install: npm install -g reasonix

Issues, PRs, and benchmarks especially welcome.