I've been running DeepSeek behind LangChain for a few months for a side project. Worked fine, except one day I noticed
something weird: DeepSeek's pricing page advertises cached input tokens at ~10% of the miss rate, but my bills didn't
reflect that at all.
I dug in. The cache is byte-prefix based. The moment your request's prefix differs from the previous one by even a single
character, you pay full price. And LangChain β along with every generic agent framework I checked β rebuilds the prompt
every turn. Timestamps get injected. History gets reordered. Tool schemas re-serialize with different whitespace. The prefix
drifts, the cache never hits.
So I wrote something opinionated: Reasonix β a TypeScript agent framework built only for DeepSeek. No multi-provider
abstraction, no orchestration graph, no RAG. Just three things done deeply.
π¦
npm install -g reasonix && reasonix chat
π GitHub: esengine/reasonix
π MIT License
## The numbers up front
Measured against the live DeepSeek API, not marketing math:
| Scenario | Model | Turns | Cache hit | Cost | Same on Claude Sonnet 4.6 | Savings |
|---|---|---|---|---|---|---|
| Multi-turn chat | deepseek-chat | 5 | 85.2% | $0.000923 | $0.015174 | 93.9% |
| Tool-use (calculator) | deepseek-chat | 2 | 94.9% | $0.000142 | $0.003351 | 95.8% |
| R1 reasoning + harvest | deepseek-reasoner | 1 | 72.7% | $0.006478 | $0.044484 | 85.4% |
Numbers come straight from usage.prompt_cache_hit_tokens on real API responses. You can install Reasonix and verify in 2
minutes.
## Pillar 1 β Cache-First Loop
The problem again: DeepSeek's cache only fires on identical byte prefix. Generic frameworks rebuild prompts, so the prefix
drifts, so the cache rarely hits.
The fix is structural. Every request's context gets partitioned into three regions with strict invariants:
βββββββββββββββββββββββββββββββββββββββ
β IMMUTABLE PREFIX β β frozen at session start
β system + tool_specs + few_shots β this is the cache target
βββββββββββββββββββββββββββββββββββββββ€
β APPEND-ONLY LOG β β grows monotonically
β [userβ][assistantβ][toolβ]... β prior turns preserve as prefix
βββββββββββββββββββββββββββββββββββββββ€
β VOLATILE SCRATCH β β reset each turn
β R1 thoughts, transient state β never sent upstream
βββββββββββββββββββββββββββββββββββββββ
In code, the prefix is hashed at construction and pinned. The log's append() method refuses any mutation. The scratch gets
wiped at every turn boundary.
That's it. That single discipline is enough to push cache hit rates to 85-95% on real sessions. Nothing else in the
framework would matter if this was wrong.
## Pillar 2 β R1 Thought Harvesting
DeepSeek's reasoning model deepseek-reasoner (aka R1) emits extensive reasoning_content β often 1000+ tokens of
step-by-step thinking. DeepSeek's own docs recommend not feeding it back to the next turn (it hurts quality). So most
frameworks just display it or drop it.
That's leaving a plan on the table. R1's reasoning trace is literally the model thinking out loud about subgoals,
hypotheses, and uncertainties. I pipe it through a cheap secondary V3 call in JSON mode and extract structured state:
interface TypedPlanState {
subgoals: string[]; // concrete intermediate objectives
hypotheses: string[]; // candidate approaches being weighed
uncertainties: string[]; // things R1 flags as unclear
rejectedPaths: string[]; // approaches considered and abandoned
}
Here's R1 on a classic logic puzzle β "3 boxes with swapped labels; pick one fruit to determine all three contents":
βΉ subgoals (3): enumerate label-content permutations Β· decide which box to sample Β· verify uniqueness
βΉ hypotheses (3): sample from "apple" box Β· sample from "orange" box Β· sample from "mixed" box
βΉ uncertainties (2): can a single pick uniquely determine all? Β· does "mixed" contain equal ratios?
βΉ rejected (2): sampling from "apple" box (ambiguous) Β· sampling from "orange" box (symmetric)
Every field maps to actual content in R1's reasoning trace. V3 is cheap enough (~$0.0001/turn) that this is essentially
free. Opt-in via reasonix chat --harvest or /harvest on inside the TUI.
## Pillar 3 β Tool-Call Repair
DeepSeek has several known tool-use quirks that generic frameworks don't handle:
- Deep or wide schemas drop arguments. Tool schemas with more than ~10 leaf parameters or more than 2 levels of nesting cause V3/R1 to silently omit fields.
-
R1 leaks tool calls into
<think>. The model writes tool-call JSON inside its reasoning trace and forgets to surface it in the actualtool_callsfield. -
JSON gets truncated. Long
argumentspayloads hitmax_tokensmid-structure. - Call storms. The model hammers the same tool with identical arguments in an infinite loop.
Reasonix's repair layer has four passes running on every turn:
// 1. Auto-flatten deep/wide schemas
ToolRegistry.register({
name: "updateProfile",
parameters: {
type: "object",
properties: {
user: { type: "object", properties: {
profile: { type: "object", properties: {
name: { type: "string" },
age: { type: "integer" },
}},
}},
},
},
fn: ({ user }) => updateInDB(user),
});
// Internally shown to the model as a flat schema:
// {"user.profile.name": "...", "user.profile.age": ...}
// On dispatch, args re-nested back to { user: { profile: { ... } } }
// 2. Scavenge: regex + JSON parser sweeps reasoning_content for missed calls
// 3. Truncation recovery: close braces, trim trailing commas, fill dangling keys
// 4. Storm breaker: sliding-window dedup of (tool, args) tuples
All four are always on. No user configuration.
## Bonus: Self-Consistency Branching
Here's the fun one. DeepSeek is roughly 20Γ cheaper than Claude Sonnet 4.6. That means three parallel R1 samples per turn
is still cheaper than a single Claude call. What was a research luxury (self-consistency sampling) becomes a practical
default.
reasonix chat --branch 3
# or inside the TUI:
> /preset max
Three samples fire in parallel at temperatures 0.0 / 0.5 / 1.0. Each one's reasoning is harvested. The default selector
picks whichever sample has the fewest flagged uncertainties (tie-break on shorter answer length β Occam's razor as a
heuristic).
TUI shows this live:
π branched 3 samples β picked #1 #0 T=0.0 u=2 βΈ#1 T=0.5 u=0 #2 T=1.0 u=3
Anecdotally it lifts accuracy 10-15 percentage points on medium-difficulty reasoning, at roughly 1/5 the cost of a single
Claude pass. I haven't run a formal benchmark yet β that's next.
## What it's explicitly not
- Not a LangChain replacement. No multi-provider, no graph orchestration, no RAG.
- Not a drop-in for OpenAI-compatible code. The whole point is DeepSeek-specific.
- Not production-ready. v0.0.6 pre-alpha, 135 passing tests, no formal benchmarks yet.
## Quick start
npm install -g reasonix
reasonix chat
First launch prompts for your DeepSeek API key and saves it to ~/.reasonix/config.json. Sessions auto-persist, so chat 2
hours of work, quit, come back tomorrow, type reasonix chat β you're back where you left off.
Inside the TUI, slash commands cover everything:
/preset fast|smart|max one-tap config (fast = default)
/model <id> deepseek-chat or deepseek-reasoner
/harvest [on|off] Pillar 2 toggle
/branch <N|off> N parallel samples (>=2)
/sessions list saved sessions
/forget delete current session
/help full list
No flag-soup to memorize. A command strip under the prompt shows the top-level commands at all times.
## Library usage
import {
CacheFirstLoop,
DeepSeekClient,
ImmutablePrefix,
ToolRegistry,
} from "reasonix";
const client = new DeepSeekClient(); // reads DEEPSEEK_API_KEY
const tools = new ToolRegistry();
tools.register({
name: "add",
parameters: {
type: "object",
properties: { a: { type: "integer" }, b: { type: "integer" } },
required: ["a", "b"],
},
fn: ({ a, b }: { a: number; b: number }) => a + b,
});
const loop = new CacheFirstLoop({
client,
tools,
prefix: new ImmutablePrefix({
system: "You are a math helper.",
toolSpecs: tools.specs(),
}),
harvest: true,
branch: 3,
session: "math-tutor",
});
for await (const ev of loop.step("What is 17 + 25?")) {
if (ev.role === "assistant_final") console.log(ev.content);
}
console.log(loop.stats.summary());
// { turns: 2, totalCostUsd: 0.0003, savingsVsClaudePct: 94, cacheHitRatio: 0.87 }
## Open questions I'd love feedback on
Branching selector heuristic. The default is
min(uncertainties.length)with length tie-break. That's obviously
naive. What signals would you combine? Cross-sample answer similarity? Tool-call success rate per sample? An LLM-judge pass?Harvest cost/value trade-off. The $0.0001/turn V3 call feels negligible but it's a floor on per-turn cost. Has anyone
tried fine-tuning R1 to output structured plan state directly?Cache continuity across config changes. Right now changing the system prompt mid-session invalidates the prefix
cache. Is there a migration path that preserves the existing log's value?
Full source: github.com/esengine/reasonix
Install: npm install -g reasonix
Issues, PRs, and benchmarks especially welcome.
Top comments (0)