TL;DR — Open-sourcing Agent Amplifier v1.0 today. One install command turns your existing AI coding agent (Claude Code, Cursor, GitHub Copilot, LangGraph, CrewAI, AgentScope, LangChain) into a deterministically-managed runtime — effort routing, goal anchoring, convergence detection, phase prompts, persona escalation, token budgeting. No extra LLM calls. AGPL-3.0. Dogfooded across 22 of my own real sessions and 1.71 billion tokens over three days. 60-second demo on YouTube.
pip install agent-amplifier && agent-amp install claude-code
The problem nobody is talking about
Last Tuesday I ran a Claude Code session at Opus 4.7 extra-high. One turn spent 486.9 million tokens. It came back with a clean answer to a question I hadn't asked. Goal drift, eight iterations deep.
The same week, another session — same model, same effort tier — converged in three iterations and 9.7 million tokens. Different problem, sure, but the variance wasn't proportional to the problem. It was proportional to the runtime.
That gap — between an agent that holds its goal under load and one that doesn't — is where the reliability engineering of AI coding agents actually happens. And the place to close it isn't in the model. It's in the hook layer.
"Same model. Same week. Roughly fifty-fold variance in spend. The model isn't broken. The runtime is unmanaged."
Today I'm shipping Agent Amplifier v1.0 — a deterministic runtime amplification layer for AI coding agents. It installs in one command, runs in Python, makes zero extra LLM calls, and works across seven host agents at launch: Claude Code, Cursor, GitHub Copilot, LangGraph, CrewAI, AgentScope, and LangChain.
Watch the 60-second launch video: youtube.com/watch?v=arfkIS00eKg
What hooks became — and what they still don't do
Claude Code, Cursor, GitHub Copilot, Aider, OpenHands — every serious AI coding agent in 2026 exposes lifecycle hooks: UserPromptSubmit, PreToolUse, PostToolUse, Stop, PreCompact. The community filled those hooks with guardrails: AgentSteer, Straiker Defend AI, Sysdig — products whose job is to block dangerous actions. Stop the agent from rm -rf $HOME. Block prompt-injection-driven exfiltration. Gate destructive bash before it runs.
That work is necessary. It's also one job out of several the hook layer can do.
Hooks can do more than say no. They can dynamically shape how the agent runs — what effort to apply, what phase to operate in, when to stop iterating, how strict the audit gets after iteration four. None of the existing hook-layer products do this. The slot was empty.
Guardrails block dangerous actions. Amplification improves agent quality. Same hook layer, different problem. Both are needed.
The six primitives
Agent Amplifier intercepts at five hook events and applies six deterministic operations. Each one was built because I watched my own coding agent fail at it.
1. Effort routing
The agent doesn't know whether you're asking for a syntax fix or a race-condition audit. It applies the effort tier you set globally. So you either burn ultrathink on a typo, or you under-think a concurrency bug.
Agent Amplifier classifies prompt complexity at UserPromptSubmit — five tiers, deterministic features (token count, code-fence presence, question-word density, file-touch breadth) — and suggests the effort tier per turn. If you asked a one-liner, it routes to think. If you handed it a distributed-systems prompt, it routes to ultrathink. The model still decides. The hook just stopped letting effort be a global constant.
2. Goal anchoring
After iteration four, the agent forgets what you asked. It starts solving the problem in front of it, not the problem you posed. This is the single most expensive failure mode in long sessions.
The goal anchor re-injects your original request verbatim every N tool calls. It's a one-line system reminder. It works because LLMs respect repetition in context far more than they respect anything they wrote three turns ago.
3. Convergence detection
You don't always know when to stop iterating. Three loops of "looks better but I'm not sure" burn tokens without raising quality. The hook layer has the full transcript — it can do this math.
Agent Amplifier treats the iteration sequence as a discrete-time signal and applies a small LTI stability test: when iteration deltas fall below threshold over a window, it marks the session converged and stops generating new amplification prompts. The agent doesn't know it's converged; the runtime does.
4. Phase prompts
Agents don't natively switch between "explore the problem space" and "execute the chosen path." They do one continuous thing, and the continuous thing is usually a blend of both — which is why they over-investigate and under-execute, or under-investigate and over-execute.
Five phases — EXPLORE → EVALUATE → EXECUTE → VERIFY → REFINE — each with a different prompt prefix. The selector picks based on iteration index and convergence state. Same agent. Different phase. Measurable difference in what comes out.
5. Persona composition
Persona had two confounded dimensions in our first design: who's auditing (a security engineer? a senior reviewer? a performance specialist?) and how strict are they being right now (gentle first pass, ruthless final pass)?
The v1.0 architecture treats them as orthogonal axes:
rendered_persona = compose(flavor.description, flavor.review_focus, strictness_profile(iteration))
Four flavors at launch — senior-engineer, security-paranoid-engineer, plus two derived from the legacy level ladder. Strictness escalates from 0.6 at iteration zero to 1.0 by iteration four. You pick the flavor; the runtime escalates the strictness. Custom flavors live in ~/.config/agent-amplifier/personas.toml with prompt-injection defense at save, load, and render time.
6. Token budgeting
Amplification has a ceiling. If we don't enforce one, the runtime becomes the new cost problem. Agent Amplifier reads real per-turn token deltas from Claude Code's transcript JSONL, summed across all assistant messages, and applies a session budget. Hit the ceiling, the runtime stops adding amplification overhead — the agent keeps working, just without further hook-layer steering.
Why deterministic?
The first design instinct, when you set out to manage one LLM well, is to add another LLM to manage it. We rejected that. Turtles all the way down is not a runtime — it's an apology.
Every primitive in Agent Amplifier is deterministic Python. No LLM call in the hook path. No external service in the hot loop. Fail-open at every layer — if the runtime can't read a transcript, the agent runs unamplified, not broken. Determinism gives you debuggability, latency you can predict, and a system where the agent's mistakes can be traced to the agent and the runtime's mistakes can be traced to the runtime.
This is the design choice that lets a hook-layer product run inside ~/.claude/settings.json without adding 800ms to every Stop event.
Proof, not claim
The numbers I'm going to give you are the numbers in my own dashboard, on my own machine, from my own sessions over the last three days.
- 22 real coding sessions dogfooded across the build window
- 1,708,022,914 tokens tracked end-to-end via the Claude Code transcript JSONL reader
-
Per-session spread: 486.9M tokens (session
4c53227e, turn 35) — 325.2M tokens (sessione856c1fe, turn 30) — 9.7M tokens for a tight session - 1,741 tests passing, 1 skipped, 100% coverage (5,473 statements, 1,496 branches)
-
mypy --strictclean across 64 source files -
ruff checkclean acrosssrc/andtests/ - Seven host adapters at v1.0: Claude Code, Cursor, GitHub Copilot, LangGraph, CrewAI, AgentScope, LangChain
The per-session spread is the data point that should make you suspicious of any agent-only benchmark. Roughly fifty-fold variance in spend, same model, same agent, same week. The model isn't broken. The runtime isn't managed. Now it is.
Install in one command
pip install agent-amplifier
agent-amp install claude-code
That's the path I dogfooded. Cursor and GitHub Copilot work the same way: agent-amp install cursor, agent-amp install github-copilot. The framework adapters (LangGraph, CrewAI, AgentScope, LangChain) attach at construction time via a wrapper.
agent-amp dashboard # Streamlit UI on :8501 + FastAPI backend on :8766
agent-amp status --watch # live token bar in your terminal
agent-amp demo "Refactor auth to use JWT" # preview the amplified envelope
agent-amp doctor # environment diagnostics
agent-amp persona list # show built-in flavors + your custom ones
agent-amp persona add my-ml-reviewer # add a custom flavor
Where Agent Amplifier sits in the AI Reliability Engineering stack
Memory has SuperLocalMemory. Eval has AgentAssert and AgentAssay. Guardrails have AgentSteer and Straiker. Observability has Langfuse and Helicone. Frameworks have LangChain and friends.
The amplification slot was empty. Agent Amplifier is what we put in it.
This is the first deterministic amplification shim for existing AI coding agents — drop-in middleware that hooks into Claude Code, Cursor, GitHub Copilot and friends without replacing them. Other deterministic-runtime work in 2026 (Voltropy's Volt, the Lossless Context Management paper, the Hermes context engine) builds new agents with deterministic guts. We took the other path: leave the agent you're already using alone, and shape its runtime from the outside.
Both architectures will exist. The shim approach is the only one that doesn't ask you to migrate.
AI Reliability Engineering — the category Qualixar is building — gets one missing primitive at a time. Memory was the first. Amplification is the next.
Frequently Asked Questions
Does Agent Amplifier slow down my coding agent?
No. The hook handlers run deterministic Python with no extra LLM calls. Per-turn overhead is sub-50ms on a 2024 MacBook. The transcript reader is fail-open — if it can't read, your agent runs unamplified, not broken.
Will it work with my agent that isn't Claude Code or Cursor?
If your agent supports any of the five intercepted hooks (UserPromptSubmit, PreToolUse, PostToolUse, Stop, PreCompact) or implements AdapterBase, yes. v1.0 ships seven host adapters out of the box: Claude Code, Cursor, GitHub Copilot, LangGraph, CrewAI, AgentScope, LangChain. Adding a new adapter takes ~80 lines of Python.
How is this different from observability tools like Langfuse or Helicone?
Observability tells you what happened after a turn finished. Agent Amplifier shapes the turn while it runs — different effort tier, different phase prompt, anchored goal, escalating persona. Observability is downstream of the runtime. Amplification is the runtime.
How is this different from guardrails like AgentSteer or Straiker?
Guardrails sit on the same hook layer doing a different job: blocking dangerous actions before they execute. Agent Amplifier improves agent quality while it runs — effort routing, goal anchoring, convergence, phase, persona, budget. You use both together. Neither replaces the other.
Is it really open source?
Yes — AGPL-3.0-or-later. Source on GitHub. Commercial licensing available for organizations that need to embed it in proprietary products — reach out via varunpratap.com.
Where do I see it in action?
Sixty-second walkthrough on YouTube: youtube.com/watch?v=arfkIS00eKg. The README at the GitHub repo has the same demo plus copy-paste install snippets.
Try it · Star it · Share it
| Action | Link |
|---|---|
| Install |
pip install agent-amplifier · agent-amp install claude-code
|
| Star the repo | github.com/qualixar/agent-amplifier |
| Watch the demo (60s) | youtube.com/watch?v=arfkIS00eKg |
| PyPI | pypi.org/project/agent-amplifier |
| npm | npmjs.com/package/agent-amplifier |
| Follow the build | X: @varunPbhardwaj · YouTube: @qualixar-ai · Personal: varunpratap.com |
If you ship AI coding agents in production, the runtime variance problem is going to find you eventually. Better to find it first.
Varun Pratap Bhardwaj is the founder of Qualixar — the AI Reliability Engineering category creator — and the author of seven research papers in agent reliability. Agent Amplifier was dogfooded on his own Claude Code sessions for three days before launch.
Built in public. Open source. AGPL-3.0-or-later. Follow @varunPbhardwaj for the daily build thread, subscribe to @qualixar-ai for the video log series, and read the longer architecture writeups at varunpratap.com.



Top comments (0)