Every bug in my trading bot required the same ritual: add a print, restart, wait for a price move, hope it reproduces. Half the time it didn't. Then last week Claude debugged a service running in Docker — picked the variables to watch, observed them live, found the root cause. No print statements. No restarts. Here's how I got there.
The problem
Last year I was migrating a Python trading bot to a new broker API. The broker changed their price format. My bot silently failed — orders looked successful but nothing actually executed. I only discovered it by opening the broker app and seeing zero trades.
Debugging this meant: add prints, restart the bot, manually trigger a buy, and wait for the market to move in the right direction to hit the code path. Half the time the price moved the wrong way, the event filtered out, and I started over.
The real problem wasn't the bug itself. It was that restarting the bot changed the exact conditions I needed to reproduce it. Timing-sensitive code with external dependencies — the act of debugging destroys what you're trying to debug.
The insight (and the failed first attempt)
About a year ago I had the idea to give AI agents a debugger. My first attempt went nowhere — I tried building it like a human debugger, stepping through code line by line. Agents aren't humans. They don't sit there clicking "step over" and staring at a call stack.
The shift happened during a microservices architecture course in late 2025. In the observability module — Graylog, Prometheus, Grafana — we also had special endpoints for inspecting application performance profiles. It clicked: we already set up metrics for monitoring, we already set up observations for performance. Why can't we set up dynamic metrics — ones we choose at runtime, without code changes?
That's how Detrix was born. (First name was "Detrics" — from Dynamic Metrics — but it was already taken.)
Your debugger is already an observability tool — you just can't use it that way. DAP logpoints can evaluate any expression in a running process without pausing it. The problem: your IDE keeps this power locked behind a GUI. Your AI agent can't click buttons.
Detrix unlocks it.
What I built
Detrix is an open-source MCP server written in Rust that exposes debugger capabilities to AI coding agents — Claude Code, Cursor, Windsurf.
Agent (Claude/Cursor)
→ Detrix mcp → Detrix Daemon → Debugger → Your running process
Two components: detrix mcp is a lightweight MCP bridge that runs next to your agent; Detrix Daemon is the server that manages debugger connections and can run locally or alongside your service in Docker/cloud.
The key properties:
- Pull-based — the agent queries when ready, nothing streams into context
- Non-breaking — logpoints don't pause the process
- Throttling on hot paths — won't blow up if you observe a tight loop
- Expression safety — allow/block lists prevent accidental exposure of secrets
- Local and remote — Detrix server runs alongside your service in Docker or on a remote host, so the agent on your laptop observes a process running in the cloud
Once the Claude Code skill is installed, Claude reaches for Detrix before suggesting print statements — you don't need to ask. If you prefer to stay out of agents entirely, the CLI works too — useful for scripts or manual inspection:
# Add an observation point
detrix metric add my_metric -l main.go#83 -e "txn.Amount" -e "txn.Unit" -C <conn_id>
# Query captured values
detrix event query --format json
13 Rust crates, 2,300+ tests. Python (debugpy), Go (Delve), Rust (lldb-dap). MIT licensed.
Where this matters most
The demo below is intentionally simple — small codebase, one service, easy to follow. In practice there are two kinds of bugs where live observation makes the biggest difference.
When you don't know what to blame. Active development means multiple suspects — the refactor from last week, the new API integration, something else entirely. In my trading bot case, the new API was confirmed working, then suddenly I got a cryptic error. My first instinct was to dig into my own code — a refactor was the obvious suspect. The actual cause was the API behaving differently for certain stock types — stocks my news-driven bot had simply never traded before. The bug might have been there since launch. You can curl the API once you know to look at it. The hard part is knowing where to look at all. Watching live values across the full request flow tells you immediately.
When rebuilding is expensive. I ran into this while working on a Zed issue (Rust, 10+ minute builds). Spent several days on it, only to find it was already a duplicate — the issue was closed and all the work was gone. The loop was: add prints, rebuild, click through the UI to reproduce the sequence, hope it triggers, read the output, go back to Claude, add more prints, rebuild again. With Detrix, Claude just watches what's happening inside the running app while you interact with it. No rebuild. No cleanup. No fortune-hunting.
Demo: Claude debugging a Docker service
Same class of bug as the trading bot, easier to show on video. A Go microservice in Docker starts reporting wildly inflated revenue. The service fetches pricing from an external API — source completely outside the project root. Claude cannot read it.
Me: "The order-service in Docker is reporting wildly inflated revenue.
Daily total is in the thousands when transactions should be $20-200."
Claude: "I can see a potential issue in the code.
Let me wake the service to confirm with live values."
Claude read the source, spotted dead unit-conversion code, and suspected it might be relevant. But static analysis couldn't tell it why it suddenly broke. To find that, it needed to see what the API was actually sending.
I didn't tell it what to observe. It decided on its own. Here's what that looks like in Claude Code:
detrix → wake order-service:8091 → woke, debug port auto-assigned
detrix → observe txn.Amount, txn.Unit, total
find_variable: "total" ← no line number given
Detrix found it: main.go:83
detrix → query_metrics
Claude: "Found it. The pricing API changed what it's returning.
txn.Amount = 8957, txn.Unit = "cents" → should be $89.57, summed as $8957
txn.Amount = 13030, txn.Unit = "cents" → should be $130.30, summed as $13030
The API now returns some amounts in cents, some in dollars.
The client code fetches the unit field but discards it:
unit := txn.Unit
_ = unit // ← conversion was intended, never implemented
total += amount // ← adds cents as if they were dollars
This didn't matter before. It matters now."
Then Claude fixed what it owned:
- unit := txn.Unit
- _ = unit
+ if txn.Unit == "cents" {
+ amount /= 100.0
+ }
And reported the contract change upstream — the pricing API now returns mixed units with no per-response indicator.
The old workflow: add a log line, rebuild the container, redeploy, wait for the bug to reproduce. Claude just watched it live. Container never stopped.
Getting started
brew install flashus/tap/detrix && detrix init
claude mcp add --scope user detrix -- detrix mcp
Two lines in your app:
import detrix
detrix.init(name="my-app") # starts in background; debugger attaches only when agent wakes it
Install the Claude Code skill to make this the default debugging behavior:
mkdir -p ~/.claude/skills/detrix && cp skills/detrix/* ~/.claude/skills/detrix/
Try it
The full Docker demo with video is in examples/docker-demo/ in the repo.
GitHub: github.com/flashus/detrix — MIT licensed, free to use.
I'd love to hear: what edge cases would break this in your stack? What languages should be next? Open an issue or drop a comment — I'm reading all of them.
Top comments (0)