DEV Community

侯垒
侯垒

Posted on

Why I quit SaaS AI observability tools and built a local proxy instead

A confession

I've been using Langfuse and Helicone for the last 6 months. They're great products. Their teams are sharp.

But they don't work for coding agents.

The mismatch

Tool Architecture Works for coding agents?
Langfuse SDK + async upload to SaaS ❌ Need to instrument the agent
Helicone HTTPS proxy via HTTP_PROXY ❌ CLIs ignore HTTP_PROXY
Datadog LLM Obs APM agent ❌ Same problem
ccglass Local loopback reverse proxy ✅ Yes

The reason: Claude Code, Codex, OpenCode, Kimi, etc. are native CLIs (Node, Rust, Go). They make HTTPS calls directly to the API endpoint. They do not respect HTTP_PROXY environment variables.

So the standard observability play — "just point your SDK at our proxy" — doesn't work. The agent isn't using a library that knows to call your endpoint.

What I actually needed

I needed something that would:

  1. Be a man-in-the-middle on the loopback (so it sees plain HTTP)
  2. Forward to the real API (so the agent works)
  3. Be zero-config (the agent already trusts http://127.0.0.1)
  4. Not require a CA cert (loopback is plain HTTP)
  5. Be local-only (no SaaS, no account)

I built it. It's called ccglass. It does those 5 things. Nothing else.

What it looks like in practice

$ npm i -g ccglass
$ ccglass claude
# → starts proxy on http://127.0.0.1:8123
# → overrides ANTHROPIC_BASE_URL to point at it
# → spawns claude
# → opens dashboard at http://127.0.0.1:8123
Enter fullscreen mode Exit fullscreen mode

The dashboard shows:

  • Live request log with the full system prompt, tool calls, responses
  • Per-request cost (with cache-aware pricing)
  • Per-turn diff (what changed in the context this turn)
  • Cache hit rate (how often your system prompt is being cached)
  • Token breakdown (input / output / cache_read / cache_write)

What's different from Langfuse / Helicone

  • Local-only. No data leaves your machine. No account. No API key on their side.
  • Works for coding agents specifically. Built for the HTTP_PROXY-bypass problem.
  • Single binary, 1-command install. No SDK to integrate.
  • Open source under MIT. You can read every line.

What's the same

  • Token accounting
  • Per-request cost
  • Latency tracking
  • Provider routing (multiple model providers)

Why I'm sharing this

If you use a coding agent heavily, and you don't know which of your prompts are 4,000 tokens of accidental repetition, you're leaving money on the table.

The first time I saw my own cache hit rate (38% — meaning I was re-sending the same system prompt 38% of the time and not knowing it), I had a "wait, that's literally me paying for nothing" moment.

Try it once. The data is eye-opening.

🔗 GitHub: https://github.com/jianshuo/ccglass

Top comments (0)