I thought plugging in LangSmith would solve agentic AI monitoring

#agents #architecture #llm #monitoring

I thought plugging in LangSmith would solve agentic AI monitoring
In my last post, I shared the "costs are a black box" problem.

Some issues used 10,000 tokens, others 1,000,000. A 100x difference with no explanation.

I figured switching to LangGraph and adding LangSmith would fix it.

Wrong.

First wall: CLI calls can't be traced

The original setup ran Claude Code CLI as a subprocess from GitHub Actions.

GitHub Actions → npx claude-code → (black box) → result

Even with LangSmith connected, LLM calls inside the CLI were invisible. The problem wasn't the tooling—it was that the architecture wasn't observable in the first place.

So I switched from CLI to the Anthropic SDK.

——
Second wall: tokens don't show up in LangSmith

Switched to SDK. Still no token counts.

After debugging, I learned:

You need run_type=llm for tokens to be tracked
input_cost and output_cost must be manually added as metadata

I assumed an "observability tool" would show everything automatically. Turns out, defining what you want to see is your job.

Third wall: the trap of over-engineering

I tried to build for scale and include LangGraph V1's new features.

Durable State & Built-in Persistence
Scoring, observability, model config
Fallback API logic

In the end, I deleted all of it.

Once I actually ran it, none of that was needed. Most settings were better hardcoded. Unnecessary logic just added complexity.

Result: now I can see the costs

The cause of that 100x difference finally became visible:

Triage → input 1.5K / output 102 tokens → $0.002
Analyze → input 18K / output 703K tokens → $0.105
Fix → input 50K / output 3,760K tokens → $0.512

The shift from "running on gut feel" to "running on numbers."

Plugging in an observability tool is easy. Building an observable architecture is the real work.

Next: a dashboard that shows whether these numbers actually deliver value—productivity metrics in real time.

DEV Community

I thought plugging in LangSmith would solve agentic AI monitoring

Top comments (0)