In my last post, I shared that agentic AI cut issue handling time by 66%.
The numbers looked great on paper. But the moment I asked "can we actually keep running this?"—unexpected problems started piling up.
Costs are a black box
Agents make multiple LLM calls internally.
Reading code, analyzing, deciding, sometimes double-checking.
But I had no way to track how many tokens any of this used.
Some issues consumed 10,000 tokens. Others burned 1,000,000.
A 100x difference—and I couldn't tell why.
The current setup runs Claude Code CLI inside GitHub Actions.
There's no system for systematically tracking and analyzing token usage.
I discovered that tools like LangSmith are built to solve exactly this. Next step: either parse CLI logs into structured data, or integrate a dedicated monitoring tool.
The workflow only exists as code
Right now the workflow is 4 YAML files. Each one is 100-300 lines, and the conditional branches keep getting more complex. When a teammate asks "what exactly is the AI doing right now?"—I have to open the code and explain line by line.
Trust is everything for agentic AI.
To explain "why did the AI make this decision?", you need to see the intermediate steps visually.
I'm looking into low-code platforms like LangFlow and n8n. If the visual workflow actually reflects what's running, with logs at each node, that would help a lot.
No real-time KPIs
"AI saved 66% of developer time" came from manually analyzing 10 issues.
But I have no idea how well the AI is performing right now.
Metrics I need:
- AI filtering rate: % of issues AI resolved without developer involvement
- Average first response time: how fast did we reply after issue creation
- Auto-resolution rate: % where AI fixed code and opened a PR
- Daily/weekly/monthly trends: is performance improving or degrading
Without a dashboard, I'm left judging by feel—"seems to be working fine."
Agentic AI isn't "build and done"—it's "the beginning of operations."
Results came fast, but making this a sustainable system is a completely different challenge.
Next time I'll share how I'm tackling these problems one by one. Starting with cost monitoring and real-time dashboards.
Top comments (0)