侯垒

Posted on Jun 21

Debugging AI Coding Agents: How to See Prompts, Tool Calls, Token Usage, and Cost

#ai #debugging #opensource #devtools

When a coding agent fails, the visible error is rarely the whole story.

You might see:

a tool call that never ran
a command repeated again and again
a sudden token spike
a provider rejecting a request with 400 Bad Request
an agent that says it edited a file but did not
a long session that starts producing shallow or confused answers

The usual reaction is to tweak the prompt and try again.

Sometimes that works. But for agentic coding tools, guessing is not enough. You need to inspect what the agent actually sent to the model.

That is the problem ccglass is built for.

GitHub: https://github.com/jianshuo/ccglass

The debugging problem with coding agents

Modern coding agents are not simple chatbots.

Tools like Claude Code, Codex, OpenCode, CodeBuddy, Qoder, and similar systems usually run a loop like this:

user request
  -> model request
  -> tool call
  -> local command / file read / edit / search
  -> tool result
  -> next model request
  -> final answer

When something goes wrong, the bug can be in any part of that loop.

For example:

The model never saw the tool schema you thought it saw.
The tool schema was too large or malformed.
The model returned a malformed tool call.
The local client dropped part of the tool result.
A huge tool result entered the next request and inflated token usage.
The provider rejected a request shape that another provider accepts.
A proxy or gateway translated Anthropic and OpenAI formats incorrectly.

You cannot debug that reliably from the final answer alone.

What to inspect first

When an agent behaves strangely, I usually want to see five things.

1. The system prompt

The system prompt often explains behavior that looks mysterious from the outside.

It may contain rules about:

when to ask permission
when to use tools
how much work to do before stopping
whether to run tests
whether to preserve existing files
how to summarize results

If the agent ignores your instruction, first check whether the system prompt is pushing it in a different direction.

2. The tool schema

Tool calling depends heavily on the schema sent to the model.

If a tool is described vaguely, has confusing parameter names, or contains a schema shape the provider does not like, the model may choose the wrong tool or produce invalid arguments.

This matters even more with MCP servers and custom tools.

The question is not "what did my code define?" The real question is:

What tool schema was actually sent in the model request?

3. The tool call

A tool call bug can come from the model, the client, or the provider adapter.

You want to inspect:

tool name
call id
arguments
malformed fields
missing required fields
whether the tool call was emitted as structured data or plain text

For example, if the model emits something that looks like a tool call but the client renders it as text, the agent may continue as if the tool ran even though no tool result exists.

4. The tool result

Tool results are often the hidden source of context bloat.

A single file read, search result, stack trace, or command output can add thousands of tokens to the next turn.

If the agent suddenly becomes expensive or confused, check what tool results were fed back into the model.

5. Token usage and latency

Token totals are useful, but per-request token usage is better.

You want to know:

which request got expensive
whether input, output, or cache tokens dominated
whether a request was slow before the first token
whether repeated turns reused the same large context
whether a provider returned usage data at all

That is the difference between "this session was expensive" and "this specific tool result caused the spike."

Using ccglass for request-level debugging

ccglass is a local proxy and dashboard for coding-agent traffic.

It lets you inspect what supported agents actually send to the model:

system prompts
messages
tool schemas
tool calls
tool results
raw request and response bodies
token/cache/cost
latency
turn-to-turn diffs

It works locally. It is open source.

Install:

npm install -g ccglass

Start it:

ccglass

Or choose a client directly:

ccglass claude
ccglass codex
ccglass opencode
ccglass qoder
ccglass codebuddy

For generic OpenAI-compatible or Anthropic-compatible clients, you can also run proxy-only mode:

ccglass proxy --provider openai
ccglass proxy --provider claude

Then point your client or IDE at the printed local base URL.

Example debugging workflow

Suppose an agent repeatedly fails to call a tool correctly.

Instead of changing the prompt first, inspect the actual request flow:

Open the ccglass dashboard.
Find the request where the model was expected to call the tool.
Expand the system prompt and tool schema.
Check whether the tool was visible to the model.
Check the model response for the tool call.
Check whether the tool result was paired correctly.
Compare the next request to see what context was carried forward.

That gives you a factual answer to questions like:

Did the model see the tool?
Did it call the wrong tool?
Were the arguments malformed?
Did the client drop the tool result?
Did the next turn include the right result?

Example: debugging token spikes

Another common problem:

Why did this one coding-agent session use so many tokens?

In ccglass, inspect the request list and session summary.

Look for:

a request with unusually high input tokens
a large tool result entering the next request
many repeated requests with similar context
cache usage that is lower than expected
a slow request with high input size

Then use turn-to-turn diff to see what changed between two requests.

This is often more useful than looking only at the final cost.

Example: debugging provider 400 errors

Provider errors are another good use case.

If an Anthropic-compatible or OpenAI-compatible endpoint rejects a request, you need the exact payload.

Check:

request body
tool schema
message order
tool_use / tool_result pairing
response or error body
provider/model name

This is useful when working with:

internal gateways
OpenRouter
Ollama-compatible endpoints
Bedrock or Vertex routes
Anthropic-compatible translation layers
OpenAI-compatible coding-agent backends

The failure is often not "the model is bad." It is often a request-shape problem.

Exporting evidence

ccglass can export captured requests:

ccglass export <session>/<seq> --format raw
ccglass export <session>/<seq> --format md
ccglass export <session>/<seq> --format json
ccglass export <session>/<seq> --format har

That is useful when reporting bugs to an agent project, provider, or proxy maintainer.

Instead of saying:

The agent failed.

You can show:

This exact request contained this tool schema, this model response emitted this malformed tool call, and this provider returned this error.

That is much easier to debug.

A few practical notes

ccglass is not a universal network sniffer.

It works best when the client can be pointed at a local base URL or local proxy. For example, API-key based OpenAI-compatible and Anthropic-compatible traffic is a good fit.

Some clients have special transports. For example, Codex authenticated through ChatGPT login may use a WebSocket path that does not honor OPENAI_BASE_URL, so local base URL inspection will not see that traffic.

For CodeBuddy, ccglass uses a forward-proxy mode because CodeBuddy hardcodes its upstream endpoint.

Why this matters

As coding agents become more autonomous, debugging needs to move one layer deeper.

It is no longer enough to ask:

Did the agent produce the right diff?

You also need to ask:

What did the agent see, what tool did it choose, what result came back, and what context entered the next turn?

That is what ccglass tries to make visible.

GitHub:

https://github.com/jianshuo/ccglass

Install:

npm install -g ccglass

If you build with coding agents, request-level debugging is worth having in your toolbox.

Top comments (1)

Alex Shev • Jun 22

Agent debugging needs observability at the prompt/tool boundary. If you cannot see the prompt, selected tools, token spikes, and final file changes together, you end up debugging symptoms instead of the actual failure path.