DEV Community

Observability at Scale: Mastering ADK Callbacks for Cost, Latency, and Auditability [GDE]

Connie Leung on April 06, 2026

AI orchestrators receive significant attention; however, when deployments become latent and costly, developers often overlook a critical capability...

Read full post

Archit Mittal • Apr 9

The beforeModelCallback pattern for conditional skipping is incredibly powerful and underused. What I find most interesting is how this maps to a broader principle in agent design: treating LLM calls as expensive I/O operations rather than default logic paths. Your circuit-breaker pattern in afterToolCallback (escalate + FATAL_ERROR after max retries) is essentially the same pattern we use in distributed systems for failing fast. One thing I'd add — if you're running multiple sequential agents like this in production, consider aggregating the performance metrics from agentStartCallback/agentEndCallback into a structured trace (OpenTelemetry spans, for instance) rather than just console.log. That way you get a full flame graph of your agent pipeline and can spot which subagent is the bottleneck without parsing logs manually. Really solid patterns here.

Connie Leung Google Developer Experts • Apr 9

Excellent feedback. I found out adk.dev/integrations/?topic=observ... today. I will give it a try and then write a blog post.

Archit Mittal • Apr 11

That's awesome Connie! The ADK integrations page has some really solid patterns for connecting callbacks to OpenTelemetry exporters. One thing I'd suggest when writing your blog post — show the before/after of debugging a multi-step agent call with vs without the observability layer. The cost visibility alone (seeing exactly which sub-agent burned through tokens) is usually what convinces teams to adopt it. Looking forward to reading your post!

Apex Stack • Apr 7

The beforeModelCallback pattern for conditional skipping is the real gem here. I run a multi-agent setup with about a dozen scheduled agents that each handle different tasks — site auditing, content publishing, metric monitoring — and the biggest cost drain early on was agents making unnecessary LLM calls when the data they needed wasn't ready yet or hadn't changed since the last run.

Your approach of validating session state before hitting the model is exactly the right pattern. In my case I ended up building something similar where each agent checks whether its upstream dependencies have produced new data before doing any inference work. The savings are dramatic — probably 40-50% fewer LLM calls once you add those guards.

The afterToolCallback for retry counting with FATAL_ERROR escalation is also smart. Without a hard cap like that, validation loops can silently burn through your token budget. I've seen agents get stuck retrying malformed outputs indefinitely when the model just can't produce valid JSON for a particular edge case. Having that circuit breaker built into the callback layer rather than in the agent logic itself keeps things much cleaner.

Connie Leung Google Developer Experts • Apr 7

Thank you for attesting that the patterns work. I only learned them a month ago when preparing for a technical talk. I hope to give the same talk at Build with AI in Hong Kong at the end of the month.

Agent Work • Apr 7

Observability is a nightmare when you're dealing with distributed systems. ADK callbacks can be a lifesaver but also a pain if not handled right. AgentWork uses similar principles to manage task execution and observability across a decentralized network. It’s a wild ride, but worth it.

Syed Ahmer Shah • Apr 6

a great deep dive into a part of the agent workflow that usually gets ignored.

Connie Leung Google Developer Experts • Apr 7

Thank you. You can learn more by watching YouTube videos that the googlers published.

Mykola Kondratiuk • Apr 8

ran into the same thing - app-level logging was useless for latency. hooking callbacks per model call was the only way to see where tokens were burning. cost tracking finally made sense after that.

Agent Work • Apr 7

ADK callbacks are a pain point for observability at scale. AgentWork uses Solana's speed and low costs to handle thousands of tasks without the overhead of traditional observability tools. We're not using ADK, but the problem is real.

Socials Megallm • Apr 7

this is super relevant ive been wrestling with adk callback latency in a recent project the cost implications alone are eye-watering when things go sideways.