Google's Agent Development Kit (ADK) makes it straightforward to compose LlmAgent instances into multi-agent hierarchies. But two bugs bit me hard in production that aren't documented anywhere. Here's what happened and how to fix them.
The Setup
A root router LlmAgent with two sub-agents. Both sub-agents are module-level singletons — instantiated at import time, referenced from the root agent's constructor.
# Agents/my_app/root_agent.py
from Agents.my_app.sub_agent_a.agent import sub_agent_a
from Agents.my_app.sub_agent_b.agent import sub_agent_b
def _build_sub_agents() -> list:
return [sub_agent_a, sub_agent_b]
root_agent = LlmAgent(
name="my_app",
sub_agents=_build_sub_agents(),
...
)
Worked fine locally with adk web. Blew up on Cloud Run.
Bug 1: Agent already has a parent agent on module reload
The error
pydantic_core._pydantic_core.ValidationError: 1 validation error for LlmAgent
Value error, Agent `SubAgentA` already has a parent agent,
current parent: `my_app`, trying to add: `my_app`
What's happening
ADK's agent_loader calls importlib.import_module(agent_name) on every request. On the first request, it loads the module fresh and creates root_agent. The LlmAgent constructor sets sub_agent.parent_agent = root_agent for each sub-agent.
On the second request, agent_loader reloads the module. Because sub_agent_a and sub_agent_b are module-level singletons, they're the same Python objects from the previous load — still carrying their parent_agent reference. When the new LlmAgent tries to assign the parent again, pydantic's validator rejects it.
# Inside ADK's LlmAgent.__init__ (simplified)
for sub in sub_agents:
if sub.parent_agent is not None:
raise ValueError(f"Agent `{sub.name}` already has a parent agent ...")
sub.parent_agent = self
This never surfaces locally because adk web loads the module only once per session. Cloud Run's request-per-reload behavior is what triggers it.
The fix
Reset parent_agent to None before passing sub-agents to the constructor:
def _build_sub_agents() -> list:
agents = [sub_agent_a, sub_agent_b]
for agent in agents:
agent.parent_agent = None # reset before each reload
return agents
This is safe because the assignment happens synchronously before the new parent is set.
Bug 2: Context variable not found in instruction strings
The error
KeyError: 'Context variable not found: `hostname`.'
Traceback points here:
File ".../google/adk/utils/instructions_utils.py", line 124, in inject_session_state
return await _async_sub(r'{+[^{}]*}+', _replace_match, template)
What's happening
ADK injects session state into agent instructions at runtime. The mechanism scans the instruction string with the regex r'{+[^{}]*}+' and replaces every {var_name} with the corresponding session state value.
If your instruction contains an example URL or any template-like text with curly braces:
The URL format is `https://{hostname}/api/{resource_id}/`
ADK sees {hostname}, looks it up in session state, finds nothing, raises KeyError.
My first instinct was to double-brace escape like Python's .format():
https://{{hostname}}/api/{{resource_id}}/
This does not work. The regex is {+[^{}]*}+ — it matches one or more { characters followed by non-brace characters followed by one or more } characters. {{hostname}} still matches.
The fix
Don't use curly braces for literal placeholder text in instructions:
The URL format is `https://<hostname>/api/<resource_id>/`
More broadly: any {word} pattern in an ADK instruction string is treated as a session state variable, regardless of how many braces you use. Use angle brackets, square brackets, or prose for template-like text in prompts.
Summary
| Bug | Trigger | Fix |
|---|---|---|
parent_agent collision |
Module-level singleton sub-agents + ADK module reload per request | Reset agent.parent_agent = None before passing to constructor |
Context variable not found |
{word} patterns in instruction strings |
Use <word> or square brackets instead |
Both are easy to fix once you know what's happening, but the error messages don't immediately point to the root cause. The parent_agent one is especially sneaky — it only appears in production where the module is reloaded per request, never in adk web during local development.
Top comments (0)