This is a submission for the Google AI Agents Writing Challenge: Learning Reflections & Capstone Showcase
My Learning Journey
Five days ago, I submitted an AI Photography Coach—a multi-agent system capstone project for the 5-Day AI Agents Intensive Course with Google that fundamentally changed how I think about building intelligent applications. This wasn't just another capstone project. It forced me to confront a question I'd been wrestling with for months: What separates a system that appears intelligent from one that genuinely solves problems?
Coming into the Google AI Agents Intensive, I understood agents conceptually. I'd read the papers. I'd tinkered with LLMs. But there was a critical gap between understanding and architecting—and this course obliterated that gap.
The breakthrough came on Day 2 when we deconstructed the difference between monolithic LLM calls and specialized agent systems. Most people use LLMs like Swiss Army knives—one model trying to do everything. The course showed me something radical: the power isn't in having one smart model; it's in having many focused ones working together.
Key Concepts & Technical Deep Dive
The ADK Native Orchestrator Pattern: Architecture That Actually Works
The game-changer for my photography coach was understanding the ADK-native orchestrator pattern. Instead of building a custom routing system, I leveraged Google's Agent Development Kit's built-in orchestration capabilities.
Here's the architecture that makes this work:
Core Agents (Shared):
- Vision Agent (Sub-Agent 1): Uses Gemini 2.5 Flash Vision for image analysis—EXIF extraction, composition analysis, defect detection with severity scoring, and strength identification
- Orchestrator Agent (Parent): The intelligent coordinator that manages session state, routes requests to specialized sub-agents, implements context compaction, and persists memory using SQLite + ADK Cloud Memory adapters
- Knowledge Agent (Sub-Agent 2): Powered by Gemini 2.5 Flash with hybrid CASCADE RAG for query understanding, knowledge retrieval, response generation, citation grounding, and skill-level adaptation
Key Pattern: Orchestrator Mediates All Communication
This is critical: the orchestrator mediates all agent communication. The Vision Agent doesn't talk directly to the Knowledge Agent. Instead:
- Vision Agent outputs structured analysis (exif dict, composition_summary, detected_issues, strengths)
- Orchestrator aggregates this with session context
- Knowledge Agent receives unified input context and generates the coaching response
- Orchestrator updates conversation history and persists session state
This eliminates cascading errors and makes the entire system debuggable in ways direct agent-to-agent communication could never achieve. It's a pattern, not just a feature—and it's built into ADK natively.
Three Infrastructure Approaches: One System, Multiple Deployments
What fascinated me was how the same core agents can run through three different interfaces. This distinction between agent architecture and deployment architecture was the second major revelation:
1. ADK Runner (Cloud)
- Components: LlmAgent, Runner, Sessions
- Interface: Vertex AI / Cloud Run
- When to use: Production-grade photo coaching with cloud scalability and managed infrastructure
2. MCP Server (Desktop)
- Components: JSON-RPC 2.0 over stdio transport
- Capabilities: 3 tools exposed per agent
- Deploy: Claude Desktop, local machine
- When to use: Local development, integration with Claude, running alongside other MCP-compatible tools
- This was the breakthrough for me—MCP protocol meant I could integrate my agents with any compatible application without rewriting core logic
3. Python API (Custom)
- Components: Direct imports, function calls
- Deploy: Notebooks, custom apps, Streamlit dashboards
- When to use: Research, experimentation, embedded systems, educational contexts
The realization: agent architecture is orthogonal to deployment architecture. Design the agent system once (orchestrator + specialized agents), then expose it through whichever interface makes sense for your use case. This separation of concerns is elegant and powerful.
The Critical Insight: Negative Space Design
During debugging, I discovered something counterintuitive: the best agent isn't the one with the smartest prompts; it's the one with the clearest responsibilities.
I spent as much time defining what each agent should not do as defining what it should do:
- Vision Agent: Analyzes only what's in the image. Never generates teaching advice or pedagogical content.
- Knowledge Agent: Teaches based on provided analysis. Never re-analyzes images or duplicates vision work.
- Orchestrator: Routes and aggregates. Never generates original analysis or coaching—only synthesis.
This negative space design—drawing boundaries tighter than seemed necessary—eliminated entire categories of bugs. It forced each agent's responsibility to be so crystalline that context compaction became natural, error handling became obvious, and delegation logic became transparent.
Context Engineering and Memory as Foundation
The course's emphasis on context compaction changed how I architect systems. In a multi-agent ecosystem, context is a resource, not a convenience.
The photography coach uses a two-tier memory system:
- Session memory: Short-term context about current analysis and conversation
- User model: Long-term history of preferences, skill progression, learning patterns
The orchestrator implements context compaction before passing context between agents:
- Summarizing vision analysis into structured fields (rather than raw model output)
- Truncating conversation history intelligently
- Maintaining only relevant user profile context
This isn't optimization; it's architectural necessity. With three agents and multiple turns, uncompressed context balloons quickly. Compaction forces rigor in what information actually matters for decision-making.
Tools: The Backbone of Agent Capability
The course reframed my entire thinking: agents aren't intelligent because of their prompts; they're intelligent because of their tools.
For the photography coach:
- Vision APIs: Constrain analysis to structured outputs
- Vector Database (CASCADE hybrid RAG): Guarantee knowledge comes from grounded sources
- Custom Tools: Photography-specific calculations (depth of field relationships, shutter speed ratios, focal length conversions)
- Memory Tools: SQLite adapters for persistence
Each tool is a constraint that prevents hallucination. When a Vision Agent can only output structured EXIF data and composition summaries, it can't invent. When the Knowledge Agent can only pull from photography principles via RAG, its advice has traceable citations. Tools aren't features you add; they're guardrails you build into the system's fabric.
Reflections & Takeaways
What the Course Got Right
The hands-on codelabs genuinely built intuition. I didn't just read about multi-agent systems; I implemented them, broke them, debugged them, rebuilt them. The guest speakers—engineers shipping agentic AI at scale—grounded theory in production reality. Learning about the ADK's orchestrator pattern in isolation, then building it into a real system, created understanding that no lecture could achieve.
The emphasis on architecture as design constraint was transformative. Before this course, I thought about features and interfaces. Now I think about specialization, coordination, failure modes, and the boundaries between components.
Honest Critique
The course could dive deeper into failure modes in multi-agent systems. They fail in new ways: cascading errors compounding across agents, subtle bugs in delegation logic, context compaction artifacts that only emerge in production. A dedicated deep-dive would be invaluable.
More explicit guidance on choosing deployment interfaces would help practitioners. The fact that one agent system can work through ADK Runner, MCP Server, or custom Python API is powerful—but knowing when to use each requires hands-on experience or mentorship.
How This Changes What I Build Next
I'm now architecting systems fundamentally differently:
- Define agent specialization and boundaries first, before any code
- Treat the orchestrator pattern as primitive, not optional
- Make context compaction a first-class design concern
- Use tools to constrain behavior, not enhance capability
- Choose deployment interface after agent architecture is finalized, not before
The photography coach is just the beginning. The real power is understanding that intelligent systems are built through specialization and clear boundaries, not through smarter prompts or larger models. Architecture beats parameters every time.
The Bigger Picture
If you're considering the AI Agents Intensive: do it. But go in expecting it to change your architecture mindset, not just teach you new libraries.
The future of AI isn't smarter models—it's smarter systems. Systems that know their limitations, delegate to specialists, maintain clear boundaries, and communicate through structured protocols. Systems where architecture is a design tool, not an afterthought. That's what this course teaches. That's what matters now.
Technical Stack & Architecture
Core Agents (ADK Native):
- Vision Agent: Gemini 2.5 Flash Vision (image analysis, EXIF extraction, composition scoring, defect detection)
- Orchestrator Agent: Session management, context compaction, routing, memory persistence
- Knowledge Agent: Gemini 2.5 Flash + Hybrid CASCADE RAG (knowledge retrieval, citations, skill adaptation)
Memory & Persistence:
- SQLite for session state
- ADK Cloud Memory adapters
- Conversation history management
- User model tracking
Deployment Options:
- ADK Runner: Cloud/Vertex AI production deployment
- MCP Server: Desktop deployment with JSON-RPC 2.0 (Claude Desktop, local tools)
- Python API: Notebooks, Streamlit, custom applications
Integration Patterns:
- Orchestrator-mediated agent communication (no direct agent-to-agent)
- Structured context passing between agents
- RAG-grounded knowledge retrieval with citations
- Context compaction before inter-agent communication
Project Links:


Top comments (0)