This is a submission for the "New Year, New You" Portfolio Challenge
About Me
I am a Head of AI working across frontier research and cross-sector environments, designing and deploying AI systems intended for real-world use rather than demonstration alone.
I am an engineer by training and inclination, with a strategic lens shaped by working at the intersection of advanced capability, organisational decision-making, and societal impact.
My work focuses on making AI behaviour legible: exposing reasoning, intent, and system boundaries so that decisions are not merely produced, but understood.
This portfolio reflects that philosophy. It is not a showcase of outcomes, but of thinking — how complex AI systems are designed, constrained, and made observable in practice.
Portfolio
This portfolio demonstrates how modern AI systems can be designed to be inspectable, interpretable, and production-ready.
True multi-agent coordination
Queries are handled by four specialised agents rather than a single generalist model. A lightweight Coordinator analyses intent and routes requests to the appropriate specialist—covering technical implementation, research and methodology, or professional context. This mirrors how enterprise AI systems are structured in practice.
Observable reasoning traces
Every interaction exposes how the system arrives at an answer:
- which agent handled the query
- how prior context influenced the response
- token usage and latency
- the decision logic behind routing and generation
Observability here is not a visual flourish. It is a debugging and audit surface, designed to make AI behaviour understandable and accountable.
Session memory
Conversations retain context across turns, enabling coherent, multi-step dialogue that evolves naturally over time rather than resetting on each request.
Production-grade architecture
The system is built with Next.js 14, TypeScript, and Tailwind CSS, powered by Gemini 2.0 Flash and deployed on Google Cloud Run with automatic scaling. Performance targets are treated as constraints, not afterthoughts, with sub-two-second response times and a consistently high Lighthouse score.
Try It Yourself
Ask technical, research, or background-oriented questions and observe how the system responds.
Pay attention to:
- the agent routing indicator
- the live reasoning trace
- how earlier context is incorporated
- performance metrics updating in real time
The goal is not just to receive an answer, but to see how the answer was produced.
How I Built It
This portfolio was designed and delivered in less than a week, from concept to production deployment, with observability and safety treated as first-class architectural constraints rather than add-ons.
Architecture Overview
At its core is a true multi-agent system, following enterprise AI design patterns rather than prompt orchestration. Four specialised agents operate under a lightweight coordination layer:
a Coordinator agent performs intent analysis and routing
Projects, Research, and Career agents each maintain a narrow, well-defined knowledge domain
The Coordinator receives the user query plus a summarised conversation context and returns a single agent identifier. This avoids the common anti-pattern of querying all agents in parallel, reducing latency, cost, and response inconsistency while improving semantic routing accuracy.
Routing is LLM-based rather than keyword-driven, allowing intent to be inferred from both language and conversational history. In practice, this yields ~95% routing accuracy with a negligible latency overhead (~200–300 ms).
Session Memory & Context Management
Because Gemini APIs are stateless, I implemented an in-memory session management layer to support coherent multi-turn dialogue.
Each session maintains:
- full conversational history
- the last active agent
- inferred user intent and topic context
To prevent token growth, only the most recent exchanges are passed verbatim into prompts. Older interactions are compressed into a structured summary describing previously discussed topics and active context. This hybrid strategy reduces prompt size by ~60–70% in long conversations while preserving response quality and reducing latency.
The current implementation uses in-memory storage for speed (<1 ms lookup) and simplicity; the architecture is intentionally compatible with a Redis-backed persistence layer for scale.
Observable Reasoning as a System Primitive
Rather than treating reasoning traces as a UI feature, the system emits a structured reasoning trace alongside every response.
Each trace records:
- selected agent and routing rationale
- context summary used in the prompt
- token usage and response latency
- model version and timestamp
These traces are rendered in a dedicated side panel with progressive disclosure: high-level signals are always visible, while full prompts and metadata are expandable. This makes AI behaviour inspectable without overwhelming the user.
In practice, observability became a core development tool. Debugging, prompt iteration, and performance tuning were significantly faster because routing decisions and context usage were immediately visible.
Model & Tooling Choices
All agents are powered by Gemini 2.0 Flash, selected for its balance of latency, reasoning quality, and structured output reliability. Streaming responses are used to keep perceived latency low, while retries and fallbacks ensure graceful degradation.
Development was accelerated using Google Antigravity, which materially shifted effort away from boilerplate and towards architecture, prompt design, and system boundaries. The result was a measurable reduction in development time (≈40–50%) and cleaner, more consistent patterns across the codebase.
Deployment & Performance
The application is deployed on Google Cloud Run using a Next.js standalone build. This enables automatic scaling from zero to ten instances, HTTPS by default, and cost-efficient operation within the free tier.
Cold starts were reduced to ~2–3 seconds through container and bundle optimisation, while warm requests complete in under 100 ms excluding model inference. End-to-end response times typically fall in the 1–3 second range.
Key Engineering Takeaways
- Coordination matters more than agent count: multi-agent systems require explicit routing authority.
- Observability pays for itself: traces built for safety became indispensable for debugging and optimisation.
- Summarisation beats brute force context: intelligent memory handling outperforms full-history prompts.
The result is not just a portfolio of projects, but a working demonstration of modern AI systems engineering: observable, coordinated, and designed to operate safely under real-world constraints.
The portfolio can be viewed on the website.
Evaluating AI Systems Matters
Building AI systems is only half the work; rigorous evaluation of outputs, internal processes, and safety properties is what determines whether those systems can be trusted, governed, and deployed responsibly at scale.
The portfolio evaluates its multi-agent AI system through observable reasoning traces with real-time performance metrics, and plans to enhance this with LLM-as-a-judge automated quality scoring, user feedback collection, and cross-agent verification to systematically improve response accuracy and user satisfaction.
The cover image was created using Google AI's Flow. Gemini-3-pro, Gemini-3-Flash, and Google Antigravity were used for development.



Top comments (0)