A deep-dive into building a production-ready, multi-agent AI chat application using Angular 20, Firebase Cloud Functions, and Google Genkit — powered by Gemini 2.5 Flash.
A single LLM can answer questions, but real-world complexity usually requires specialized roles. For example, in a typical company setting, you wouldn't ask your backend engineer to design an app's UI, and you wouldn't ask the designer to optimize database queries. Similarly, we can create specialized AI agents that focus on one task and coordinate with each other to solve complex problems.
Overview
For this project, we have Concierge AI, a conversational travel assistant that helps users plan day trips, discover restaurants, find weekend events, and navigate routes, all through a single chat interface.
What makes it interesting from an engineering perspective is the multi-agent architecture under the hood. Rather than one monolithic AI prompt trying to do everything, the system uses a root orchestrator agent that intelligently delegates to four specialist sub-agents, each with its own system prompt and grounding via Google Search.
The frontend is built with modern Angular 20, using signals, OnPush change detection, and SSR, while the backend runs on Firebase Cloud Functions with Google Genkit orchestrating the AI pipeline. The Large Language Model (LLM) being used is the Gemini 2.5 Flash.
Architecture
The Multi-Agent Pattern
The system is built around a hierarchical multi-agent pattern:
User Message
│
▼
┌─────────────────────────────────┐
│ Concierge Agent │ ← Orchestrator
│ (Root agent / router) │
└────────────┬────────────────────┘
│ delegates via tool calls
┌───────┼───────────────┐
▼ ▼ ▼ ▼
┌─────────┐ ┌──────────┐ ┌──────────────┐ ┌────────────────┐
│ Day Trip│ │ Foodie │ │Weekend Guide │ │Find & Navigate │
│ Agent │ │ Agent │ │ Agent │ │ Agent │
└─────────┘ └──────────┘ └──────────────┘ └────────────────┘
Each sub-agent is a Genkit tool which is a typed, callable function that the orchestrator can invoke when it determines the user's request falls within that agent's domain.
Why Multi-Agent?
A single monolithic prompt becomes difficult to manage as capabilities grow. The multi-agent approach gives you:
- Separation of concerns: each agent has a focused system prompt
- Independent grounding: sub-agents can use Google Search; the orchestrator doesn't need to
- Composability: agents can be added, removed, or swapped independently
- Debuggability: you can test each agent in isolation via the Genkit Dev UI
-
Evaluation & Observability — Genkit's built-in tracing and the Firebase Genkit plugin's telemetry (
enableFirebaseTelemetry()) let you monitor agent behaviour, latency, and cost in production
Backend: Firebase Functions + Genkit
Configuring Genkit
Genkit is initialised once at the module level. The Google AI plugin is configured with the API key, which is stored as a Firebase Secret and injected at runtime:
import { defineSecret } from 'firebase-functions/params';
import { googleAI } from '@genkit-ai/google-genai';
import { genkit } from 'genkit';
const GEMINI_API_KEY = defineSecret('GEMINI_API_KEY');
const ai = genkit({
plugins: [googleAI({ apiKey: process.env.GEMINI_API_KEY })],
model: googleAI.model('gemini-2.5-flash'),
});
Note:
defineSecretregisters the secret with Firebase so it's automatically injected into the function's environment at runtime. You never hard-code the key.
Defining Sub-Agent Tools
Each specialist agent is defined using ai.defineTool. The tool has a typed input/output schema (via Zod) and its own system prompt. Crucially, it also accepts the conversation history so it has full context:
export const _dayTripAgentToolLogic = ai.defineTool(
{
name: 'dayTripAgentTool',
description: 'Assists with planning day trips',
inputSchema: z.object({
input: z.string(),
history: z.array(conversationMessageSchema).optional(),
}),
outputSchema: z.string(),
},
async ({ input, history }) => {
const response = await ai.generate({
system: DAY_TRIP_AGENT_PROMPT,
messages: [
...toGenkitMessages(history ?? []),
{ role: 'user', content: [{ text: input }] },
],
config: {
googleSearchRetrieval: {}, // ← live web search grounding
},
});
if (!response.text) throw new Error('No output from AI');
return response.text;
}
);
The description field is critical; it's what the orchestrator's LLM reads to decide whether to invoke this tool.
The Orchestrator Agent
The root agent is a Genkit flow (not a tool). It receives the user's message and history, then calls ai.generate with all four sub-agent tools registered. The LLM decides autonomously which tool(s) to call:
export const _conciergeAgentLogic = ai.defineFlow(
{
name: 'conciergeAgentFlow',
inputSchema: z.object({
input: z.string(),
history: z.array(conversationMessageSchema).optional(),
}),
outputSchema: z.string(),
},
async ({ input, history }) => {
const response = await ai.generate({
system: CONCIERGE_AGENT_PROMPT,
messages: [
...toGenkitMessages(history ?? []),
{ role: 'user', content: [{ text: input }] },
],
tools: [
_dayTripAgentToolLogic,
_foodieAgentToolLogic,
_weekendGuideAgentToolLogic,
_findAndNavigateAgentToolLogic,
],
});
const result = response.text || response.output;
if (!result) throw new Error('No output from AI');
return result;
}
);
Key insight: The orchestrator does not use
googleSearchRetrieval. Only the sub-agents do. The orchestrator's job is routing and synthesis, not raw research.
Exposing the Flow as a Callable Function
onCallGenkit wraps the Genkit flow as a Firebase Callable Function, handling authentication, CORS, and secret injection automatically:
const GENKIT_FUNCTION_CONFIG = {
secrets: [GEMINI_API_KEY],
region: 'africa-south1',
cors: isEmulated
? true
: [
'http://localhost:4200',
/^https:\/\/agents-concierge(--[a-z0-9-]+)?\.web\.app$/,
],
};
export const conciergeAgentFlow = onCallGenkit(
GENKIT_FUNCTION_CONFIG,
_conciergeAgentLogic
);
The regex in the cors array allows both the production domain and Firebase preview channel URLs (e.g. agents-concierge--pr-42-abc123.web.app).
Conversation Context: Keeping the AI Stateful
The Problem with Stateless Functions
Every invocation of a Firebase Cloud Function is stateless. Without extra work, the AI has no memory of previous turns:
Turn 1 — User: "Plan a day trip"
AI: "Sure! What city are you in?"
Turn 2 — User: "Cape Town"
AI: "I'm not sure what you're referring to." ← no context!
Client-Managed History
Rather than storing history server-side (e.g. in Firestore), the Angular client maintains the conversation history in memory and sends it with every request. The backend stays stateless and horizontally scalable.
User types message
│
▼
ChatComponent snapshots current history
│
▼
AiService.sendMessage(query, historySnapshot)
│
▼
Firebase Function: conciergeAgentFlow({ input, history })
│
▼
ai.generate({ messages: [...history, currentUserMessage], tools })
│
▼
Sub-agent tool receives { input, history } → same context
│
▼
Response returned → client appends both turns to history
The shared message schema (defined with Zod on the backend, mirrored as a TypeScript interface on the frontend) ensures both sides speak the same language:
// Backend (Zod schema)
const conversationMessageSchema = z.object({
role: z.enum(['user', 'model']),
content: z.string(),
});
// Frontend (TypeScript interface — chat.model.ts)
export interface ConversationMessage {
role: 'user' | 'model';
content: string;
}
A small helper on the backend converts this flat format into Genkit's MessageData structure:
function toGenkitMessages(history: ConversationMessage[]) {
return history.map((msg) => ({
role: msg.role as 'user' | 'model',
content: [{ text: msg.content }],
}));
}
The History Snapshot Pattern
There's a subtle but important detail in how the component sends history. The snapshot must be taken before appending the new user turn, otherwise the current message appears twice in the context:
sendMessage(): void {
const query = this.queryControl.value.trim();
// ✅ Snapshot BEFORE appending the new user turn
const historySnapshot = this.conversationHistory();
// Now append the new user turn to local history
this.conversationHistory.update((h) => [...h, { role: 'user', content: query }]);
this.aiService.sendMessage(query, historySnapshot).subscribe({
next: (response) => {
// Append AI turn after receiving the response
this.conversationHistory.update((h) => [
...h,
{ role: 'model', content: response.data },
]);
},
});
}
Frontend: Angular 20+
The AI Service
The service is a thin wrapper around the Firebase Callable Function. It uses inject() for dependency injection and returns an Observable so the component can use RxJS operators:
@Injectable({ providedIn: 'root' })
export class AiService {
private readonly functions = inject(Functions);
sendMessage(
query: string,
history: ConversationMessage[] = []
): Observable<{ data: string }> {
const conciergeAgentFlow = httpsCallable<
{ input: string; history: ConversationMessage[] },
string
>(this.functions, 'conciergeAgentFlow');
return from(conciergeAgentFlow({ input: query, history }));
}
}
The Chat Component
The component uses ChangeDetectionStrategy.OnPushwhereby Angular only re-renders when signal values change, making it highly performant even with a long message list:
@Component({
selector: 'app-chat',
templateUrl: './chat.component.html',
styleUrls: ['./chat.component.scss'],
imports: [ReactiveFormsModule],
changeDetection: ChangeDetectionStrategy.OnPush,
})
export class ChatComponent implements AfterViewChecked {
private readonly aiService = inject(AiService);
private readonly sanitizer = inject(DomSanitizer);
messages = signal<Message[]>([]);
conversationHistory = signal<ConversationMessage[]>([]);
isLoading = signal(false);
queryControl = new FormControl('', {
nonNullable: true,
validators: [Validators.required],
});
// ...
}
Signals for State Management
All mutable state is held in signals. The isLoading signal is managed with RxJS's finalize() operator, which fires whether the observable completes successfully or errors, thus ensuring the loading state is always cleaned up:
this.isLoading.set(true);
this.aiService
.sendMessage(query, historySnapshot)
.pipe(
finalize(() => {
this.isLoading.set(false); // ← always runs, even on error
this.shouldScrollToBottom = true;
})
)
.subscribe({
next: (response) => {
this.messages.update((msgs) => [
...msgs,
{
text: response.data,
formattedText: this.formatMarkdown(response.data),
sender: 'ai',
timestamp: new Date(),
},
]);
this.conversationHistory.update((h) => [
...h,
{ role: 'model', content: response.data },
]);
},
error: (err) => {
// append error message to chat
},
});
Rendering AI Markdown Responses
Gemini returns responses in Markdown. A custom MarkdownUtils class converts the most common Markdown constructs to HTML. Critically, the raw text is HTML-escaped first to prevent XSS, and then Angular's DomSanitizer.bypassSecurityTrustHtml is used only after that sanitisation step:
export class MarkdownUtils {
static formatMarkdown(text: string): string {
if (!text) return '';
// 1. Escape HTML first to prevent XSS
let escaped = text
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
.replace(/'/g, ''');
// 2. Then apply Markdown → HTML transformations
// (headers, bold, italic, lists, paragraphs...)
return result;
}
}
// In the component:
private formatMarkdown(text: string): SafeHtml {
const html = MarkdownUtils.formatMarkdown(text);
return this.sanitizer.bypassSecurityTrustHtml(html);
}
The template then binds the SafeHtml value using [innerHTML]:
@if (message.formattedText) {
<div class="message-text" [innerHTML]="message.formattedText"></div>
} @else {
<div class="message-text">{{ message.text }}</div>
}
Google Search Grounding
Sub-agents are configured with googleSearchRetrieval: {} in their ai.generate config. This tells Gemini to perform a live Google Search and use the results as grounding context before generating its response, therefore ensuring answers about restaurants, events, and routes are based on current, real-world data rather than the model's training data alone.
config: {
googleSearchRetrieval: {},
}
Why only on sub-agents?
The orchestrator's role is to understand intent and route to the right specialist. It doesn't need live search data to do that. Enabling search grounding on the orchestrator would add latency and cost without benefit. The sub-agents are where domain-specific, up-to-date information matters.
Project Structure
concierge/
├── src/
│ └── app/
│ ├── components/
│ │ └── chat/ # Main chat UI component
│ │ ├── chat.component.ts
│ │ ├── chat.component.html
│ │ └── chat.component.scss
│ ├── services/
│ │ └── core/
│ │ └── ai/
│ │ └── ai.service.ts # Firebase callable wrapper
│ ├── models/
│ │ └── chat.model.ts # ConversationMessage, Message interfaces
│ ├── utils/
│ │ └── markdown-utils.ts # Markdown → HTML renderer
│ ├── app.config.ts # Angular app config (Firebase, SSR)
│ └── app.routes.ts # Route definitions
├── functions/
│ └── src/
│ ├── index.ts # All Genkit agents & Firebase Function export
│ └── system-prompt.ts # System prompts for each agent
├── styles/
│ ├── _variables.scss # Design tokens
│ ├── _typography.scss
│ ├── _buttons.scss
│ ├── _forms.scss
│ ├── _animations.scss
│ └── _utilities.scss
├── documentation/
│ └── conversation-context.md # Deep-dive on history management
└── firebase.json # Firebase Hosting + Functions config
Conclusion
Building Concierge AI was an exercise in composing modern tools — Angular 20, Firebase, and Google Genkit — into a coherent, production-ready architecture. Each layer of the stack was chosen deliberately, and the patterns that emerged from that process are worth carrying into future projects.
What We Built
A fully functional, multi-agent AI chat application where:
- A root orchestrator understands user intent and delegates to the right specialist.
- Four sub-agents each bring focused expertise, grounded in real-time Google Search results.
- The Angular frontend manages conversation state locally using signals, keeping the backend stateless and scalable.
- The entire pipeline is secured, deployed, and served through Firebase.
Check out the project, Concierge AI, here: https://agents-concierge.web.app/
GitHub Repo: https://github.com/waynegakuo/concierge
Top comments (0)