If you have built a chat UI for a large language model in the last two years, you probably reached for RxJS, an OnPush component, an async pipe, and a BehaviorSubject per piece of state. It worked, but it was a lot of plumbing for what is fundamentally a very simple shape: one string that grows over time.
Angular Signals collapse that plumbing into a single primitive. And it turns out that streaming Gemini responses with Signals is one of the cleanest, most satisfying pieces of code you can write in modern Angular today.
In this tutorial we will build a working Google AI chat component, in roughly one hundred lines, that streams tokens from Gemini in real time, supports a stop button, and feels native on desktop and mobile. Then we will ship it safely on Cloud Run with a thin proxy, so you can drop a live, embedded demo into your post.
Why Signals are a perfect fit for streaming AI
A streaming LLM response is, mechanically, a sequence of small text deltas arriving over a fetch stream. Old-school Angular handled this with Subjects, async pipes, and a lot of trust that change detection would do the right thing.
Signals reframe the problem. A signal<string>('') is just a value that you call .update() on. Each update notifies only the views that read that signal, and Angular 20 with zoneless change detection skips the whole-tree dirty check entirely. That means you can call .update() thirty times a second from inside a for await loop and your UI will not break a sweat.
There is also a smaller, ergonomic win. With Signals the rendering rule is "whatever the signal is at this instant." Streaming chat is a value that is visibly mid-update, and Signals give you the perfect vocabulary for that — the in-flight token buffer is just another signal, alongside the committed message history.
What we are building
A single-page Angular app with one component. You type a question, hit send, and watch Gemini's answer stream in word by word. There is a stop button that cancels the stream, a running history of messages, and that is it. We will use Angular 20 standalone components, Signals, the new control flow (@for, @if), and the official @google/genai SDK.
You can find the finished repo on GitHub at the link at the bottom of this post.
Prerequisites
You will need Node 20 or newer, the Angular CLI (npm i -g @angular/cli), and a Gemini API key from Google AI Studio. The free tier is more than enough to follow along.
A note on the API key, because this matters: in the local version we read the key from an environment variable that gets bundled into the client. That is fine for local exploration. It is not fine for production. Anything in your bundle is visible to anyone who opens DevTools. We will fix this in the deploy section by adding a small proxy on Cloud Run — the key stays on the server, and the Angular code barely changes.
Project setup
Spin up a new Angular project with the CLI:
ng new gemini-stream --standalone --routing=false --style=css --skip-tests
cd gemini-stream
npm i @google/genai
Open src/environments/environment.ts (create it if the CLI did not) and add your key:
export const environment = {
geminiApiKey: 'YOUR_AI_STUDIO_KEY_HERE',
};
Add the same file under environment.development.ts if you use a separate dev environment, and make sure .gitignore keeps these out of source control if you put a real key in.
In src/app/app.config.ts, opt into zoneless change detection. By Angular 20 this is a stable provider, and it gives you the per-signal update path that makes streaming feel snappy:
import { ApplicationConfig, provideZonelessChangeDetection } from '@angular/core';
export const appConfig: ApplicationConfig = {
providers: [provideZonelessChangeDetection()],
};
That is the entire setup. On to the interesting bits.
The Gemini service
Create src/app/gemini.service.ts. The job of this service is small: take a chat history, return an async iterable of text deltas, and let the caller stop early.
import { Injectable } from '@angular/core';
import { GoogleGenAI } from '@google/genai';
import { environment } from '../environments/environment';
export type ChatRole = 'user' | 'model';
export interface ChatMessage {
role: ChatRole;
content: string;
}
@Injectable({ providedIn: 'root' })
export class GeminiService {
private ai = new GoogleGenAI({ apiKey: environment.geminiApiKey });
async *stream(
history: ChatMessage[],
shouldStop: () => boolean = () => false,
): AsyncGenerator<string> {
const response = await this.ai.models.generateContentStream({
model: 'gemini-2.5-flash',
contents: history.map((m) => ({
role: m.role,
parts: [{ text: m.content }],
})),
});
for await (const chunk of response) {
if (shouldStop()) return;
const text = chunk.text;
if (text) yield text;
}
}
}
Three things worth pointing out here.
First, generateContentStream returns an async iterable of chunks. Each chunk has a text getter that gives you the new tokens for that step. That is all the SDK asks of you.
Second, we accept a shouldStop predicate instead of an AbortController. This keeps cancellation logic on our side, where it composes nicely with Signals — the predicate is going to read a signal, and the moment the user clicks Stop, the next iteration of the loop bails out.
Third, the service yields strings, not chunks. By the time anything else in the app sees a delta, it is already plain text. That keeps our chat component free of any SDK-specific types.
Signals-based chat state
Now the chat component. Create src/app/chat.component.ts and start with the state. The whole point of this article is in this section, so read it slowly.
import {
ChangeDetectionStrategy,
Component,
computed,
effect,
inject,
signal,
viewChild,
ElementRef,
} from '@angular/core';
import { GeminiService, ChatMessage } from './gemini.service';
@Component({
selector: 'app-chat',
standalone: true,
changeDetection: ChangeDetectionStrategy.OnPush,
template: `<!-- coming up next -->`,
styles: [`/* coming up next */`],
})
export class ChatComponent {
private gemini = inject(GeminiService);
readonly messages = signal<ChatMessage[]>([]);
readonly draft = signal('');
readonly streaming = signal('');
readonly isStreaming = signal(false);
readonly stopRequested = signal(false);
readonly canSend = computed(
() => this.draft().trim().length > 0 && !this.isStreaming(),
);
private scroller = viewChild<ElementRef<HTMLDivElement>>('scroller');
constructor() {
effect(() => {
// Read the streaming buffer and message count to re-trigger on every update,
// then scroll to the bottom on the next animation frame.
this.streaming();
this.messages().length;
const el = this.scroller()?.nativeElement;
if (el) requestAnimationFrame(() => (el.scrollTop = el.scrollHeight));
});
}
async send() {
if (!this.canSend()) return;
const userMessage: ChatMessage = { role: 'user', content: this.draft().trim() };
this.messages.update((m) => [...m, userMessage]);
this.draft.set('');
this.streaming.set('');
this.isStreaming.set(true);
this.stopRequested.set(false);
try {
for await (const delta of this.gemini.stream(
this.messages(),
() => this.stopRequested(),
)) {
this.streaming.update((s) => s + delta);
}
} catch (err) {
this.streaming.update((s) => s + `\n\n_Error: ${(err as Error).message}_`);
} finally {
const final = this.streaming();
if (final) {
this.messages.update((m) => [...m, { role: 'model', content: final }]);
}
this.streaming.set('');
this.isStreaming.set(false);
}
}
stop() {
this.stopRequested.set(true);
}
}
Five signals carry the entire state of the chat. messages is the committed history. draft is what is in the textarea. streaming is the buffer for the in-flight assistant reply, separate from the history so we can render it differently. isStreaming and stopRequested are the control flags.
Notice that canSend is a computed. We never write to it, we never subscribe to it; we just read it from the template and Angular figures out when it changes. That single line replaces the form-validation observable boilerplate you might be used to.
The effect is doing the auto-scroll. By reading streaming() and messages().length inside the effect, we tell Angular: "rerun me whenever either of these changes." Then we scroll the chat container to the bottom on the next frame. This is the kind of small DOM concern that used to require AfterViewChecked and a flag; here it is six lines.
The send method is where streaming meets state. We push the user message, clear the buffer, then iterate over the service's async generator and call .update() on the streaming signal for each delta. When the loop ends (or the user hits Stop, which makes shouldStop return true on the next iteration), we commit whatever was in the buffer to the message history and reset.
The template
Replace the placeholder template and styles in the same file:
template: `
<div class="shell">
<div class="scroller" #scroller>
@for (m of messages(); track $index) {
<div class="msg {{ m.role }}">{{ m.content }}</div>
}
@if (isStreaming() && streaming()) {
<div class="msg model streaming">{{ streaming() }}<span class="cursor"></span></div>
}
</div>
<form class="composer" (submit)="$event.preventDefault(); send()">
<textarea
rows="2"
placeholder="Ask Gemini something..."
[value]="draft()"
(input)="draft.set($any($event.target).value)"
(keydown.enter)="$event.preventDefault(); send()"
></textarea>
@if (isStreaming()) {
<button type="button" (click)="stop()">Stop</button>
} @else {
<button type="submit" [disabled]="!canSend()">Send</button>
}
</form>
</div>
`,
styles: [`
.shell { display: flex; flex-direction: column; height: 100dvh; max-width: 720px; margin: 0 auto; font-family: system-ui, sans-serif; }
.scroller { flex: 1; overflow-y: auto; padding: 1rem; display: flex; flex-direction: column; gap: 0.75rem; }
.msg { padding: 0.75rem 1rem; border-radius: 12px; white-space: pre-wrap; line-height: 1.5; max-width: 85%; }
.msg.user { align-self: flex-end; background: #4285f4; color: white; }
.msg.model { align-self: flex-start; background: #f1f3f4; color: #202124; }
.cursor { display: inline-block; width: 0.5ch; background: currentColor; margin-left: 2px; animation: blink 1s steps(1) infinite; }
@keyframes blink { 50% { opacity: 0; } }
.composer { display: flex; gap: 0.5rem; padding: 1rem; border-top: 1px solid #eee; }
textarea { flex: 1; resize: none; padding: 0.75rem; border-radius: 12px; border: 1px solid #ddd; font: inherit; }
button { padding: 0 1.25rem; border-radius: 12px; border: none; background: #4285f4; color: white; font-weight: 600; cursor: pointer; }
button:disabled { opacity: 0.5; cursor: not-allowed; }
`]
The new control flow (@for, @if, @else) makes this template read like a small story: render every committed message, then render the in-flight reply if there is one, then show Send or Stop based on whether we are mid-stream. The blinking cursor on the streaming bubble is a tiny detail that makes the whole thing feel alive.
Wire the component into src/app/app.component.ts as the only thing rendered, run ng serve, and you should have a working streaming chat at http://localhost:4200.
Shipping it on Cloud Run
The local app calls Gemini directly with a key in the bundle. To ship it safely we need two small moves: a tiny server proxy that holds the key, and Cloud Run to host both the proxy and the static Angular build.
Create server/index.ts at the project root:
import express from 'express';
import { GoogleGenAI } from '@google/genai';
const app = express();
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY! });
app.use(express.json({ limit: '4mb' }));
app.use(express.static('dist/gemini-stream/browser'));
app.post('/api/stream', async (req, res) => {
res.setHeader('Content-Type', 'text/plain; charset=utf-8');
res.setHeader('Transfer-Encoding', 'chunked');
const stream = await ai.models.generateContentStream({
model: 'gemini-2.5-flash',
contents: req.body.contents,
});
for await (const chunk of stream) {
if (chunk.text) res.write(chunk.text);
}
res.end();
});
app.listen(process.env.PORT || 8080);
Update gemini.service.ts to read from the proxy with fetch instead of calling the SDK in the browser. The SDK and the API key never leave the server:
async *stream(history: ChatMessage[], shouldStop = () => false) {
const res = await fetch('/api/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
contents: history.map((m) => ({ role: m.role, parts: [{ text: m.content }] })),
}),
});
const reader = res.body!.pipeThrough(new TextDecoderStream()).getReader();
while (true) {
if (shouldStop()) { reader.cancel(); return; }
const { value, done } = await reader.read();
if (done) return;
if (value) yield value;
}
}
This is the part I love about the Signals architecture: the component code does not change at all. The signals do not care that the bytes are coming from a Cloud Run service now instead of the SDK. Same loop, same streaming.update() call.
Add a Dockerfile at the project root:
FROM node:20-alpine AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build && npx tsc -p server
FROM node:20-alpine
WORKDIR /app
COPY --from=build /app/dist ./dist
COPY --from=build /app/server/dist ./server
COPY --from=build /app/node_modules ./node_modules
COPY --from=build /app/package*.json ./
ENV NODE_ENV=production
CMD ["node", "server/index.js"]
Then ship it with one command — Cloud Run will build the container from source for you:
gcloud run deploy gemini-stream \
--source . \
--region us-central1 \
--allow-unauthenticated \
--set-env-vars GEMINI_API_KEY=YOUR_AI_STUDIO_KEY
You will get back a URL like https://gemini-stream-xxxxxx.us-central1.run.app. Test it in the browser, confirm the chat works end to end, and you are done.
The fun part: dev.to has a first-class Cloud Run embed, so here you go:
What you actually built
The whole thing — service, component, template, styles — comes in just over a hundred lines. Compare that to an equivalent app two years ago and you will notice what is missing: there is no Subject, no BehaviorSubject, no async pipe, no OnPush boilerplate that you have to think about, no manual subscription cleanup. Signals plus the new control flow plus zoneless change detection is genuinely a different programming model, and streaming AI is the application that shows it off best.
A couple of small things to try next, in roughly increasing order of effort:
Add a systemInstruction to the generateContentStream call to give your model a persona. The SDK accepts it as a sibling of contents on the proxy side.
Switch from text-only input to multimodal: drop an image into the chat and forward it from the proxy as a parts entry of { inlineData: { mimeType, data } }. Gemini handles the rest.
Prefer Firebase to Cloud Run? Firebase AI Logic gives you the same proxy pattern with less infra — install firebase and @firebase/ai, and the SDK shape stays almost identical. You give up the dev.to Cloud Run embed, but the Angular code is unchanged.
Try the same UI against Chrome's Built-in AI (Gemini Nano running on-device, no key, no network). The Prompt API has its own streaming primitive that drops into the same Signal-based shell with almost no changes — and you get an offline-capable chat for free.
Wrap-up
If you take one thing away from this post, let it be that Signals were designed for values that change a lot, and an LLM stream is the canonical example of a value that changes a lot. The pieces fit so cleanly that the resulting code reads more like a description of the UI than like a program.
Repo: https://github.com/TomWebwalker/gemini-stream-angular
If you build something with this drop a link in the comments — I would love to see what people make of it.
Top comments (0)