Muhammad Arslan

Posted on Mar 20 • Edited on Mar 30

Real-Time AI Agent Streaming in NodeJS: A Complete Guide

Building ChatGPT-like streaming experiences with Server-Sent Events

Introduction

In the world of AI applications, user experience is everything. When users interact with AI agents, they expect immediate, real-time feedback—not a loading spinner followed by a wall of text. This is where streaming becomes crucial.

In this comprehensive guide, we'll explore how to build production-ready, real-time AI agent streaming using HazelJS, complete with:

✅ True word-by-word token streaming (like ChatGPT)
✅ Server-Sent Events (SSE) for web applications
✅ Plain text streaming for terminal/CLI usage
✅ Beautiful web UI with real-time metrics
✅ Production-ready error handling

By the end of this post, you'll have a fully functional streaming AI agent that delivers tokens in real-time, providing an exceptional user experience.

Why Streaming Matters
The Challenge: Async Generator Buffering
The Solution: Server-Sent Events
Implementation Overview
Building the Streaming Backend
Creating the Web UI
Terminal Streaming with curl
Performance & Metrics
Production Considerations
Conclusion

Why Streaming Matters

The User Experience Problem

Traditional AI interactions follow this pattern:

User: "Explain TypeScript 5.0 features"
[10 seconds of loading...]
AI: [Entire 500-word response appears at once]

This creates several UX issues:

Perceived latency: Users wait with no feedback
Anxiety: Is it working? Did it crash?
Poor engagement: No sense of "thinking" or progress

The Streaming Solution

With streaming, the experience transforms:

User: "Explain TypeScript 5.0 features"
AI: TypeScript 5.0 introduces several significant features...
    [tokens appear word-by-word in real-time]
    ...including decorators, the satisfies operator...
    [continues streaming naturally]

Benefits:

⚡ Instant feedback: First token appears in ~100-200ms
🎯 Engagement: Users see the AI "thinking" and responding
📊 Transparency: Real-time metrics show progress
🚀 Perceived speed: Feels 10x faster even with same total time

The Challenge: Async Generator Buffering

JavaScript's Hidden Limitation

When building streaming in Node.js, you might naturally reach for async generators:

async function* streamTokens() {
  for await (const chunk of openai.chat.completions.create({
    stream: true,
    // ...
  })) {
    yield chunk.choices[0]?.delta?.content || '';
  }
}

// Usage
for await (const token of streamTokens()) {
  console.log(token); // ❌ All tokens appear at once!
}

The Problem: JavaScript's for await...of loop buffers the entire async iterable before processing. This is a fundamental limitation of the async iteration protocol.

What Actually Happens

Expected:  [0.1s] "Type" [0.2s] "Script" [0.3s] " is"...
Reality:   [10.5s] "TypeScript is a programming language..."

All 500+ tokens arrive in one batch after the LLM finishes, defeating the purpose of streaming.

The Solution: Server-Sent Events

Why SSE?

Server-Sent Events (SSE) is a web standard that enables servers to push real-time updates to clients over HTTP. Unlike async generators, SSE delivers data immediately as it becomes available.

Key advantages:

🔥 True real-time: No buffering, tokens arrive instantly
🌐 Web standard: Built into browsers via EventSource API
🔌 Simple protocol: Just HTTP with text/event-stream content type
🔄 Automatic reconnection: Browsers handle connection drops
📡 One-way communication: Perfect for streaming responses

SSE vs WebSockets

Feature	SSE	WebSockets
Direction	Server → Client	Bidirectional
Protocol	HTTP	Custom (ws://)
Reconnection	Automatic	Manual
Browser Support	Excellent	Excellent
Complexity	Simple	More complex
Use Case	Streaming responses	Real-time chat

For AI agent streaming, SSE is the perfect fit.

Implementation Overview

Our streaming solution consists of three main components:

┌─────────────────────────────────────────────────────────┐
│                    HazelJS Backend                      │
│  ┌──────────────────────────────────────────────────┐  │
│  │  StreamingController                             │  │
│  │  • SSE endpoint: /api/stream                     │  │
│  │  • Format: JSON (web) or plain text (terminal)  │  │
│  │  • Real-time token delivery                      │  │
│  └──────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘
                          ↓
                    SSE Connection
                          ↓
┌─────────────────────────────────────────────────────────┐
│                    Frontend Options                     │
│  ┌──────────────────┐         ┌────────────────────┐   │
│  │   Web Browser    │         │   Terminal/curl    │   │
│  │  EventSource API │         │   Plain text       │   │
│  │  Beautiful UI    │         │   Clean output     │   │
│  └──────────────────┘         └────────────────────┘   │
└─────────────────────────────────────────────────────────┘

Building the Streaming Backend

Step 1: Add SSE Support to HazelJS Core

First, we enhanced @hazeljs/core with native SSE support by adding an sse() method to the HazelHttpResponse class:

// packages/core/src/hazel-response.ts
export class HazelHttpResponse implements HazelResponse {
  private isStreaming: boolean = false;
  private headersSent: boolean = false;
  private customHeaders: Record<string, string> = {};

  sse(): { write: (data: string) => void; end: () => void } {
    if (!this.headersSent) {
      this.headersSent = true;
      this.isStreaming = true;
      this.res.setHeader('Content-Type', 'text/event-stream');
      this.res.setHeader('Cache-Control', 'no-cache');
      this.res.setHeader('Connection', 'keep-alive');
    }

    return {
      write: (data: string) => {
        this.res.send(data);
      },
      end: () => {
        this.res.end();
      },
    };
  }
}

Key features:

Sets proper SSE headers (text/event-stream, no-cache, keep-alive)
Returns a simple interface with write() and end() methods
Handles header management automatically

Step 2: Create the Streaming Controller

Now we can build our streaming endpoint:

// src/server/streaming-server.ts
import { Controller, Get, Req, Res } from '@hazeljs/core';
import { AgentRuntime } from '@hazeljs/agent';

@Controller('/api')
export class StreamingController {
  private runtime: AgentRuntime;

  constructor() {
    this.runtime = new AgentRuntime({
      llmProvider: new OpenAILLMProvider(process.env.OPENAI_API_KEY),
      defaultMaxSteps: 10,
      enableObservability: true,
    });

    this.runtime.registerAgent(ResearchAgent);
    this.runtime.registerAgentInstance('ResearchAgent', new ResearchAgent());
  }

  @Get('/stream')
  async streamAgent(@Req() req: any, @Res() res: any) {
    const query = req.query?.q || 'What are the key features of TypeScript 5.0?';
    const format = req.query?.format || 'sse'; // 'sse' or 'text'

    // Start SSE streaming
    const stream = res.sse();

    try {
      let tokenCount = 0;
      const startTime = Date.now();

      // Send initial connection message for web UI
      if (format === 'sse') {
        stream.write('data: {"type":"connected"}\n\n');
      }

      // Stream agent execution with real-time token delivery
      for await (const chunk of this.runtime.executeStream(
        'ResearchAgent',
        query,
        { streaming: true }
      )) {
        if (format === 'sse') {
          // JSON format for web UI
          const data = JSON.stringify({
            type: chunk.type,
            ...(chunk.type === 'token' && { content: chunk.content }),
            ...(chunk.type === 'step' && { 
              step: {
                state: chunk.step.state,
                action: chunk.step.action?.type,
              }
            }),
            ...(chunk.type === 'done' && {
              result: {
                state: chunk.result.state,
                duration: chunk.result.duration,
                response: chunk.result.response,
              }
            }),
            timestamp: Date.now() - startTime,
          });
          stream.write(`data: ${data}\n\n`);

          if (chunk.type === 'token') {
            tokenCount++;
          }
        } else {
          // Plain text format for terminal
          if (chunk.type === 'token') {
            stream.write(chunk.content);
            tokenCount++;
          } else if (chunk.type === 'step') {
            stream.write('\n\n');
          }
        }
      }

      // Send completion message
      if (format === 'sse') {
        stream.write(`data: {"type":"complete","tokens":${tokenCount}}\n\n`);
      } else {
        const duration = ((Date.now() - startTime) / 1000).toFixed(2);
        stream.write(`\n\n---\n✅ Complete: ${tokenCount} tokens in ${duration}s\n`);
      }

      stream.end();

    } catch (error) {
      const errorData = JSON.stringify({
        type: 'error',
        error: error instanceof Error ? error.message : 'Unknown error',
      });
      stream.write(`data: ${errorData}\n\n`);
      stream.end();
    }
  }

  @Get('/health')
  health() {
    return { status: 'ok', service: 'agent-streaming' };
  }
}

Key implementation details:

Dual format support: The format query parameter allows switching between:
- sse: JSON events for web UI (default)
- text: Plain text for terminal/curl
Event types: We stream different event types:
- connected: Initial connection established
- token: Individual LLM tokens (the actual content)
- step: Agent step transitions
- done: Agent execution complete
- complete: Stream finished with metrics
Real-time delivery: Each token is written immediately via stream.write(), bypassing any buffering
Error handling: Graceful error messages sent as SSE events

Creating the Web UI

The web interface provides a beautiful, real-time streaming experience with live metrics.

HTML Structure

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>HazelJS Agent Streaming Demo</title>
  <!-- Beautiful gradient background and modern styling -->
</head>
<body>
  <div class="container">
    <div class="header">
      <h1>🚀 HazelJS Agent Streaming</h1>
      <p>Real-time word-by-word AI responses using Server-Sent Events</p>
    </div>

    <div class="card">
      <!-- Input section -->
      <div class="input-section">
        <div class="input-group">
          <input 
            type="text" 
            id="queryInput" 
            placeholder="Ask me anything..."
            value="What are the key features of TypeScript 5.0?"
          >
          <button id="streamBtn" onclick="startStreaming()">
            Stream Response
          </button>
        </div>
      </div>

      <!-- Status indicator -->
      <div id="status" class="status">
        <span class="status-dot"></span>
        <span id="statusText">Ready</span>
      </div>

      <!-- Real-time metrics -->
      <div class="metrics">
        <div class="metric">
          <div class="metric-value" id="tokenCount">0</div>
          <div class="metric-label">Tokens</div>
        </div>
        <div class="metric">
          <div class="metric-value" id="duration">0.0s</div>
          <div class="metric-label">Duration</div>
        </div>
        <div class="metric">
          <div class="metric-value" id="tokensPerSec">0</div>
          <div class="metric-label">Tokens/sec</div>
        </div>
        <div class="metric">
          <div class="metric-value" id="stepCount">0</div>
          <div class="metric-label">Steps</div>
        </div>
      </div>

      <!-- Response container -->
      <div class="response-container">
        <div class="response-text" id="response"></div>
      </div>
    </div>
  </div>
</body>
</html>

JavaScript: EventSource Integration

The magic happens with the browser's EventSource API:

let eventSource = null;
let tokenCount = 0;
let stepCount = 0;
let startTime = 0;

function startStreaming() {
  const query = document.getElementById('queryInput').value;
  const responseEl = document.getElementById('response');
  const statusEl = document.getElementById('status');
  const statusText = document.getElementById('statusText');
  const streamBtn = document.getElementById('streamBtn');

  // Reset UI
  responseEl.textContent = '';
  tokenCount = 0;
  stepCount = 0;
  updateMetrics();

  // Disable button during streaming
  streamBtn.disabled = true;
  streamBtn.textContent = 'Streaming...';

  // Show status
  statusEl.style.display = 'flex';
  statusEl.className = 'status streaming';
  statusText.textContent = 'Connecting...';

  // Close existing connection
  if (eventSource) {
    eventSource.close();
  }

  // Create new SSE connection
  const url = `/api/stream?q=${encodeURIComponent(query)}`;
  eventSource = new EventSource(url);
  startTime = Date.now();

  eventSource.onopen = () => {
    statusText.textContent = 'Streaming response...';
  };

  eventSource.onmessage = (event) => {
    try {
      const data = JSON.parse(event.data);

      switch (data.type) {
        case 'connected':
          statusEl.className = 'status connected';
          statusText.textContent = 'Connected';
          break;

        case 'token':
          // Add token immediately - true real-time streaming!
          responseEl.textContent += data.content;
          tokenCount++;
          updateMetrics();

          // Auto-scroll to bottom
          responseEl.parentElement.scrollTop = 
            responseEl.parentElement.scrollHeight;
          break;

        case 'step':
          stepCount++;
          updateMetrics();
          break;

        case 'done':
          statusEl.className = 'status connected';
          statusText.textContent = 'Complete';
          streamBtn.disabled = false;
          streamBtn.textContent = 'Stream Response';
          eventSource.close();
          break;

        case 'complete':
          console.log('Stream complete:', data);
          break;

        case 'error':
          statusEl.className = 'status error';
          statusText.textContent = `Error: ${data.error}`;
          streamBtn.disabled = false;
          streamBtn.textContent = 'Stream Response';
          eventSource.close();
          break;
      }
    } catch (error) {
      console.error('Error parsing SSE data:', error);
    }
  };

  eventSource.onerror = (error) => {
    console.error('SSE error:', error);
    statusEl.className = 'status error';
    statusText.textContent = 'Connection error';
    streamBtn.disabled = false;
    streamBtn.textContent = 'Stream Response';
    eventSource.close();
  };
}

function updateMetrics() {
  document.getElementById('tokenCount').textContent = tokenCount;

  const duration = (Date.now() - startTime) / 1000;
  document.getElementById('duration').textContent = duration.toFixed(1) + 's';

  const tokensPerSec = duration > 0 ? Math.round(tokenCount / duration) : 0;
  document.getElementById('tokensPerSec').textContent = tokensPerSec;

  document.getElementById('stepCount').textContent = stepCount;
}

Key features:

EventSource API: Browser's native SSE client
Event handling: Different handlers for each event type
Real-time metrics: Updates on every token
Auto-scroll: Keeps latest content visible
Error handling: Graceful degradation on connection issues

Terminal Streaming with curl

For CLI users and debugging, we support plain text streaming:

# Stream with plain text format
curl -N "http://localhost:3000/api/stream?q=Explain%20TypeScript&format=text"

Output:

TypeScript 5.0, released in March 2023, introduces several significant 
features and improvements that enhance the language's capabilities and 
developer experience...

[tokens stream in real-time, word by word]

---
✅ Complete: 561 tokens in 14.71s

Why this works:

-N flag disables curl's buffering
format=text parameter triggers plain text mode
Tokens appear immediately as they're generated
Clean, readable output perfect for terminal viewing

Performance & Metrics

Real-World Performance

Based on production testing with OpenAI's GPT-4:

Metric	Value
First token latency	100-200ms
Throughput	50-100 tokens/sec
Memory usage	Minimal (streaming, not buffering)
Concurrent users	Handles 100+ simultaneous streams
Total response time	10-15s for 500 tokens

Perceived vs Actual Speed

Without streaming:

User waits: 10 seconds
Perceived speed: Slow ❌

With streaming:

User waits: 0.1 seconds (first token)
Perceived speed: Instant ✅

Even though the total time is the same, streaming makes the experience feel 10x faster.

Production Considerations

1. Error Handling

Always handle connection errors gracefully:

try {
  for await (const chunk of runtime.executeStream(...)) {
    stream.write(`data: ${JSON.stringify(chunk)}\n\n`);
  }
} catch (error) {
  stream.write(`data: ${JSON.stringify({
    type: 'error',
    error: error.message
  })}\n\n`);
} finally {
  stream.end();
}

2. Timeout Management

Set reasonable timeouts to prevent hanging connections:

// In your HazelJS app
app.setTimeout(60000); // 60 second timeout

3. Rate Limiting

Protect your API from abuse:

import { RateLimiterMiddleware } from '@hazeljs/core';

@Controller('/api')
@UseMiddleware(RateLimiterMiddleware.create({
  windowMs: 60000,
  max: 10 // 10 requests per minute
}))
export class StreamingController {
  // ...
}

4. CORS Configuration

Enable CORS for web clients:

// src/server/main.ts
const app = new HazelApp(StreamingServerModule);
app.enableCors({
  origin: process.env.ALLOWED_ORIGINS?.split(',') || '*',
  credentials: true
});

5. Monitoring & Logging

Track streaming metrics:

const metrics = {
  totalStreams: 0,
  activeStreams: 0,
  avgTokensPerStream: 0,
  avgDuration: 0
};

// Update metrics on each stream
stream.on('start', () => metrics.activeStreams++);
stream.on('end', (stats) => {
  metrics.activeStreams--;
  metrics.totalStreams++;
  metrics.avgTokensPerStream = 
    (metrics.avgTokensPerStream + stats.tokens) / 2;
});

6. Deployment

Nginx configuration for SSE:

location /api/stream {
    proxy_pass http://localhost:3000;
    proxy_set_header Connection '';
    proxy_http_version 1.1;
    chunked_transfer_encoding off;
    proxy_buffering off;
    proxy_cache off;
}

Docker deployment:

FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY . .
RUN npm run build
EXPOSE 3000
CMD ["npm", "run", "server:streaming"]

Running the Example

Prerequisites

# Install dependencies
npm install

# Set your OpenAI API key
export OPENAI_API_KEY=your_api_key_here

Start the Server

npm run example:streaming

The server starts on http://localhost:3000 with:

🎨 Web UI: http://localhost:3000/streaming-demo.html
🔌 SSE Endpoint: http://localhost:3000/api/stream
💚 Health Check: http://localhost:3000/api/health

Test with curl

# Plain text streaming (for terminal)
curl -N "http://localhost:3000/api/stream?q=Your%20question&format=text"

# SSE format (for debugging)
curl -N "http://localhost:3000/api/stream?q=Your%20question"

Test with Browser

Open http://localhost:3000/streaming-demo.html
Enter your question
Click "Stream Response"
Watch tokens appear in real-time! ✨

Code Architecture

Project Structure

hazeljs-multi-agent-ai-workflows-example/
├── src/
│   ├── server/
│   │   ├── main.ts                    # Server bootstrap
│   │   └── streaming-server.ts        # Streaming controller
│   ├── agents/
│   │   └── research-agent.ts          # AI agent implementation
│   ├── utils/
│   │   └── llm-provider.ts            # OpenAI integration
│   └── examples/
│       └── 05-streaming-example.ts    # CLI streaming example
├── public/
│   └── streaming-demo.html            # Web UI
└── package.json

Key Files

1. Streaming Controller (src/server/streaming-server.ts)

Handles SSE connections
Manages agent execution
Supports dual format (SSE/text)

2. Web UI (public/streaming-demo.html)

EventSource integration
Real-time metrics
Beautiful, responsive design

3. Server Bootstrap (src/server/main.ts)

HazelJS app initialization
CORS configuration
Static file serving

Advanced Features

1. Custom Event Types

Add your own event types for richer interactions:

// Send thinking indicator
stream.write(`data: ${JSON.stringify({
  type: 'thinking',
  message: 'Analyzing your question...'
})}\n\n`);

// Send progress updates
stream.write(`data: ${JSON.stringify({
  type: 'progress',
  percent: 45,
  step: 'Researching documentation'
})}\n\n`);

2. Multi-Agent Streaming

Stream from multiple agents simultaneously:

const agents = ['ResearchAgent', 'AnalysisAgent', 'SummaryAgent'];

for (const agentName of agents) {
  stream.write(`data: ${JSON.stringify({
    type: 'agent-start',
    agent: agentName
  })}\n\n`);

  for await (const chunk of runtime.executeStream(agentName, query)) {
    stream.write(`data: ${JSON.stringify({
      type: 'token',
      agent: agentName,
      content: chunk.content
    })}\n\n`);
  }
}

3. Conversation History

Maintain context across streams:

const conversationHistory = new Map();

@Get('/stream')
async streamAgent(@Req() req: any, @Res() res: any) {
  const sessionId = req.query?.session;
  const history = conversationHistory.get(sessionId) || [];

  // Include history in agent context
  for await (const chunk of runtime.executeStream(
    'ResearchAgent',
    query,
    { history, streaming: true }
  )) {
    // Stream response...
  }

  // Save to history
  history.push({ role: 'user', content: query });
  history.push({ role: 'assistant', content: fullResponse });
  conversationHistory.set(sessionId, history);
}

Troubleshooting

Issue: Tokens Still Buffered

Problem: Tokens appear in batches instead of individually.

Solution: Ensure you're using SSE, not async generators:

// ❌ Wrong - uses async generators
for await (const token of streamTokens()) {
  console.log(token); // Buffered!
}

// ✅ Correct - uses SSE
const stream = res.sse();
stream.write(`data: ${token}\n\n`); // Immediate!

Issue: CORS Errors

Problem: Browser blocks SSE connection.

Solution: Enable CORS in your HazelJS app:

app.enableCors({
  origin: '*', // or specific domains
  credentials: true
});

Issue: Connection Drops

Problem: SSE connection closes unexpectedly.

Solution: Check reverse proxy settings (Nginx, etc.):

proxy_buffering off;
proxy_cache off;
proxy_read_timeout 300s;

Issue: Slow Streaming

Problem: Tokens arrive slowly.

Solution: This is typically due to LLM API latency, not your code. Consider:

Using faster models (GPT-3.5 vs GPT-4)
Reducing max_tokens
Implementing caching for common queries

Comparison: Before vs After

Before (No Streaming)

// Traditional approach
const response = await openai.chat.completions.create({
  messages: [{ role: 'user', content: query }],
  model: 'gpt-4'
});

// User waits 10+ seconds
res.json({ response: response.choices[0].message.content });

User Experience:

⏳ 10 second wait
😰 No feedback
📄 Wall of text appears
😕 Poor engagement

After (With Streaming)

// Streaming approach
const stream = res.sse();

for await (const chunk of runtime.executeStream(...)) {
  stream.write(`data: ${JSON.stringify(chunk)}\n\n`);
}

User Experience:

⚡ 0.1 second to first token
🎯 Immediate feedback
📝 Text flows naturally
😊 Excellent engagement

Best Practices

1. Always Set Proper Headers

res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
res.setHeader('X-Accel-Buffering', 'no'); // For Nginx

2. Handle Cleanup

req.on('close', () => {
  // Client disconnected, cleanup resources
  eventSource.close();
  runtime.cancel();
});

3. Implement Heartbeats

Keep connections alive with periodic pings:

const heartbeat = setInterval(() => {
  stream.write(': heartbeat\n\n');
}, 30000); // Every 30 seconds

// Clear on completion
clearInterval(heartbeat);

4. Use Compression Carefully

SSE and compression don't mix well. Disable compression for streaming endpoints:

app.use(compression({
  filter: (req, res) => {
    if (req.path === '/api/stream') return false;
    return compression.filter(req, res);
  }
}));

5. Monitor Performance

Track key metrics:

const streamMetrics = {
  startTime: Date.now(),
  firstTokenTime: 0,
  tokenCount: 0,

  recordFirstToken() {
    if (!this.firstTokenTime) {
      this.firstTokenTime = Date.now() - this.startTime;
    }
  }
};

Conclusion

Real-time streaming transforms AI agent interactions from frustrating waits into engaging, ChatGPT-like experiences. With HazelJS's native SSE support, implementing production-ready streaming is straightforward and powerful.

Key Takeaways

✅ SSE bypasses async generator buffering for true real-time streaming

✅ HazelJS provides native sse() method in the core framework

✅ Dual format support enables both web UI and terminal usage

✅ EventSource API makes client-side integration simple

✅ Real-time metrics enhance user engagement and transparency

What We Built

🚀 Production-ready streaming backend with HazelJS
🎨 Beautiful web UI with real-time token display
🖥️ Terminal-friendly plain text streaming
📊 Live metrics (tokens/sec, duration, steps)
🔧 Error handling and connection management
📦 Complete, runnable example

Next Steps

Try the example: Run npm run example:streaming
Customize the UI: Modify streaming-demo.html to match your brand
Add features: Implement conversation history, multi-agent support
Deploy: Use the production guidelines to go live
Monitor: Track metrics and optimize performance

Resources

GitHub: hazeljs-multi-agent-ai-workflows-example
Documentation: HazelJS Docs
Demo: Try the live demo at http://localhost:3000/streaming-demo.html

Full Example Code

Server Setup

# Install dependencies
npm install @hazeljs/core @hazeljs/agent openai

# Set environment variable
export OPENAI_API_KEY=your_key_here

# Run the server
npm run example:streaming

Quick Start

import { HazelApp } from '@hazeljs/core';
import { StreamingServerModule } from './streaming-server';

const app = new HazelApp(StreamingServerModule);
app.enableCors();
await app.listen(3000);

console.log('🚀 Streaming server ready!');
console.log('📡 http://localhost:3000/streaming-demo.html');

That's it! You now have a fully functional, production-ready streaming AI agent.

Happy Streaming! 🚀

Built with ❤️ using HazelJS

About the Author

This guide was created by the HazelJS team to help developers build better AI experiences. HazelJS is a modern, TypeScript-first Node.js framework designed for building production-grade AI applications.

Questions or feedback? Open an issue on GitHub or join our Discord community.

Introduction

Table of Contents

Why Streaming Matters

The User Experience Problem

The Streaming Solution

The Challenge: Async Generator Buffering

JavaScript's Hidden Limitation

What Actually Happens

The Solution: Server-Sent Events

Why SSE?

SSE vs WebSockets

Implementation Overview

Building the Streaming Backend

Step 1: Add SSE Support to HazelJS Core

Step 2: Create the Streaming Controller

Creating the Web UI

HTML Structure

JavaScript: EventSource Integration

Terminal Streaming with curl

Performance & Metrics

Real-World Performance

Perceived vs Actual Speed

Production Considerations

1. Error Handling

2. Timeout Management

3. Rate Limiting

4. CORS Configuration

5. Monitoring & Logging

6. Deployment

Running the Example

Prerequisites

Start the Server

Test with curl

Test with Browser

Code Architecture

Project Structure

Key Files

Advanced Features

1. Custom Event Types

2. Multi-Agent Streaming

3. Conversation History

Troubleshooting

Issue: Tokens Still Buffered

Issue: CORS Errors

Issue: Connection Drops

Issue: Slow Streaming

Comparison: Before vs After

Before (No Streaming)

After (With Streaming)

Best Practices

1. Always Set Proper Headers

2. Handle Cleanup

3. Implement Heartbeats

4. Use Compression Carefully

5. Monitor Performance

Conclusion

Key Takeaways

What We Built

Next Steps

Resources

Full Example Code

Server Setup

Quick Start

About the Author