DEV Community

Cover image for React + AI: Building Intelligent Web Applications in 2026
P R
P R

Posted on • Originally published at Medium

React + AI: Building Intelligent Web Applications in 2026

This article was originally published in Towards AI on Medium. Canonical link: https://medium.com/gitconnected/react-ai-building-intelligent-web-applications-in-2026-6d412830f705

Every React app will need AI features within 18 months. The question isn't whether — it's how to build them without turning your codebase into a tangled mess of API calls and loading spinners.

Most "React + AI" tutorials stop at calling the OpenAI API with fetch(). That's not AI integration — that's an API call with a bigger bill.

In this article, we're going to build 3 real AI features in React — streaming chat, semantic search, and AI-powered form validation — with clean, typed patterns you can copy into your projects today. Full TypeScript, production-ready hooks, backend routes, and the cost math nobody else shows you.

Section 1: Architecture — Where AI Fits in a React App

Before writing a single component, get the architecture right. AI features fail in production due to poor architecture, not bad prompts.

The Three-Layer Pattern

Every AI interaction flows through these three layers. Your React components never talk to LLM providers directly.

Key Architectural Decisions

Never call AI APIs directly from the frontend. Always proxy through your API layer. Your OpenAI/Anthropic API key in client-side code is a security disaster. The API layer provides rate limiting, cost tracking, response caching, and prompt-injection filtering.

Use Server-Sent Events (SSE) for streaming — not WebSockets. LLM streaming is unidirectional. SSE handles this natively over HTTP, works with CDNs and load balancers out of the box, and auto-reconnects on failure.

Implement a cost tracking middleware from day one. Cache aggressively — same prompt = same response. In production, 30–60% cache hit rates translate directly to 30–60% cost savings.

Section 2: Pattern 1 — Streaming AI Chat

Backend: The SSE Endpoint

// app/api/chat/route.ts (Next.js App Router)
import Anthropic from '@anthropic-ai/sdk';
import { NextRequest } from 'next/server';

const anthropic = new Anthropic();

export async function POST(req: NextRequest) {
  const { messages } = await req.json();
  const encoder = new TextEncoder();

  const stream = new ReadableStream({
    async start(controller) {
      try {
        const response = anthropic.messages.stream({
          model: 'claude-sonnet-4-20250514',
          max_tokens: 1024,
          messages: messages.map(
            ({ role, content }: { role: string; content: string }) => ({
              role,
              content,
            })
          ),
        });

        response.on('text', (text) => {
          controller.enqueue(
            encoder.encode('data: ' + JSON.stringify({ token: text }) + '

')
          );
        });

        response.on('end', () => {
          controller.enqueue(encoder.encode('data: [DONE]

'));
          controller.close();
        });

        response.on('error', (error) => {
          controller.enqueue(
            encoder.encode('data: ' + JSON.stringify({ error: error.message }) + '

')
          );
          controller.close();
        });
      } catch (error) {
        controller.enqueue(
          encoder.encode('data: ' + JSON.stringify({ error: 'Stream initialization failed' }) + '

')
        );
        controller.close();
      }
    },
  });

  return new Response(stream, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
      Connection: 'keep-alive',
      'X-Accel-Buffering': 'no',
    },
  });
}
Enter fullscreen mode Exit fullscreen mode

The X-Accel-Buffering: no header is critical. Without it, nginx buffers the entire SSE response before forwarding it, completely killing the streaming effect.

The Custom Hook: useStreamingChat

// hooks/useStreamingChat.ts
import { useReducer, useCallback, useRef } from 'react';

interface Message {
  id: string;
  role: 'user' | 'assistant';
  content: string;
  status: 'complete' | 'streaming' | 'error';
  timestamp: number;
}

type ChatAction =
  | { type: 'ADD_MESSAGE'; message: Message }
  | { type: 'APPEND_TOKEN'; id: string; token: string }
  | { type: 'SET_STATUS'; id: string; status: Message['status'] }
  | { type: 'CLEAR' };

interface ChatState {
  messages: Message[];
  isStreaming: boolean;
  error: Error | null;
}

function chatReducer(state: ChatState, action: ChatAction): ChatState {
  switch (action.type) {
    case 'ADD_MESSAGE':
      return { ...state, messages: [...state.messages, action.message], isStreaming: action.message.status === 'streaming', error: null };
    case 'APPEND_TOKEN':
      return { ...state, messages: state.messages.map((msg) => msg.id === action.id ? { ...msg, content: msg.content + action.token } : msg) };
    case 'SET_STATUS':
      return { ...state, messages: state.messages.map((msg) => msg.id === action.id ? { ...msg, status: action.status } : msg), isStreaming: action.status === 'streaming', error: action.status === 'error' ? new Error('Generation failed') : state.error };
    case 'CLEAR':
      return { messages: [], isStreaming: false, error: null };
    default:
      return state;
  }
}

export function useStreamingChat(endpoint: string = '/api/chat') {
  const [state, dispatch] = useReducer(chatReducer, { messages: [], isStreaming: false, error: null });
  const abortRef = useRef<AbortController | null>(null);

  const sendMessage = useCallback(async (content: string) => {
    const userMsg: Message = { id: crypto.randomUUID(), role: 'user', content, status: 'complete', timestamp: Date.now() };
    dispatch({ type: 'ADD_MESSAGE', message: userMsg });

    const assistantId = crypto.randomUUID();
    dispatch({ type: 'ADD_MESSAGE', message: { id: assistantId, role: 'assistant', content: '', status: 'streaming', timestamp: Date.now() } });

    abortRef.current = new AbortController();

    try {
      const response = await fetch(endpoint, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ messages: [...state.messages, userMsg].map(({ role, content }) => ({ role, content })) }),
        signal: abortRef.current.signal,
      });

      if (!response.ok) throw new Error('HTTP ' + response.status);
      if (!response.body) throw new Error('No response body');

      const reader = response.body.getReader();
      const decoder = new TextDecoder();

      while (true) {
        const { done, value } = await reader.read();
        if (done) break;
        const chunk = decoder.decode(value, { stream: true });
        for (const line of chunk.split('
')) {
          if (line.startsWith('data: ')) {
            const data = line.slice(6);
            if (data === '[DONE]') break;
            try {
              const parsed = JSON.parse(data);
              if (parsed.token) dispatch({ type: 'APPEND_TOKEN', id: assistantId, token: parsed.token });
              if (parsed.error) throw new Error(parsed.error);
            } catch { if (data.trim()) dispatch({ type: 'APPEND_TOKEN', id: assistantId, token: data }); }
          }
        }
      }
      dispatch({ type: 'SET_STATUS', id: assistantId, status: 'complete' });
    } catch (err: unknown) {
      dispatch({ type: 'SET_STATUS', id: assistantId, status: err instanceof Error && err.name === 'AbortError' ? 'complete' : 'error' });
    }
  }, [endpoint, state.messages]);

  const cancelStream = useCallback(() => { abortRef.current?.abort(); }, []);

  return { messages: state.messages, isStreaming: state.isStreaming, error: state.error, sendMessage, cancelStream };
}
Enter fullscreen mode Exit fullscreen mode

Why useReducer instead of useState? With useState, you hit stale closures and race conditions when streaming. useReducer handles all state transitions atomically.

Frontend: The Chat Component

// components/AIChat.tsx
import { useState, useRef, useEffect, memo } from 'react';
import { useStreamingChat } from '../hooks/useStreamingChat';
import ReactMarkdown from 'react-markdown';

const MessageBubble = memo(function MessageBubble({ message }) {
  return (
    <div className={'message ' + message.role} style={{ contentVisibility: 'auto' }}>
      {message.role === 'assistant' ? <ReactMarkdown>{message.content}</ReactMarkdown> : <p>{message.content}</p>}
      {message.status === 'streaming' && <span className="cursor-blink">|</span>}
      {message.status === 'error' && <span className="error-badge">Failed to generate — try again</span>}
    </div>
  );
});

export function AIChat() {
  const { messages, isStreaming, sendMessage, cancelStream } = useStreamingChat();
  const [input, setInput] = useState('');
  const bottomRef = useRef<HTMLDivElement>(null);

  useEffect(() => { bottomRef.current?.scrollIntoView({ behavior: 'smooth' }); }, [messages]);

  const handleSubmit = (e: React.FormEvent) => {
    e.preventDefault();
    if (!input.trim() || isStreaming) return;
    sendMessage(input.trim());
    setInput('');
  };

  return (
    <div className="chat-container">
      <div className="messages" role="log" aria-live="polite">
        {messages.map((msg) => <MessageBubble key={msg.id} message={msg} />)}
        <div ref={bottomRef} />
      </div>
      <form onSubmit={handleSubmit} className="chat-input-form">
        <input value={input} onChange={(e) => setInput(e.target.value)} placeholder="Ask anything..." disabled={isStreaming} />
        {isStreaming ? <button type="button" onClick={cancelStream}>Stop</button> : <button type="submit" disabled={!input.trim()}>Send</button>}
      </form>
    </div>
  );
}
Enter fullscreen mode Exit fullscreen mode

Key decisions: contentVisibility: 'auto' cuts rendering time by 40–60% in long conversations. The AbortController pattern lets users cancel mid-stream. memo on MessageBubble prevents re-rendering every message when a single token arrives.

Section 3: Pattern 2 — Semantic Search with Hybrid Scoring

Traditional search is string matching. Pure semantic search loses exact-match precision. Hybrid search combines both — research shows it achieves 15–30% better precision than either approach alone.

Backend: Hybrid Search with pgvector

// app/api/search/route.ts
import { NextRequest, NextResponse } from 'next/server';
import OpenAI from 'openai';
import { Pool } from 'pg';

const openai = new OpenAI();
const pool = new Pool({ connectionString: process.env.DATABASE_URL });

export async function POST(req: NextRequest) {
  const { query } = await req.json();
  if (!query?.trim()) return NextResponse.json([]);

  const embeddingRes = await openai.embeddings.create({ model: 'text-embedding-3-small', input: query });
  const queryEmbedding = embeddingRes.data[0].embedding;

  const { rows } = await pool.query(
    `WITH semantic AS (
        SELECT id, title, snippet, 1 - (embedding <=> $1::vector) AS semantic_score
        FROM documents ORDER BY embedding <=> $1::vector LIMIT 20
      ),
      keyword AS (
        SELECT id, title, snippet, ts_rank(search_vector, plainto_tsquery('english', $2)) AS keyword_score
        FROM documents WHERE search_vector @@ plainto_tsquery('english', $2)
        ORDER BY keyword_score DESC LIMIT 20
      )
      SELECT COALESCE(s.id, k.id) AS id, COALESCE(s.title, k.title) AS title, COALESCE(s.snippet, k.snippet) AS snippet,
        (COALESCE(s.semantic_score, 0) * 0.7 + COALESCE(k.keyword_score, 0) * 0.3) AS score,
        CASE WHEN s.id IS NOT NULL AND k.id IS NOT NULL THEN 'both'
             WHEN s.id IS NOT NULL THEN 'semantic' ELSE 'keyword' END AS source
      FROM semantic s FULL OUTER JOIN keyword k ON s.id = k.id
      ORDER BY score DESC LIMIT 10`,
    [JSON.stringify(queryEmbedding), query]
  );

  return NextResponse.json(rows);
}
Enter fullscreen mode Exit fullscreen mode

The 0.7 semantic + 0.3 keyword weighting is a strong default. Tune it based on content type. If you're already on PostgreSQL, pgvector is the pragmatic choice for under 1M documents — no separate vector database infrastructure needed.

Section 4: Pattern 3 — AI-Powered Form Validation

Standard validation checks structure. AI validation checks meaning. At ~$0.001 per validation call, you get spam detection, content categorization, and meaning-based validation that regex cannot do.

Backend: Validation Endpoint

// app/api/validate/route.ts
import { NextRequest, NextResponse } from 'next/server';
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic();

export async function POST(req: NextRequest) {
  const { field, value, rules, context } = await req.json();

  if (value.length < 10) {
    return NextResponse.json({ valid: true, suggestions: [], corrected: value });
  }

  try {
    const message = await anthropic.messages.create({
      model: 'claude-sonnet-4-20250514',
      max_tokens: 256,
      messages: [{
        role: 'user',
        content: 'Validate this form field and respond ONLY with JSON.
Field: "' + field + '"
Value: "' + value + '"
Validation rules: ' + rules + '

Respond in this exact JSON format:
{ "valid": boolean, "suggestions": ["list of improvement suggestions if invalid"], "corrected": "corrected version or original if valid", "category": "auto-detected category if applicable" }',
      }],
    });

    const textContent = message.content.find((c) => c.type === 'text');
    if (!textContent || textContent.type !== 'text') throw new Error('No text response');

    return NextResponse.json(JSON.parse(textContent.text));
  } catch (error) {
    // AI validation failure should NEVER block the form
    return NextResponse.json({ valid: true, suggestions: [], corrected: value });
  }
}
Enter fullscreen mode Exit fullscreen mode

Key patterns: Trigger on onBlur (not onChange) to avoid paying for every keystroke. Use an 800ms debounce as a safety net. Cache by field+value. If AI fails, always return { valid: true } — AI validation is additive, never blocking.

Section 5: Cost Management

Monthly cost per 1,000 DAU (with optimizations):

  • Streaming Chat: $50–200
  • Semantic Search: $15–50
  • Form Validation: $10–30
  • Dashboard Insights: $5–20
  • Total: $80–300/month

5 Cost Control Patterns:

  1. Prompt caching — 30–60% hit rate. Use Redis for shared cache. Embedding calls are the most cacheable (same input = same vector, always).
  2. Model routing — save 80%. Use GPT-4o-mini/Claude Haiku for simple tasks. Reserve premium models for complex work.
  3. Token budgets — per-user daily limits with graceful degradation.
  4. Streaming cancellation — abort both client and server when users stop reading. Without server-side cleanup, you pay for every token until response completes.
  5. Batch embeddings — send arrays of 100 to the embedding API instead of one at a time.
const COST_TABLE: Record<string, { input: number; output: number }> = {
  'claude-sonnet-4-20250514': { input: 3.0, output: 15.0 },
  'gpt-4o-mini': { input: 0.15, output: 0.6 },
  'text-embedding-3-small': { input: 0.02, output: 0 },
};

export function estimateCost(model: string, inputTokens: number, outputTokens: number): number {
  const rates = COST_TABLE[model] || { input: 1, output: 5 };
  return (inputTokens / 1_000_000) * rates.input + (outputTokens / 1_000_000) * rates.output;
}
Enter fullscreen mode Exit fullscreen mode

Section 6: Production Checklist

  • Rate limiting per user (not just per IP)
  • Cost alerts when daily spend exceeds threshold
  • Prompt injection defense — never pass raw user input directly into system prompts
  • Fallback UI when AI is unavailable — degrade gracefully, don't crash
  • A/B test AI features — measure actual engagement
  • GDPR compliance — strip PII from prompts when possible
  • Monitor latency — alert when p95 exceeds 3 seconds
  • Log all prompts + responses for debugging and cost tracking

Closing

Three patterns. Three hooks. Three backend routes.

Streaming chat turns 5-second waits into experiences that feel instantaneous. Semantic search with hybrid scoring gives you a single PostgreSQL deployment with no separate vector database infrastructure. AI-powered form validation is the sleeper feature most teams overlook — for $0.001 per validation, you get spam detection and meaning-based validation that regex cannot do.

The winners won't be developers who build the most impressive AI demos. They'll be the ones who build AI features that are fast (streaming, not spinners), cheap (cached, rate-limited, model-routed), and gracefully degradable (the app works fine when the AI service doesn't).

Start with streaming chat. Get it into production. Measure. Then expand.


Written by Pratik K Rupareliya, Co-Founder & Head of Strategy at Intuz — building React, Next.js, and AI-powered web applications for enterprise clients.

Top comments (0)