DEV Community

Cover image for Building an AI Chat Interface in React Native with Streaming Responses
Famitha M A
Famitha M A

Posted on • Originally published at fami-blog.hashnode.dev

Building an AI Chat Interface in React Native with Streaming Responses

Building an AI Chat Interface in React Native with Streaming Responses

Tap "send", wait three seconds, then a wall of text appears at once. That's the difference between an AI chat that feels alive and one that feels broken. Streaming — token-by-token rendering as the model thinks — turns a slow API call into something users perceive as a conversation. On the web, the Vercel AI SDK and useChat make it almost trivial. On React Native, you have to think harder: there's no EventSource in the runtime, FlatList can't be naïve about re-renders, and Markdown doesn't render itself like it does in the browser.

This post walks through a production-ready streaming AI chat in React Native and Expo end-to-end — server endpoint, client wiring, and the mobile-specific gotchas (network drops, scroll behavior, persistence) most tutorials skip.

Stack: Expo SDK 54+, React Native 0.81, the Vercel AI SDK (ai v4.3+), and any major LLM provider — Anthropic Claude, OpenAI GPT, or xAI Grok via the @ai-sdk/* adapters.

Why streaming changes the UX

Streaming isn't a performance optimization, it's a perception fix. A non-streaming chat call that takes 6 seconds feels broken — the user assumes the app froze. The same call streamed token-by-token feels responsive within 200ms because the first words land before the brain decides "this is taking too long."

On mobile this matters more than on web. Cellular latency varies wildly, App Store reviews punish perceived slowness, and a static spinner doesn't telegraph progress the way moving text does.

Architecture: three moving parts

  1. The mobile client — Expo app, holding messages in state, rendering tokens as they arrive.
  2. An edge endpoint — Next.js API route or similar HTTP boundary that holds your provider API key and proxies the stream.
  3. The model provider — Anthropic, OpenAI, Bedrock, etc., emitting SSE chunks.

You never call the provider directly from the mobile app. Your API key would leak into the JS bundle, and most providers reject mobile origins or rate-limit them aggressively. The edge endpoint is mandatory.

Setting up

bunx create-expo-app@latest chat-app --template tabs
cd chat-app
bun install ai @ai-sdk/anthropic zod react-native-markdown-display
Enter fullscreen mode Exit fullscreen mode

Same ai package on server and client — the AI SDK 4 normalizes streaming protocols (x-vercel-ai-data-stream) so both sides speak the same wire format.

The server endpoint

// app/api/chat/route.ts (Next.js)
import { anthropic } from '@ai-sdk/anthropic';
import { streamText } from 'ai';
import { z } from 'zod';

const Body = z.object({
  messages: z.array(z.object({
    role: z.enum(['user', 'assistant', 'system']),
    content: z.string(),
  })).max(50),
});

export async function POST(req: Request) {
  const { messages } = Body.parse(await req.json());

  const result = streamText({
    model: anthropic('claude-sonnet-4-5'),
    system: 'You are a concise, helpful assistant. Use Markdown for code.',
    messages,
    maxTokens: 1024,
  });

  return result.toDataStreamResponse();
}
Enter fullscreen mode Exit fullscreen mode

streamText returns lazily, toDataStreamResponse() emits the AI SDK's data-stream protocol, and Zod validates the message shape. Always cap message count — 50 is a reasonable ceiling — to prevent prompt-injection-via-history-bloat.

For production, add a per-user rate limiter (Upstash sliding window works well) and an auth check before you ever hit the provider. Every dropped request that never reaches the model is dollars saved.

The React Native client

// app/(tabs)/index.tsx
import { useChat } from 'ai/react';
import { FlatList, View } from 'react-native';
import { useRef } from 'react';

export default function ChatScreen() {
  const { messages, input, handleInputChange, handleSubmit, isLoading, stop } =
    useChat({
      api: 'https://your-api.com/api/chat',
      streamProtocol: 'data',
    });

  const listRef = useRef<FlatList>(null);

  return (
    <View style={{ flex: 1 }}>
      <FlatList
        ref={listRef}
        data={messages}
        keyExtractor={(m) => m.id}
        renderItem={({ item }) => <MessageBubble message={item} />}
        onContentSizeChange={() =>
          listRef.current?.scrollToEnd({ animated: true })
        }
      />
      <ChatInput {...{ input, handleInputChange, handleSubmit, stop, isLoading }} />
    </View>
  );
}
Enter fullscreen mode Exit fullscreen mode

Two React Native specifics: handleInputChange expects a synthetic event, so you cast a fake event shape because TextInput only gives you a string. Auto-scroll on onContentSizeChange is correct because each new token expands content height — don't scroll on onChangeText or you'll fight the user when they read earlier messages.

Rendering Markdown

LLMs return Markdown. React Native's Text won't render it. Three options:

  • react-native-markdown-display — pure JS, fast for chat-sized output, custom renderers for code blocks. Default choice.
  • @expo/html-elements + Markdown-to-HTML — heavier but reusable if you render web content elsewhere.
  • Custom parser — only if you need strict typography control.

Don't use react-native-render-html. It's slow and visibly janks during streaming.

Network resilience

Mobile networks drop. A 30-second Claude response can lose its connection three times when the user walks into an elevator.

stop() from useChat calls AbortController.abort() on the underlying fetch — wire it to a stop button. For retries, restart-on-fail is almost always the right call: discard the partial response and re-send. Resume-marker patterns are complex and only worth it for very long generations. Add exponential backoff (1s, 2s, 4s, give up at 8s).

Persistence

useChat keeps messages in memory. Persist them with AsyncStorage:

useEffect(() => {
  if (!isLoading && messages.length > 0) {
    AsyncStorage.setItem('chat:messages', JSON.stringify(messages));
  }
}, [isLoading, messages]);
Enter fullscreen mode Exit fullscreen mode

Don't write on every token (hundreds of writes per response). Debounce to 300ms or only write when isLoading flips to false.

Performance on real devices

A naïve FlatList re-renders the entire conversation per token. At 30 tokens/sec on 50 messages, you drop frames.

  • Memoize bubbles with React.memo + custom comparator: re-render only on id, content, or isStreaming change.
  • Use keyExtractor correctly — return message.id, not array index.
  • Batch updates — wrap mutations in requestAnimationFrame if managing messages manually.

Test on a real low-end Android device. Streaming jank is invisible on M-series Macs and brutal on a $150 Pixel.

Production checklist

  • Edge endpoint validates with Zod
  • Per-user rate limit
  • Stop button visible during isLoading
  • Markdown renderer with code highlighting
  • Messages persist across restarts
  • Auto-scroll on content-size change, not input change
  • Tested on real low-end Android
  • Sentry on the streaming endpoint
  • Stream timeout configured (30s soft cap)

Wrapping up

A great React Native AI chat is mostly mobile craft, not AI craft. Streaming is solved at the SDK layer; the work is rendering tokens smoothly on a $150 Android, persisting messages, surviving cellular handoffs.

If you want to skip the boilerplate, RapidNative generates the streaming endpoint + useChat wiring + Markdown bubbles + persistence layer from a natural-language prompt. Output is plain Expo code you own and can extend.

Top comments (0)