"It's just a fetch call, bro." — Every junior dev, moments before learning a hard lesson.
So you want to build a ChatGPT wrapper. Fair enough. It sounds embarrassingly simple — you've got an API key, a text box, and ambition. How hard can it be?
Turns out: pretty hard. Not rocket science hard, but "oh god why is my chat broken on refresh" hard.
Let's walk through the three stages of enlightenment.
🟢 Stage 1: The Junior Dev Move
"Ship it. It works on my machine."
The junior dev opens a blank React project, installs axios (or doesn't, and just uses fetch), and writes something like this:
Frontend:
const res = await fetch('/api/chat', {
method: 'POST',
body: JSON.stringify({ msg: input })
});
const { result } = await res.json();
setReply(result);
Backend:
app.post('/api/chat', async (req, res) => {
const result = await ai.chat(req.body.msg);
res.json({ result });
});
Done. Deployed. LinkedIn post drafted.
The problem? You're waiting for the entire AI response to generate before anything shows up on screen. For a short reply, fine. For a longer one? You're staring at a blank screen for 10–15 seconds while your users quietly close the tab.
It feels like dial-up internet. In 2025.
🟡 Stage 2: The Mid-Level Dev Move
"I've heard of this 'streaming' thing."
The mid-level dev has seen the ChatGPT UI. Tokens appear one by one, like the AI is thinking. That's streaming — and it's a vastly better experience.
Backend:
app.post('/api/chat', async (req, res) => {
res.setHeader('Content-Type', 'text/event-stream');
const stream = await ai.chat.stream(req.body.msg);
for await (const chunk of stream) {
res.write(`data: ${chunk.text}\n\n`);
}
res.end();
});
Frontend:
const response = await fetch('/api/chat', { method: 'POST', body: ... });
const reader = response.body.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
setReply(prev => prev + decode(value));
}
Now tokens stream in real-time. The app feels alive. Users are happy. You are happy.
The problem? Hit refresh.
Gone. All of it. Every conversation, vanished. The app has the memory of a goldfish. Each new page load is a blank slate — no history, no context, no continuity.
You've built a chat app that forgets you exist the moment you leave. That's not a chat app. That's a very expensive alert() box.
🔴 Stage 3: The Senior Dev Move
"Streaming is table stakes. Persistence is the real work."
The senior dev knows that a chat app without history isn't a product — it's a demo. Real users expect to close the tab, come back tomorrow, and pick up where they left off. They expect their conversation to exist.
This means the stream itself needs to be saved as it arrives — not after it's done, not on the next request, but token by token, persisted to a database in real time.
Backend — save while streaming:
app.post('/api/chat', async (req, res) => {
const { conversationId, msg } = req.body;
// Save user message immediately
await db.messages.insert({ conversationId, role: 'user', content: msg });
res.setHeader('Content-Type', 'text/event-stream');
const stream = await ai.chat.stream(msg);
let fullResponse = '';
for await (const chunk of stream) {
fullResponse += chunk.text;
res.write(`data: ${chunk.text}\n\n`);
}
// Save complete assistant response after stream finishes
await db.messages.insert({ conversationId, role: 'assistant', content: fullResponse });
res.end();
});
Frontend — load history on mount:
useEffect(() => {
const history = await fetch(`/api/conversations/${id}/messages`);
setMessages(await history.json());
}, [id]);
Now when a user refreshes? Their chat is still there. When they come back on mobile? Still there. When they share a conversation link? You get the idea.
But wait, there's more. The senior dev also thinks about:
- Sending the full conversation history back to the AI (so it has actual context)
- Handling mid-stream disconnects gracefully (what if the user closes the tab at 50%?)
- Debouncing saves so you're not hammering your DB on every single token
- Auth, so users only see their conversations
None of this is glamorous. It's just... what production looks like.
The Takeaway
| Level | What they build | What's missing |
|---|---|---|
| Junior | Request → Response | Speed, UX |
| Mid | Streaming response | Persistence |
| Senior | Streaming + saved history | (probably something else they'll discover next week) |
The gap between a junior and senior implementation isn't really about the AI part at all — the API call is the easy bit. It's about treating the surrounding infrastructure with the same seriousness as the feature itself.
So next time someone says "I built an AI chat app over the weekend" — ask them what happens when they refresh the page.
The silence will tell you everything.
Now go add persistence to your side project. You know who you are.

Top comments (0)