What happens when you put multiple AI models in the same real-time conversation

#ai #saas #startup #challenge

For months my workflow looked like this: ask ChatGPT a question, copy the answer, paste it into Claude for a second opinion, then check with Gemini. Every single day. I was the copy-paste middleman between AI models.

So I asked myself — what if they could just talk in the same room?

I built a platform where multiple AI agents and multiple humans share one real-time group conversation. Not side-by-side comparison. Not model switching. An actual group chat where everyone — human and AI — sees every message and responds in context.

What I discovered

The models genuinely disagree with each other. Not politely — substantively.

I asked all four models to analyze a business strategy:

GPT was optimistic. Growth projections, opportunity everywhere.
Claude was the skeptic. Poked holes in every assumption.
Gemini played the middle. Brought data from both sides.
Grok went sideways. Made a point nobody else considered.

The most interesting part: when Claude disagreed with GPT, GPT adjusted its next response. It saw the critique in context and self-corrected — without me telling it to.

After a few rounds, it felt less like using a tool and more like moderating a team with actual personalities.

The architecture

The real challenge wasn't integrating AI APIs — that's straightforward. The hard part was building real-time multi-user chat where some participants are humans (persistent WebSocket connections) and some are AI agents (server-side, triggered by messages).

Stack:

Frontend: React 19, Vite 6, Tailwind 4
Backend: Express 5, TypeScript
Database: PostgreSQL 15
Cache/Real-time state: Redis 7
WebSocket: Socket.IO
AI: OpenAI, Anthropic, Google, xAI SDKs

Key architectural decisions:

@Mention parsing — messages are parsed before sending to determine which agents should respond. No @mention + auto_respond flag = agent responds automatically. @mention a specific agent = only that one fires. @all-agents = all fire sequentially with a queue to prevent overlap.

Sender type — the existing role column (user/assistant/system) wasn't granular enough for multi-user rooms. Added a sender_type enum (human/agent) plus agent_id foreign key for proper attribution.

Room participants — had to build a full room_participants table with roles (owner/admin/member), invite system with shareable codes, and permission checks on every message send.

Agent orchestration — each agent has its own model, provider, system prompt, and temperature. When triggered, the full conversation history is sent as context so agents see what everyone (including other agents) has said.

Context length — long conversations blow up token costs. Solved with cursor-based pagination — load last 100 messages, fetch more on scroll. AI calls only get recent context window.

Credit system — prepaid credits instead of flat subscriptions. Users buy credits in advance, each AI response deducts based on model + tokens used. We never owe an API provider money we haven't collected.

Features that emerged

Debate Mode — two AI agents argue opposing sides of a topic in structured rounds while a human moderates. Produces significantly better analysis than asking either model for a "balanced view."

Review Mode — one agent creates content, another critiques it, the creator revises. Automatic multi-cycle feedback loops.

Red Room — submit any project or idea into a room full of AI critics configured to find weaknesses. Whatever survives is worth building.

What I'd do differently

I spent too long building features before showing anyone. The admin dashboard, fraud detection, moderation queue — all built before a single user tried the core product. If I started over, I'd ship the group chat with two AI agents and nothing else. Everything else is a distraction until you've validated that people actually want the core experience.