I Built a Group Chat Where Multiple AIs Can Talk to You at the Same Time

#opensource #productivity #ai #programming

There's a specific habit I picked up early as an AI developer that probably sounds familiar if you've spent any real time working with language models.
You come across a problem. You want to know how different models think about it. So you open one tab for ChatGPT, another for Claude, maybe a third for Gemini. You paste the same prompt three times. You wait. Then you manually read through three separate conversations trying to piece together which perspective makes the most sense.
It works. But it's tedious in a way that starts to bother you after the hundredth time.
I kept thinking there had to be a better way to do this. Not a comparison tool with a split screen and rigid inputs, but something that actually felt like a conversation. Something where you could just talk, and multiple models would respond naturally, in the same thread, the same way your friends would in a group chat.
That thought turned into Kōl.

The Idea

The concept is simple. You create a room. You add whichever AI models you want as members, and you can also invite actual people. Friends, teammates, whoever. Then you send a message and everyone in the room responds, humans and AIs together, in the same conversation thread.
It's not a side by side comparison UI. It's not a dashboard with dropdowns. It's a chat room. The AIs are members the same way your friends are members. The conversation flows naturally and you can follow up, go deeper, push back on one response, and watch how everyone else handles the same thread.
But the part that makes it genuinely different from a simple multi-model wrapper is what happens over time. Each room builds its own memory. The longer a room exists, the more context it carries. The AIs in that room don't just know what was said five minutes ago. They know what your room has been about for weeks.

How I Built It

The frontend is Next.js. The backend is Express on Node.js. The real-time layer runs on Socket.io. AI models are pulled from Groq, LongCat, and Gemini, giving a mix of speed and personality across the responses.
One decision I made early that shaped the whole feel of the product: responses are not streamed token by token. When you send a message, a typing indicator appears for each AI member in the room, just like when a real person is composing a reply. Then the full response arrives when it's ready.
That single choice changed everything about how the product feels. It stopped feeling like a tool and started feeling like a conversation. The typing animation creates the same anticipation you get when a friend is actually thinking through what to say. You're not watching text generate, you're waiting for a response. The distinction sounds small but experientially it's completely different.

The Memory System

This is the part I'm most proud of and the part that took the most thinking to get right.
Every room in Kōl builds its own memory over time. There's a background model that runs silently, watching the conversations happening in each room. As messages accumulate, it processes them and builds a growing summary of what that room is about. What topics come up. What decisions were made. What the people in that room care about. What was discussed last Tuesday.
That summary isn't just stored somewhere. It gets fed back to the AI members as context every time someone sends a new message. So when you come back to a room after a few days and ask something, the AIs aren't starting from scratch. They already know the history of that room. They respond with the weight of everything that's been discussed before.
The effect is hard to describe until you experience it. It stops feeling like you're querying a model and starts feeling like you're talking to someone who was there for the whole conversation and actually remembers it.
The background model running and maintaining memory independently was an interesting architectural challenge. It needs to process conversations without interrupting them, update the summary incrementally as new messages come in, and make sure the context it produces is actually useful rather than just a raw dump of everything ever said. Getting that balance right took a few iterations.

The Routing Problem That Almost Broke Everything

With multiple rooms active and multiple models responding simultaneously, socket events were landing in the wrong rooms. It was one of those bugs that makes the whole product look broken when the actual problem is much smaller than it appears. Took a few broken UIs to find it but once I saw it clearly it was a straightforward fix.

What It Changed

The thing I didn't fully expect was how much it changed the way I interact with AI models in general.
When you're bouncing between tabs, you naturally anchor to whichever response you read first. It frames how you think about the problem and everything after gets filtered through that lens. You're not really comparing anymore, you're just looking for confirmation.
When responses arrive in the same thread alongside each other and alongside what your actual friends are saying, that bias goes away. You process them more like genuine perspectives in a real conversation. It's a small shift but it changes how you evaluate what you're reading.
And with the memory layer on top of that, the room starts to feel like a place rather than a session. Something that has history. Something you can come back to.

What I'd Do Differently

The thing I'd rethink most is how the background model picks which AIs respond. Right now there's already an LLM running behind the scenes that decides which models should reply to a given message, which is the right approach. But the decision making could be smarter. I'd want it to factor in more context, the tone of the message, which model has been most active, what the room's history looks like, so the participation feels more organic and less predictable. In a real group chat people don't all chime in with the same energy every time. The models shouldn't either.
I'd also improve the way responses are ordered when multiple models reply around the same time. Right now they arrive based on which API responded first. Giving more control over that, or making it feel more natural, is something worth thinking about.

Why I Open Sourced It

Kōl started as something I built for myself. It solved a real problem I had and I wanted to see if it solved the same problem for other people.
Open sourcing it felt like the natural next step. If someone wants to fork it, build on it, or just dig into how the memory layer or the real-time routing works, it's all there. And honestly, seeing how other developers approach the same kinds of problems is worth more to me than keeping it closed.
If you build something with it or run into something interesting, I'd genuinely like to hear about it.

Here's The repo: https://github.com/m-taqii/kol.