Mr.Pan

Posted on May 4

How I Built an AI Chat Platform with Multi-Model Comparison and Git-Like Branching

#showdev #webdev #ai #productivity

I've been working on YeeroAI — an AI conversation platform with a few ideas I haven't seen elsewhere. Let me walk through the key technical decisions.

The Problem

Most AI chat tools treat conversations as disposable. You close the tab, and the knowledge is gone. I wanted to build something where:

Every conversation accumulates knowledge
You can compare multiple AI models at once
Conversations can branch like Git
Everything is semantically searchable

1. Multi-Model Parallel Chat

Architecture

User Message
    ├── Model A (GPT-4o) ──→ SSE Stream A
    ├── Model B (Claude)  ──→ SSE Stream B
    └── Model C (Gemini)  ──→ SSE Stream C

We use OpenRouter as a unified API to access 300+ models. Each model request runs on a virtual thread (JDK 25), and responses stream back via SSE.

Frontend: EventSource listeners for each model, with independent state tracking (preparing → reasoning → streaming → done/error/cancelled).

Backend: Spring Boot's SseEmitter with virtual threads for parallel execution. First token arrives in <3 seconds.

2. Git-Like Conversation Branching

Data Model

Chat
 └── Branch (1:N)
      └── BranchMessageNode (1:N)
           └── Message (1:N, one per model response)

Key operations:

Fork: Create a new Branch from a Node, copying prior messages
Merge: Append source Branch's Messages to target Branch
Pick: Mark which Message in a Node is the "main path"
Force Point: Redirect a Branch's HEAD to another Branch's position

Each Branch has independent config: model list, tools, temperature, reasoning effort, context window size, etc.

This turns conversations from linear chains into tree structures — perfect for exploring multiple approaches to a problem.

3. Knowledge Extraction + Vector Search

Pipeline

AI completes a response
Async task extracts key insights → knowledge card (title + summary + tags)
bge-m3 model generates 1024-dim vector embedding
Store in PostgreSQL + pgvector
Users can browse, search, and reference knowledge in the knowledge base

Memory System

AI also extracts user preferences and habits as "memories" with 6 stability levels:

FIXED → VERY_STABLE → HABIT → PREFERENCE → MUTABLE → WEAK

Higher stability = longer retention. Memories are vectorized and auto-retrieved by cosine similarity on each user message, then injected into the system prompt as context.

Context Compression

Optional feature: replace full message history with AI-generated summaries. Drastically reduces token costs for long conversations.

4. Semantic Search

Messages: Vector similarity search with similarity percentage display
Files: Dual mode — keyword + AI semantic search (describe what you're looking for)
Global: Ctrl+K across all types (chats, messages, files, apps, folders)

5. AI App Generation

Users describe an idea in natural language → AI generates a complete HTML app via SSE streaming → rendered in a sandboxed iframe. Version history, manual code editing, and live preview.

Just shipped discussion panels for collaborative app review.

6. Background Streaming

One feature I haven't seen in other AI chat platforms: when you close the tab, the backend keeps receiving AI responses. When you reopen the conversation, SSE auto-reconnects and you pick up where you left off.

Implementation:

backgroundStreamEnabled field on IceUser table (persistent global toggle)
Redis List stores every chunk (guaranteed no data loss)
Redis Pub/Sub broadcasts new chunks (multi-instance decoupling)
ChatStreamHub manages background stream state
/checkActiveStream + /resume API endpoints for reconnection
Frontend: resumeStream in use-chat-sse.ts handles CATCH_UP (replacement mode) and subsequent stream events
MessageList.tsx auto-detects STREAMING messages and triggers reconnection

7. Reasoning Visibility + Multi-Modal

Reasoning: For reasoning models (o1/o3), MessageKnowledgeReasoningBlock component enables expandable chain-of-thought display. SseMessageCard shows reasoning content in real-time during streaming.

Multi-modal: Image, audio, video input supported. AI can understand and respond across media types. Image generation with configurable sizes (1K-4K) and aspect ratios.

Model Marketplace: Independent page to browse all 300+ models with ModelCard, ModelFilters, ModelPagination components. Filter by provider, modality, pricing, or free-only.

Tech Stack

Layer	Tech
Frontend	Next.js 16 + React 19 + TypeScript + shadcn/ui
Backend	Spring Boot 3.5 + Kotlin + JDK 25 (virtual threads)
Database	PostgreSQL + pgvector
Cache	Redis (Redisson) + Caffeine (local)
AI	Spring AI → OpenRouter
Embeddings	bge-m3 (1024-dim)
Streaming	SSE
i18n	next-intl (English + Chinese)

Try it free: yeero.ai — 10,000 credits on signup, no card needed.

Happy to answer questions about any of the technical decisions!

DEV Community