DEV Community

Mr.Pan
Mr.Pan

Posted on

How I Built an AI Chat Platform with Multi-Model Comparison and Git-Like Branching

I've been working on YeeroAI — an AI conversation platform with a few ideas I haven't seen elsewhere. Let me walk through the key technical decisions.

The Problem

Most AI chat tools treat conversations as disposable. You close the tab, and the knowledge is gone. I wanted to build something where:

  1. Every conversation accumulates knowledge
  2. You can compare multiple AI models at once
  3. Conversations can branch like Git
  4. Everything is semantically searchable

1. Multi-Model Parallel Chat

Architecture

User Message
    ├── Model A (GPT-4o) ──→ SSE Stream A
    ├── Model B (Claude)  ──→ SSE Stream B
    └── Model C (Gemini)  ──→ SSE Stream C
Enter fullscreen mode Exit fullscreen mode

We use OpenRouter as a unified API to access 300+ models. Each model request runs on a virtual thread (JDK 25), and responses stream back via SSE.

Frontend: EventSource listeners for each model, with independent state tracking (preparing → reasoning → streaming → done/error/cancelled).

Backend: Spring Boot's SseEmitter with virtual threads for parallel execution. First token arrives in <3 seconds.

2. Git-Like Conversation Branching

Data Model

Chat
 └── Branch (1:N)
      └── BranchMessageNode (1:N)
           └── Message (1:N, one per model response)
Enter fullscreen mode Exit fullscreen mode

Key operations:

  • Fork: Create a new Branch from a Node, copying prior messages
  • Merge: Append source Branch's Messages to target Branch
  • Pick: Mark which Message in a Node is the "main path"
  • Force Point: Redirect a Branch's HEAD to another Branch's position

Each Branch has independent config: model list, tools, temperature, reasoning effort, context window size, etc.

This turns conversations from linear chains into tree structures — perfect for exploring multiple approaches to a problem.

3. Knowledge Extraction + Vector Search

Pipeline

  1. AI completes a response
  2. Async task extracts key insights → knowledge card (title + summary + tags)
  3. bge-m3 model generates 1024-dim vector embedding
  4. Store in PostgreSQL + pgvector
  5. Users can browse, search, and reference knowledge in the knowledge base

Memory System

AI also extracts user preferences and habits as "memories" with 6 stability levels:

FIXED → VERY_STABLE → HABIT → PREFERENCE → MUTABLE → WEAK
Enter fullscreen mode Exit fullscreen mode

Higher stability = longer retention. Memories are vectorized and auto-retrieved by cosine similarity on each user message, then injected into the system prompt as context.

Context Compression

Optional feature: replace full message history with AI-generated summaries. Drastically reduces token costs for long conversations.

4. Semantic Search

  • Messages: Vector similarity search with similarity percentage display
  • Files: Dual mode — keyword + AI semantic search (describe what you're looking for)
  • Global: Ctrl+K across all types (chats, messages, files, apps, folders)

5. AI App Generation

Users describe an idea in natural language → AI generates a complete HTML app via SSE streaming → rendered in a sandboxed iframe. Version history, manual code editing, and live preview.

Just shipped discussion panels for collaborative app review.

6. Background Streaming

One feature I haven't seen in other AI chat platforms: when you close the tab, the backend keeps receiving AI responses. When you reopen the conversation, SSE auto-reconnects and you pick up where you left off.

Implementation:

  • backgroundStreamEnabled field on IceUser table (persistent global toggle)
  • Redis List stores every chunk (guaranteed no data loss)
  • Redis Pub/Sub broadcasts new chunks (multi-instance decoupling)
  • ChatStreamHub manages background stream state
  • /checkActiveStream + /resume API endpoints for reconnection
  • Frontend: resumeStream in use-chat-sse.ts handles CATCH_UP (replacement mode) and subsequent stream events
  • MessageList.tsx auto-detects STREAMING messages and triggers reconnection

7. Reasoning Visibility + Multi-Modal

Reasoning: For reasoning models (o1/o3), MessageKnowledgeReasoningBlock component enables expandable chain-of-thought display. SseMessageCard shows reasoning content in real-time during streaming.

Multi-modal: Image, audio, video input supported. AI can understand and respond across media types. Image generation with configurable sizes (1K-4K) and aspect ratios.

Model Marketplace: Independent page to browse all 300+ models with ModelCard, ModelFilters, ModelPagination components. Filter by provider, modality, pricing, or free-only.

Tech Stack

Layer Tech
Frontend Next.js 16 + React 19 + TypeScript + shadcn/ui
Backend Spring Boot 3.5 + Kotlin + JDK 25 (virtual threads)
Database PostgreSQL + pgvector
Cache Redis (Redisson) + Caffeine (local)
AI Spring AI → OpenRouter
Embeddings bge-m3 (1024-dim)
Streaming SSE
i18n next-intl (English + Chinese)

Try it free: yeero.ai — 10,000 credits on signup, no card needed.

Happy to answer questions about any of the technical decisions!

Top comments (0)