I've been working on YeeroAI — an AI conversation platform with a few ideas I haven't seen elsewhere. Let me walk through the key technical decisions.
The Problem
Most AI chat tools treat conversations as disposable. You close the tab, and the knowledge is gone. I wanted to build something where:
- Every conversation accumulates knowledge
- You can compare multiple AI models at once
- Conversations can branch like Git
- Everything is semantically searchable
1. Multi-Model Parallel Chat
Architecture
User Message
├── Model A (GPT-4o) ──→ SSE Stream A
├── Model B (Claude) ──→ SSE Stream B
└── Model C (Gemini) ──→ SSE Stream C
We use OpenRouter as a unified API to access 300+ models. Each model request runs on a virtual thread (JDK 25), and responses stream back via SSE.
Frontend: EventSource listeners for each model, with independent state tracking (preparing → reasoning → streaming → done/error/cancelled).
Backend: Spring Boot's SseEmitter with virtual threads for parallel execution. First token arrives in <3 seconds.
2. Git-Like Conversation Branching
Data Model
Chat
└── Branch (1:N)
└── BranchMessageNode (1:N)
└── Message (1:N, one per model response)
Key operations:
- Fork: Create a new Branch from a Node, copying prior messages
- Merge: Append source Branch's Messages to target Branch
- Pick: Mark which Message in a Node is the "main path"
- Force Point: Redirect a Branch's HEAD to another Branch's position
Each Branch has independent config: model list, tools, temperature, reasoning effort, context window size, etc.
This turns conversations from linear chains into tree structures — perfect for exploring multiple approaches to a problem.
3. Knowledge Extraction + Vector Search
Pipeline
- AI completes a response
- Async task extracts key insights → knowledge card (title + summary + tags)
- bge-m3 model generates 1024-dim vector embedding
- Store in PostgreSQL + pgvector
- Users can browse, search, and reference knowledge in the knowledge base
Memory System
AI also extracts user preferences and habits as "memories" with 6 stability levels:
FIXED → VERY_STABLE → HABIT → PREFERENCE → MUTABLE → WEAK
Higher stability = longer retention. Memories are vectorized and auto-retrieved by cosine similarity on each user message, then injected into the system prompt as context.
Context Compression
Optional feature: replace full message history with AI-generated summaries. Drastically reduces token costs for long conversations.
4. Semantic Search
- Messages: Vector similarity search with similarity percentage display
- Files: Dual mode — keyword + AI semantic search (describe what you're looking for)
- Global: Ctrl+K across all types (chats, messages, files, apps, folders)
5. AI App Generation
Users describe an idea in natural language → AI generates a complete HTML app via SSE streaming → rendered in a sandboxed iframe. Version history, manual code editing, and live preview.
Just shipped discussion panels for collaborative app review.
6. Background Streaming
One feature I haven't seen in other AI chat platforms: when you close the tab, the backend keeps receiving AI responses. When you reopen the conversation, SSE auto-reconnects and you pick up where you left off.
Implementation:
-
backgroundStreamEnabledfield onIceUsertable (persistent global toggle) - Redis List stores every chunk (guaranteed no data loss)
- Redis Pub/Sub broadcasts new chunks (multi-instance decoupling)
-
ChatStreamHubmanages background stream state -
/checkActiveStream+/resumeAPI endpoints for reconnection - Frontend:
resumeStreaminuse-chat-sse.tshandles CATCH_UP (replacement mode) and subsequent stream events -
MessageList.tsxauto-detects STREAMING messages and triggers reconnection
7. Reasoning Visibility + Multi-Modal
Reasoning: For reasoning models (o1/o3), MessageKnowledgeReasoningBlock component enables expandable chain-of-thought display. SseMessageCard shows reasoning content in real-time during streaming.
Multi-modal: Image, audio, video input supported. AI can understand and respond across media types. Image generation with configurable sizes (1K-4K) and aspect ratios.
Model Marketplace: Independent page to browse all 300+ models with ModelCard, ModelFilters, ModelPagination components. Filter by provider, modality, pricing, or free-only.
Tech Stack
| Layer | Tech |
|---|---|
| Frontend | Next.js 16 + React 19 + TypeScript + shadcn/ui |
| Backend | Spring Boot 3.5 + Kotlin + JDK 25 (virtual threads) |
| Database | PostgreSQL + pgvector |
| Cache | Redis (Redisson) + Caffeine (local) |
| AI | Spring AI → OpenRouter |
| Embeddings | bge-m3 (1024-dim) |
| Streaming | SSE |
| i18n | next-intl (English + Chinese) |
Try it free: yeero.ai — 10,000 credits on signup, no card needed.
Happy to answer questions about any of the technical decisions!
Top comments (0)