System Overview
Kinetiq is a global education platform with real-time collaboration and multi-language support. Current scale:
- 500+ tutors, 30 countries
- 3,500+ completed sessions
- 150+ languages supported
- 99.9% uptime in production
This post covers the technical infrastructure and key architectural decisions.
Architecture Layers
┌─────────────────────────────────────────────────────┐
│ Frontend: Next.js 15.5.4 + React 19.1.0 │
│ - Server Components for data-heavy pages │
│ - React Query 5.90.2 for server state │
│ - Zustand 5.0.8 for client state │
│ - TypeScript 5.x │
└─────────────────────────────────────────────────────┘
▼ HTTPS/WSS
┌─────────────────────────────────────────────────────┐
│ Backend: .NET 10 Minimal APIs │
│ - Dapper ORM for queries │
│ - SignalR for WebSocket connections │
│ - PostgreSQL 16 (primary data) │
│ - Redis 7 (cache, sessions, pub/sub) │
└─────────────────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────┐
│ Real-Time Services │
│ - LiveKit (self-hosted WebRTC SFU) │
│ - Yjs CRDT (collaborative whiteboard + code editor) │
│ - Deepgram (real-time STT, <500ms latency) │
│ - Azure Cognitive Services (translation) │
│ - OpenAI GPT-4 + Whisper (content generation) │
└─────────────────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────┐
│ Infrastructure: Kubernetes + Docker │
│ - Dapr 1.16 Service Mesh │
│ - Cloudflare R2 (video/file storage) │
│ - Serilog → Seq (structured logging) │
│ - Zipkin (distributed tracing) │
└─────────────────────────────────────────────────────┘
Key Technical Decisions
1. .NET 10 Minimal APIs Over Node.js
Decision: Use .NET 10 for backend instead of Node.js/Express.
Rationale:
- Performance: TechEmpower benchmarks show .NET 10 Minimal APIs at 7M req/s vs Node.js (Fastify) at 1.2M req/s
- Concurrency: SignalR handles 100K+ concurrent WebSocket connections on single instance
- CPU-intensive workloads: Better for AI prompt processing and real-time translation orchestration
- Memory: Lower memory footprint under load compared to Node.js event loop
Production metrics:
- 100K concurrent WebSocket connections at 40% CPU
- P50 API response: 8ms
- P95 API response: 45ms
2. Yjs CRDT for Collaborative Editing
Decision: Use Yjs (Conflict-free Replicated Data Types) for real-time collaboration.
Rationale:
- Conflict-free: Multiple users can edit simultaneously without merge conflicts
- Eventual consistency: All clients converge to identical state
- Offline-capable: Operations queue locally and sync when reconnected
- Performance: Operations are lightweight, minimal network overhead
Implementation:
Whiteboard (tldraw):
import { Tldraw } from 'tldraw'
import { YjsEditor } from '@tldraw/yjs'
import * as Y from 'yjs'
import { WebsocketProvider } from 'y-websocket'
const doc = new Y.Doc()
const provider = new WebsocketProvider('wss://kinetiq.one/sync', roomId, doc)
const store = createTLStore()
new YjsEditor(store, doc.getMap('tldraw'))
Code editor (Monaco):
import { MonacoBinding } from 'y-monaco'
const doc = new Y.Doc()
const yText = doc.getText('monaco')
const provider = new WebsocketProvider('wss://kinetiq.one/sync', roomId, doc)
new MonacoBinding(
yText,
editor.getModel(),
new Set([editor]),
provider.awareness
)
Production metrics:
- Zero sync conflicts since deployment (6 months, 3,500+ sessions)
- P50 sync latency: 12ms
- P95 sync latency: 85ms (cross-continent)
3. Self-Hosted LiveKit Over Managed Services
Decision: Self-host LiveKit WebRTC SFU instead of using Twilio/Agora/Vonage.
Rationale:
- Cost: Managed services charge $0.004-0.015/minute/participant. At 3,500+ sessions, this is $2K+/month
- Control: Full control over SFU configuration, recording format, storage location
- Scalability: Horizontal scaling in Kubernetes with auto-healing
- Integration: Direct recording to Cloudflare R2 (zero egress fees)
Infrastructure:
- LiveKit deployed as Kubernetes StatefulSet
- Auto-scaling based on active rooms
- Health checks + liveness probes
- Recording pipeline: LiveKit → Cloudflare R2 → Deepgram/Whisper
Cost comparison:
- Managed service: ~$2,000/month
- Self-hosted (Kubernetes + compute): ~$200/month
- Savings: $1,800/month
4. PostgreSQL + Redis Architecture
Decision: PostgreSQL 16 as primary database, Redis 7 for caching and real-time features.
PostgreSQL usage:
- User data, courses, sessions
- Full-text search with GIN indexes
- JSONB for flexible schemas (course content, user preferences)
Redis usage:
- Session storage (distributed across instances)
- Translation cache (85% hit rate, saves $680/month on Azure API calls)
- Real-time presence (online users, typing indicators)
- Rate limiting (sliding window algorithm)
- Pub/Sub for SignalR backplane (multi-instance WebSocket)
Query optimization example:
-- Tutor search with full-text + filters
CREATE INDEX idx_tutors_search ON tutors USING GIN(to_tsvector('english', name || ' ' || bio));
CREATE INDEX idx_tutors_skills ON tutors USING GIN(skills);
-- Query time: 8s → 40ms after indexing
SELECT * FROM tutors
WHERE to_tsvector('english', name || ' ' || bio) @@ to_tsquery('python & react')
AND skills @> ARRAY['typescript', 'node.js']
LIMIT 20;
5. Translation Caching Strategy
Problem: Azure Translator API costs $10 per 1M chars. At scale: $800/month for repetitive phrases.
Solution: Redis caching with intelligent key strategy.
// C# caching implementation
public async Task<string> TranslateAsync(string text, string targetLang)
{
var cacheKey = $"translate:{ComputeHash(text)}:{targetLang}";
// Check cache first
var cached = await _redis.StringGetAsync(cacheKey);
if (cached.HasValue)
return cached.ToString();
// Miss: call Azure API
var translated = await _azureTranslator.TranslateAsync(text, targetLang);
// Cache for 7 days
await _redis.StringSetAsync(cacheKey, translated, TimeSpan.FromDays(7));
return translated;
}
Results:
- Cache hit rate: 85%
- API calls reduced from 8M/month to 1.2M/month
- Cost: $800/month → $120/month
- P50 latency: 450ms → 15ms (cache hit)
Production Metrics
Uptime & Reliability:
- 99.9% uptime (Kubernetes auto-healing)
- Zero-downtime deployments (rolling updates)
- Mean time to recovery: <2 minutes
Performance:
- API P50: 8ms, P95: 45ms
- WebSocket sync P50: 12ms, P95: 85ms
- Translation P50: 15ms (cached), P95: 450ms
- Full-text search: 40ms average
Scale:
- 100K+ concurrent WebSocket connections supported
- 500 concurrent tutoring sessions (peak)
- 3,500+ completed sessions
- 150+ languages supported
Cost Efficiency:
- Infrastructure: ~$400/month (compute + storage + CDN)
- LiveKit self-hosted: $200/month vs $2K managed
- Translation caching: $120/month vs $800 without cache
- Total savings: ~$2,500/month vs managed alternatives
Technology Stack
Frontend:
- Next.js 15.5.4 (App Router, Server Components)
- React 19.1.0
- TypeScript 5.x
- Tailwind CSS 4.0
- React Query 5.90.2
- Zustand 5.0.8
Backend:
- .NET 10 (Minimal APIs)
- Dapper ORM
- SignalR (WebSockets)
- PostgreSQL 16
- Redis 7
Real-Time:
- LiveKit (self-hosted SFU)
- Yjs CRDT + y-websocket
- tldraw 2.4.6 (whiteboard)
- Monaco Editor + y-monaco (code editor)
AI/ML:
- OpenAI GPT-4 (content generation)
- Whisper (transcription)
- Deepgram (real-time STT)
- Azure Cognitive Services (translation)
Infrastructure:
- Kubernetes + Docker
- Dapr 1.16 (service mesh)
- Cloudflare R2 (object storage)
- Serilog + Seq (logging)
- Zipkin (distributed tracing)
Read More
Full case study with implementation details, challenges, and lessons learned:
wojciechowski.app/en/articles/kinetiq-case-study
Portfolio: wojciechowski.app
Top comments (1)
Questions about the architecture? Drop a comment.