Jamal Saad

Posted on Mar 9

Why modern real-time apps require 5 different systems — and why I built one instead

#agents #ai #backend #database

Modern apps are expected to feel live.

Chat apps, AI products, collaborative tools, notifications, activity feeds — they all need fast writes, real-time updates, and storage that does not become expensive as data grows.

But the usual backend stack gets complicated fast.

You store data in one system.
You push live updates with another.
You archive old data somewhere cheaper.
You add extra glue code to keep everything in sync.

That complexity is exactly what pushed me to build KalamDB.

KalamDB is an open-source, SQL-first real-time database designed around a simple idea: keep recent data fast, move older data to cheaper storage, and let developers subscribe to changes directly from the database. The project is explicitly aimed at speed, efficiency, and minimal resource use, with a goal of reducing CPU, memory, storage, and infrastructure cost.

At the core of KalamDB is a two-tier storage model.

New writes first land in RocksDB hot storage, which gives very fast acknowledgments and keeps recent data close for quick reads. Later, that data is flushed into columnar Parquet files in filesystem or object storage, where it becomes much cheaper to keep for the long term. Queries can read across both tiers, so applications get fresh data from hot storage and older history from cold storage without needing separate systems.

KalamDB also supports three table types, and that matters a lot for cost and design.

USER tables isolate data per user and flush each user’s partition into separate Parquet files. That makes them a strong fit for chat history, tenant data, and privacy-sensitive workloads.

SHARED tables store global data once for everyone, which works well for configuration data, reference datasets, and cross-user analytics.

STREAM tables are designed for short‑lived real-time events like typing indicators, presence, AI thinking signals, or notifications. They stay in memory or RocksDB hot storage, use TTL-based eviction, and are never flushed to Parquet.

Another big part of the design is the table-per-user architecture.

Instead of putting all users into one giant shared table and filtering constantly, KalamDB stores each user’s data in isolated partitions. That makes real-time subscriptions simpler and helps the system scale more naturally for workloads like conversations, notifications, and AI history. The idea is to reduce complexity while keeping performance predictable as the number of users grows.

What it feels like to use KalamDB

Instead of constantly polling the database for updates, applications can subscribe to changes directly from SQL.

For example, a chat application can subscribe to new messages in a conversation:

SUBSCRIBE TO chat.messages
WHERE conversation_id = 1
OPTIONS (last_rows = 20);

Whenever a new row is inserted into the table, the client instantly receives the update through a WebSocket connection.

The same idea can be used from application code using the JavaScript SDK:

import { createClient, Auth, SeqId } from 'kalam-link';

const client = createClient({
  url: 'http://localhost:8080',
  authProvider: async () => Auth.jwt('<ACCESS_TOKEN>'),
});

const stopMessages = await client.live(
  `SELECT id, room, role, content, created_at
    FROM chat.messages`,
  (rows) => {
    // `chat.messages` can be a USER table.
    // The same query runs for every signed-in customer, but KalamDB only
    // returns that caller's rows.
    renderRows(rows);
  },
  {
    subscriptionOptions: {
      last_rows: 200,
      from: SeqId.from('7262216745594062848'),
    },
  },
);

await client.query(
  'INSERT INTO chat.messages (room, role, content) VALUES ($1, $2, $3)',
  ['main', 'user', 'hello from the browser']
);

This is the type of pattern KalamDB was designed for: real-time apps where data storage and live updates happen in the same system.

That is the real reason I built KalamDB.

Not to add another database to the world, but to cut down the amount of infrastructure developers have to stitch together just to build something real-time.

Less glue code.
Less storage waste.
Less operational overhead.
And a simpler path from fast writes to cheaper long-term storage.

KalamDB is still in development, and I would describe file storage/BLOB support and high availability as part of the direction, not current production-ready features yet. But the core idea is already clear: a database built to make real-time apps simpler and cheaper to run.

If you want to explore KalamDB:

🌐 Website: kalamdb.org
⭐ GitHub: KalamDB on GitHub
🚀 Quick Start: Run KalamDB locally
🧠 Docs: Learn the SQL syntax

Top comments (2)

Mihir kanzariya • Mar 9

This resonates a lot. We ran into the exact same "5 systems" problem building a real-time collaboration tool. You end up with Postgres for persistence, Redis for pub/sub, some kind of CRDT layer, a separate search index, and then a caching layer on top.

The SQL-first approach is interesting because most of these unified solutions sacrifice query flexibility. Being able to just write normal SQL but still get real-time updates sounds like the best of both worlds honestly.

Curious about the conflict resolution strategy when multiple clients write to the same row simultaneously. Is it last-write-wins or something more sophisticated?

Jamal Saad • Mar 9

Appreciate that — that’s exactly the pain point I kept running into too. You start with “just store some messages,” and suddenly you’re operating a database, cache, pub/sub system, search index, and coordination glue.

That’s really the motivation behind KalamDB: keep the SQL model developers already know, but make real-time subscriptions part of the database itself instead of something you bolt on later.

For conflicts, KalamDB follows a simple MVCC model where the latest change wins. One design choice that helps a lot is isolating each user’s data, every user effectively has their own tables/tenant space. Because writes from different users don’t hit the same rows, it removes a large class of conflicts in the first place.

Looking forward, especially for offline-first use cases, I’m interested in moving closer to CRDT-style conflict resolution. That could mean merging columns or applying diffs when concurrent edits happen. I haven’t finalized the design yet, but there are some good ideas from existing systems that could fit well here.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.